2024-08-03T04:43:15.826386155Z /opt/miniconda3/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
2024-08-03T04:43:15.826422135Z   warnings.warn(
2024-08-03T04:43:16.171177373Z The following values were not passed to `accelerate launch` and had defaults used instead:
2024-08-03T04:43:16.171193964Z 		More than one GPU was found, enabling multi-GPU training.
2024-08-03T04:43:16.171196208Z 		If this was unintended please pass in `--num_processes=1`.
2024-08-03T04:43:16.171197625Z 	`--dynamo_backend` was set to a value of `'no'`
2024-08-03T04:43:16.171199260Z To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
2024-08-03T04:43:17.521790191Z [2024-08-02 21:43:17,521] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-08-03T04:43:17.565018073Z df: /root/.triton/autotune: No such file or directory
2024-08-03T04:43:18.235509620Z W0802 21:43:18.235000 140343159166784 torch/distributed/run.py:757] 
2024-08-03T04:43:18.235518704Z W0802 21:43:18.235000 140343159166784 torch/distributed/run.py:757] *****************************************
2024-08-03T04:43:18.235521050Z W0802 21:43:18.235000 140343159166784 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
2024-08-03T04:43:18.235523618Z W0802 21:43:18.235000 140343159166784 torch/distributed/run.py:757] *****************************************
2024-08-03T04:43:24.284628462Z [2024-08-02 21:43:24,284] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-08-03T04:43:24.941000014Z [2024-08-02 21:43:24,940] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-08-03T04:43:25.013411560Z [2024-08-02 21:43:25,012] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-08-03T04:43:25.055771936Z /opt/miniconda3/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
2024-08-03T04:43:25.055784948Z   warnings.warn(
2024-08-03T04:43:25.057366255Z [2024-08-02 21:43:25,057] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-08-03T04:43:25.062048178Z [2024-08-02 21:43:25,061] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-08-03T04:43:25.082734616Z [2024-08-02 21:43:25,082] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-08-03T04:43:25.116629615Z [2024-08-02 21:43:25,116] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-08-03T04:43:25.157895482Z [2024-08-02 21:43:25,157] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-08-03T04:43:25.341183807Z /opt/miniconda3/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
2024-08-03T04:43:25.341210697Z   warnings.warn(
2024-08-03T04:43:25.408099456Z [93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
2024-08-03T04:43:25.408232594Z /opt/miniconda3/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
2024-08-03T04:43:25.408236827Z   warnings.warn(
2024-08-03T04:43:25.415477791Z [93m [WARNING] [0m async_io: please install the libaio-dev package with apt
2024-08-03T04:43:25.415498136Z [93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
2024-08-03T04:43:25.415518182Z [93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
2024-08-03T04:43:25.453922263Z /opt/miniconda3/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
2024-08-03T04:43:25.453939972Z   warnings.warn(
2024-08-03T04:43:25.456347456Z /opt/miniconda3/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
2024-08-03T04:43:25.456350810Z   warnings.warn(
2024-08-03T04:43:25.493852549Z /opt/miniconda3/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
2024-08-03T04:43:25.493872699Z   warnings.warn(
2024-08-03T04:43:25.521106972Z /opt/miniconda3/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
2024-08-03T04:43:25.521129433Z   warnings.warn(
2024-08-03T04:43:25.550696204Z /opt/miniconda3/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
2024-08-03T04:43:25.550719063Z   warnings.warn(
2024-08-03T04:43:25.658638643Z [93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
2024-08-03T04:43:25.666060508Z [93m [WARNING] [0m async_io: please install the libaio-dev package with apt
2024-08-03T04:43:25.666083639Z [93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
2024-08-03T04:43:25.666096987Z [93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
2024-08-03T04:43:25.724935572Z [93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
2024-08-03T04:43:25.732450432Z [93m [WARNING] [0m async_io: please install the libaio-dev package with apt
2024-08-03T04:43:25.732473648Z [93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
2024-08-03T04:43:25.732491666Z [93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
2024-08-03T04:43:25.782293256Z [93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
2024-08-03T04:43:25.786870487Z [93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
2024-08-03T04:43:25.790089689Z [93m [WARNING] [0m async_io: please install the libaio-dev package with apt
2024-08-03T04:43:25.790116555Z [93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
2024-08-03T04:43:25.790137413Z [93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
2024-08-03T04:43:25.794864480Z [93m [WARNING] [0m async_io: please install the libaio-dev package with apt
2024-08-03T04:43:25.794888725Z [93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
2024-08-03T04:43:25.794908026Z [93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
2024-08-03T04:43:25.854138325Z [93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
2024-08-03T04:43:25.858523756Z [93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
2024-08-03T04:43:25.858585192Z [93m [WARNING] [0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible
2024-08-03T04:43:25.861956290Z [93m [WARNING] [0m async_io: please install the libaio-dev package with apt
2024-08-03T04:43:25.861979220Z [93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
2024-08-03T04:43:25.861999433Z [93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
2024-08-03T04:43:25.879868426Z [93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
2024-08-03T04:43:25.883065211Z [93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
2024-08-03T04:43:25.887872074Z [93m [WARNING] [0m async_io: please install the libaio-dev package with apt
2024-08-03T04:43:25.887897731Z [93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
2024-08-03T04:43:25.887911378Z [93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
2024-08-03T04:43:25.890754145Z [93m [WARNING] [0m async_io: please install the libaio-dev package with apt
2024-08-03T04:43:25.890777964Z [93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
2024-08-03T04:43:25.890784548Z [93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
2024-08-03T04:43:26.107417643Z [93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
2024-08-03T04:43:26.107438708Z [93m [WARNING] [0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible
2024-08-03T04:43:26.179297850Z [93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
2024-08-03T04:43:26.179322023Z [93m [WARNING] [0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible
2024-08-03T04:43:26.212745694Z [93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
2024-08-03T04:43:26.212760691Z [93m [WARNING] [0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible
2024-08-03T04:43:26.256867455Z [93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
2024-08-03T04:43:26.256883754Z [93m [WARNING] [0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible
2024-08-03T04:43:26.257414001Z [93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
2024-08-03T04:43:26.257460818Z [93m [WARNING] [0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible
2024-08-03T04:43:26.278436649Z [93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
2024-08-03T04:43:26.278451229Z [93m [WARNING] [0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible
2024-08-03T04:43:26.363954720Z [93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
2024-08-03T04:43:26.363973450Z [93m [WARNING] [0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible
2024-08-03T04:43:26.653830618Z [2024-08-02 21:43:26,653] [INFO] [comm.py:637:init_distributed] cdb=None
2024-08-03T04:43:26.846123863Z [2024-08-02 21:43:26,845] [INFO] [comm.py:637:init_distributed] cdb=None
2024-08-03T04:43:26.846144946Z [2024-08-02 21:43:26,845] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
2024-08-03T04:43:26.914398518Z 08/02/2024 21:43:26 - INFO - __main__ - Distributed environment: DEEPSPEED  Backend: nccl
2024-08-03T04:43:26.914419196Z Num processes: 32
2024-08-03T04:43:26.914421036Z Process index: 2
2024-08-03T04:43:26.914422381Z Local process index: 2
2024-08-03T04:43:26.914435788Z Device: cuda:2
2024-08-03T04:43:26.914437128Z 
2024-08-03T04:43:26.914438406Z Mixed precision type: bf16
2024-08-03T04:43:26.914440259Z ds_config: {'bf16': {'enabled': True}, 'zero_optimization': {'stage': 3, 'overlap_comm': True, 'contiguous_gradients': True, 'sub_group_size': 1000000000.0, 'reduce_bucket_size': 'auto', 'stage3_prefetch_bucket_size': 'auto', 'stage3_param_persistence_threshold': 'auto', 'stage3_max_live_parameters': 1000000000.0, 'stage3_max_reuse_distance': 1000000000.0, 'stage3_gather_16bit_weights_on_model_save': True}, 'gradient_accumulation_steps': 'auto', 'gradient_clipping': 'auto', 'steps_per_print': inf, 'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'wall_clock_breakdown': False, 'fp16': {'enabled': False}}
2024-08-03T04:43:26.914444720Z 
2024-08-03T04:43:26.923110122Z [2024-08-02 21:43:26,923] [INFO] [comm.py:637:init_distributed] cdb=None
2024-08-03T04:43:26.952859095Z 08/02/2024 21:43:26 - INFO - __main__ - Distributed environment: DEEPSPEED  Backend: nccl
2024-08-03T04:43:26.952878949Z Num processes: 32
2024-08-03T04:43:26.952880981Z Process index: 0
2024-08-03T04:43:26.952882487Z Local process index: 0
2024-08-03T04:43:26.952883848Z Device: cuda:0
2024-08-03T04:43:26.952885300Z 
2024-08-03T04:43:26.952886824Z Mixed precision type: bf16
2024-08-03T04:43:26.952888363Z ds_config: {'bf16': {'enabled': True}, 'zero_optimization': {'stage': 3, 'overlap_comm': True, 'contiguous_gradients': True, 'sub_group_size': 1000000000.0, 'reduce_bucket_size': 'auto', 'stage3_prefetch_bucket_size': 'auto', 'stage3_param_persistence_threshold': 'auto', 'stage3_max_live_parameters': 1000000000.0, 'stage3_max_reuse_distance': 1000000000.0, 'stage3_gather_16bit_weights_on_model_save': True}, 'gradient_accumulation_steps': 'auto', 'gradient_clipping': 'auto', 'steps_per_print': inf, 'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'wall_clock_breakdown': False, 'fp16': {'enabled': False}}
2024-08-03T04:43:26.952891244Z 
2024-08-03T04:43:26.956426649Z [2024-08-02 21:43:26,956] [INFO] [comm.py:637:init_distributed] cdb=None
2024-08-03T04:43:27.005178505Z [2024-08-02 21:43:27,004] [INFO] [comm.py:637:init_distributed] cdb=None
2024-08-03T04:43:27.009034350Z [2024-08-02 21:43:27,008] [INFO] [comm.py:637:init_distributed] cdb=None
2024-08-03T04:43:27.019683974Z [2024-08-02 21:43:27,019] [INFO] [comm.py:637:init_distributed] cdb=None
2024-08-03T04:43:27.149157113Z [2024-08-02 21:43:27,149] [INFO] [comm.py:637:init_distributed] cdb=None
2024-08-03T04:43:27.309466775Z jupiter-cs-aus-207:95:95 [0] NCCL INFO Bootstrap : Using ibs255:10.246.2.90<0>
2024-08-03T04:43:27.309843253Z jupiter-cs-aus-207:95:95 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
2024-08-03T04:43:27.310100779Z jupiter-cs-aus-207:95:95 [0] NCCL INFO cudaDriverVersion 12010
2024-08-03T04:43:27.310105087Z NCCL version 2.20.5+cuda12.4
2024-08-03T04:43:27.322638530Z jupiter-cs-aus-207:97:97 [2] NCCL INFO cudaDriverVersion 12010
2024-08-03T04:43:27.323030911Z jupiter-cs-aus-207:97:97 [2] NCCL INFO Bootstrap : Using ibs255:10.246.2.90<0>
2024-08-03T04:43:27.323218596Z jupiter-cs-aus-207:97:97 [2] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
2024-08-03T04:43:27.567389471Z 08/02/2024 21:43:27 - INFO - __main__ - Distributed environment: DEEPSPEED  Backend: nccl
2024-08-03T04:43:27.567408719Z Num processes: 32
2024-08-03T04:43:27.567411249Z Process index: 7
2024-08-03T04:43:27.567412846Z Local process index: 7
2024-08-03T04:43:27.567414322Z Device: cuda:7
2024-08-03T04:43:27.567415645Z 
2024-08-03T04:43:27.567417036Z Mixed precision type: bf16
2024-08-03T04:43:27.567421156Z ds_config: {'bf16': {'enabled': True}, 'zero_optimization': {'stage': 3, 'overlap_comm': True, 'contiguous_gradients': True, 'sub_group_size': 1000000000.0, 'reduce_bucket_size': 'auto', 'stage3_prefetch_bucket_size': 'auto', 'stage3_param_persistence_threshold': 'auto', 'stage3_max_live_parameters': 1000000000.0, 'stage3_max_reuse_distance': 1000000000.0, 'stage3_gather_16bit_weights_on_model_save': True}, 'gradient_accumulation_steps': 'auto', 'gradient_clipping': 'auto', 'steps_per_print': inf, 'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'wall_clock_breakdown': False, 'fp16': {'enabled': False}}
2024-08-03T04:43:27.567425021Z 
2024-08-03T04:43:27.610570605Z jupiter-cs-aus-207:102:102 [7] NCCL INFO cudaDriverVersion 12010
2024-08-03T04:43:27.611744569Z jupiter-cs-aus-207:102:102 [7] NCCL INFO Bootstrap : Using ibs255:10.246.2.90<0>
2024-08-03T04:43:27.611921610Z jupiter-cs-aus-207:102:102 [7] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
2024-08-03T04:43:27.662357015Z 08/02/2024 21:43:27 - INFO - __main__ - Distributed environment: DEEPSPEED  Backend: nccl
2024-08-03T04:43:27.662360734Z Num processes: 32
2024-08-03T04:43:27.662362125Z Process index: 5
2024-08-03T04:43:27.662363353Z Local process index: 5
2024-08-03T04:43:27.662364678Z Device: cuda:5
2024-08-03T04:43:27.662365915Z 
2024-08-03T04:43:27.662367256Z Mixed precision type: bf16
2024-08-03T04:43:27.662368497Z ds_config: {'bf16': {'enabled': True}, 'zero_optimization': {'stage': 3, 'overlap_comm': True, 'contiguous_gradients': True, 'sub_group_size': 1000000000.0, 'reduce_bucket_size': 'auto', 'stage3_prefetch_bucket_size': 'auto', 'stage3_param_persistence_threshold': 'auto', 'stage3_max_live_parameters': 1000000000.0, 'stage3_max_reuse_distance': 1000000000.0, 'stage3_gather_16bit_weights_on_model_save': True}, 'gradient_accumulation_steps': 'auto', 'gradient_clipping': 'auto', 'steps_per_print': inf, 'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'wall_clock_breakdown': False, 'fp16': {'enabled': False}}
2024-08-03T04:43:27.662370958Z 
2024-08-03T04:43:27.662372105Z 08/02/2024 21:43:27 - INFO - __main__ - Distributed environment: DEEPSPEED  Backend: nccl
2024-08-03T04:43:27.662383727Z Num processes: 32
2024-08-03T04:43:27.662384956Z Process index: 6
2024-08-03T04:43:27.662386191Z Local process index: 6
2024-08-03T04:43:27.662387357Z Device: cuda:6
2024-08-03T04:43:27.662388651Z 
2024-08-03T04:43:27.662389755Z Mixed precision type: bf16
2024-08-03T04:43:27.662391123Z ds_config: {'bf16': {'enabled': True}, 'zero_optimization': {'stage': 3, 'overlap_comm': True, 'contiguous_gradients': True, 'sub_group_size': 1000000000.0, 'reduce_bucket_size': 'auto', 'stage3_prefetch_bucket_size': 'auto', 'stage3_param_persistence_threshold': 'auto', 'stage3_max_live_parameters': 1000000000.0, 'stage3_max_reuse_distance': 1000000000.0, 'stage3_gather_16bit_weights_on_model_save': True}, 'gradient_accumulation_steps': 'auto', 'gradient_clipping': 'auto', 'steps_per_print': inf, 'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'wall_clock_breakdown': False, 'fp16': {'enabled': False}}
2024-08-03T04:43:27.662393153Z 
2024-08-03T04:43:27.663564752Z 08/02/2024 21:43:27 - INFO - __main__ - Distributed environment: DEEPSPEED  Backend: nccl
2024-08-03T04:43:27.663586706Z Num processes: 32
2024-08-03T04:43:27.663588956Z Process index: 4
2024-08-03T04:43:27.663590534Z Local process index: 4
2024-08-03T04:43:27.663592045Z Device: cuda:4
2024-08-03T04:43:27.663593543Z 
2024-08-03T04:43:27.663594908Z Mixed precision type: bf16
2024-08-03T04:43:27.663600583Z ds_config: {'bf16': {'enabled': True}, 'zero_optimization': {'stage': 3, 'overlap_comm': True, 'contiguous_gradients': True, 'sub_group_size': 1000000000.0, 'reduce_bucket_size': 'auto', 'stage3_prefetch_bucket_size': 'auto', 'stage3_param_persistence_threshold': 'auto', 'stage3_max_live_parameters': 1000000000.0, 'stage3_max_reuse_distance': 1000000000.0, 'stage3_gather_16bit_weights_on_model_save': True}, 'gradient_accumulation_steps': 'auto', 'gradient_clipping': 'auto', 'steps_per_print': inf, 'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'wall_clock_breakdown': False, 'fp16': {'enabled': False}}
2024-08-03T04:43:27.663604741Z 
2024-08-03T04:43:27.664858945Z 08/02/2024 21:43:27 - INFO - __main__ - Distributed environment: DEEPSPEED  Backend: nccl
2024-08-03T04:43:27.664870906Z Num processes: 32
2024-08-03T04:43:27.664872991Z Process index: 1
2024-08-03T04:43:27.664874428Z Local process index: 1
2024-08-03T04:43:27.664875972Z Device: cuda:1
2024-08-03T04:43:27.664877307Z 
2024-08-03T04:43:27.664878773Z Mixed precision type: bf16
2024-08-03T04:43:27.664880231Z ds_config: {'bf16': {'enabled': True}, 'zero_optimization': {'stage': 3, 'overlap_comm': True, 'contiguous_gradients': True, 'sub_group_size': 1000000000.0, 'reduce_bucket_size': 'auto', 'stage3_prefetch_bucket_size': 'auto', 'stage3_param_persistence_threshold': 'auto', 'stage3_max_live_parameters': 1000000000.0, 'stage3_max_reuse_distance': 1000000000.0, 'stage3_gather_16bit_weights_on_model_save': True}, 'gradient_accumulation_steps': 'auto', 'gradient_clipping': 'auto', 'steps_per_print': inf, 'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'wall_clock_breakdown': False, 'fp16': {'enabled': False}}
2024-08-03T04:43:27.664896707Z 
2024-08-03T04:43:27.665354325Z 08/02/2024 21:43:27 - INFO - __main__ - Distributed environment: DEEPSPEED  Backend: nccl
2024-08-03T04:43:27.665358137Z Num processes: 32
2024-08-03T04:43:27.665359581Z Process index: 3
2024-08-03T04:43:27.665361139Z Local process index: 3
2024-08-03T04:43:27.665362585Z Device: cuda:3
2024-08-03T04:43:27.665363953Z 
2024-08-03T04:43:27.665365342Z Mixed precision type: bf16
2024-08-03T04:43:27.665366789Z ds_config: {'bf16': {'enabled': True}, 'zero_optimization': {'stage': 3, 'overlap_comm': True, 'contiguous_gradients': True, 'sub_group_size': 1000000000.0, 'reduce_bucket_size': 'auto', 'stage3_prefetch_bucket_size': 'auto', 'stage3_param_persistence_threshold': 'auto', 'stage3_max_live_parameters': 1000000000.0, 'stage3_max_reuse_distance': 1000000000.0, 'stage3_gather_16bit_weights_on_model_save': True}, 'gradient_accumulation_steps': 'auto', 'gradient_clipping': 'auto', 'steps_per_print': inf, 'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'wall_clock_breakdown': False, 'fp16': {'enabled': False}}
2024-08-03T04:43:27.665370269Z 
2024-08-03T04:43:27.668912222Z jupiter-cs-aus-207:100:100 [5] NCCL INFO cudaDriverVersion 12010
2024-08-03T04:43:27.669406437Z jupiter-cs-aus-207:100:100 [5] NCCL INFO Bootstrap : Using ibs255:10.246.2.90<0>
2024-08-03T04:43:27.669601227Z jupiter-cs-aus-207:100:100 [5] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
2024-08-03T04:43:27.675842521Z jupiter-cs-aus-207:101:101 [6] NCCL INFO cudaDriverVersion 12010
2024-08-03T04:43:27.676352080Z jupiter-cs-aus-207:101:101 [6] NCCL INFO Bootstrap : Using ibs255:10.246.2.90<0>
2024-08-03T04:43:27.676546240Z jupiter-cs-aus-207:101:101 [6] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
2024-08-03T04:43:27.678066496Z jupiter-cs-aus-207:96:96 [1] NCCL INFO cudaDriverVersion 12010
2024-08-03T04:43:27.679075507Z jupiter-cs-aus-207:98:98 [3] NCCL INFO cudaDriverVersion 12010
2024-08-03T04:43:27.679422594Z jupiter-cs-aus-207:96:96 [1] NCCL INFO Bootstrap : Using ibs255:10.246.2.90<0>
2024-08-03T04:43:27.679595246Z jupiter-cs-aus-207:96:96 [1] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
2024-08-03T04:43:27.679644032Z jupiter-cs-aus-207:98:98 [3] NCCL INFO Bootstrap : Using ibs255:10.246.2.90<0>
2024-08-03T04:43:27.679848077Z jupiter-cs-aus-207:98:98 [3] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
2024-08-03T04:43:27.686827922Z jupiter-cs-aus-207:99:99 [4] NCCL INFO cudaDriverVersion 12010
2024-08-03T04:43:27.687344905Z jupiter-cs-aus-207:99:99 [4] NCCL INFO Bootstrap : Using ibs255:10.246.2.90<0>
2024-08-03T04:43:27.687560018Z jupiter-cs-aus-207:99:99 [4] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
2024-08-03T04:43:27.729187926Z jupiter-cs-aus-207:97:316 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_bond_0:1/RoCE [4]mlx5_3:1/IB [5]mlx5_4:1/IB [6]mlx5_5:1/IB [7]mlx5_6:1/IB [8]mlx5_7:1/IB [RO]; OOB ibs255:10.246.2.90<0>
2024-08-03T04:43:27.729200209Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:43:27.729202197Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Using network IB
2024-08-03T04:43:27.755389810Z jupiter-cs-aus-207:95:315 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_bond_0:1/RoCE [4]mlx5_3:1/IB [5]mlx5_4:1/IB [6]mlx5_5:1/IB [7]mlx5_6:1/IB [8]mlx5_7:1/IB [RO]; OOB ibs255:10.246.2.90<0>
2024-08-03T04:43:27.755401331Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:43:27.755403466Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Using network IB
2024-08-03T04:43:27.968475842Z jupiter-cs-aus-207:102:322 [7] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_bond_0:1/RoCE [4]mlx5_3:1/IB [5]mlx5_4:1/IB [6]mlx5_5:1/IB [7]mlx5_6:1/IB [8]mlx5_7:1/IB [RO]; OOB ibs255:10.246.2.90<0>
2024-08-03T04:43:27.968501288Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:43:27.968503866Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Using network IB
2024-08-03T04:43:28.025526284Z jupiter-cs-aus-207:100:327 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_bond_0:1/RoCE [4]mlx5_3:1/IB [5]mlx5_4:1/IB [6]mlx5_5:1/IB [7]mlx5_6:1/IB [8]mlx5_7:1/IB [RO]; OOB ibs255:10.246.2.90<0>
2024-08-03T04:43:28.025538911Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:43:28.025540897Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Using network IB
2024-08-03T04:43:28.047940278Z jupiter-cs-aus-207:98:333 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_bond_0:1/RoCE [4]mlx5_3:1/IB [5]mlx5_4:1/IB [6]mlx5_5:1/IB [7]mlx5_6:1/IB [8]mlx5_7:1/IB [RO]; OOB ibs255:10.246.2.90<0>
2024-08-03T04:43:28.047951929Z jupiter-cs-aus-207:99:335 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_bond_0:1/RoCE [4]mlx5_3:1/IB [5]mlx5_4:1/IB [6]mlx5_5:1/IB [7]mlx5_6:1/IB [8]mlx5_7:1/IB [RO]; OOB ibs255:10.246.2.90<0>
2024-08-03T04:43:28.047954264Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:43:28.047955818Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Using network IB
2024-08-03T04:43:28.047957139Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:43:28.047958670Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Using network IB
2024-08-03T04:43:28.071147854Z jupiter-cs-aus-207:101:331 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_bond_0:1/RoCE [4]mlx5_3:1/IB [5]mlx5_4:1/IB [6]mlx5_5:1/IB [7]mlx5_6:1/IB [8]mlx5_7:1/IB [RO]; OOB ibs255:10.246.2.90<0>
2024-08-03T04:43:28.071266101Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:43:28.071268960Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Using network IB
2024-08-03T04:43:28.072035792Z jupiter-cs-aus-207:96:332 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_bond_0:1/RoCE [4]mlx5_3:1/IB [5]mlx5_4:1/IB [6]mlx5_5:1/IB [7]mlx5_6:1/IB [8]mlx5_7:1/IB [RO]; OOB ibs255:10.246.2.90<0>
2024-08-03T04:43:28.072176330Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:43:28.072178253Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Using network IB
2024-08-03T04:43:28.972342005Z jupiter-cs-aus-207:95:315 [0] NCCL INFO comm 0xa3860f0 rank 0 nranks 32 cudaDev 0 nvmlDev 0 busId 18000 commId 0xe138317c315c387 - Init START
2024-08-03T04:43:28.972372270Z jupiter-cs-aus-207:96:332 [1] NCCL INFO comm 0xb44bbd0 rank 1 nranks 32 cudaDev 1 nvmlDev 1 busId 2a000 commId 0xe138317c315c387 - Init START
2024-08-03T04:43:28.972379248Z jupiter-cs-aus-207:98:333 [3] NCCL INFO comm 0xb892530 rank 3 nranks 32 cudaDev 3 nvmlDev 3 busId 5d000 commId 0xe138317c315c387 - Init START
2024-08-03T04:43:28.972381046Z jupiter-cs-aus-207:97:316 [2] NCCL INFO comm 0xab5c5a0 rank 2 nranks 32 cudaDev 2 nvmlDev 2 busId 3a000 commId 0xe138317c315c387 - Init START
2024-08-03T04:43:28.972382593Z jupiter-cs-aus-207:100:327 [5] NCCL INFO comm 0xa142060 rank 5 nranks 32 cudaDev 5 nvmlDev 5 busId 8b000 commId 0xe138317c315c387 - Init START
2024-08-03T04:43:28.972384755Z jupiter-cs-aus-207:99:335 [4] NCCL INFO comm 0xb068620 rank 4 nranks 32 cudaDev 4 nvmlDev 4 busId 84000 commId 0xe138317c315c387 - Init START
2024-08-03T04:43:28.972461428Z jupiter-cs-aus-207:101:331 [6] NCCL INFO comm 0x9d1ef00 rank 6 nranks 32 cudaDev 6 nvmlDev 6 busId 91000 commId 0xe138317c315c387 - Init START
2024-08-03T04:43:28.972575131Z jupiter-cs-aus-207:102:322 [7] NCCL INFO comm 0xa2679d0 rank 7 nranks 32 cudaDev 7 nvmlDev 7 busId e4000 commId 0xe138317c315c387 - Init START
2024-08-03T04:43:32.544584774Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Setting affinity for GPU 3 to ffff,fffffffd,00000000,0000ffff,fffffffd
2024-08-03T04:43:32.544603325Z jupiter-cs-aus-207:98:333 [3] NCCL INFO NVLS multicast support is not available on dev 3
2024-08-03T04:43:32.599564520Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Setting affinity for GPU 1 to ffff,fffffffd,00000000,0000ffff,fffffffd
2024-08-03T04:43:32.599576116Z jupiter-cs-aus-207:96:332 [1] NCCL INFO NVLS multicast support is not available on dev 1
2024-08-03T04:43:32.612546282Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Setting affinity for GPU 5 to ffffffff,fffe0000,00000000,ffffffff,fffe0000,00000000
2024-08-03T04:43:32.612557291Z jupiter-cs-aus-207:100:327 [5] NCCL INFO NVLS multicast support is not available on dev 5
2024-08-03T04:43:32.627282466Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Setting affinity for GPU 0 to ffff,fffffffd,00000000,0000ffff,fffffffd
2024-08-03T04:43:32.627293427Z jupiter-cs-aus-207:95:315 [0] NCCL INFO NVLS multicast support is not available on dev 0
2024-08-03T04:43:32.638831767Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Setting affinity for GPU 4 to ffffffff,fffe0000,00000000,ffffffff,fffe0000,00000000
2024-08-03T04:43:32.638854154Z jupiter-cs-aus-207:99:335 [4] NCCL INFO NVLS multicast support is not available on dev 4
2024-08-03T04:43:32.669438490Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Setting affinity for GPU 2 to ffff,fffffffd,00000000,0000ffff,fffffffd
2024-08-03T04:43:32.669950568Z jupiter-cs-aus-207:97:316 [2] NCCL INFO NVLS multicast support is not available on dev 2
2024-08-03T04:43:32.670726800Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Setting affinity for GPU 6 to ffffffff,fffe0000,00000000,ffffffff,fffe0000,00000000
2024-08-03T04:43:32.670728968Z jupiter-cs-aus-207:101:331 [6] NCCL INFO NVLS multicast support is not available on dev 6
2024-08-03T04:43:32.684470444Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Setting affinity for GPU 7 to ffffffff,fffe0000,00000000,ffffffff,fffe0000,00000000
2024-08-03T04:43:32.684481619Z jupiter-cs-aus-207:102:322 [7] NCCL INFO NVLS multicast support is not available on dev 7
2024-08-03T04:43:32.792936374Z jupiter-cs-aus-207:102:322 [7] NCCL INFO comm 0xa2679d0 rank 7 nRanks 32 nNodes 4 localRanks 8 localRank 7 MNNVL 0
2024-08-03T04:43:32.792948343Z jupiter-cs-aus-207:101:331 [6] NCCL INFO comm 0x9d1ef00 rank 6 nRanks 32 nNodes 4 localRanks 8 localRank 6 MNNVL 0
2024-08-03T04:43:32.792950439Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] 0/-1/-1->7->6 [2] 0/-1/-1->7->6 [3] 0/-1/-1->7->6 [4] 0/-1/-1->7->6 [5] 0/-1/-1->7->6 [6] 0/-1/-1->7->6 [7] 0/23/-1->7->-1 [8] -1/-1/-1->7->6 [9] 0/-1/-1->7->6 [10] 0/-1/-1->7->6 [11] 0/-1/-1->7->6 [12] 0/-1/-1->7->6 [13] 0/-1/-1->7->6 [14] 0/-1/-1->7->6 [15] 0/-1/-1->7->15
2024-08-03T04:43:32.792953808Z jupiter-cs-aus-207:102:322 [7] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:43:32.792955226Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/22/-1->6->-1 [7] -1/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->14 [15] -1/-1/-1->6->5
2024-08-03T04:43:32.792957514Z jupiter-cs-aus-207:101:331 [6] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:43:32.792959057Z jupiter-cs-aus-207:100:327 [5] NCCL INFO comm 0xa142060 rank 5 nRanks 32 nNodes 4 localRanks 8 localRank 5 MNNVL 0
2024-08-03T04:43:32.792960436Z jupiter-cs-aus-207:97:316 [2] NCCL INFO comm 0xab5c5a0 rank 2 nRanks 32 nNodes 4 localRanks 8 localRank 2 MNNVL 0
2024-08-03T04:43:32.792961946Z jupiter-cs-aus-207:96:332 [1] NCCL INFO comm 0xb44bbd0 rank 1 nRanks 32 nNodes 4 localRanks 8 localRank 1 MNNVL 0
2024-08-03T04:43:32.792963320Z jupiter-cs-aus-207:99:335 [4] NCCL INFO comm 0xb068620 rank 4 nRanks 32 nNodes 4 localRanks 8 localRank 4 MNNVL 0
2024-08-03T04:43:32.792965133Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/21/-1->5->-1 [6] -1/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->13 [14] -1/-1/-1->5->4 [15] 6/-1/-1->5->4
2024-08-03T04:43:32.792973166Z jupiter-cs-aus-207:100:327 [5] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:43:32.792975221Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/20/-1->4->-1 [5] -1/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->12 [13] -1/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3
2024-08-03T04:43:32.792977381Z jupiter-cs-aus-207:99:335 [4] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:43:32.792978746Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/17/-1->1->-1 [2] -1/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->9 [10] -1/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0
2024-08-03T04:43:32.792980843Z jupiter-cs-aus-207:96:332 [1] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:43:32.792982346Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/18/-1->2->-1 [3] -1/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->10 [11] -1/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1
2024-08-03T04:43:32.792984470Z jupiter-cs-aus-207:97:316 [2] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:43:32.792985806Z jupiter-cs-aus-207:95:315 [0] NCCL INFO comm 0xa3860f0 rank 0 nRanks 32 nNodes 4 localRanks 8 localRank 0 MNNVL 0
2024-08-03T04:43:32.792991178Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 00/16 :    0   7   6   5   4   3   2   1   9  10  11  12  13  14  15   8  16  23  22  21
2024-08-03T04:43:32.792992852Z jupiter-cs-aus-207:98:333 [3] NCCL INFO comm 0xb892530 rank 3 nRanks 32 nNodes 4 localRanks 8 localRank 3 MNNVL 0
2024-08-03T04:43:32.792994220Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 01/16 :    0   8  15  14  13  12  11  10   9  17  18  19  20  21  22  23  16  24  31  30
2024-08-03T04:43:32.792995647Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 02/16 :    0   7   6   5   4   3  11  12  13  14  15   8   9  10  18  17  16  23  22  21
2024-08-03T04:43:32.792997066Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 03/16 :    0   1   2  10   9   8  15  14  13  12  11  19  20  21  22  23  16  17  18  26
2024-08-03T04:43:32.792998661Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 04/16 :    0   7   6   5  13  14  15   8   9  10  11  12  20  19  18  17  16  23  22  21
2024-08-03T04:43:32.793000050Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 05/16 :    0   1   2   3   4  12  11  10   9   8  15  14  13  21  22  23  16  17  18  19
2024-08-03T04:43:32.793003099Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 06/16 :    0   7  15   8   9  10  11  12  13  14  22  21  20  19  18  17  16  23  31  24
2024-08-03T04:43:32.793004491Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 07/16 :    0   1   2   3   4   5   6  14  13  12  11  10   9   8  15  23  16  17  18  19
2024-08-03T04:43:32.793006080Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 08/16 :    0   7   6   5   4   3   2   1   9  10  11  12  13  14  15   8  16  23  22  21
2024-08-03T04:43:32.793007554Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 09/16 :    0   8  15  14  13  12  11  10   9  17  18  19  20  21  22  23  16  24  31  30
2024-08-03T04:43:32.793008968Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 10/16 :    0   7   6   5   4   3  11  12  13  14  15   8   9  10  18  17  16  23  22  21
2024-08-03T04:43:32.793010437Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 11/16 :    0   1   2  10   9   8  15  14  13  12  11  19  20  21  22  23  16  17  18  26
2024-08-03T04:43:32.793014114Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 12/16 :    0   7   6   5  13  14  15   8   9  10  11  12  20  19  18  17  16  23  22  21
2024-08-03T04:43:32.793015602Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 13/16 :    0   1   2   3   4  12  11  10   9   8  15  14  13  21  22  23  16  17  18  19
2024-08-03T04:43:32.793017023Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 14/16 :    0   7  15   8   9  10  11  12  13  14  22  21  20  19  18  17  16  23  31  24
2024-08-03T04:43:32.793018474Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 15/16 :    0   1   2   3   4   5   6  14  13  12  11  10   9   8  15  23  16  17  18  19
2024-08-03T04:43:32.793038901Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Trees [0] 1/16/-1->0->-1 [1] -1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->7 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7 [8] 1/-1/-1->0->8 [9] -1/-1/-1->0->7 [10] 1/-1/-1->0->7 [11] 1/-1/-1->0->7 [12] 1/-1/-1->0->7 [13] 1/-1/-1->0->7 [14] 1/-1/-1->0->7 [15] 1/-1/-1->0->7
2024-08-03T04:43:32.793041219Z jupiter-cs-aus-207:95:315 [0] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:43:32.793376777Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/19/-1->3->-1 [4] -1/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->11 [12] -1/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2
2024-08-03T04:43:32.793380403Z jupiter-cs-aus-207:98:333 [3] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:43:32.830072067Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:32.830674400Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:32.831588830Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:32.832464271Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:32.833355843Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:32.834228250Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:32.835700043Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:32.836453555Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:32.837187124Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:32.837218685Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:32.837949760Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:32.838046163Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:32.838466557Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:32.838614851Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:32.838912448Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:32.839233094Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:32.839574504Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:32.839894636Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:32.840314240Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:32.841759648Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:32.842228604Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:32.842689266Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:32.842930550Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:32.843121714Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:32.843517085Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:32.843671882Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:32.844065800Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:32.844161500Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:32.844510212Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:32.844665684Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:32.844691472Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:32.845144510Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:32.845368028Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:32.845525227Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:32.845676031Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:32.845843838Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:32.845990937Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:32.846158958Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:32.846304709Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:32.846408117Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:32.846467077Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:32.846630563Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:32.846718279Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:32.846793511Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:32.847027891Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:32.847048428Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 00/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:32.847230564Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:32.847292703Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 02/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:32.847546088Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:32.847609370Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 04/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:32.847862467Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:32.847927694Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 06/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:32.848251432Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 08/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:32.848467923Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 10/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:32.848684462Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 12/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:32.848909340Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 14/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:32.849354208Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 04/0 : 28[4] -> 4[4] [receive] via NET/IB/5/GDRDMA
2024-08-03T04:43:32.849413837Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 12/0 : 28[4] -> 4[4] [receive] via NET/IB/5/GDRDMA
2024-08-03T04:43:32.849464560Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 05/0 : 4[4] -> 12[4] [send] via NET/IB/5/GDRDMA
2024-08-03T04:43:32.849516966Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 13/0 : 4[4] -> 12[4] [send] via NET/IB/5/GDRDMA
2024-08-03T04:43:32.850979224Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 05/0 : 29[5] -> 5[5] [receive] via NET/IB/6/GDRDMA
2024-08-03T04:43:32.850981374Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 06/0 : 30[6] -> 6[6] [receive] via NET/IB/7/GDRDMA
2024-08-03T04:43:32.851041213Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 13/0 : 29[5] -> 5[5] [receive] via NET/IB/6/GDRDMA
2024-08-03T04:43:32.851042579Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 14/0 : 30[6] -> 6[6] [receive] via NET/IB/7/GDRDMA
2024-08-03T04:43:32.851102138Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 04/0 : 5[5] -> 13[5] [send] via NET/IB/6/GDRDMA
2024-08-03T04:43:32.851118144Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 07/0 : 6[6] -> 14[6] [send] via NET/IB/7/GDRDMA
2024-08-03T04:43:32.851147718Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 12/0 : 5[5] -> 13[5] [send] via NET/IB/6/GDRDMA
2024-08-03T04:43:32.851168383Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 15/0 : 6[6] -> 14[6] [send] via NET/IB/7/GDRDMA
2024-08-03T04:43:32.852150213Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 03/0 : 27[3] -> 3[3] [receive] via NET/IB/4/GDRDMA
2024-08-03T04:43:32.852151858Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 02/0 : 26[2] -> 2[2] [receive] via NET/IB/2/GDRDMA
2024-08-03T04:43:32.852207735Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 01/0 : 25[1] -> 1[1] [receive] via NET/IB/1/GDRDMA
2024-08-03T04:43:32.852209092Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 10/0 : 26[2] -> 2[2] [receive] via NET/IB/2/GDRDMA
2024-08-03T04:43:32.852228044Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 11/0 : 27[3] -> 3[3] [receive] via NET/IB/4/GDRDMA
2024-08-03T04:43:32.852268088Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 03/0 : 2[2] -> 10[2] [send] via NET/IB/2/GDRDMA
2024-08-03T04:43:32.852269708Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 09/0 : 25[1] -> 1[1] [receive] via NET/IB/1/GDRDMA
2024-08-03T04:43:32.852303137Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 02/0 : 3[3] -> 11[3] [send] via NET/IB/4/GDRDMA
2024-08-03T04:43:32.852323403Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 11/0 : 2[2] -> 10[2] [send] via NET/IB/2/GDRDMA
2024-08-03T04:43:32.852327306Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 00/0 : 1[1] -> 9[1] [send] via NET/IB/1/GDRDMA
2024-08-03T04:43:32.852373651Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 10/0 : 3[3] -> 11[3] [send] via NET/IB/4/GDRDMA
2024-08-03T04:43:32.852394553Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 08/0 : 1[1] -> 9[1] [send] via NET/IB/1/GDRDMA
2024-08-03T04:43:32.853307286Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 07/0 : 31[7] -> 7[7] [receive] via NET/IB/8/GDRDMA
2024-08-03T04:43:32.853308715Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 00/0 : 24[0] -> 0[0] [receive] via NET/IB/0/GDRDMA
2024-08-03T04:43:32.853370593Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 15/0 : 31[7] -> 7[7] [receive] via NET/IB/8/GDRDMA
2024-08-03T04:43:32.853391425Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 08/0 : 24[0] -> 0[0] [receive] via NET/IB/0/GDRDMA
2024-08-03T04:43:32.853427327Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 06/0 : 7[7] -> 15[7] [send] via NET/IB/8/GDRDMA
2024-08-03T04:43:32.853468375Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 01/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA
2024-08-03T04:43:32.853484169Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 14/0 : 7[7] -> 15[7] [send] via NET/IB/8/GDRDMA
2024-08-03T04:43:32.853542310Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 09/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA
2024-08-03T04:43:32.856674873Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:32.856795846Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:32.857614005Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:32.857691003Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:32.858471968Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:32.858541716Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:32.858700721Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:32.859304121Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:32.859305680Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:32.859362890Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:32.859509015Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:32.860071822Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:32.860153956Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:32.860235161Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:32.860395385Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:32.860691565Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:32.860903763Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:32.860970986Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:32.861011626Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:32.861095443Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:32.861206384Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:32.861492427Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:32.861526549Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:32.861761593Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:32.861851414Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:32.861944058Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:32.862018775Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:32.862068315Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:32.862372892Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:32.862393282Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:32.862693442Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:32.862755798Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:32.862878523Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:32.862922026Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:32.862977728Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:32.863276456Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:32.863296930Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:32.863526236Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:32.863650204Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:32.863864892Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:32.864118028Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:32.864139648Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:32.864175841Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:32.864288209Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:32.864341189Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:32.864531604Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:32.864813711Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:32.864870124Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:32.864943163Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:32.865076915Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:32.865126792Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:32.865579286Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:32.865708494Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:32.866324574Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:32.867015466Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:32.867660351Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.267311947Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Connected all rings
2024-08-03T04:43:33.270322788Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Connected all rings
2024-08-03T04:43:33.270325650Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Connected all rings
2024-08-03T04:43:33.270346582Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.271665539Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.272125719Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.272591236Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.272757041Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.272935379Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.274316294Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.274404360Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.274640517Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.274733119Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.274810806Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Connected all rings
2024-08-03T04:43:33.274824919Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Connected all rings
2024-08-03T04:43:33.274928124Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.275071953Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.275127951Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Connected all rings
2024-08-03T04:43:33.275153330Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Connected all rings
2024-08-03T04:43:33.275173938Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Connected all rings
2024-08-03T04:43:33.275412373Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:33.275613873Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.276357634Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:33.276469715Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.277110871Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:33.277235783Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.277869988Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:33.278032740Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.278558090Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:33.279183375Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:33.279875225Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.279972330Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 09/0 : 1[1] -> 9[1] [send] via NET/IB/1/GDRDMA
2024-08-03T04:43:33.280042745Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.280169758Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.280231626Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.280445746Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.280447554Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.280527365Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.280711710Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 01/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.280841200Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.280941555Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.280995897Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.281213264Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 03/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.281387414Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.281461227Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.281538603Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.281751072Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 05/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.281885536Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:33.281920299Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.282000102Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.282232581Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 07/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.282356528Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:33.282428569Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.282449302Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.282644433Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 09/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.282791490Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:33.282863626Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.282923723Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.283049998Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 11/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.283238974Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:33.283288544Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.283352485Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.283480611Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 13/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.283662571Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:33.283712580Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.283903165Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 15/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:43:33.283906767Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 01/0 : 17[1] -> 1[1] [receive] via NET/IB/1/GDRDMA
2024-08-03T04:43:33.283940273Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 14/0 : 6[6] -> 14[6] [send] via NET/IB/7/GDRDMA
2024-08-03T04:43:33.283958758Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 01/0 : 1[1] -> 17[1] [send] via NET/IB/1/GDRDMA
2024-08-03T04:43:33.284074675Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:33.284244026Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 09/0 : 9[1] -> 1[1] [receive] via NET/IB/1/GDRDMA
2024-08-03T04:43:33.284245824Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 13/0 : 5[5] -> 13[5] [send] via NET/IB/6/GDRDMA
2024-08-03T04:43:33.284525376Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 06/0 : 22[6] -> 6[6] [receive] via NET/IB/7/GDRDMA
2024-08-03T04:43:33.284574278Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 06/0 : 6[6] -> 22[6] [send] via NET/IB/7/GDRDMA
2024-08-03T04:43:33.284575798Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 10/0 : 2[2] -> 10[2] [send] via NET/IB/2/GDRDMA
2024-08-03T04:43:33.284619483Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 12/0 : 4[4] -> 12[4] [send] via NET/IB/5/GDRDMA
2024-08-03T04:43:33.284669736Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 11/0 : 3[3] -> 11[3] [send] via NET/IB/4/GDRDMA
2024-08-03T04:43:33.285146058Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 05/0 : 21[5] -> 5[5] [receive] via NET/IB/6/GDRDMA
2024-08-03T04:43:33.285191044Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 05/0 : 5[5] -> 21[5] [send] via NET/IB/6/GDRDMA
2024-08-03T04:43:33.285786502Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 04/0 : 20[4] -> 4[4] [receive] via NET/IB/5/GDRDMA
2024-08-03T04:43:33.285833412Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 04/0 : 4[4] -> 20[4] [send] via NET/IB/5/GDRDMA
2024-08-03T04:43:33.285858088Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 03/0 : 19[3] -> 3[3] [receive] via NET/IB/4/GDRDMA
2024-08-03T04:43:33.285915697Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 03/0 : 3[3] -> 19[3] [send] via NET/IB/4/GDRDMA
2024-08-03T04:43:33.285917221Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 02/0 : 18[2] -> 2[2] [receive] via NET/IB/2/GDRDMA
2024-08-03T04:43:33.285985574Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 02/0 : 2[2] -> 18[2] [send] via NET/IB/2/GDRDMA
2024-08-03T04:43:33.286197456Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 15/0 : 7[7] -> 15[7] [send] via NET/IB/8/GDRDMA
2024-08-03T04:43:33.286199170Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 08/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA
2024-08-03T04:43:33.286269492Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 10/0 : 10[2] -> 2[2] [receive] via NET/IB/2/GDRDMA
2024-08-03T04:43:33.286374587Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 14/0 : 14[6] -> 6[6] [receive] via NET/IB/7/GDRDMA
2024-08-03T04:43:33.286376226Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:33.286578030Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:33.286783262Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:33.287006082Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:33.287265615Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:33.287593852Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:33.287912974Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:33.288238812Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:33.288801012Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.289012264Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 00/0 : 16[0] -> 0[0] [receive] via NET/IB/0/GDRDMA
2024-08-03T04:43:33.289014123Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.289065557Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 00/0 : 0[0] -> 16[0] [send] via NET/IB/0/GDRDMA
2024-08-03T04:43:33.289191315Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 07/0 : 23[7] -> 7[7] [receive] via NET/IB/8/GDRDMA
2024-08-03T04:43:33.289211740Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Channel 08/0 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA
2024-08-03T04:43:33.289238205Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 07/0 : 7[7] -> 23[7] [send] via NET/IB/8/GDRDMA
2024-08-03T04:43:33.289273430Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.289385998Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 15/0 : 15[7] -> 7[7] [receive] via NET/IB/8/GDRDMA
2024-08-03T04:43:33.289473262Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.289515603Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.289798089Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:33.289799731Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.289864543Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.290217185Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:33.290280613Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.290306907Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.290665449Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:33.290862477Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:33.291026609Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:33.291174822Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:43:33.292006574Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.292251540Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.292253216Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.292526712Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.292549235Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:43:33.292760783Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.292948240Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:43:33.293014078Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 13/0 : 13[5] -> 5[5] [receive] via NET/IB/6/GDRDMA
2024-08-03T04:43:33.293179900Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 11/0 : 11[3] -> 3[3] [receive] via NET/IB/4/GDRDMA
2024-08-03T04:43:33.293782156Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 12/0 : 12[4] -> 4[4] [receive] via NET/IB/5/GDRDMA
2024-08-03T04:43:33.294198063Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.297003595Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.297300663Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.297709616Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.297873879Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:33.298172494Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.298279245Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:33.298605929Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:33.298618828Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.298792210Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:33.298996835Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:33.299043075Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.299227474Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:33.299552873Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:33.299588024Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:43:33.299805082Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:33.300099402Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:33.300348795Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:33.300667041Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:33.300993931Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:43:33.301209464Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:33.301657381Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:33.304301186Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.304392555Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:33.305489843Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.305549199Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:43:33.305819902Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.309328506Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.309806642Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.318086858Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.318459236Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.321010199Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:43:33.707564878Z jupiter-cs-aus-207:97:316 [2] NCCL INFO Connected all trees
2024-08-03T04:43:33.707579008Z jupiter-cs-aus-207:97:316 [2] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:43:33.707581188Z jupiter-cs-aus-207:97:316 [2] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:43:33.707949990Z jupiter-cs-aus-207:98:333 [3] NCCL INFO Connected all trees
2024-08-03T04:43:33.707973229Z jupiter-cs-aus-207:98:333 [3] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:43:33.707975145Z jupiter-cs-aus-207:98:333 [3] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:43:33.710594052Z jupiter-cs-aus-207:99:335 [4] NCCL INFO Connected all trees
2024-08-03T04:43:33.710604928Z jupiter-cs-aus-207:99:335 [4] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:43:33.710606707Z jupiter-cs-aus-207:99:335 [4] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:43:33.710657362Z jupiter-cs-aus-207:100:327 [5] NCCL INFO Connected all trees
2024-08-03T04:43:33.710671980Z jupiter-cs-aus-207:100:327 [5] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:43:33.710674256Z jupiter-cs-aus-207:100:327 [5] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:43:33.712723137Z jupiter-cs-aus-207:101:331 [6] NCCL INFO Connected all trees
2024-08-03T04:43:33.712726354Z jupiter-cs-aus-207:102:322 [7] NCCL INFO Connected all trees
2024-08-03T04:43:33.712727874Z jupiter-cs-aus-207:101:331 [6] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:43:33.712729293Z jupiter-cs-aus-207:101:331 [6] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:43:33.712731005Z jupiter-cs-aus-207:102:322 [7] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:43:33.712732656Z jupiter-cs-aus-207:102:322 [7] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:43:33.712750851Z jupiter-cs-aus-207:95:315 [0] NCCL INFO Connected all trees
2024-08-03T04:43:33.712754660Z jupiter-cs-aus-207:96:332 [1] NCCL INFO Connected all trees
2024-08-03T04:43:33.712767048Z jupiter-cs-aus-207:96:332 [1] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:43:33.712770587Z jupiter-cs-aus-207:96:332 [1] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:43:33.712835530Z jupiter-cs-aus-207:95:315 [0] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:43:33.712840667Z jupiter-cs-aus-207:95:315 [0] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:43:33.829534976Z jupiter-cs-aus-207:100:327 [5] NCCL INFO comm 0xa142060 rank 5 nranks 32 cudaDev 5 nvmlDev 5 busId 8b000 commId 0xe138317c315c387 - Init COMPLETE
2024-08-03T04:43:33.829558428Z jupiter-cs-aus-207:102:322 [7] NCCL INFO comm 0xa2679d0 rank 7 nranks 32 cudaDev 7 nvmlDev 7 busId e4000 commId 0xe138317c315c387 - Init COMPLETE
2024-08-03T04:43:33.829560597Z jupiter-cs-aus-207:98:333 [3] NCCL INFO comm 0xb892530 rank 3 nranks 32 cudaDev 3 nvmlDev 3 busId 5d000 commId 0xe138317c315c387 - Init COMPLETE
2024-08-03T04:43:33.829562070Z jupiter-cs-aus-207:96:332 [1] NCCL INFO comm 0xb44bbd0 rank 1 nranks 32 cudaDev 1 nvmlDev 1 busId 2a000 commId 0xe138317c315c387 - Init COMPLETE
2024-08-03T04:43:33.829563740Z jupiter-cs-aus-207:99:335 [4] NCCL INFO comm 0xb068620 rank 4 nranks 32 cudaDev 4 nvmlDev 4 busId 84000 commId 0xe138317c315c387 - Init COMPLETE
2024-08-03T04:43:33.829565272Z jupiter-cs-aus-207:95:315 [0] NCCL INFO comm 0xa3860f0 rank 0 nranks 32 cudaDev 0 nvmlDev 0 busId 18000 commId 0xe138317c315c387 - Init COMPLETE
2024-08-03T04:43:33.829580247Z jupiter-cs-aus-207:101:331 [6] NCCL INFO comm 0x9d1ef00 rank 6 nranks 32 cudaDev 6 nvmlDev 6 busId 91000 commId 0xe138317c315c387 - Init COMPLETE
2024-08-03T04:43:33.829581806Z jupiter-cs-aus-207:97:316 [2] NCCL INFO comm 0xab5c5a0 rank 2 nranks 32 cudaDev 2 nvmlDev 2 busId 3a000 commId 0xe138317c315c387 - Init COMPLETE
2024-08-03T04:43:34.587772667Z 
Downloading readme:   0%|          | 0.00/2.04k [00:00<?, ?B/s]
Downloading readme: 100%|██████████| 2.04k/2.04k [00:00<00:00, 18.3MB/s]
2024-08-03T04:43:39.513429198Z 
Downloading data:   0%|          | 0.00/222M [00:00<?, ?B/s]
Downloading data:   5%|▍         | 10.5M/222M [00:00<00:05, 38.5MB/s]
Downloading data:   9%|▉         | 21.0M/222M [00:00<00:03, 58.8MB/s]
Downloading data:  19%|█▉        | 41.9M/222M [00:00<00:02, 89.8MB/s]
Downloading data:  28%|██▊       | 62.9M/222M [00:00<00:01, 106MB/s] 
Downloading data:  38%|███▊      | 83.9M/222M [00:00<00:01, 101MB/s]
Downloading data:  47%|████▋     | 105M/222M [00:01<00:01, 99.5MB/s]
Downloading data:  52%|█████▏    | 115M/222M [00:01<00:01, 98.5MB/s]
Downloading data:  57%|█████▋    | 126M/222M [00:01<00:00, 98.4MB/s]
Downloading data:  61%|██████▏   | 136M/222M [00:01<00:00, 96.7MB/s]
Downloading data:  66%|██████▌   | 147M/222M [00:01<00:00, 97.5MB/s]
Downloading data:  71%|███████   | 157M/222M [00:01<00:00, 97.9MB/s]
Downloading data:  76%|███████▌  | 168M/222M [00:01<00:00, 98.8MB/s]
Downloading data:  80%|████████  | 178M/222M [00:01<00:00, 98.3MB/s]
Downloading data:  85%|████████▌ | 189M/222M [00:01<00:00, 98.9MB/s]
Downloading data:  90%|████████▉ | 199M/222M [00:02<00:00, 99.8MB/s]
Downloading data:  94%|█████████▍| 210M/222M [00:02<00:00, 99.5MB/s]
Downloading data:  99%|█████████▉| 220M/222M [00:02<00:00, 99.6MB/s]
Downloading data: 100%|██████████| 222M/222M [00:02<00:00, 90.6MB/s]
2024-08-03T04:43:41.971215825Z 
Downloading data:   0%|          | 0.00/211M [00:00<?, ?B/s]
Downloading data:   5%|▍         | 10.5M/211M [00:00<00:05, 39.6MB/s]
Downloading data:  10%|▉         | 21.0M/211M [00:00<00:03, 61.6MB/s]
Downloading data:  20%|█▉        | 41.9M/211M [00:00<00:01, 93.1MB/s]
Downloading data:  30%|██▉       | 62.9M/211M [00:00<00:01, 120MB/s] 
Downloading data:  40%|███▉      | 83.9M/211M [00:00<00:00, 144MB/s]
Downloading data:  50%|████▉     | 105M/211M [00:00<00:00, 158MB/s] 
Downloading data:  60%|█████▉    | 126M/211M [00:00<00:00, 172MB/s]
Downloading data:  70%|██████▉   | 147M/211M [00:01<00:00, 180MB/s]
Downloading data:  80%|███████▉  | 168M/211M [00:01<00:00, 178MB/s]
Downloading data:  90%|████████▉ | 189M/211M [00:01<00:00, 156MB/s]
Downloading data: 100%|█████████▉| 210M/211M [00:01<00:00, 148MB/s]
Downloading data: 100%|██████████| 211M/211M [00:01<00:00, 129MB/s]
2024-08-03T04:43:44.119175345Z 
Downloading data:   0%|          | 0.00/210M [00:00<?, ?B/s]
Downloading data:   5%|▌         | 10.5M/210M [00:00<00:03, 52.4MB/s]
Downloading data:  15%|█▌        | 31.5M/210M [00:00<00:01, 105MB/s] 
Downloading data:  30%|███       | 62.9M/210M [00:00<00:00, 152MB/s]
Downloading data:  45%|████▌     | 94.4M/210M [00:00<00:00, 177MB/s]
Downloading data:  60%|██████    | 126M/210M [00:00<00:00, 193MB/s] 
Downloading data:  70%|███████   | 147M/210M [00:00<00:00, 194MB/s]
Downloading data:  80%|████████  | 168M/210M [00:00<00:00, 196MB/s]
Downloading data:  90%|█████████ | 189M/210M [00:01<00:00, 194MB/s]
Downloading data: 100%|██████████| 210M/210M [00:01<00:00, 185MB/s]
Downloading data: 100%|██████████| 210M/210M [00:01<00:00, 159MB/s]
2024-08-03T04:43:47.247478005Z 
Downloading data:   0%|          | 0.00/210M [00:00<?, ?B/s]
Downloading data:   5%|▌         | 10.5M/210M [00:00<00:07, 25.6MB/s]
Downloading data:  10%|█         | 21.0M/210M [00:00<00:04, 41.7MB/s]
Downloading data:  15%|█▌        | 31.5M/210M [00:00<00:03, 57.2MB/s]
Downloading data:  25%|██▌       | 52.4M/210M [00:00<00:01, 83.1MB/s]
Downloading data:  35%|███▌      | 73.4M/210M [00:00<00:01, 107MB/s] 
Downloading data:  45%|████▌     | 94.4M/210M [00:01<00:00, 131MB/s]
Downloading data:  60%|██████    | 126M/210M [00:01<00:00, 155MB/s] 
Downloading data:  70%|███████   | 147M/210M [00:01<00:00, 129MB/s]
Downloading data:  80%|████████  | 168M/210M [00:01<00:00, 115MB/s]
Downloading data:  90%|█████████ | 189M/210M [00:01<00:00, 108MB/s]
Downloading data: 100%|██████████| 210M/210M [00:02<00:00, 92.5MB/s]
Downloading data: 100%|██████████| 210M/210M [00:02<00:00, 91.6MB/s]
2024-08-03T04:43:50.883249257Z 
Generating train split:   0%|          | 0/608042 [00:00<?, ? examples/s]
Generating train split:   2%|▏         | 11000/608042 [00:00<00:06, 99378.46 examples/s]
Generating train split:   4%|▍         | 26000/608042 [00:00<00:04, 122756.64 examples/s]
Generating train split:   7%|▋         | 44000/608042 [00:00<00:03, 146027.44 examples/s]
Generating train split:  10%|█         | 62000/608042 [00:00<00:03, 158294.46 examples/s]
Generating train split:  13%|█▎        | 80000/608042 [00:00<00:03, 163958.53 examples/s]
Generating train split:  16%|█▌        | 98000/608042 [00:00<00:03, 166809.03 examples/s]
Generating train split:  19%|█▉        | 116000/608042 [00:00<00:02, 168864.39 examples/s]
Generating train split:  22%|██▏       | 134000/608042 [00:00<00:02, 169266.86 examples/s]
Generating train split:  25%|██▍       | 152000/608042 [00:00<00:02, 168823.60 examples/s]
Generating train split:  28%|██▊       | 170011/608042 [00:01<00:02, 166231.87 examples/s]
Generating train split:  31%|███       | 188011/608042 [00:01<00:02, 167590.62 examples/s]
Generating train split:  34%|███▍      | 206011/608042 [00:01<00:02, 168832.14 examples/s]
Generating train split:  37%|███▋      | 224011/608042 [00:01<00:02, 169363.67 examples/s]
Generating train split:  40%|███▉      | 242011/608042 [00:01<00:02, 168702.95 examples/s]
Generating train split:  43%|████▎     | 260011/608042 [00:01<00:02, 168713.21 examples/s]
Generating train split:  46%|████▌     | 278011/608042 [00:01<00:01, 169707.89 examples/s]
Generating train split:  49%|████▊     | 296011/608042 [00:01<00:01, 169002.32 examples/s]
Generating train split:  52%|█████▏    | 314022/608042 [00:01<00:01, 170247.01 examples/s]
Generating train split:  55%|█████▍    | 332022/608042 [00:02<00:01, 170028.00 examples/s]
Generating train split:  58%|█████▊    | 350022/608042 [00:02<00:01, 171094.16 examples/s]
Generating train split:  61%|██████    | 368022/608042 [00:02<00:01, 171867.21 examples/s]
Generating train split:  63%|██████▎   | 386022/608042 [00:02<00:01, 170950.89 examples/s]
Generating train split:  66%|██████▋   | 404022/608042 [00:02<00:01, 170282.73 examples/s]
Generating train split:  69%|██████▉   | 422022/608042 [00:02<00:01, 171092.29 examples/s]
Generating train split:  72%|███████▏  | 440022/608042 [00:02<00:00, 172059.55 examples/s]
Generating train split:  75%|███████▌  | 458032/608042 [00:02<00:00, 169808.83 examples/s]
Generating train split:  78%|███████▊  | 475032/608042 [00:02<00:00, 166956.02 examples/s]
Generating train split:  81%|████████  | 492032/608042 [00:02<00:00, 166483.79 examples/s]
Generating train split:  84%|████████▍ | 510032/608042 [00:03<00:00, 167740.14 examples/s]
Generating train split:  87%|████████▋ | 528032/608042 [00:03<00:00, 168550.26 examples/s]
Generating train split:  90%|████████▉ | 546032/608042 [00:03<00:00, 169974.54 examples/s]
Generating train split:  93%|█████████▎| 564032/608042 [00:03<00:00, 171290.38 examples/s]
Generating train split:  96%|█████████▌| 582032/608042 [00:03<00:00, 171028.52 examples/s]
Generating train split:  99%|█████████▊| 600032/608042 [00:03<00:00, 171583.80 examples/s]
Generating train split: 100%|██████████| 608042/608042 [00:03<00:00, 167417.26 examples/s]
2024-08-03T04:43:51.096969636Z loading configuration file config.json from cache at ./cache/models--allenai--OLMoE-7B-A1B/snapshots/18050db312a6c9714b636537149672b0d77df3e1/config.json
2024-08-03T04:43:51.097691739Z Model config OlmoeConfig {
2024-08-03T04:43:51.097693395Z   "_name_or_path": "allenai/OLMoE-7B-A1B",
2024-08-03T04:43:51.097694974Z   "architectures": [
2024-08-03T04:43:51.097696283Z     "OlmoeForCausalLM"
2024-08-03T04:43:51.097697675Z   ],
2024-08-03T04:43:51.097698934Z   "attention_bias": false,
2024-08-03T04:43:51.097700321Z   "attention_dropout": 0.0,
2024-08-03T04:43:51.097701888Z   "clip_qkv": null,
2024-08-03T04:43:51.097703250Z   "eos_token_id": 50279,
2024-08-03T04:43:51.097704774Z   "hidden_act": "silu",
2024-08-03T04:43:51.097706143Z   "hidden_size": 2048,
2024-08-03T04:43:51.097707463Z   "initializer_range": 0.02,
2024-08-03T04:43:51.097708764Z   "intermediate_size": 1024,
2024-08-03T04:43:51.097710041Z   "max_position_embeddings": 4096,
2024-08-03T04:43:51.097711390Z   "model_type": "olmoe",
2024-08-03T04:43:51.097712757Z   "norm_topk_prob": false,
2024-08-03T04:43:51.097714219Z   "num_attention_heads": 16,
2024-08-03T04:43:51.097715532Z   "num_experts": 64,
2024-08-03T04:43:51.097716785Z   "num_experts_per_tok": 8,
2024-08-03T04:43:51.097718073Z   "num_hidden_layers": 16,
2024-08-03T04:43:51.097719358Z   "num_key_value_heads": 16,
2024-08-03T04:43:51.097720626Z   "output_router_logits": true,
2024-08-03T04:43:51.097721933Z   "pad_token_id": 1,
2024-08-03T04:43:51.097723198Z   "rope_scaling": null,
2024-08-03T04:43:51.097727114Z   "rope_theta": 10000.0,
2024-08-03T04:43:51.097728511Z   "router_aux_loss_coef": 0.001,
2024-08-03T04:43:51.097729835Z   "tie_word_embeddings": false,
2024-08-03T04:43:51.097731148Z   "torch_dtype": "bfloat16",
2024-08-03T04:43:51.097732503Z   "transformers_version": "4.43.0.dev0",
2024-08-03T04:43:51.097733843Z   "use_cache": true,
2024-08-03T04:43:51.097735139Z   "vocab_size": 50304
2024-08-03T04:43:51.097736450Z }
2024-08-03T04:43:51.097737719Z 
2024-08-03T04:43:52.020450281Z loading file vocab.json from cache at None
2024-08-03T04:43:52.020472567Z loading file merges.txt from cache at None
2024-08-03T04:43:52.020474819Z loading file tokenizer.json from cache at ./cache/models--allenai--OLMoE-7B-A1B/snapshots/18050db312a6c9714b636537149672b0d77df3e1/tokenizer.json
2024-08-03T04:43:52.020476862Z loading file added_tokens.json from cache at None
2024-08-03T04:43:52.020478409Z loading file special_tokens_map.json from cache at ./cache/models--allenai--OLMoE-7B-A1B/snapshots/18050db312a6c9714b636537149672b0d77df3e1/special_tokens_map.json
2024-08-03T04:43:52.020480140Z loading file tokenizer_config.json from cache at ./cache/models--allenai--OLMoE-7B-A1B/snapshots/18050db312a6c9714b636537149672b0d77df3e1/tokenizer_config.json
2024-08-03T04:43:52.366323069Z 
Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]
Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]
Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]
Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]
Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]loading weights file model.safetensors from cache at ./cache/models--allenai--OLMoE-7B-A1B/snapshots/18050db312a6c9714b636537149672b0d77df3e1/model.safetensors.index.json
2024-08-03T04:44:36.880775423Z 
Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]
Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]
Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]
Downloading shards:  33%|███▎      | 1/3 [00:16<00:33, 17.00s/it]
Downloading shards:  33%|███▎      | 1/3 [00:16<00:33, 16.98s/it]
Downloading shards:  33%|███▎      | 1/3 [00:16<00:33, 16.99s/it]
Downloading shards:  33%|███▎      | 1/3 [00:16<00:33, 17.00s/it]
Downloading shards:  33%|███▎      | 1/3 [00:16<00:33, 16.99s/it]
Downloading shards:  33%|███▎      | 1/3 [00:16<00:33, 17.00s/it]
Downloading shards:  33%|███▎      | 1/3 [00:16<00:33, 17.00s/it]
Downloading shards:  33%|███▎      | 1/3 [00:17<00:34, 17.05s/it]
Downloading shards:  67%|██████▋   | 2/3 [00:32<00:16, 16.22s/it]
Downloading shards:  67%|██████▋   | 2/3 [00:32<00:16, 16.23s/it]
Downloading shards:  67%|██████▋   | 2/3 [00:32<00:16, 16.22s/it]
Downloading shards:  67%|██████▋   | 2/3 [00:32<00:16, 16.23s/it]
Downloading shards:  67%|██████▋   | 2/3 [00:32<00:16, 16.23s/it]
Downloading shards:  67%|██████▋   | 2/3 [00:32<00:16, 16.23s/it]
Downloading shards:  67%|██████▋   | 2/3 [00:32<00:16, 16.23s/it]
Downloading shards:  67%|██████▋   | 2/3 [00:32<00:16, 16.23s/it]
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.23s/it]
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.85s/it]
2024-08-03T04:44:36.893497833Z 
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.23s/it]
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.85s/it]
2024-08-03T04:44:36.897124526Z 
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.23s/it]
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.84s/it]
2024-08-03T04:44:36.897554051Z 
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.23s/it]
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.84s/it]
2024-08-03T04:44:36.903698548Z 
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.23s/it]
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.85s/it]
2024-08-03T04:44:36.904515410Z 
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.23s/it]
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.85s/it]
2024-08-03T04:44:36.913398714Z 
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.24s/it]
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.86s/it]
2024-08-03T04:44:36.947496528Z 
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.25s/it]
Downloading shards: 100%|██████████| 3/3 [00:44<00:00, 14.86s/it]
2024-08-03T04:44:36.947610947Z Detected DeepSpeed ZeRO-3: activating zero.init() for this model
2024-08-03T04:44:36.949377948Z The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
2024-08-03T04:44:36.950575625Z You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
2024-08-03T04:44:36.950662210Z You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
2024-08-03T04:44:36.956833095Z Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in OlmoeForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
2024-08-03T04:44:36.957238663Z Generate config GenerationConfig {
2024-08-03T04:44:36.957240387Z   "eos_token_id": 50279,
2024-08-03T04:44:36.957241894Z   "pad_token_id": 1
2024-08-03T04:44:36.957243083Z }
2024-08-03T04:44:36.957244247Z 
2024-08-03T04:44:36.958325051Z Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in OlmoeModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
2024-08-03T04:44:38.267058426Z [2024-08-02 21:44:38,266] [INFO] [partition_parameters.py:345:__exit__] finished initializing model - num_params = 3219, num_elems = 6.92B
2024-08-03T04:44:48.574763432Z 
Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:  33%|███▎      | 1/3 [00:02<00:05,  2.84s/it]
Loading checkpoint shards:  33%|███▎      | 1/3 [00:02<00:05,  2.77s/it]
Loading checkpoint shards:  33%|███▎      | 1/3 [00:02<00:05,  2.74s/it]
Loading checkpoint shards:  33%|███▎      | 1/3 [00:02<00:05,  2.74s/it]
Loading checkpoint shards:  33%|███▎      | 1/3 [00:02<00:05,  2.76s/it]
Loading checkpoint shards:  33%|███▎      | 1/3 [00:02<00:05,  2.81s/it]
Loading checkpoint shards:  33%|███▎      | 1/3 [00:02<00:05,  2.84s/it]
Loading checkpoint shards:  33%|███▎      | 1/3 [00:03<00:06,  3.09s/it]
Loading checkpoint shards:  67%|██████▋   | 2/3 [00:05<00:02,  2.92s/it]
Loading checkpoint shards:  67%|██████▋   | 2/3 [00:05<00:02,  2.91s/it]
Loading checkpoint shards:  67%|██████▋   | 2/3 [00:05<00:02,  2.90s/it]
Loading checkpoint shards:  67%|██████▋   | 2/3 [00:05<00:02,  2.92s/it]
Loading checkpoint shards:  67%|██████▋   | 2/3 [00:05<00:02,  2.91s/it]
Loading checkpoint shards:  67%|██████▋   | 2/3 [00:05<00:02,  2.91s/it]
Loading checkpoint shards:  67%|██████▋   | 2/3 [00:05<00:02,  2.93s/it]
Loading checkpoint shards:  67%|██████▋   | 2/3 [00:06<00:03,  3.04s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.61s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.68s/it]
2024-08-03T04:44:48.590360646Z 
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.66s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.72s/it]
2024-08-03T04:44:48.590683383Z 
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.62s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.68s/it]
2024-08-03T04:44:48.596579088Z 
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.63s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.69s/it]
2024-08-03T04:44:48.600686722Z 
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.63s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.70s/it]
2024-08-03T04:44:48.605181732Z 
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.62s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.68s/it]
2024-08-03T04:44:48.718400859Z 
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.68s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.74s/it]
2024-08-03T04:44:48.830150769Z 
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.67s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.77s/it]
2024-08-03T04:44:48.830247749Z All model checkpoint weights were used when initializing OlmoeForCausalLM.
2024-08-03T04:44:48.830250858Z 
2024-08-03T04:44:48.830266187Z All the weights of OlmoeForCausalLM were initialized from the model checkpoint at allenai/OLMoE-7B-A1B.
2024-08-03T04:44:48.830267818Z If your task is similar to the task the model of the checkpoint was trained on, you can already use OlmoeForCausalLM for predictions without further training.
2024-08-03T04:44:48.921571825Z loading configuration file generation_config.json from cache at ./cache/models--allenai--OLMoE-7B-A1B/snapshots/18050db312a6c9714b636537149672b0d77df3e1/generation_config.json
2024-08-03T04:44:48.921712075Z Generate config GenerationConfig {
2024-08-03T04:44:48.921714356Z   "eos_token_id": 50279,
2024-08-03T04:44:48.921716096Z   "pad_token_id": 1
2024-08-03T04:44:48.921717699Z }
2024-08-03T04:44:48.921719000Z 
2024-08-03T04:45:03.832268751Z 
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 0/608042 [00:00<?, ? examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 6/608042 [00:00<4:50:40, 34.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 45/608042 [00:00<53:35, 189.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 211/608042 [00:00<13:27, 752.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 398/608042 [00:00<08:51, 1143.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 693/608042 [00:00<05:56, 1703.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 909/608042 [00:00<05:32, 1824.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1238/608042 [00:00<04:47, 2112.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1585/608042 [00:00<04:04, 2481.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1875/608042 [00:01<03:55, 2576.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2204/608042 [00:01<03:38, 2778.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2497/608042 [00:01<03:52, 2599.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2766/608042 [00:01<04:21, 2311.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 3022/608042 [00:01<04:16, 2355.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3287/608042 [00:01<04:08, 2432.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3545/608042 [00:01<04:06, 2454.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3823/608042 [00:01<03:57, 2541.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4089/608042 [00:01<04:06, 2453.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4355/608042 [00:02<04:07, 2441.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4745/608042 [00:02<03:32, 2844.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5043/608042 [00:02<03:35, 2793.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5331/608042 [00:02<03:47, 2651.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5602/608042 [00:02<03:51, 2607.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5872/608042 [00:02<03:53, 2583.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6133/608042 [00:02<04:04, 2465.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6390/608042 [00:02<04:05, 2447.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6642/608042 [00:02<04:10, 2397.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6883/608042 [00:03<04:25, 2267.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7169/608042 [00:03<04:09, 2410.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7423/608042 [00:03<04:13, 2372.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7687/608042 [00:03<04:05, 2446.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7939/608042 [00:03<04:07, 2423.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8234/608042 [00:03<03:53, 2572.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8526/608042 [00:03<03:45, 2658.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8795/608042 [00:03<04:17, 2324.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 9047/608042 [00:03<04:18, 2317.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9373/608042 [00:04<03:53, 2568.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9644/608042 [00:04<03:55, 2536.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9919/608042 [00:04<03:55, 2537.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10176/608042 [00:04<03:56, 2528.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10436/608042 [00:04<04:02, 2463.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10691/608042 [00:04<04:26, 2239.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10944/608042 [00:04<04:21, 2285.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11269/608042 [00:04<03:58, 2502.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11528/608042 [00:04<04:06, 2417.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11790/608042 [00:05<04:05, 2431.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12036/608042 [00:05<04:15, 2329.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12327/608042 [00:05<04:00, 2481.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12580/608042 [00:05<04:13, 2349.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12846/608042 [00:05<04:07, 2404.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13100/608042 [00:05<04:04, 2436.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13378/608042 [00:05<03:55, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13636/608042 [00:05<04:06, 2412.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13884/608042 [00:05<04:05, 2418.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14163/608042 [00:06<03:56, 2506.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14425/608042 [00:06<04:06, 2412.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14673/608042 [00:06<04:11, 2359.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14912/608042 [00:06<04:17, 2305.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 15168/608042 [00:06<04:10, 2370.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15423/608042 [00:06<04:08, 2382.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15666/608042 [00:06<04:14, 2331.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15916/608042 [00:06<04:38, 2122.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16159/608042 [00:06<04:34, 2159.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16391/608042 [00:07<04:29, 2192.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16658/608042 [00:07<04:16, 2302.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16932/608042 [00:07<04:07, 2392.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17215/608042 [00:07<03:56, 2502.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17548/608042 [00:07<03:36, 2726.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17833/608042 [00:07<03:54, 2519.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18095/608042 [00:07<04:00, 2457.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18363/608042 [00:07<03:57, 2487.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18648/608042 [00:07<03:51, 2549.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18963/608042 [00:07<03:37, 2705.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19240/608042 [00:08<03:48, 2571.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19503/608042 [00:08<04:02, 2424.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19750/608042 [00:08<04:18, 2272.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19990/608042 [00:08<04:22, 2238.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20300/608042 [00:08<03:59, 2454.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20567/608042 [00:08<04:11, 2331.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20825/608042 [00:08<04:06, 2378.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 21154/608042 [00:08<03:46, 2589.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21438/608042 [00:09<04:00, 2437.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21747/608042 [00:09<03:51, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22038/608042 [00:09<03:46, 2592.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22352/608042 [00:09<03:36, 2700.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22640/608042 [00:09<03:34, 2734.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 22919/608042 [00:09<03:38, 2678.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23190/608042 [00:09<03:49, 2544.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23507/608042 [00:09<03:41, 2643.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23856/608042 [00:09<03:29, 2786.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24212/608042 [00:10<03:15, 2982.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24517/608042 [00:10<03:22, 2876.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24812/608042 [00:10<03:26, 2827.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25100/608042 [00:10<03:56, 2469.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25377/608042 [00:10<03:54, 2488.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25681/608042 [00:10<03:43, 2608.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25989/608042 [00:10<03:35, 2705.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26272/608042 [00:10<03:57, 2444.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26538/608042 [00:10<03:53, 2493.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26801/608042 [00:11<04:07, 2348.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27057/608042 [00:11<04:02, 2392.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27343/608042 [00:11<03:50, 2518.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27645/608042 [00:11<03:40, 2636.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27959/608042 [00:11<03:29, 2767.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28247/608042 [00:11<03:33, 2719.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28526/608042 [00:11<03:43, 2597.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28804/608042 [00:11<03:47, 2542.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29139/608042 [00:11<03:31, 2734.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29453/608042 [00:12<03:23, 2848.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29755/608042 [00:12<03:36, 2676.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30032/608042 [00:12<03:34, 2696.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30311/608042 [00:12<03:35, 2678.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30584/608042 [00:12<03:46, 2544.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30845/608042 [00:12<03:46, 2548.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31102/608042 [00:12<03:51, 2490.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31362/608042 [00:12<03:59, 2403.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31641/608042 [00:12<03:55, 2442.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31902/608042 [00:13<04:03, 2364.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32181/608042 [00:13<03:59, 2404.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32433/608042 [00:13<03:59, 2400.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32695/608042 [00:13<04:00, 2395.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32956/608042 [00:13<03:57, 2425.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 33212/608042 [00:13<04:00, 2392.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33454/608042 [00:13<04:04, 2346.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33764/608042 [00:13<03:44, 2556.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34039/608042 [00:13<03:40, 2607.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34315/608042 [00:14<03:46, 2537.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34588/608042 [00:14<03:45, 2538.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34848/608042 [00:14<03:58, 2403.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35125/608042 [00:14<03:55, 2436.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35378/608042 [00:14<04:04, 2345.21 examples/s]
2024-08-03T04:45:03.832268751Z Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35627/608042 [00:14<04:17, 2223.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35903/608042 [00:14<04:04, 2335.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36162/608042 [00:14<04:00, 2381.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36427/608042 [00:14<03:53, 2451.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36674/608042 [00:15<03:55, 2429.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36919/608042 [00:15<03:56, 2417.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37228/608042 [00:15<03:39, 2606.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37499/608042 [00:15<03:39, 2602.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37814/608042 [00:15<03:26, 2757.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38104/608042 [00:15<03:33, 2670.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38380/608042 [00:15<03:37, 2613.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38671/608042 [00:15<03:31, 2691.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38943/608042 [00:15<03:42, 2554.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39209/608042 [00:15<03:48, 2491.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39468/608042 [00:16<03:51, 2457.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 39737/608042 [00:16<03:53, 2433.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40065/608042 [00:16<03:38, 2601.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40454/608042 [00:16<03:12, 2955.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40768/608042 [00:16<03:27, 2737.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41048/608042 [00:16<03:26, 2741.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41329/608042 [00:16<03:43, 2536.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41589/608042 [00:16<03:54, 2414.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41923/608042 [00:17<03:37, 2605.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42195/608042 [00:17<03:51, 2444.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42470/608042 [00:17<03:44, 2518.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42772/608042 [00:17<03:33, 2649.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43089/608042 [00:17<03:26, 2731.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43370/608042 [00:17<03:37, 2595.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43635/608042 [00:17<03:40, 2562.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43905/608042 [00:17<03:40, 2562.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44203/608042 [00:17<03:31, 2660.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44525/608042 [00:17<03:20, 2810.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44815/608042 [00:18<03:20, 2805.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45101/608042 [00:18<03:33, 2642.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45373/608042 [00:18<03:37, 2592.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45719/608042 [00:18<03:27, 2707.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45992/608042 [00:18<03:30, 2669.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46269/608042 [00:18<03:37, 2585.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46610/608042 [00:18<03:20, 2798.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46897/608042 [00:18<03:24, 2746.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47184/608042 [00:19<03:44, 2499.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47445/608042 [00:19<03:42, 2521.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47701/608042 [00:19<03:47, 2465.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47951/608042 [00:19<03:47, 2462.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48200/608042 [00:19<03:51, 2421.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48445/608042 [00:19<04:01, 2317.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48706/608042 [00:19<03:57, 2358.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49009/608042 [00:19<03:41, 2525.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49277/608042 [00:19<03:52, 2398.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49532/608042 [00:20<04:11, 2219.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49791/608042 [00:20<04:01, 2307.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50071/608042 [00:20<03:52, 2403.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50347/608042 [00:20<03:43, 2496.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50610/608042 [00:20<03:50, 2418.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50866/608042 [00:20<03:56, 2351.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51114/608042 [00:20<03:55, 2363.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51368/608042 [00:20<03:57, 2343.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51683/608042 [00:20<03:37, 2561.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 51958/608042 [00:21<03:49, 2422.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52218/608042 [00:21<03:52, 2387.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52481/608042 [00:21<03:53, 2378.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52827/608042 [00:21<03:30, 2636.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 53129/608042 [00:21<03:24, 2716.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53439/608042 [00:21<03:17, 2814.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53731/608042 [00:21<03:52, 2384.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53986/608042 [00:21<03:50, 2401.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54246/608042 [00:21<03:55, 2352.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54498/608042 [00:22<04:05, 2253.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54785/608042 [00:22<03:49, 2406.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55039/608042 [00:22<03:51, 2390.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55285/608042 [00:22<03:58, 2322.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55521/608042 [00:22<04:06, 2238.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55830/608042 [00:22<03:45, 2447.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56097/608042 [00:22<03:43, 2473.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56412/608042 [00:22<03:28, 2642.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56682/608042 [00:22<03:34, 2575.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56965/608042 [00:23<03:33, 2580.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57239/608042 [00:23<03:38, 2523.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57518/608042 [00:23<03:34, 2563.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 57782/608042 [00:23<03:45, 2436.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58143/608042 [00:23<03:20, 2745.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58427/608042 [00:23<03:30, 2616.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58702/608042 [00:23<03:29, 2620.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58994/608042 [00:23<03:23, 2702.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59311/608042 [00:23<03:15, 2804.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59610/608042 [00:24<03:14, 2824.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59910/608042 [00:24<03:28, 2630.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60186/608042 [00:24<03:36, 2532.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60448/608042 [00:24<03:34, 2555.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60755/608042 [00:24<03:24, 2671.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61030/608042 [00:24<03:26, 2648.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61359/608042 [00:24<03:17, 2771.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61658/608042 [00:24<03:15, 2795.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61941/608042 [00:24<03:19, 2735.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62216/608042 [00:25<03:37, 2512.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62516/608042 [00:25<03:30, 2586.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62779/608042 [00:25<03:46, 2402.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63044/608042 [00:25<03:41, 2465.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63306/608042 [00:25<03:51, 2350.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63554/608042 [00:25<03:51, 2350.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63826/608042 [00:25<03:44, 2428.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64079/608042 [00:25<03:52, 2334.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64333/608042 [00:25<03:48, 2383.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64602/608042 [00:26<03:41, 2457.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64882/608042 [00:26<03:33, 2538.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65161/608042 [00:26<03:28, 2609.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65428/608042 [00:26<03:33, 2545.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65687/608042 [00:26<03:41, 2451.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66008/608042 [00:26<03:24, 2656.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66293/608042 [00:26<03:23, 2667.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66562/608042 [00:26<03:35, 2508.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66832/608042 [00:26<03:41, 2446.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67131/608042 [00:27<03:37, 2492.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67393/608042 [00:27<03:36, 2491.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67672/608042 [00:27<03:34, 2514.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67927/608042 [00:27<03:40, 2450.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 68218/608042 [00:27<03:31, 2551.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68539/608042 [00:27<03:18, 2720.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68837/608042 [00:27<03:12, 2794.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69140/608042 [00:27<03:22, 2666.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69410/608042 [00:27<03:28, 2583.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69676/608042 [00:27<03:37, 2475.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 69939/608042 [00:28<03:39, 2446.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70187/608042 [00:28<03:44, 2399.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70435/608042 [00:28<03:43, 2407.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70686/608042 [00:28<04:01, 2225.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70953/608042 [00:28<03:51, 2315.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71242/608042 [00:28<03:37, 2465.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71496/608042 [00:28<03:38, 2450.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71763/608042 [00:28<03:362024-08-03T04:45:03.832268751Z 
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 0/608042 [00:00<?, ? examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 6/608042 [00:00<4:50:40, 34.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 45/608042 [00:00<53:35, 189.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 211/608042 [00:00<13:27, 752.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 398/608042 [00:00<08:51, 1143.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 693/608042 [00:00<05:56, 1703.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 909/608042 [00:00<05:32, 1824.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1238/608042 [00:00<04:47, 2112.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1585/608042 [00:00<04:04, 2481.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1875/608042 [00:01<03:55, 2576.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2204/608042 [00:01<03:38, 2778.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2497/608042 [00:01<03:52, 2599.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2766/608042 [00:01<04:21, 2311.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 3022/608042 [00:01<04:16, 2355.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3287/608042 [00:01<04:08, 2432.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3545/608042 [00:01<04:06, 2454.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3823/608042 [00:01<03:57, 2541.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4089/608042 [00:01<04:06, 2453.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4355/608042 [00:02<04:07, 2441.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4745/608042 [00:02<03:32, 2844.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5043/608042 [00:02<03:35, 2793.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5331/608042 [00:02<03:47, 2651.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5602/608042 [00:02<03:51, 2607.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5872/608042 [00:02<03:53, 2583.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6133/608042 [00:02<04:04, 2465.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6390/608042 [00:02<04:05, 2447.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6642/608042 [00:02<04:10, 2397.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6883/608042 [00:03<04:25, 2267.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7169/608042 [00:03<04:09, 2410.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7423/608042 [00:03<04:13, 2372.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7687/608042 [00:03<04:05, 2446.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7939/608042 [00:03<04:07, 2423.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8234/608042 [00:03<03:53, 2572.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8526/608042 [00:03<03:45, 2658.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8795/608042 [00:03<04:17, 2324.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 9047/608042 [00:03<04:18, 2317.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9373/608042 [00:04<03:53, 2568.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9644/608042 [00:04<03:55, 2536.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9919/608042 [00:04<03:55, 2537.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10176/608042 [00:04<03:56, 2528.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10436/608042 [00:04<04:02, 2463.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10691/608042 [00:04<04:26, 2239.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10944/608042 [00:04<04:21, 2285.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11269/608042 [00:04<03:58, 2502.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11528/608042 [00:04<04:06, 2417.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11790/608042 [00:05<04:05, 2431.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12036/608042 [00:05<04:15, 2329.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12327/608042 [00:05<04:00, 2481.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12580/608042 [00:05<04:13, 2349.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12846/608042 [00:05<04:07, 2404.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13100/608042 [00:05<04:04, 2436.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13378/608042 [00:05<03:55, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13636/608042 [00:05<04:06, 2412.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13884/608042 [00:05<04:05, 2418.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14163/608042 [00:06<03:56, 2506.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14425/608042 [00:06<04:06, 2412.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14673/608042 [00:06<04:11, 2359.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14912/608042 [00:06<04:17, 2305.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 15168/608042 [00:06<04:10, 2370.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15423/608042 [00:06<04:08, 2382.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15666/608042 [00:06<04:14, 2331.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15916/608042 [00:06<04:38, 2122.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16159/608042 [00:06<04:34, 2159.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16391/608042 [00:07<04:29, 2192.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16658/608042 [00:07<04:16, 2302.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16932/608042 [00:07<04:07, 2392.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17215/608042 [00:07<03:56, 2502.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17548/608042 [00:07<03:36, 2726.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17833/608042 [00:07<03:54, 2519.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18095/608042 [00:07<04:00, 2457.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18363/608042 [00:07<03:57, 2487.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18648/608042 [00:07<03:51, 2549.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18963/608042 [00:07<03:37, 2705.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19240/608042 [00:08<03:48, 2571.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19503/608042 [00:08<04:02, 2424.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19750/608042 [00:08<04:18, 2272.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19990/608042 [00:08<04:22, 2238.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20300/608042 [00:08<03:59, 2454.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20567/608042 [00:08<04:11, 2331.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20825/608042 [00:08<04:06, 2378.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 21154/608042 [00:08<03:46, 2589.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21438/608042 [00:09<04:00, 2437.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21747/608042 [00:09<03:51, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22038/608042 [00:09<03:46, 2592.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22352/608042 [00:09<03:36, 2700.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22640/608042 [00:09<03:34, 2734.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 22919/608042 [00:09<03:38, 2678.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23190/608042 [00:09<03:49, 2544.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23507/608042 [00:09<03:41, 2643.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23856/608042 [00:09<03:29, 2786.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24212/608042 [00:10<03:15, 2982.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24517/608042 [00:10<03:22, 2876.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24812/608042 [00:10<03:26, 2827.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25100/608042 [00:10<03:56, 2469.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25377/608042 [00:10<03:54, 2488.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25681/608042 [00:10<03:43, 2608.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25989/608042 [00:10<03:35, 2705.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26272/608042 [00:10<03:57, 2444.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26538/608042 [00:10<03:53, 2493.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26801/608042 [00:11<04:07, 2348.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27057/608042 [00:11<04:02, 2392.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27343/608042 [00:11<03:50, 2518.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27645/608042 [00:11<03:40, 2636.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27959/608042 [00:11<03:29, 2767.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28247/608042 [00:11<03:33, 2719.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28526/608042 [00:11<03:43, 2597.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28804/608042 [00:11<03:47, 2542.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29139/608042 [00:11<03:31, 2734.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29453/608042 [00:12<03:23, 2848.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29755/608042 [00:12<03:36, 2676.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30032/608042 [00:12<03:34, 2696.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30311/608042 [00:12<03:35, 2678.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30584/608042 [00:12<03:46, 2544.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30845/608042 [00:12<03:46, 2548.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31102/608042 [00:12<03:51, 2490.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31362/608042 [00:12<03:59, 2403.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31641/608042 [00:12<03:55, 2442.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31902/608042 [00:13<04:03, 2364.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32181/608042 [00:13<03:59, 2404.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32433/608042 [00:13<03:59, 2400.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32695/608042 [00:13<04:00, 2395.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32956/608042 [00:13<03:57, 2425.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 33212/608042 [00:13<04:00, 2392.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33454/608042 [00:13<04:04, 2346.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33764/608042 [00:13<03:44, 2556.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34039/608042 [00:13<03:40, 2607.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34315/608042 [00:14<03:46, 2537.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34588/608042 [00:14<03:45, 2538.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34848/608042 [00:14<03:58, 2403.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35125/608042 [00:14<03:55, 2436.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35378/608042 [00:14<04:04, 2345.21 examples/s]
2024-08-03T04:45:03.832268751Z Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35627/608042 [00:14<04:17, 2223.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35903/608042 [00:14<04:04, 2335.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36162/608042 [00:14<04:00, 2381.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36427/608042 [00:14<03:53, 2451.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36674/608042 [00:15<03:55, 2429.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36919/608042 [00:15<03:56, 2417.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37228/608042 [00:15<03:39, 2606.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37499/608042 [00:15<03:39, 2602.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37814/608042 [00:15<03:26, 2757.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38104/608042 [00:15<03:33, 2670.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38380/608042 [00:15<03:37, 2613.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38671/608042 [00:15<03:31, 2691.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38943/608042 [00:15<03:42, 2554.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39209/608042 [00:15<03:48, 2491.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39468/608042 [00:16<03:51, 2457.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 39737/608042 [00:16<03:53, 2433.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40065/608042 [00:16<03:38, 2601.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40454/608042 [00:16<03:12, 2955.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40768/608042 [00:16<03:27, 2737.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41048/608042 [00:16<03:26, 2741.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41329/608042 [00:16<03:43, 2536.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41589/608042 [00:16<03:54, 2414.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41923/608042 [00:17<03:37, 2605.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42195/608042 [00:17<03:51, 2444.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42470/608042 [00:17<03:44, 2518.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42772/608042 [00:17<03:33, 2649.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43089/608042 [00:17<03:26, 2731.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43370/608042 [00:17<03:37, 2595.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43635/608042 [00:17<03:40, 2562.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43905/608042 [00:17<03:40, 2562.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44203/608042 [00:17<03:31, 2660.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44525/608042 [00:17<03:20, 2810.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44815/608042 [00:18<03:20, 2805.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45101/608042 [00:18<03:33, 2642.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45373/608042 [00:18<03:37, 2592.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45719/608042 [00:18<03:27, 2707.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45992/608042 [00:18<03:30, 2669.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46269/608042 [00:18<03:37, 2585.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46610/608042 [00:18<03:20, 2798.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46897/608042 [00:18<03:24, 2746.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47184/608042 [00:19<03:44, 2499.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47445/608042 [00:19<03:42, 2521.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47701/608042 [00:19<03:47, 2465.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47951/608042 [00:19<03:47, 2462.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48200/608042 [00:19<03:51, 2421.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48445/608042 [00:19<04:01, 2317.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48706/608042 [00:19<03:57, 2358.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49009/608042 [00:19<03:41, 2525.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49277/608042 [00:19<03:52, 2398.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49532/608042 [00:20<04:11, 2219.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49791/608042 [00:20<04:01, 2307.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50071/608042 [00:20<03:52, 2403.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50347/608042 [00:20<03:43, 2496.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50610/608042 [00:20<03:50, 2418.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50866/608042 [00:20<03:56, 2351.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51114/608042 [00:20<03:55, 2363.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51368/608042 [00:20<03:57, 2343.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51683/608042 [00:20<03:37, 2561.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 51958/608042 [00:21<03:49, 2422.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52218/608042 [00:21<03:52, 2387.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52481/608042 [00:21<03:53, 2378.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52827/608042 [00:21<03:30, 2636.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 53129/608042 [00:21<03:24, 2716.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53439/608042 [00:21<03:17, 2814.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53731/608042 [00:21<03:52, 2384.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53986/608042 [00:21<03:50, 2401.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54246/608042 [00:21<03:55, 2352.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54498/608042 [00:22<04:05, 2253.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54785/608042 [00:22<03:49, 2406.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55039/608042 [00:22<03:51, 2390.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55285/608042 [00:22<03:58, 2322.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55521/608042 [00:22<04:06, 2238.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55830/608042 [00:22<03:45, 2447.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56097/608042 [00:22<03:43, 2473.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56412/608042 [00:22<03:28, 2642.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56682/608042 [00:22<03:34, 2575.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56965/608042 [00:23<03:33, 2580.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57239/608042 [00:23<03:38, 2523.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57518/608042 [00:23<03:34, 2563.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 57782/608042 [00:23<03:45, 2436.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58143/608042 [00:23<03:20, 2745.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58427/608042 [00:23<03:30, 2616.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58702/608042 [00:23<03:29, 2620.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58994/608042 [00:23<03:23, 2702.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59311/608042 [00:23<03:15, 2804.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59610/608042 [00:24<03:14, 2824.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59910/608042 [00:24<03:28, 2630.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60186/608042 [00:24<03:36, 2532.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60448/608042 [00:24<03:34, 2555.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60755/608042 [00:24<03:24, 2671.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61030/608042 [00:24<03:26, 2648.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61359/608042 [00:24<03:17, 2771.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61658/608042 [00:24<03:15, 2795.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61941/608042 [00:24<03:19, 2735.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62216/608042 [00:25<03:37, 2512.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62516/608042 [00:25<03:30, 2586.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62779/608042 [00:25<03:46, 2402.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63044/608042 [00:25<03:41, 2465.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63306/608042 [00:25<03:51, 2350.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63554/608042 [00:25<03:51, 2350.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63826/608042 [00:25<03:44, 2428.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64079/608042 [00:25<03:52, 2334.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64333/608042 [00:25<03:48, 2383.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64602/608042 [00:26<03:41, 2457.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64882/608042 [00:26<03:33, 2538.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65161/608042 [00:26<03:28, 2609.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65428/608042 [00:26<03:33, 2545.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65687/608042 [00:26<03:41, 2451.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66008/608042 [00:26<03:24, 2656.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66293/608042 [00:26<03:23, 2667.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66562/608042 [00:26<03:35, 2508.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66832/608042 [00:26<03:41, 2446.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67131/608042 [00:27<03:37, 2492.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67393/608042 [00:27<03:36, 2491.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67672/608042 [00:27<03:34, 2514.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67927/608042 [00:27<03:40, 2450.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 68218/608042 [00:27<03:31, 2551.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68539/608042 [00:27<03:18, 2720.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68837/608042 [00:27<03:12, 2794.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69140/608042 [00:27<03:22, 2666.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69410/608042 [00:27<03:28, 2583.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69676/608042 [00:27<03:37, 2475.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 69939/608042 [00:28<03:39, 2446.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70187/608042 [00:28<03:44, 2399.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70435/608042 [00:28<03:43, 2407.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70686/608042 [00:28<04:01, 2225.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70953/608042 [00:28<03:51, 2315.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71242/608042 [00:28<03:37, 2465.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71496/608042 [00:28<03:38, 2450.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71763/608042 [00:28<03:362024-08-03T04:45:03.832268751Z 
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 0/608042 [00:00<?, ? examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 6/608042 [00:00<4:50:40, 34.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 45/608042 [00:00<53:35, 189.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 211/608042 [00:00<13:27, 752.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 398/608042 [00:00<08:51, 1143.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 693/608042 [00:00<05:56, 1703.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 909/608042 [00:00<05:32, 1824.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1238/608042 [00:00<04:47, 2112.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1585/608042 [00:00<04:04, 2481.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1875/608042 [00:01<03:55, 2576.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2204/608042 [00:01<03:38, 2778.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2497/608042 [00:01<03:52, 2599.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2766/608042 [00:01<04:21, 2311.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 3022/608042 [00:01<04:16, 2355.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3287/608042 [00:01<04:08, 2432.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3545/608042 [00:01<04:06, 2454.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3823/608042 [00:01<03:57, 2541.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4089/608042 [00:01<04:06, 2453.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4355/608042 [00:02<04:07, 2441.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4745/608042 [00:02<03:32, 2844.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5043/608042 [00:02<03:35, 2793.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5331/608042 [00:02<03:47, 2651.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5602/608042 [00:02<03:51, 2607.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5872/608042 [00:02<03:53, 2583.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6133/608042 [00:02<04:04, 2465.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6390/608042 [00:02<04:05, 2447.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6642/608042 [00:02<04:10, 2397.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6883/608042 [00:03<04:25, 2267.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7169/608042 [00:03<04:09, 2410.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7423/608042 [00:03<04:13, 2372.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7687/608042 [00:03<04:05, 2446.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7939/608042 [00:03<04:07, 2423.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8234/608042 [00:03<03:53, 2572.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8526/608042 [00:03<03:45, 2658.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8795/608042 [00:03<04:17, 2324.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 9047/608042 [00:03<04:18, 2317.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9373/608042 [00:04<03:53, 2568.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9644/608042 [00:04<03:55, 2536.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9919/608042 [00:04<03:55, 2537.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10176/608042 [00:04<03:56, 2528.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10436/608042 [00:04<04:02, 2463.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10691/608042 [00:04<04:26, 2239.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10944/608042 [00:04<04:21, 2285.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11269/608042 [00:04<03:58, 2502.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11528/608042 [00:04<04:06, 2417.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11790/608042 [00:05<04:05, 2431.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12036/608042 [00:05<04:15, 2329.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12327/608042 [00:05<04:00, 2481.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12580/608042 [00:05<04:13, 2349.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12846/608042 [00:05<04:07, 2404.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13100/608042 [00:05<04:04, 2436.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13378/608042 [00:05<03:55, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13636/608042 [00:05<04:06, 2412.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13884/608042 [00:05<04:05, 2418.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14163/608042 [00:06<03:56, 2506.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14425/608042 [00:06<04:06, 2412.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14673/608042 [00:06<04:11, 2359.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14912/608042 [00:06<04:17, 2305.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 15168/608042 [00:06<04:10, 2370.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15423/608042 [00:06<04:08, 2382.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15666/608042 [00:06<04:14, 2331.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15916/608042 [00:06<04:38, 2122.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16159/608042 [00:06<04:34, 2159.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16391/608042 [00:07<04:29, 2192.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16658/608042 [00:07<04:16, 2302.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16932/608042 [00:07<04:07, 2392.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17215/608042 [00:07<03:56, 2502.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17548/608042 [00:07<03:36, 2726.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17833/608042 [00:07<03:54, 2519.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18095/608042 [00:07<04:00, 2457.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18363/608042 [00:07<03:57, 2487.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18648/608042 [00:07<03:51, 2549.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18963/608042 [00:07<03:37, 2705.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19240/608042 [00:08<03:48, 2571.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19503/608042 [00:08<04:02, 2424.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19750/608042 [00:08<04:18, 2272.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19990/608042 [00:08<04:22, 2238.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20300/608042 [00:08<03:59, 2454.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20567/608042 [00:08<04:11, 2331.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20825/608042 [00:08<04:06, 2378.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 21154/608042 [00:08<03:46, 2589.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21438/608042 [00:09<04:00, 2437.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21747/608042 [00:09<03:51, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22038/608042 [00:09<03:46, 2592.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22352/608042 [00:09<03:36, 2700.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22640/608042 [00:09<03:34, 2734.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 22919/608042 [00:09<03:38, 2678.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23190/608042 [00:09<03:49, 2544.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23507/608042 [00:09<03:41, 2643.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23856/608042 [00:09<03:29, 2786.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24212/608042 [00:10<03:15, 2982.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24517/608042 [00:10<03:22, 2876.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24812/608042 [00:10<03:26, 2827.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25100/608042 [00:10<03:56, 2469.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25377/608042 [00:10<03:54, 2488.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25681/608042 [00:10<03:43, 2608.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25989/608042 [00:10<03:35, 2705.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26272/608042 [00:10<03:57, 2444.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26538/608042 [00:10<03:53, 2493.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26801/608042 [00:11<04:07, 2348.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27057/608042 [00:11<04:02, 2392.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27343/608042 [00:11<03:50, 2518.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27645/608042 [00:11<03:40, 2636.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27959/608042 [00:11<03:29, 2767.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28247/608042 [00:11<03:33, 2719.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28526/608042 [00:11<03:43, 2597.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28804/608042 [00:11<03:47, 2542.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29139/608042 [00:11<03:31, 2734.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29453/608042 [00:12<03:23, 2848.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29755/608042 [00:12<03:36, 2676.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30032/608042 [00:12<03:34, 2696.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30311/608042 [00:12<03:35, 2678.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30584/608042 [00:12<03:46, 2544.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30845/608042 [00:12<03:46, 2548.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31102/608042 [00:12<03:51, 2490.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31362/608042 [00:12<03:59, 2403.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31641/608042 [00:12<03:55, 2442.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31902/608042 [00:13<04:03, 2364.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32181/608042 [00:13<03:59, 2404.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32433/608042 [00:13<03:59, 2400.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32695/608042 [00:13<04:00, 2395.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32956/608042 [00:13<03:57, 2425.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 33212/608042 [00:13<04:00, 2392.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33454/608042 [00:13<04:04, 2346.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33764/608042 [00:13<03:44, 2556.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34039/608042 [00:13<03:40, 2607.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34315/608042 [00:14<03:46, 2537.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34588/608042 [00:14<03:45, 2538.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34848/608042 [00:14<03:58, 2403.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35125/608042 [00:14<03:55, 2436.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35378/608042 [00:14<04:04, 2345.21 examples/s]
2024-08-03T04:45:03.832268751Z Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35627/608042 [00:14<04:17, 2223.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35903/608042 [00:14<04:04, 2335.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36162/608042 [00:14<04:00, 2381.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36427/608042 [00:14<03:53, 2451.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36674/608042 [00:15<03:55, 2429.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36919/608042 [00:15<03:56, 2417.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37228/608042 [00:15<03:39, 2606.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37499/608042 [00:15<03:39, 2602.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37814/608042 [00:15<03:26, 2757.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38104/608042 [00:15<03:33, 2670.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38380/608042 [00:15<03:37, 2613.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38671/608042 [00:15<03:31, 2691.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38943/608042 [00:15<03:42, 2554.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39209/608042 [00:15<03:48, 2491.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39468/608042 [00:16<03:51, 2457.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 39737/608042 [00:16<03:53, 2433.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40065/608042 [00:16<03:38, 2601.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40454/608042 [00:16<03:12, 2955.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40768/608042 [00:16<03:27, 2737.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41048/608042 [00:16<03:26, 2741.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41329/608042 [00:16<03:43, 2536.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41589/608042 [00:16<03:54, 2414.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41923/608042 [00:17<03:37, 2605.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42195/608042 [00:17<03:51, 2444.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42470/608042 [00:17<03:44, 2518.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42772/608042 [00:17<03:33, 2649.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43089/608042 [00:17<03:26, 2731.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43370/608042 [00:17<03:37, 2595.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43635/608042 [00:17<03:40, 2562.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43905/608042 [00:17<03:40, 2562.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44203/608042 [00:17<03:31, 2660.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44525/608042 [00:17<03:20, 2810.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44815/608042 [00:18<03:20, 2805.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45101/608042 [00:18<03:33, 2642.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45373/608042 [00:18<03:37, 2592.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45719/608042 [00:18<03:27, 2707.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45992/608042 [00:18<03:30, 2669.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46269/608042 [00:18<03:37, 2585.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46610/608042 [00:18<03:20, 2798.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46897/608042 [00:18<03:24, 2746.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47184/608042 [00:19<03:44, 2499.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47445/608042 [00:19<03:42, 2521.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47701/608042 [00:19<03:47, 2465.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47951/608042 [00:19<03:47, 2462.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48200/608042 [00:19<03:51, 2421.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48445/608042 [00:19<04:01, 2317.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48706/608042 [00:19<03:57, 2358.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49009/608042 [00:19<03:41, 2525.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49277/608042 [00:19<03:52, 2398.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49532/608042 [00:20<04:11, 2219.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49791/608042 [00:20<04:01, 2307.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50071/608042 [00:20<03:52, 2403.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50347/608042 [00:20<03:43, 2496.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50610/608042 [00:20<03:50, 2418.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50866/608042 [00:20<03:56, 2351.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51114/608042 [00:20<03:55, 2363.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51368/608042 [00:20<03:57, 2343.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51683/608042 [00:20<03:37, 2561.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 51958/608042 [00:21<03:49, 2422.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52218/608042 [00:21<03:52, 2387.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52481/608042 [00:21<03:53, 2378.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52827/608042 [00:21<03:30, 2636.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 53129/608042 [00:21<03:24, 2716.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53439/608042 [00:21<03:17, 2814.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53731/608042 [00:21<03:52, 2384.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53986/608042 [00:21<03:50, 2401.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54246/608042 [00:21<03:55, 2352.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54498/608042 [00:22<04:05, 2253.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54785/608042 [00:22<03:49, 2406.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55039/608042 [00:22<03:51, 2390.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55285/608042 [00:22<03:58, 2322.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55521/608042 [00:22<04:06, 2238.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55830/608042 [00:22<03:45, 2447.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56097/608042 [00:22<03:43, 2473.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56412/608042 [00:22<03:28, 2642.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56682/608042 [00:22<03:34, 2575.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56965/608042 [00:23<03:33, 2580.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57239/608042 [00:23<03:38, 2523.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57518/608042 [00:23<03:34, 2563.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 57782/608042 [00:23<03:45, 2436.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58143/608042 [00:23<03:20, 2745.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58427/608042 [00:23<03:30, 2616.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58702/608042 [00:23<03:29, 2620.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58994/608042 [00:23<03:23, 2702.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59311/608042 [00:23<03:15, 2804.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59610/608042 [00:24<03:14, 2824.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59910/608042 [00:24<03:28, 2630.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60186/608042 [00:24<03:36, 2532.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60448/608042 [00:24<03:34, 2555.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60755/608042 [00:24<03:24, 2671.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61030/608042 [00:24<03:26, 2648.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61359/608042 [00:24<03:17, 2771.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61658/608042 [00:24<03:15, 2795.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61941/608042 [00:24<03:19, 2735.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62216/608042 [00:25<03:37, 2512.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62516/608042 [00:25<03:30, 2586.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62779/608042 [00:25<03:46, 2402.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63044/608042 [00:25<03:41, 2465.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63306/608042 [00:25<03:51, 2350.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63554/608042 [00:25<03:51, 2350.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63826/608042 [00:25<03:44, 2428.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64079/608042 [00:25<03:52, 2334.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64333/608042 [00:25<03:48, 2383.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64602/608042 [00:26<03:41, 2457.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64882/608042 [00:26<03:33, 2538.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65161/608042 [00:26<03:28, 2609.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65428/608042 [00:26<03:33, 2545.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65687/608042 [00:26<03:41, 2451.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66008/608042 [00:26<03:24, 2656.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66293/608042 [00:26<03:23, 2667.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66562/608042 [00:26<03:35, 2508.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66832/608042 [00:26<03:41, 2446.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67131/608042 [00:27<03:37, 2492.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67393/608042 [00:27<03:36, 2491.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67672/608042 [00:27<03:34, 2514.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67927/608042 [00:27<03:40, 2450.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 68218/608042 [00:27<03:31, 2551.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68539/608042 [00:27<03:18, 2720.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68837/608042 [00:27<03:12, 2794.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69140/608042 [00:27<03:22, 2666.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69410/608042 [00:27<03:28, 2583.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69676/608042 [00:27<03:37, 2475.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 69939/608042 [00:28<03:39, 2446.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70187/608042 [00:28<03:44, 2399.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70435/608042 [00:28<03:43, 2407.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70686/608042 [00:28<04:01, 2225.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70953/608042 [00:28<03:51, 2315.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71242/608042 [00:28<03:37, 2465.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71496/608042 [00:28<03:38, 2450.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71763/608042 [00:28<03:362024-08-03T04:45:03.832268751Z 
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 0/608042 [00:00<?, ? examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 6/608042 [00:00<4:50:40, 34.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 45/608042 [00:00<53:35, 189.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 211/608042 [00:00<13:27, 752.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 398/608042 [00:00<08:51, 1143.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 693/608042 [00:00<05:56, 1703.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 909/608042 [00:00<05:32, 1824.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1238/608042 [00:00<04:47, 2112.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1585/608042 [00:00<04:04, 2481.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1875/608042 [00:01<03:55, 2576.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2204/608042 [00:01<03:38, 2778.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2497/608042 [00:01<03:52, 2599.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2766/608042 [00:01<04:21, 2311.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 3022/608042 [00:01<04:16, 2355.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3287/608042 [00:01<04:08, 2432.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3545/608042 [00:01<04:06, 2454.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3823/608042 [00:01<03:57, 2541.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4089/608042 [00:01<04:06, 2453.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4355/608042 [00:02<04:07, 2441.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4745/608042 [00:02<03:32, 2844.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5043/608042 [00:02<03:35, 2793.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5331/608042 [00:02<03:47, 2651.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5602/608042 [00:02<03:51, 2607.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5872/608042 [00:02<03:53, 2583.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6133/608042 [00:02<04:04, 2465.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6390/608042 [00:02<04:05, 2447.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6642/608042 [00:02<04:10, 2397.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6883/608042 [00:03<04:25, 2267.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7169/608042 [00:03<04:09, 2410.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7423/608042 [00:03<04:13, 2372.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7687/608042 [00:03<04:05, 2446.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7939/608042 [00:03<04:07, 2423.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8234/608042 [00:03<03:53, 2572.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8526/608042 [00:03<03:45, 2658.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8795/608042 [00:03<04:17, 2324.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 9047/608042 [00:03<04:18, 2317.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9373/608042 [00:04<03:53, 2568.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9644/608042 [00:04<03:55, 2536.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9919/608042 [00:04<03:55, 2537.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10176/608042 [00:04<03:56, 2528.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10436/608042 [00:04<04:02, 2463.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10691/608042 [00:04<04:26, 2239.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10944/608042 [00:04<04:21, 2285.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11269/608042 [00:04<03:58, 2502.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11528/608042 [00:04<04:06, 2417.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11790/608042 [00:05<04:05, 2431.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12036/608042 [00:05<04:15, 2329.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12327/608042 [00:05<04:00, 2481.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12580/608042 [00:05<04:13, 2349.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12846/608042 [00:05<04:07, 2404.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13100/608042 [00:05<04:04, 2436.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13378/608042 [00:05<03:55, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13636/608042 [00:05<04:06, 2412.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13884/608042 [00:05<04:05, 2418.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14163/608042 [00:06<03:56, 2506.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14425/608042 [00:06<04:06, 2412.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14673/608042 [00:06<04:11, 2359.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14912/608042 [00:06<04:17, 2305.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 15168/608042 [00:06<04:10, 2370.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15423/608042 [00:06<04:08, 2382.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15666/608042 [00:06<04:14, 2331.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15916/608042 [00:06<04:38, 2122.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16159/608042 [00:06<04:34, 2159.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16391/608042 [00:07<04:29, 2192.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16658/608042 [00:07<04:16, 2302.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16932/608042 [00:07<04:07, 2392.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17215/608042 [00:07<03:56, 2502.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17548/608042 [00:07<03:36, 2726.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17833/608042 [00:07<03:54, 2519.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18095/608042 [00:07<04:00, 2457.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18363/608042 [00:07<03:57, 2487.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18648/608042 [00:07<03:51, 2549.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18963/608042 [00:07<03:37, 2705.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19240/608042 [00:08<03:48, 2571.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19503/608042 [00:08<04:02, 2424.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19750/608042 [00:08<04:18, 2272.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19990/608042 [00:08<04:22, 2238.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20300/608042 [00:08<03:59, 2454.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20567/608042 [00:08<04:11, 2331.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20825/608042 [00:08<04:06, 2378.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 21154/608042 [00:08<03:46, 2589.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21438/608042 [00:09<04:00, 2437.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21747/608042 [00:09<03:51, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22038/608042 [00:09<03:46, 2592.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22352/608042 [00:09<03:36, 2700.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22640/608042 [00:09<03:34, 2734.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 22919/608042 [00:09<03:38, 2678.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23190/608042 [00:09<03:49, 2544.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23507/608042 [00:09<03:41, 2643.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23856/608042 [00:09<03:29, 2786.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24212/608042 [00:10<03:15, 2982.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24517/608042 [00:10<03:22, 2876.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24812/608042 [00:10<03:26, 2827.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25100/608042 [00:10<03:56, 2469.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25377/608042 [00:10<03:54, 2488.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25681/608042 [00:10<03:43, 2608.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25989/608042 [00:10<03:35, 2705.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26272/608042 [00:10<03:57, 2444.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26538/608042 [00:10<03:53, 2493.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26801/608042 [00:11<04:07, 2348.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27057/608042 [00:11<04:02, 2392.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27343/608042 [00:11<03:50, 2518.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27645/608042 [00:11<03:40, 2636.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27959/608042 [00:11<03:29, 2767.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28247/608042 [00:11<03:33, 2719.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28526/608042 [00:11<03:43, 2597.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28804/608042 [00:11<03:47, 2542.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29139/608042 [00:11<03:31, 2734.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29453/608042 [00:12<03:23, 2848.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29755/608042 [00:12<03:36, 2676.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30032/608042 [00:12<03:34, 2696.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30311/608042 [00:12<03:35, 2678.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30584/608042 [00:12<03:46, 2544.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30845/608042 [00:12<03:46, 2548.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31102/608042 [00:12<03:51, 2490.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31362/608042 [00:12<03:59, 2403.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31641/608042 [00:12<03:55, 2442.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31902/608042 [00:13<04:03, 2364.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32181/608042 [00:13<03:59, 2404.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32433/608042 [00:13<03:59, 2400.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32695/608042 [00:13<04:00, 2395.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32956/608042 [00:13<03:57, 2425.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 33212/608042 [00:13<04:00, 2392.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33454/608042 [00:13<04:04, 2346.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33764/608042 [00:13<03:44, 2556.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34039/608042 [00:13<03:40, 2607.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34315/608042 [00:14<03:46, 2537.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34588/608042 [00:14<03:45, 2538.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34848/608042 [00:14<03:58, 2403.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35125/608042 [00:14<03:55, 2436.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35378/608042 [00:14<04:04, 2345.21 examples/s]
2024-08-03T04:45:03.832268751Z Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35627/608042 [00:14<04:17, 2223.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35903/608042 [00:14<04:04, 2335.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36162/608042 [00:14<04:00, 2381.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36427/608042 [00:14<03:53, 2451.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36674/608042 [00:15<03:55, 2429.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36919/608042 [00:15<03:56, 2417.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37228/608042 [00:15<03:39, 2606.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37499/608042 [00:15<03:39, 2602.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37814/608042 [00:15<03:26, 2757.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38104/608042 [00:15<03:33, 2670.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38380/608042 [00:15<03:37, 2613.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38671/608042 [00:15<03:31, 2691.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38943/608042 [00:15<03:42, 2554.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39209/608042 [00:15<03:48, 2491.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39468/608042 [00:16<03:51, 2457.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 39737/608042 [00:16<03:53, 2433.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40065/608042 [00:16<03:38, 2601.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40454/608042 [00:16<03:12, 2955.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40768/608042 [00:16<03:27, 2737.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41048/608042 [00:16<03:26, 2741.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41329/608042 [00:16<03:43, 2536.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41589/608042 [00:16<03:54, 2414.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41923/608042 [00:17<03:37, 2605.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42195/608042 [00:17<03:51, 2444.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42470/608042 [00:17<03:44, 2518.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42772/608042 [00:17<03:33, 2649.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43089/608042 [00:17<03:26, 2731.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43370/608042 [00:17<03:37, 2595.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43635/608042 [00:17<03:40, 2562.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43905/608042 [00:17<03:40, 2562.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44203/608042 [00:17<03:31, 2660.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44525/608042 [00:17<03:20, 2810.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44815/608042 [00:18<03:20, 2805.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45101/608042 [00:18<03:33, 2642.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45373/608042 [00:18<03:37, 2592.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45719/608042 [00:18<03:27, 2707.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45992/608042 [00:18<03:30, 2669.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46269/608042 [00:18<03:37, 2585.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46610/608042 [00:18<03:20, 2798.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46897/608042 [00:18<03:24, 2746.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47184/608042 [00:19<03:44, 2499.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47445/608042 [00:19<03:42, 2521.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47701/608042 [00:19<03:47, 2465.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47951/608042 [00:19<03:47, 2462.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48200/608042 [00:19<03:51, 2421.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48445/608042 [00:19<04:01, 2317.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48706/608042 [00:19<03:57, 2358.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49009/608042 [00:19<03:41, 2525.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49277/608042 [00:19<03:52, 2398.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49532/608042 [00:20<04:11, 2219.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49791/608042 [00:20<04:01, 2307.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50071/608042 [00:20<03:52, 2403.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50347/608042 [00:20<03:43, 2496.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50610/608042 [00:20<03:50, 2418.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50866/608042 [00:20<03:56, 2351.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51114/608042 [00:20<03:55, 2363.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51368/608042 [00:20<03:57, 2343.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51683/608042 [00:20<03:37, 2561.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 51958/608042 [00:21<03:49, 2422.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52218/608042 [00:21<03:52, 2387.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52481/608042 [00:21<03:53, 2378.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52827/608042 [00:21<03:30, 2636.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 53129/608042 [00:21<03:24, 2716.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53439/608042 [00:21<03:17, 2814.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53731/608042 [00:21<03:52, 2384.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53986/608042 [00:21<03:50, 2401.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54246/608042 [00:21<03:55, 2352.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54498/608042 [00:22<04:05, 2253.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54785/608042 [00:22<03:49, 2406.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55039/608042 [00:22<03:51, 2390.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55285/608042 [00:22<03:58, 2322.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55521/608042 [00:22<04:06, 2238.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55830/608042 [00:22<03:45, 2447.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56097/608042 [00:22<03:43, 2473.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56412/608042 [00:22<03:28, 2642.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56682/608042 [00:22<03:34, 2575.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56965/608042 [00:23<03:33, 2580.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57239/608042 [00:23<03:38, 2523.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57518/608042 [00:23<03:34, 2563.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 57782/608042 [00:23<03:45, 2436.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58143/608042 [00:23<03:20, 2745.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58427/608042 [00:23<03:30, 2616.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58702/608042 [00:23<03:29, 2620.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58994/608042 [00:23<03:23, 2702.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59311/608042 [00:23<03:15, 2804.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59610/608042 [00:24<03:14, 2824.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59910/608042 [00:24<03:28, 2630.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60186/608042 [00:24<03:36, 2532.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60448/608042 [00:24<03:34, 2555.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60755/608042 [00:24<03:24, 2671.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61030/608042 [00:24<03:26, 2648.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61359/608042 [00:24<03:17, 2771.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61658/608042 [00:24<03:15, 2795.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61941/608042 [00:24<03:19, 2735.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62216/608042 [00:25<03:37, 2512.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62516/608042 [00:25<03:30, 2586.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62779/608042 [00:25<03:46, 2402.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63044/608042 [00:25<03:41, 2465.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63306/608042 [00:25<03:51, 2350.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63554/608042 [00:25<03:51, 2350.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63826/608042 [00:25<03:44, 2428.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64079/608042 [00:25<03:52, 2334.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64333/608042 [00:25<03:48, 2383.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64602/608042 [00:26<03:41, 2457.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64882/608042 [00:26<03:33, 2538.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65161/608042 [00:26<03:28, 2609.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65428/608042 [00:26<03:33, 2545.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65687/608042 [00:26<03:41, 2451.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66008/608042 [00:26<03:24, 2656.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66293/608042 [00:26<03:23, 2667.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66562/608042 [00:26<03:35, 2508.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66832/608042 [00:26<03:41, 2446.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67131/608042 [00:27<03:37, 2492.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67393/608042 [00:27<03:36, 2491.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67672/608042 [00:27<03:34, 2514.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67927/608042 [00:27<03:40, 2450.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 68218/608042 [00:27<03:31, 2551.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68539/608042 [00:27<03:18, 2720.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68837/608042 [00:27<03:12, 2794.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69140/608042 [00:27<03:22, 2666.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69410/608042 [00:27<03:28, 2583.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69676/608042 [00:27<03:37, 2475.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 69939/608042 [00:28<03:39, 2446.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70187/608042 [00:28<03:44, 2399.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70435/608042 [00:28<03:43, 2407.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70686/608042 [00:28<04:01, 2225.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70953/608042 [00:28<03:51, 2315.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71242/608042 [00:28<03:37, 2465.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71496/608042 [00:28<03:38, 2450.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71763/608042 [00:28<03:362024-08-03T04:45:03.832268751Z , 2479.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72013/608042 [00:28<03:40, 2429.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72262/608042 [00:29<03:44, 2389.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72521/608042 [00:29<03:41, 2422.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72807/608042 [00:29<03:30, 2544.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73074/608042 [00:29<03:33, 2504.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73344/608042 [00:29<03:30, 2538.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73608/608042 [00:29<03:40, 2418.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73920/608042 [00:29<03:26, 2580.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74198/608042 [00:29<03:22, 2633.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74483/608042 [00:29<03:18, 2694.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74769/608042 [00:30<03:17, 2704.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75044/608042 [00:30<03:28, 2558.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75311/608042 [00:30<03:42, 2389.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75555/608042 [00:30<03:52, 2293.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75808/608042 [00:30<04:04, 2176.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76096/608042 [00:30<03:45, 2360.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76377/608042 [00:30<03:36, 2453.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76662/608042 [00:30<03:28, 2547.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76927/608042 [00:30<03:53, 2276.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77176/608042 [00:31<04:03, 2183.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77521/608042 [00:31<03:33, 2490.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77778/608042 [00:31<03:54, 2260.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78013/608042 [00:31<03:59, 2215.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78378/608042 [00:31<03:32, 2489.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78637/608042 [00:31<03:45, 2343.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78889/608042 [00:31<03:46, 2337.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79134/608042 [00:31<03:54, 2252.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79373/608042 [00:32<03:56, 2237.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79623/608042 [00:32<03:51, 2283.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79860/608042 [00:32<03:56, 2229.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80114/608042 [00:32<03:55, 2241.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80359/608042 [00:32<03:52, 2266.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80597/608042 [00:32<03:52, 2265.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80826/608042 [00:32<04:00, 2188.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81084/608042 [00:32<04:02, 2175.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81366/608042 [00:32<03:47, 2315.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81657/608042 [00:33<03:33, 2462.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81913/608042 [00:33<03:39, 2398.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82163/608042 [00:33<03:37, 2416.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82446/608042 [00:33<03:27, 2532.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82719/608042 [00:33<03:24, 2568.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82978/608042 [00:33<03:35, 2433.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 83233/608042 [00:33<03:54, 2236.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 83511/608042 [00:33<03:40, 2374.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 83771/608042 [00:33<03:35, 2436.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84019/608042 [00:34<03:57, 2209.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84259/608042 [00:34<03:54, 2232.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84594/608042 [00:34<03:26, 2528.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84857/608042 [00:34<03:29, 2500.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85119/608042 [00:34<03:41, 2356.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85428/608042 [00:34<03:25, 2537.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85696/608042 [00:34<03:32, 2459.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85952/608042 [00:34<03:31, 2473.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86230/608042 [00:34<03:28, 2508.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86503/608042 [00:35<03:27, 2508.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86829/608042 [00:35<03:11, 2719.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87133/608042 [00:35<03:25, 2536.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87405/608042 [00:35<03:28, 2501.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87662/608042 [00:35<03:35, 2416.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87914/608042 [00:35<03:32, 2444.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 88163/608042 [00:35<03:38, 2383.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88409/608042 [00:35<03:39, 2371.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88651/608042 [00:35<04:02, 2141.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88882/608042 [00:36<04:01, 2146.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89162/608042 [00:36<03:49, 2264.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89468/608042 [00:36<03:31, 2452.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89733/608042 [00:36<03:29, 2475.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90047/608042 [00:36<03:17, 2622.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90323/608042 [00:36<03:18, 2610.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90587/608042 [00:36<03:22, 2550.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90845/608042 [00:36<03:31, 2447.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 91171/608042 [00:36<03:14, 2662.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 91440/608042 [00:37<03:37, 2377.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 91696/608042 [00:37<03:45, 2289.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92010/608042 [00:37<03:27, 2488.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92280/608042 [00:37<03:26, 2493.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92557/608042 [00:37<03:27, 2488.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92867/608042 [00:37<03:19, 2577.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93131/608042 [00:37<03:34, 2398.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93394/608042 [00:37<03:31, 2428.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93646/608042 [00:37<03:38, 2354.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93895/608042 [00:38<03:37, 2365.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 94136/608042 [00:38<03:37, 2364.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94389/608042 [00:38<03:39, 2342.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94689/608042 [00:38<03:23, 2523.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94968/608042 [00:38<03:28, 2459.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95220/608042 [00:38<03:34, 2395.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95465/608042 [00:38<03:47, 2256.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95703/608042 [00:38<03:44, 2284.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95948/608042 [00:38<03:55, 2173.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96171/608042 [00:39<03:57, 2151.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96438/608042 [00:39<03:43, 2290.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96681/608042 [00:39<03:48, 2234.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96919/608042 [00:39<03:51, 2208.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97230/608042 [00:39<03:35, 2373.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97513/608042 [00:39<03:25, 2484.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97768/608042 [00:39<03:38, 2338.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98014/608042 [00:39<04:01, 2115.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98290/608042 [00:39<03:46, 2251.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98534/608042 [00:40<03:42, 2291.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 98888/608042 [00:40<03:15, 2609.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99160/608042 [00:40<03:13, 2623.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99438/608042 [00:40<03:16, 2584.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99704/608042 [00:40<03:19, 2546.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 100004/608042 [00:40<03:15, 2598.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100366/608042 [00:40<02:56, 2874.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100665/608042 [00:40<03:07, 2711.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100941/608042 [00:40<03:17, 2573.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101248/608042 [00:41<03:08, 2691.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101636/608042 [00:41<02:50, 2971.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101943/608042 [00:41<02:53, 2922.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102247/608042 [00:41<03:09, 2672.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102545/608042 [00:41<03:03, 2752.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102865/608042 [00:41<02:57, 2846.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103163/608042 [00:41<03:06, 2701.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103439/608042 [00:41<03:10, 2643.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103708/608042 [00:41<03:26, 2440.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103972/608042 [00:42<03:33, 2363.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104230/608042 [00:42<03:30, 2393.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104541/608042 [00:42<03:15, 2573.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104837/608042 [00:42<03:12, 2620.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105113/608042 [00:42<03:24, 2457.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105367/608042 [00:42<03:28, 2413.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105611/608042 [00:42<03:35, 2327.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105873/608042 [00:42<03:33, 2348.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 106120/608042 [00:43<03:34, 2335.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 106374/608042 [00:43<03:46, 2215.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 102024-08-03T04:45:03.832268751Z 
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 0/608042 [00:00<?, ? examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 6/608042 [00:00<4:50:40, 34.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 45/608042 [00:00<53:35, 189.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 211/608042 [00:00<13:27, 752.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 398/608042 [00:00<08:51, 1143.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 693/608042 [00:00<05:56, 1703.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 909/608042 [00:00<05:32, 1824.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1238/608042 [00:00<04:47, 2112.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1585/608042 [00:00<04:04, 2481.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1875/608042 [00:01<03:55, 2576.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2204/608042 [00:01<03:38, 2778.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2497/608042 [00:01<03:52, 2599.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2766/608042 [00:01<04:21, 2311.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 3022/608042 [00:01<04:16, 2355.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3287/608042 [00:01<04:08, 2432.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3545/608042 [00:01<04:06, 2454.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3823/608042 [00:01<03:57, 2541.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4089/608042 [00:01<04:06, 2453.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4355/608042 [00:02<04:07, 2441.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4745/608042 [00:02<03:32, 2844.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5043/608042 [00:02<03:35, 2793.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5331/608042 [00:02<03:47, 2651.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5602/608042 [00:02<03:51, 2607.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5872/608042 [00:02<03:53, 2583.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6133/608042 [00:02<04:04, 2465.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6390/608042 [00:02<04:05, 2447.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6642/608042 [00:02<04:10, 2397.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6883/608042 [00:03<04:25, 2267.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7169/608042 [00:03<04:09, 2410.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7423/608042 [00:03<04:13, 2372.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7687/608042 [00:03<04:05, 2446.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7939/608042 [00:03<04:07, 2423.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8234/608042 [00:03<03:53, 2572.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8526/608042 [00:03<03:45, 2658.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8795/608042 [00:03<04:17, 2324.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 9047/608042 [00:03<04:18, 2317.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9373/608042 [00:04<03:53, 2568.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9644/608042 [00:04<03:55, 2536.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9919/608042 [00:04<03:55, 2537.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10176/608042 [00:04<03:56, 2528.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10436/608042 [00:04<04:02, 2463.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10691/608042 [00:04<04:26, 2239.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10944/608042 [00:04<04:21, 2285.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11269/608042 [00:04<03:58, 2502.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11528/608042 [00:04<04:06, 2417.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11790/608042 [00:05<04:05, 2431.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12036/608042 [00:05<04:15, 2329.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12327/608042 [00:05<04:00, 2481.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12580/608042 [00:05<04:13, 2349.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12846/608042 [00:05<04:07, 2404.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13100/608042 [00:05<04:04, 2436.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13378/608042 [00:05<03:55, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13636/608042 [00:05<04:06, 2412.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13884/608042 [00:05<04:05, 2418.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14163/608042 [00:06<03:56, 2506.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14425/608042 [00:06<04:06, 2412.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14673/608042 [00:06<04:11, 2359.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14912/608042 [00:06<04:17, 2305.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 15168/608042 [00:06<04:10, 2370.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15423/608042 [00:06<04:08, 2382.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15666/608042 [00:06<04:14, 2331.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15916/608042 [00:06<04:38, 2122.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16159/608042 [00:06<04:34, 2159.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16391/608042 [00:07<04:29, 2192.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16658/608042 [00:07<04:16, 2302.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16932/608042 [00:07<04:07, 2392.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17215/608042 [00:07<03:56, 2502.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17548/608042 [00:07<03:36, 2726.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17833/608042 [00:07<03:54, 2519.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18095/608042 [00:07<04:00, 2457.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18363/608042 [00:07<03:57, 2487.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18648/608042 [00:07<03:51, 2549.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18963/608042 [00:07<03:37, 2705.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19240/608042 [00:08<03:48, 2571.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19503/608042 [00:08<04:02, 2424.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19750/608042 [00:08<04:18, 2272.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19990/608042 [00:08<04:22, 2238.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20300/608042 [00:08<03:59, 2454.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20567/608042 [00:08<04:11, 2331.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20825/608042 [00:08<04:06, 2378.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 21154/608042 [00:08<03:46, 2589.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21438/608042 [00:09<04:00, 2437.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21747/608042 [00:09<03:51, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22038/608042 [00:09<03:46, 2592.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22352/608042 [00:09<03:36, 2700.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22640/608042 [00:09<03:34, 2734.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 22919/608042 [00:09<03:38, 2678.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23190/608042 [00:09<03:49, 2544.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23507/608042 [00:09<03:41, 2643.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23856/608042 [00:09<03:29, 2786.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24212/608042 [00:10<03:15, 2982.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24517/608042 [00:10<03:22, 2876.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24812/608042 [00:10<03:26, 2827.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25100/608042 [00:10<03:56, 2469.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25377/608042 [00:10<03:54, 2488.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25681/608042 [00:10<03:43, 2608.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25989/608042 [00:10<03:35, 2705.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26272/608042 [00:10<03:57, 2444.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26538/608042 [00:10<03:53, 2493.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26801/608042 [00:11<04:07, 2348.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27057/608042 [00:11<04:02, 2392.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27343/608042 [00:11<03:50, 2518.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27645/608042 [00:11<03:40, 2636.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27959/608042 [00:11<03:29, 2767.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28247/608042 [00:11<03:33, 2719.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28526/608042 [00:11<03:43, 2597.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28804/608042 [00:11<03:47, 2542.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29139/608042 [00:11<03:31, 2734.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29453/608042 [00:12<03:23, 2848.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29755/608042 [00:12<03:36, 2676.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30032/608042 [00:12<03:34, 2696.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30311/608042 [00:12<03:35, 2678.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30584/608042 [00:12<03:46, 2544.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30845/608042 [00:12<03:46, 2548.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31102/608042 [00:12<03:51, 2490.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31362/608042 [00:12<03:59, 2403.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31641/608042 [00:12<03:55, 2442.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31902/608042 [00:13<04:03, 2364.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32181/608042 [00:13<03:59, 2404.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32433/608042 [00:13<03:59, 2400.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32695/608042 [00:13<04:00, 2395.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32956/608042 [00:13<03:57, 2425.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 33212/608042 [00:13<04:00, 2392.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33454/608042 [00:13<04:04, 2346.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33764/608042 [00:13<03:44, 2556.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34039/608042 [00:13<03:40, 2607.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34315/608042 [00:14<03:46, 2537.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34588/608042 [00:14<03:45, 2538.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34848/608042 [00:14<03:58, 2403.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35125/608042 [00:14<03:55, 2436.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35378/608042 [00:14<04:04, 2345.21 examples/s]
2024-08-03T04:45:03.832268751Z Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35627/608042 [00:14<04:17, 2223.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35903/608042 [00:14<04:04, 2335.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36162/608042 [00:14<04:00, 2381.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36427/608042 [00:14<03:53, 2451.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36674/608042 [00:15<03:55, 2429.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36919/608042 [00:15<03:56, 2417.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37228/608042 [00:15<03:39, 2606.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37499/608042 [00:15<03:39, 2602.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37814/608042 [00:15<03:26, 2757.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38104/608042 [00:15<03:33, 2670.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38380/608042 [00:15<03:37, 2613.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38671/608042 [00:15<03:31, 2691.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38943/608042 [00:15<03:42, 2554.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39209/608042 [00:15<03:48, 2491.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39468/608042 [00:16<03:51, 2457.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 39737/608042 [00:16<03:53, 2433.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40065/608042 [00:16<03:38, 2601.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40454/608042 [00:16<03:12, 2955.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40768/608042 [00:16<03:27, 2737.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41048/608042 [00:16<03:26, 2741.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41329/608042 [00:16<03:43, 2536.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41589/608042 [00:16<03:54, 2414.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41923/608042 [00:17<03:37, 2605.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42195/608042 [00:17<03:51, 2444.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42470/608042 [00:17<03:44, 2518.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42772/608042 [00:17<03:33, 2649.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43089/608042 [00:17<03:26, 2731.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43370/608042 [00:17<03:37, 2595.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43635/608042 [00:17<03:40, 2562.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43905/608042 [00:17<03:40, 2562.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44203/608042 [00:17<03:31, 2660.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44525/608042 [00:17<03:20, 2810.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44815/608042 [00:18<03:20, 2805.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45101/608042 [00:18<03:33, 2642.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45373/608042 [00:18<03:37, 2592.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45719/608042 [00:18<03:27, 2707.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45992/608042 [00:18<03:30, 2669.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46269/608042 [00:18<03:37, 2585.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46610/608042 [00:18<03:20, 2798.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46897/608042 [00:18<03:24, 2746.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47184/608042 [00:19<03:44, 2499.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47445/608042 [00:19<03:42, 2521.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47701/608042 [00:19<03:47, 2465.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47951/608042 [00:19<03:47, 2462.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48200/608042 [00:19<03:51, 2421.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48445/608042 [00:19<04:01, 2317.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48706/608042 [00:19<03:57, 2358.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49009/608042 [00:19<03:41, 2525.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49277/608042 [00:19<03:52, 2398.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49532/608042 [00:20<04:11, 2219.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49791/608042 [00:20<04:01, 2307.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50071/608042 [00:20<03:52, 2403.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50347/608042 [00:20<03:43, 2496.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50610/608042 [00:20<03:50, 2418.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50866/608042 [00:20<03:56, 2351.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51114/608042 [00:20<03:55, 2363.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51368/608042 [00:20<03:57, 2343.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51683/608042 [00:20<03:37, 2561.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 51958/608042 [00:21<03:49, 2422.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52218/608042 [00:21<03:52, 2387.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52481/608042 [00:21<03:53, 2378.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52827/608042 [00:21<03:30, 2636.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 53129/608042 [00:21<03:24, 2716.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53439/608042 [00:21<03:17, 2814.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53731/608042 [00:21<03:52, 2384.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53986/608042 [00:21<03:50, 2401.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54246/608042 [00:21<03:55, 2352.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54498/608042 [00:22<04:05, 2253.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54785/608042 [00:22<03:49, 2406.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55039/608042 [00:22<03:51, 2390.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55285/608042 [00:22<03:58, 2322.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55521/608042 [00:22<04:06, 2238.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55830/608042 [00:22<03:45, 2447.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56097/608042 [00:22<03:43, 2473.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56412/608042 [00:22<03:28, 2642.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56682/608042 [00:22<03:34, 2575.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56965/608042 [00:23<03:33, 2580.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57239/608042 [00:23<03:38, 2523.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57518/608042 [00:23<03:34, 2563.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 57782/608042 [00:23<03:45, 2436.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58143/608042 [00:23<03:20, 2745.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58427/608042 [00:23<03:30, 2616.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58702/608042 [00:23<03:29, 2620.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58994/608042 [00:23<03:23, 2702.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59311/608042 [00:23<03:15, 2804.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59610/608042 [00:24<03:14, 2824.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59910/608042 [00:24<03:28, 2630.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60186/608042 [00:24<03:36, 2532.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60448/608042 [00:24<03:34, 2555.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60755/608042 [00:24<03:24, 2671.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61030/608042 [00:24<03:26, 2648.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61359/608042 [00:24<03:17, 2771.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61658/608042 [00:24<03:15, 2795.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61941/608042 [00:24<03:19, 2735.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62216/608042 [00:25<03:37, 2512.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62516/608042 [00:25<03:30, 2586.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62779/608042 [00:25<03:46, 2402.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63044/608042 [00:25<03:41, 2465.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63306/608042 [00:25<03:51, 2350.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63554/608042 [00:25<03:51, 2350.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63826/608042 [00:25<03:44, 2428.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64079/608042 [00:25<03:52, 2334.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64333/608042 [00:25<03:48, 2383.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64602/608042 [00:26<03:41, 2457.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64882/608042 [00:26<03:33, 2538.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65161/608042 [00:26<03:28, 2609.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65428/608042 [00:26<03:33, 2545.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65687/608042 [00:26<03:41, 2451.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66008/608042 [00:26<03:24, 2656.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66293/608042 [00:26<03:23, 2667.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66562/608042 [00:26<03:35, 2508.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66832/608042 [00:26<03:41, 2446.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67131/608042 [00:27<03:37, 2492.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67393/608042 [00:27<03:36, 2491.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67672/608042 [00:27<03:34, 2514.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67927/608042 [00:27<03:40, 2450.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 68218/608042 [00:27<03:31, 2551.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68539/608042 [00:27<03:18, 2720.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68837/608042 [00:27<03:12, 2794.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69140/608042 [00:27<03:22, 2666.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69410/608042 [00:27<03:28, 2583.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69676/608042 [00:27<03:37, 2475.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 69939/608042 [00:28<03:39, 2446.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70187/608042 [00:28<03:44, 2399.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70435/608042 [00:28<03:43, 2407.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70686/608042 [00:28<04:01, 2225.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70953/608042 [00:28<03:51, 2315.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71242/608042 [00:28<03:37, 2465.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71496/608042 [00:28<03:38, 2450.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71763/608042 [00:28<03:362024-08-03T04:45:03.832268751Z , 2479.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72013/608042 [00:28<03:40, 2429.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72262/608042 [00:29<03:44, 2389.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72521/608042 [00:29<03:41, 2422.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72807/608042 [00:29<03:30, 2544.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73074/608042 [00:29<03:33, 2504.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73344/608042 [00:29<03:30, 2538.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73608/608042 [00:29<03:40, 2418.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73920/608042 [00:29<03:26, 2580.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74198/608042 [00:29<03:22, 2633.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74483/608042 [00:29<03:18, 2694.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74769/608042 [00:30<03:17, 2704.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75044/608042 [00:30<03:28, 2558.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75311/608042 [00:30<03:42, 2389.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75555/608042 [00:30<03:52, 2293.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75808/608042 [00:30<04:04, 2176.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76096/608042 [00:30<03:45, 2360.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76377/608042 [00:30<03:36, 2453.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76662/608042 [00:30<03:28, 2547.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76927/608042 [00:30<03:53, 2276.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77176/608042 [00:31<04:03, 2183.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77521/608042 [00:31<03:33, 2490.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77778/608042 [00:31<03:54, 2260.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78013/608042 [00:31<03:59, 2215.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78378/608042 [00:31<03:32, 2489.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78637/608042 [00:31<03:45, 2343.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78889/608042 [00:31<03:46, 2337.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79134/608042 [00:31<03:54, 2252.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79373/608042 [00:32<03:56, 2237.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79623/608042 [00:32<03:51, 2283.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79860/608042 [00:32<03:56, 2229.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80114/608042 [00:32<03:55, 2241.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80359/608042 [00:32<03:52, 2266.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80597/608042 [00:32<03:52, 2265.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80826/608042 [00:32<04:00, 2188.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81084/608042 [00:32<04:02, 2175.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81366/608042 [00:32<03:47, 2315.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81657/608042 [00:33<03:33, 2462.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81913/608042 [00:33<03:39, 2398.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82163/608042 [00:33<03:37, 2416.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82446/608042 [00:33<03:27, 2532.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82719/608042 [00:33<03:24, 2568.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82978/608042 [00:33<03:35, 2433.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 83233/608042 [00:33<03:54, 2236.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 83511/608042 [00:33<03:40, 2374.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 83771/608042 [00:33<03:35, 2436.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84019/608042 [00:34<03:57, 2209.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84259/608042 [00:34<03:54, 2232.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84594/608042 [00:34<03:26, 2528.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84857/608042 [00:34<03:29, 2500.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85119/608042 [00:34<03:41, 2356.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85428/608042 [00:34<03:25, 2537.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85696/608042 [00:34<03:32, 2459.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85952/608042 [00:34<03:31, 2473.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86230/608042 [00:34<03:28, 2508.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86503/608042 [00:35<03:27, 2508.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86829/608042 [00:35<03:11, 2719.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87133/608042 [00:35<03:25, 2536.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87405/608042 [00:35<03:28, 2501.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87662/608042 [00:35<03:35, 2416.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87914/608042 [00:35<03:32, 2444.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 88163/608042 [00:35<03:38, 2383.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88409/608042 [00:35<03:39, 2371.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88651/608042 [00:35<04:02, 2141.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88882/608042 [00:36<04:01, 2146.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89162/608042 [00:36<03:49, 2264.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89468/608042 [00:36<03:31, 2452.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89733/608042 [00:36<03:29, 2475.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90047/608042 [00:36<03:17, 2622.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90323/608042 [00:36<03:18, 2610.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90587/608042 [00:36<03:22, 2550.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90845/608042 [00:36<03:31, 2447.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 91171/608042 [00:36<03:14, 2662.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 91440/608042 [00:37<03:37, 2377.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 91696/608042 [00:37<03:45, 2289.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92010/608042 [00:37<03:27, 2488.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92280/608042 [00:37<03:26, 2493.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92557/608042 [00:37<03:27, 2488.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92867/608042 [00:37<03:19, 2577.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93131/608042 [00:37<03:34, 2398.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93394/608042 [00:37<03:31, 2428.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93646/608042 [00:37<03:38, 2354.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93895/608042 [00:38<03:37, 2365.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 94136/608042 [00:38<03:37, 2364.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94389/608042 [00:38<03:39, 2342.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94689/608042 [00:38<03:23, 2523.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94968/608042 [00:38<03:28, 2459.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95220/608042 [00:38<03:34, 2395.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95465/608042 [00:38<03:47, 2256.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95703/608042 [00:38<03:44, 2284.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95948/608042 [00:38<03:55, 2173.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96171/608042 [00:39<03:57, 2151.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96438/608042 [00:39<03:43, 2290.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96681/608042 [00:39<03:48, 2234.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96919/608042 [00:39<03:51, 2208.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97230/608042 [00:39<03:35, 2373.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97513/608042 [00:39<03:25, 2484.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97768/608042 [00:39<03:38, 2338.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98014/608042 [00:39<04:01, 2115.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98290/608042 [00:39<03:46, 2251.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98534/608042 [00:40<03:42, 2291.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 98888/608042 [00:40<03:15, 2609.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99160/608042 [00:40<03:13, 2623.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99438/608042 [00:40<03:16, 2584.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99704/608042 [00:40<03:19, 2546.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 100004/608042 [00:40<03:15, 2598.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100366/608042 [00:40<02:56, 2874.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100665/608042 [00:40<03:07, 2711.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100941/608042 [00:40<03:17, 2573.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101248/608042 [00:41<03:08, 2691.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101636/608042 [00:41<02:50, 2971.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101943/608042 [00:41<02:53, 2922.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102247/608042 [00:41<03:09, 2672.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102545/608042 [00:41<03:03, 2752.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102865/608042 [00:41<02:57, 2846.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103163/608042 [00:41<03:06, 2701.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103439/608042 [00:41<03:10, 2643.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103708/608042 [00:41<03:26, 2440.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103972/608042 [00:42<03:33, 2363.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104230/608042 [00:42<03:30, 2393.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104541/608042 [00:42<03:15, 2573.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104837/608042 [00:42<03:12, 2620.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105113/608042 [00:42<03:24, 2457.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105367/608042 [00:42<03:28, 2413.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105611/608042 [00:42<03:35, 2327.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105873/608042 [00:42<03:33, 2348.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 106120/608042 [00:43<03:34, 2335.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 106374/608042 [00:43<03:46, 2215.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 102024-08-03T04:45:03.832268751Z 
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 0/608042 [00:00<?, ? examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 6/608042 [00:00<4:50:40, 34.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 45/608042 [00:00<53:35, 189.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 211/608042 [00:00<13:27, 752.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 398/608042 [00:00<08:51, 1143.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 693/608042 [00:00<05:56, 1703.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 909/608042 [00:00<05:32, 1824.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1238/608042 [00:00<04:47, 2112.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1585/608042 [00:00<04:04, 2481.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1875/608042 [00:01<03:55, 2576.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2204/608042 [00:01<03:38, 2778.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2497/608042 [00:01<03:52, 2599.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2766/608042 [00:01<04:21, 2311.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 3022/608042 [00:01<04:16, 2355.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3287/608042 [00:01<04:08, 2432.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3545/608042 [00:01<04:06, 2454.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3823/608042 [00:01<03:57, 2541.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4089/608042 [00:01<04:06, 2453.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4355/608042 [00:02<04:07, 2441.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4745/608042 [00:02<03:32, 2844.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5043/608042 [00:02<03:35, 2793.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5331/608042 [00:02<03:47, 2651.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5602/608042 [00:02<03:51, 2607.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5872/608042 [00:02<03:53, 2583.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6133/608042 [00:02<04:04, 2465.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6390/608042 [00:02<04:05, 2447.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6642/608042 [00:02<04:10, 2397.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6883/608042 [00:03<04:25, 2267.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7169/608042 [00:03<04:09, 2410.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7423/608042 [00:03<04:13, 2372.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7687/608042 [00:03<04:05, 2446.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7939/608042 [00:03<04:07, 2423.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8234/608042 [00:03<03:53, 2572.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8526/608042 [00:03<03:45, 2658.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8795/608042 [00:03<04:17, 2324.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 9047/608042 [00:03<04:18, 2317.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9373/608042 [00:04<03:53, 2568.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9644/608042 [00:04<03:55, 2536.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9919/608042 [00:04<03:55, 2537.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10176/608042 [00:04<03:56, 2528.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10436/608042 [00:04<04:02, 2463.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10691/608042 [00:04<04:26, 2239.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10944/608042 [00:04<04:21, 2285.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11269/608042 [00:04<03:58, 2502.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11528/608042 [00:04<04:06, 2417.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11790/608042 [00:05<04:05, 2431.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12036/608042 [00:05<04:15, 2329.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12327/608042 [00:05<04:00, 2481.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12580/608042 [00:05<04:13, 2349.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12846/608042 [00:05<04:07, 2404.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13100/608042 [00:05<04:04, 2436.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13378/608042 [00:05<03:55, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13636/608042 [00:05<04:06, 2412.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13884/608042 [00:05<04:05, 2418.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14163/608042 [00:06<03:56, 2506.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14425/608042 [00:06<04:06, 2412.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14673/608042 [00:06<04:11, 2359.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14912/608042 [00:06<04:17, 2305.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 15168/608042 [00:06<04:10, 2370.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15423/608042 [00:06<04:08, 2382.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15666/608042 [00:06<04:14, 2331.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15916/608042 [00:06<04:38, 2122.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16159/608042 [00:06<04:34, 2159.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16391/608042 [00:07<04:29, 2192.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16658/608042 [00:07<04:16, 2302.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16932/608042 [00:07<04:07, 2392.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17215/608042 [00:07<03:56, 2502.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17548/608042 [00:07<03:36, 2726.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17833/608042 [00:07<03:54, 2519.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18095/608042 [00:07<04:00, 2457.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18363/608042 [00:07<03:57, 2487.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18648/608042 [00:07<03:51, 2549.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18963/608042 [00:07<03:37, 2705.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19240/608042 [00:08<03:48, 2571.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19503/608042 [00:08<04:02, 2424.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19750/608042 [00:08<04:18, 2272.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19990/608042 [00:08<04:22, 2238.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20300/608042 [00:08<03:59, 2454.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20567/608042 [00:08<04:11, 2331.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20825/608042 [00:08<04:06, 2378.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 21154/608042 [00:08<03:46, 2589.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21438/608042 [00:09<04:00, 2437.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21747/608042 [00:09<03:51, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22038/608042 [00:09<03:46, 2592.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22352/608042 [00:09<03:36, 2700.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22640/608042 [00:09<03:34, 2734.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 22919/608042 [00:09<03:38, 2678.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23190/608042 [00:09<03:49, 2544.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23507/608042 [00:09<03:41, 2643.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23856/608042 [00:09<03:29, 2786.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24212/608042 [00:10<03:15, 2982.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24517/608042 [00:10<03:22, 2876.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24812/608042 [00:10<03:26, 2827.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25100/608042 [00:10<03:56, 2469.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25377/608042 [00:10<03:54, 2488.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25681/608042 [00:10<03:43, 2608.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25989/608042 [00:10<03:35, 2705.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26272/608042 [00:10<03:57, 2444.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26538/608042 [00:10<03:53, 2493.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26801/608042 [00:11<04:07, 2348.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27057/608042 [00:11<04:02, 2392.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27343/608042 [00:11<03:50, 2518.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27645/608042 [00:11<03:40, 2636.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27959/608042 [00:11<03:29, 2767.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28247/608042 [00:11<03:33, 2719.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28526/608042 [00:11<03:43, 2597.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28804/608042 [00:11<03:47, 2542.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29139/608042 [00:11<03:31, 2734.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29453/608042 [00:12<03:23, 2848.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29755/608042 [00:12<03:36, 2676.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30032/608042 [00:12<03:34, 2696.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30311/608042 [00:12<03:35, 2678.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30584/608042 [00:12<03:46, 2544.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30845/608042 [00:12<03:46, 2548.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31102/608042 [00:12<03:51, 2490.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31362/608042 [00:12<03:59, 2403.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31641/608042 [00:12<03:55, 2442.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31902/608042 [00:13<04:03, 2364.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32181/608042 [00:13<03:59, 2404.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32433/608042 [00:13<03:59, 2400.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32695/608042 [00:13<04:00, 2395.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32956/608042 [00:13<03:57, 2425.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 33212/608042 [00:13<04:00, 2392.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33454/608042 [00:13<04:04, 2346.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33764/608042 [00:13<03:44, 2556.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34039/608042 [00:13<03:40, 2607.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34315/608042 [00:14<03:46, 2537.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34588/608042 [00:14<03:45, 2538.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34848/608042 [00:14<03:58, 2403.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35125/608042 [00:14<03:55, 2436.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35378/608042 [00:14<04:04, 2345.21 examples/s]
2024-08-03T04:45:03.832268751Z Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35627/608042 [00:14<04:17, 2223.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35903/608042 [00:14<04:04, 2335.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36162/608042 [00:14<04:00, 2381.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36427/608042 [00:14<03:53, 2451.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36674/608042 [00:15<03:55, 2429.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36919/608042 [00:15<03:56, 2417.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37228/608042 [00:15<03:39, 2606.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37499/608042 [00:15<03:39, 2602.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37814/608042 [00:15<03:26, 2757.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38104/608042 [00:15<03:33, 2670.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38380/608042 [00:15<03:37, 2613.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38671/608042 [00:15<03:31, 2691.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38943/608042 [00:15<03:42, 2554.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39209/608042 [00:15<03:48, 2491.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39468/608042 [00:16<03:51, 2457.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 39737/608042 [00:16<03:53, 2433.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40065/608042 [00:16<03:38, 2601.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40454/608042 [00:16<03:12, 2955.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40768/608042 [00:16<03:27, 2737.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41048/608042 [00:16<03:26, 2741.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41329/608042 [00:16<03:43, 2536.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41589/608042 [00:16<03:54, 2414.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41923/608042 [00:17<03:37, 2605.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42195/608042 [00:17<03:51, 2444.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42470/608042 [00:17<03:44, 2518.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42772/608042 [00:17<03:33, 2649.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43089/608042 [00:17<03:26, 2731.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43370/608042 [00:17<03:37, 2595.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43635/608042 [00:17<03:40, 2562.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43905/608042 [00:17<03:40, 2562.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44203/608042 [00:17<03:31, 2660.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44525/608042 [00:17<03:20, 2810.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44815/608042 [00:18<03:20, 2805.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45101/608042 [00:18<03:33, 2642.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45373/608042 [00:18<03:37, 2592.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45719/608042 [00:18<03:27, 2707.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45992/608042 [00:18<03:30, 2669.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46269/608042 [00:18<03:37, 2585.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46610/608042 [00:18<03:20, 2798.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46897/608042 [00:18<03:24, 2746.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47184/608042 [00:19<03:44, 2499.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47445/608042 [00:19<03:42, 2521.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47701/608042 [00:19<03:47, 2465.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47951/608042 [00:19<03:47, 2462.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48200/608042 [00:19<03:51, 2421.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48445/608042 [00:19<04:01, 2317.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48706/608042 [00:19<03:57, 2358.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49009/608042 [00:19<03:41, 2525.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49277/608042 [00:19<03:52, 2398.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49532/608042 [00:20<04:11, 2219.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49791/608042 [00:20<04:01, 2307.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50071/608042 [00:20<03:52, 2403.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50347/608042 [00:20<03:43, 2496.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50610/608042 [00:20<03:50, 2418.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50866/608042 [00:20<03:56, 2351.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51114/608042 [00:20<03:55, 2363.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51368/608042 [00:20<03:57, 2343.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51683/608042 [00:20<03:37, 2561.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 51958/608042 [00:21<03:49, 2422.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52218/608042 [00:21<03:52, 2387.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52481/608042 [00:21<03:53, 2378.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52827/608042 [00:21<03:30, 2636.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 53129/608042 [00:21<03:24, 2716.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53439/608042 [00:21<03:17, 2814.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53731/608042 [00:21<03:52, 2384.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53986/608042 [00:21<03:50, 2401.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54246/608042 [00:21<03:55, 2352.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54498/608042 [00:22<04:05, 2253.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54785/608042 [00:22<03:49, 2406.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55039/608042 [00:22<03:51, 2390.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55285/608042 [00:22<03:58, 2322.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55521/608042 [00:22<04:06, 2238.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55830/608042 [00:22<03:45, 2447.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56097/608042 [00:22<03:43, 2473.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56412/608042 [00:22<03:28, 2642.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56682/608042 [00:22<03:34, 2575.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56965/608042 [00:23<03:33, 2580.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57239/608042 [00:23<03:38, 2523.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57518/608042 [00:23<03:34, 2563.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 57782/608042 [00:23<03:45, 2436.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58143/608042 [00:23<03:20, 2745.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58427/608042 [00:23<03:30, 2616.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58702/608042 [00:23<03:29, 2620.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58994/608042 [00:23<03:23, 2702.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59311/608042 [00:23<03:15, 2804.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59610/608042 [00:24<03:14, 2824.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59910/608042 [00:24<03:28, 2630.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60186/608042 [00:24<03:36, 2532.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60448/608042 [00:24<03:34, 2555.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60755/608042 [00:24<03:24, 2671.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61030/608042 [00:24<03:26, 2648.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61359/608042 [00:24<03:17, 2771.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61658/608042 [00:24<03:15, 2795.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61941/608042 [00:24<03:19, 2735.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62216/608042 [00:25<03:37, 2512.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62516/608042 [00:25<03:30, 2586.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62779/608042 [00:25<03:46, 2402.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63044/608042 [00:25<03:41, 2465.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63306/608042 [00:25<03:51, 2350.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63554/608042 [00:25<03:51, 2350.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63826/608042 [00:25<03:44, 2428.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64079/608042 [00:25<03:52, 2334.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64333/608042 [00:25<03:48, 2383.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64602/608042 [00:26<03:41, 2457.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64882/608042 [00:26<03:33, 2538.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65161/608042 [00:26<03:28, 2609.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65428/608042 [00:26<03:33, 2545.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65687/608042 [00:26<03:41, 2451.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66008/608042 [00:26<03:24, 2656.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66293/608042 [00:26<03:23, 2667.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66562/608042 [00:26<03:35, 2508.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66832/608042 [00:26<03:41, 2446.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67131/608042 [00:27<03:37, 2492.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67393/608042 [00:27<03:36, 2491.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67672/608042 [00:27<03:34, 2514.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67927/608042 [00:27<03:40, 2450.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 68218/608042 [00:27<03:31, 2551.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68539/608042 [00:27<03:18, 2720.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68837/608042 [00:27<03:12, 2794.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69140/608042 [00:27<03:22, 2666.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69410/608042 [00:27<03:28, 2583.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69676/608042 [00:27<03:37, 2475.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 69939/608042 [00:28<03:39, 2446.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70187/608042 [00:28<03:44, 2399.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70435/608042 [00:28<03:43, 2407.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70686/608042 [00:28<04:01, 2225.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70953/608042 [00:28<03:51, 2315.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71242/608042 [00:28<03:37, 2465.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71496/608042 [00:28<03:38, 2450.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71763/608042 [00:28<03:362024-08-03T04:45:03.832268751Z , 2479.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72013/608042 [00:28<03:40, 2429.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72262/608042 [00:29<03:44, 2389.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72521/608042 [00:29<03:41, 2422.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72807/608042 [00:29<03:30, 2544.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73074/608042 [00:29<03:33, 2504.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73344/608042 [00:29<03:30, 2538.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73608/608042 [00:29<03:40, 2418.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73920/608042 [00:29<03:26, 2580.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74198/608042 [00:29<03:22, 2633.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74483/608042 [00:29<03:18, 2694.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74769/608042 [00:30<03:17, 2704.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75044/608042 [00:30<03:28, 2558.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75311/608042 [00:30<03:42, 2389.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75555/608042 [00:30<03:52, 2293.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75808/608042 [00:30<04:04, 2176.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76096/608042 [00:30<03:45, 2360.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76377/608042 [00:30<03:36, 2453.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76662/608042 [00:30<03:28, 2547.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76927/608042 [00:30<03:53, 2276.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77176/608042 [00:31<04:03, 2183.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77521/608042 [00:31<03:33, 2490.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77778/608042 [00:31<03:54, 2260.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78013/608042 [00:31<03:59, 2215.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78378/608042 [00:31<03:32, 2489.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78637/608042 [00:31<03:45, 2343.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78889/608042 [00:31<03:46, 2337.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79134/608042 [00:31<03:54, 2252.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79373/608042 [00:32<03:56, 2237.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79623/608042 [00:32<03:51, 2283.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79860/608042 [00:32<03:56, 2229.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80114/608042 [00:32<03:55, 2241.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80359/608042 [00:32<03:52, 2266.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80597/608042 [00:32<03:52, 2265.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80826/608042 [00:32<04:00, 2188.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81084/608042 [00:32<04:02, 2175.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81366/608042 [00:32<03:47, 2315.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81657/608042 [00:33<03:33, 2462.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81913/608042 [00:33<03:39, 2398.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82163/608042 [00:33<03:37, 2416.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82446/608042 [00:33<03:27, 2532.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82719/608042 [00:33<03:24, 2568.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82978/608042 [00:33<03:35, 2433.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 83233/608042 [00:33<03:54, 2236.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 83511/608042 [00:33<03:40, 2374.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 83771/608042 [00:33<03:35, 2436.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84019/608042 [00:34<03:57, 2209.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84259/608042 [00:34<03:54, 2232.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84594/608042 [00:34<03:26, 2528.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84857/608042 [00:34<03:29, 2500.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85119/608042 [00:34<03:41, 2356.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85428/608042 [00:34<03:25, 2537.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85696/608042 [00:34<03:32, 2459.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85952/608042 [00:34<03:31, 2473.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86230/608042 [00:34<03:28, 2508.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86503/608042 [00:35<03:27, 2508.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86829/608042 [00:35<03:11, 2719.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87133/608042 [00:35<03:25, 2536.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87405/608042 [00:35<03:28, 2501.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87662/608042 [00:35<03:35, 2416.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87914/608042 [00:35<03:32, 2444.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 88163/608042 [00:35<03:38, 2383.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88409/608042 [00:35<03:39, 2371.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88651/608042 [00:35<04:02, 2141.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88882/608042 [00:36<04:01, 2146.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89162/608042 [00:36<03:49, 2264.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89468/608042 [00:36<03:31, 2452.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89733/608042 [00:36<03:29, 2475.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90047/608042 [00:36<03:17, 2622.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90323/608042 [00:36<03:18, 2610.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90587/608042 [00:36<03:22, 2550.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90845/608042 [00:36<03:31, 2447.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 91171/608042 [00:36<03:14, 2662.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 91440/608042 [00:37<03:37, 2377.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 91696/608042 [00:37<03:45, 2289.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92010/608042 [00:37<03:27, 2488.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92280/608042 [00:37<03:26, 2493.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92557/608042 [00:37<03:27, 2488.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92867/608042 [00:37<03:19, 2577.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93131/608042 [00:37<03:34, 2398.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93394/608042 [00:37<03:31, 2428.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93646/608042 [00:37<03:38, 2354.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93895/608042 [00:38<03:37, 2365.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 94136/608042 [00:38<03:37, 2364.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94389/608042 [00:38<03:39, 2342.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94689/608042 [00:38<03:23, 2523.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94968/608042 [00:38<03:28, 2459.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95220/608042 [00:38<03:34, 2395.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95465/608042 [00:38<03:47, 2256.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95703/608042 [00:38<03:44, 2284.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95948/608042 [00:38<03:55, 2173.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96171/608042 [00:39<03:57, 2151.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96438/608042 [00:39<03:43, 2290.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96681/608042 [00:39<03:48, 2234.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96919/608042 [00:39<03:51, 2208.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97230/608042 [00:39<03:35, 2373.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97513/608042 [00:39<03:25, 2484.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97768/608042 [00:39<03:38, 2338.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98014/608042 [00:39<04:01, 2115.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98290/608042 [00:39<03:46, 2251.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98534/608042 [00:40<03:42, 2291.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 98888/608042 [00:40<03:15, 2609.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99160/608042 [00:40<03:13, 2623.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99438/608042 [00:40<03:16, 2584.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99704/608042 [00:40<03:19, 2546.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 100004/608042 [00:40<03:15, 2598.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100366/608042 [00:40<02:56, 2874.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100665/608042 [00:40<03:07, 2711.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100941/608042 [00:40<03:17, 2573.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101248/608042 [00:41<03:08, 2691.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101636/608042 [00:41<02:50, 2971.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101943/608042 [00:41<02:53, 2922.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102247/608042 [00:41<03:09, 2672.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102545/608042 [00:41<03:03, 2752.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102865/608042 [00:41<02:57, 2846.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103163/608042 [00:41<03:06, 2701.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103439/608042 [00:41<03:10, 2643.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103708/608042 [00:41<03:26, 2440.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103972/608042 [00:42<03:33, 2363.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104230/608042 [00:42<03:30, 2393.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104541/608042 [00:42<03:15, 2573.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104837/608042 [00:42<03:12, 2620.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105113/608042 [00:42<03:24, 2457.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105367/608042 [00:42<03:28, 2413.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105611/608042 [00:42<03:35, 2327.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105873/608042 [00:42<03:33, 2348.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 106120/608042 [00:43<03:34, 2335.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 106374/608042 [00:43<03:46, 2215.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 102024-08-03T04:45:03.832268751Z 
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 0/608042 [00:00<?, ? examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 6/608042 [00:00<4:50:40, 34.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 45/608042 [00:00<53:35, 189.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 211/608042 [00:00<13:27, 752.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 398/608042 [00:00<08:51, 1143.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 693/608042 [00:00<05:56, 1703.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 909/608042 [00:00<05:32, 1824.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1238/608042 [00:00<04:47, 2112.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1585/608042 [00:00<04:04, 2481.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1875/608042 [00:01<03:55, 2576.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2204/608042 [00:01<03:38, 2778.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2497/608042 [00:01<03:52, 2599.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2766/608042 [00:01<04:21, 2311.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 3022/608042 [00:01<04:16, 2355.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3287/608042 [00:01<04:08, 2432.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3545/608042 [00:01<04:06, 2454.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3823/608042 [00:01<03:57, 2541.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4089/608042 [00:01<04:06, 2453.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4355/608042 [00:02<04:07, 2441.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4745/608042 [00:02<03:32, 2844.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5043/608042 [00:02<03:35, 2793.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5331/608042 [00:02<03:47, 2651.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5602/608042 [00:02<03:51, 2607.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5872/608042 [00:02<03:53, 2583.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6133/608042 [00:02<04:04, 2465.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6390/608042 [00:02<04:05, 2447.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6642/608042 [00:02<04:10, 2397.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6883/608042 [00:03<04:25, 2267.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7169/608042 [00:03<04:09, 2410.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7423/608042 [00:03<04:13, 2372.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7687/608042 [00:03<04:05, 2446.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7939/608042 [00:03<04:07, 2423.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8234/608042 [00:03<03:53, 2572.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8526/608042 [00:03<03:45, 2658.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8795/608042 [00:03<04:17, 2324.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 9047/608042 [00:03<04:18, 2317.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9373/608042 [00:04<03:53, 2568.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9644/608042 [00:04<03:55, 2536.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9919/608042 [00:04<03:55, 2537.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10176/608042 [00:04<03:56, 2528.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10436/608042 [00:04<04:02, 2463.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10691/608042 [00:04<04:26, 2239.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10944/608042 [00:04<04:21, 2285.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11269/608042 [00:04<03:58, 2502.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11528/608042 [00:04<04:06, 2417.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11790/608042 [00:05<04:05, 2431.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12036/608042 [00:05<04:15, 2329.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12327/608042 [00:05<04:00, 2481.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12580/608042 [00:05<04:13, 2349.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12846/608042 [00:05<04:07, 2404.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13100/608042 [00:05<04:04, 2436.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13378/608042 [00:05<03:55, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13636/608042 [00:05<04:06, 2412.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13884/608042 [00:05<04:05, 2418.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14163/608042 [00:06<03:56, 2506.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14425/608042 [00:06<04:06, 2412.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14673/608042 [00:06<04:11, 2359.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14912/608042 [00:06<04:17, 2305.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 15168/608042 [00:06<04:10, 2370.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15423/608042 [00:06<04:08, 2382.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15666/608042 [00:06<04:14, 2331.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15916/608042 [00:06<04:38, 2122.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16159/608042 [00:06<04:34, 2159.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16391/608042 [00:07<04:29, 2192.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16658/608042 [00:07<04:16, 2302.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16932/608042 [00:07<04:07, 2392.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17215/608042 [00:07<03:56, 2502.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17548/608042 [00:07<03:36, 2726.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17833/608042 [00:07<03:54, 2519.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18095/608042 [00:07<04:00, 2457.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18363/608042 [00:07<03:57, 2487.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18648/608042 [00:07<03:51, 2549.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18963/608042 [00:07<03:37, 2705.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19240/608042 [00:08<03:48, 2571.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19503/608042 [00:08<04:02, 2424.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19750/608042 [00:08<04:18, 2272.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19990/608042 [00:08<04:22, 2238.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20300/608042 [00:08<03:59, 2454.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20567/608042 [00:08<04:11, 2331.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20825/608042 [00:08<04:06, 2378.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 21154/608042 [00:08<03:46, 2589.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21438/608042 [00:09<04:00, 2437.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21747/608042 [00:09<03:51, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22038/608042 [00:09<03:46, 2592.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22352/608042 [00:09<03:36, 2700.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22640/608042 [00:09<03:34, 2734.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 22919/608042 [00:09<03:38, 2678.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23190/608042 [00:09<03:49, 2544.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23507/608042 [00:09<03:41, 2643.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23856/608042 [00:09<03:29, 2786.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24212/608042 [00:10<03:15, 2982.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24517/608042 [00:10<03:22, 2876.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24812/608042 [00:10<03:26, 2827.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25100/608042 [00:10<03:56, 2469.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25377/608042 [00:10<03:54, 2488.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25681/608042 [00:10<03:43, 2608.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25989/608042 [00:10<03:35, 2705.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26272/608042 [00:10<03:57, 2444.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26538/608042 [00:10<03:53, 2493.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26801/608042 [00:11<04:07, 2348.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27057/608042 [00:11<04:02, 2392.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27343/608042 [00:11<03:50, 2518.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27645/608042 [00:11<03:40, 2636.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27959/608042 [00:11<03:29, 2767.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28247/608042 [00:11<03:33, 2719.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28526/608042 [00:11<03:43, 2597.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28804/608042 [00:11<03:47, 2542.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29139/608042 [00:11<03:31, 2734.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29453/608042 [00:12<03:23, 2848.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29755/608042 [00:12<03:36, 2676.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30032/608042 [00:12<03:34, 2696.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30311/608042 [00:12<03:35, 2678.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30584/608042 [00:12<03:46, 2544.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30845/608042 [00:12<03:46, 2548.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31102/608042 [00:12<03:51, 2490.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31362/608042 [00:12<03:59, 2403.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31641/608042 [00:12<03:55, 2442.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31902/608042 [00:13<04:03, 2364.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32181/608042 [00:13<03:59, 2404.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32433/608042 [00:13<03:59, 2400.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32695/608042 [00:13<04:00, 2395.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32956/608042 [00:13<03:57, 2425.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 33212/608042 [00:13<04:00, 2392.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33454/608042 [00:13<04:04, 2346.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33764/608042 [00:13<03:44, 2556.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34039/608042 [00:13<03:40, 2607.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34315/608042 [00:14<03:46, 2537.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34588/608042 [00:14<03:45, 2538.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34848/608042 [00:14<03:58, 2403.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35125/608042 [00:14<03:55, 2436.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35378/608042 [00:14<04:04, 2345.21 examples/s]
2024-08-03T04:45:03.832268751Z Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35627/608042 [00:14<04:17, 2223.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35903/608042 [00:14<04:04, 2335.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36162/608042 [00:14<04:00, 2381.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36427/608042 [00:14<03:53, 2451.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36674/608042 [00:15<03:55, 2429.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36919/608042 [00:15<03:56, 2417.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37228/608042 [00:15<03:39, 2606.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37499/608042 [00:15<03:39, 2602.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37814/608042 [00:15<03:26, 2757.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38104/608042 [00:15<03:33, 2670.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38380/608042 [00:15<03:37, 2613.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38671/608042 [00:15<03:31, 2691.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38943/608042 [00:15<03:42, 2554.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39209/608042 [00:15<03:48, 2491.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39468/608042 [00:16<03:51, 2457.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 39737/608042 [00:16<03:53, 2433.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40065/608042 [00:16<03:38, 2601.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40454/608042 [00:16<03:12, 2955.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40768/608042 [00:16<03:27, 2737.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41048/608042 [00:16<03:26, 2741.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41329/608042 [00:16<03:43, 2536.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41589/608042 [00:16<03:54, 2414.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41923/608042 [00:17<03:37, 2605.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42195/608042 [00:17<03:51, 2444.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42470/608042 [00:17<03:44, 2518.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42772/608042 [00:17<03:33, 2649.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43089/608042 [00:17<03:26, 2731.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43370/608042 [00:17<03:37, 2595.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43635/608042 [00:17<03:40, 2562.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43905/608042 [00:17<03:40, 2562.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44203/608042 [00:17<03:31, 2660.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44525/608042 [00:17<03:20, 2810.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44815/608042 [00:18<03:20, 2805.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45101/608042 [00:18<03:33, 2642.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45373/608042 [00:18<03:37, 2592.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45719/608042 [00:18<03:27, 2707.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45992/608042 [00:18<03:30, 2669.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46269/608042 [00:18<03:37, 2585.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46610/608042 [00:18<03:20, 2798.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46897/608042 [00:18<03:24, 2746.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47184/608042 [00:19<03:44, 2499.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47445/608042 [00:19<03:42, 2521.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47701/608042 [00:19<03:47, 2465.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47951/608042 [00:19<03:47, 2462.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48200/608042 [00:19<03:51, 2421.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48445/608042 [00:19<04:01, 2317.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48706/608042 [00:19<03:57, 2358.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49009/608042 [00:19<03:41, 2525.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49277/608042 [00:19<03:52, 2398.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49532/608042 [00:20<04:11, 2219.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49791/608042 [00:20<04:01, 2307.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50071/608042 [00:20<03:52, 2403.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50347/608042 [00:20<03:43, 2496.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50610/608042 [00:20<03:50, 2418.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50866/608042 [00:20<03:56, 2351.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51114/608042 [00:20<03:55, 2363.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51368/608042 [00:20<03:57, 2343.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51683/608042 [00:20<03:37, 2561.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 51958/608042 [00:21<03:49, 2422.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52218/608042 [00:21<03:52, 2387.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52481/608042 [00:21<03:53, 2378.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52827/608042 [00:21<03:30, 2636.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 53129/608042 [00:21<03:24, 2716.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53439/608042 [00:21<03:17, 2814.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53731/608042 [00:21<03:52, 2384.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53986/608042 [00:21<03:50, 2401.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54246/608042 [00:21<03:55, 2352.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54498/608042 [00:22<04:05, 2253.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54785/608042 [00:22<03:49, 2406.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55039/608042 [00:22<03:51, 2390.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55285/608042 [00:22<03:58, 2322.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55521/608042 [00:22<04:06, 2238.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55830/608042 [00:22<03:45, 2447.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56097/608042 [00:22<03:43, 2473.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56412/608042 [00:22<03:28, 2642.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56682/608042 [00:22<03:34, 2575.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56965/608042 [00:23<03:33, 2580.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57239/608042 [00:23<03:38, 2523.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57518/608042 [00:23<03:34, 2563.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 57782/608042 [00:23<03:45, 2436.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58143/608042 [00:23<03:20, 2745.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58427/608042 [00:23<03:30, 2616.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58702/608042 [00:23<03:29, 2620.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58994/608042 [00:23<03:23, 2702.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59311/608042 [00:23<03:15, 2804.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59610/608042 [00:24<03:14, 2824.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59910/608042 [00:24<03:28, 2630.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60186/608042 [00:24<03:36, 2532.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60448/608042 [00:24<03:34, 2555.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60755/608042 [00:24<03:24, 2671.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61030/608042 [00:24<03:26, 2648.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61359/608042 [00:24<03:17, 2771.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61658/608042 [00:24<03:15, 2795.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61941/608042 [00:24<03:19, 2735.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62216/608042 [00:25<03:37, 2512.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62516/608042 [00:25<03:30, 2586.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62779/608042 [00:25<03:46, 2402.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63044/608042 [00:25<03:41, 2465.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63306/608042 [00:25<03:51, 2350.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63554/608042 [00:25<03:51, 2350.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63826/608042 [00:25<03:44, 2428.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64079/608042 [00:25<03:52, 2334.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64333/608042 [00:25<03:48, 2383.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64602/608042 [00:26<03:41, 2457.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64882/608042 [00:26<03:33, 2538.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65161/608042 [00:26<03:28, 2609.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65428/608042 [00:26<03:33, 2545.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65687/608042 [00:26<03:41, 2451.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66008/608042 [00:26<03:24, 2656.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66293/608042 [00:26<03:23, 2667.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66562/608042 [00:26<03:35, 2508.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66832/608042 [00:26<03:41, 2446.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67131/608042 [00:27<03:37, 2492.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67393/608042 [00:27<03:36, 2491.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67672/608042 [00:27<03:34, 2514.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67927/608042 [00:27<03:40, 2450.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 68218/608042 [00:27<03:31, 2551.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68539/608042 [00:27<03:18, 2720.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68837/608042 [00:27<03:12, 2794.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69140/608042 [00:27<03:22, 2666.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69410/608042 [00:27<03:28, 2583.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69676/608042 [00:27<03:37, 2475.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 69939/608042 [00:28<03:39, 2446.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70187/608042 [00:28<03:44, 2399.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70435/608042 [00:28<03:43, 2407.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70686/608042 [00:28<04:01, 2225.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70953/608042 [00:28<03:51, 2315.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71242/608042 [00:28<03:37, 2465.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71496/608042 [00:28<03:38, 2450.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71763/608042 [00:28<03:362024-08-03T04:45:03.832268751Z , 2479.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72013/608042 [00:28<03:40, 2429.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72262/608042 [00:29<03:44, 2389.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72521/608042 [00:29<03:41, 2422.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72807/608042 [00:29<03:30, 2544.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73074/608042 [00:29<03:33, 2504.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73344/608042 [00:29<03:30, 2538.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73608/608042 [00:29<03:40, 2418.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73920/608042 [00:29<03:26, 2580.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74198/608042 [00:29<03:22, 2633.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74483/608042 [00:29<03:18, 2694.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74769/608042 [00:30<03:17, 2704.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75044/608042 [00:30<03:28, 2558.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75311/608042 [00:30<03:42, 2389.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75555/608042 [00:30<03:52, 2293.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75808/608042 [00:30<04:04, 2176.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76096/608042 [00:30<03:45, 2360.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76377/608042 [00:30<03:36, 2453.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76662/608042 [00:30<03:28, 2547.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76927/608042 [00:30<03:53, 2276.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77176/608042 [00:31<04:03, 2183.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77521/608042 [00:31<03:33, 2490.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77778/608042 [00:31<03:54, 2260.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78013/608042 [00:31<03:59, 2215.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78378/608042 [00:31<03:32, 2489.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78637/608042 [00:31<03:45, 2343.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78889/608042 [00:31<03:46, 2337.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79134/608042 [00:31<03:54, 2252.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79373/608042 [00:32<03:56, 2237.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79623/608042 [00:32<03:51, 2283.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79860/608042 [00:32<03:56, 2229.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80114/608042 [00:32<03:55, 2241.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80359/608042 [00:32<03:52, 2266.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80597/608042 [00:32<03:52, 2265.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80826/608042 [00:32<04:00, 2188.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81084/608042 [00:32<04:02, 2175.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81366/608042 [00:32<03:47, 2315.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81657/608042 [00:33<03:33, 2462.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81913/608042 [00:33<03:39, 2398.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82163/608042 [00:33<03:37, 2416.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82446/608042 [00:33<03:27, 2532.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82719/608042 [00:33<03:24, 2568.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82978/608042 [00:33<03:35, 2433.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 83233/608042 [00:33<03:54, 2236.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 83511/608042 [00:33<03:40, 2374.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 83771/608042 [00:33<03:35, 2436.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84019/608042 [00:34<03:57, 2209.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84259/608042 [00:34<03:54, 2232.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84594/608042 [00:34<03:26, 2528.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84857/608042 [00:34<03:29, 2500.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85119/608042 [00:34<03:41, 2356.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85428/608042 [00:34<03:25, 2537.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85696/608042 [00:34<03:32, 2459.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85952/608042 [00:34<03:31, 2473.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86230/608042 [00:34<03:28, 2508.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86503/608042 [00:35<03:27, 2508.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86829/608042 [00:35<03:11, 2719.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87133/608042 [00:35<03:25, 2536.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87405/608042 [00:35<03:28, 2501.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87662/608042 [00:35<03:35, 2416.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87914/608042 [00:35<03:32, 2444.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 88163/608042 [00:35<03:38, 2383.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88409/608042 [00:35<03:39, 2371.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88651/608042 [00:35<04:02, 2141.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88882/608042 [00:36<04:01, 2146.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89162/608042 [00:36<03:49, 2264.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89468/608042 [00:36<03:31, 2452.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89733/608042 [00:36<03:29, 2475.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90047/608042 [00:36<03:17, 2622.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90323/608042 [00:36<03:18, 2610.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90587/608042 [00:36<03:22, 2550.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90845/608042 [00:36<03:31, 2447.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 91171/608042 [00:36<03:14, 2662.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 91440/608042 [00:37<03:37, 2377.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 91696/608042 [00:37<03:45, 2289.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92010/608042 [00:37<03:27, 2488.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92280/608042 [00:37<03:26, 2493.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92557/608042 [00:37<03:27, 2488.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92867/608042 [00:37<03:19, 2577.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93131/608042 [00:37<03:34, 2398.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93394/608042 [00:37<03:31, 2428.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93646/608042 [00:37<03:38, 2354.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93895/608042 [00:38<03:37, 2365.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 94136/608042 [00:38<03:37, 2364.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94389/608042 [00:38<03:39, 2342.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94689/608042 [00:38<03:23, 2523.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94968/608042 [00:38<03:28, 2459.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95220/608042 [00:38<03:34, 2395.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95465/608042 [00:38<03:47, 2256.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95703/608042 [00:38<03:44, 2284.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95948/608042 [00:38<03:55, 2173.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96171/608042 [00:39<03:57, 2151.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96438/608042 [00:39<03:43, 2290.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96681/608042 [00:39<03:48, 2234.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96919/608042 [00:39<03:51, 2208.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97230/608042 [00:39<03:35, 2373.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97513/608042 [00:39<03:25, 2484.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97768/608042 [00:39<03:38, 2338.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98014/608042 [00:39<04:01, 2115.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98290/608042 [00:39<03:46, 2251.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98534/608042 [00:40<03:42, 2291.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 98888/608042 [00:40<03:15, 2609.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99160/608042 [00:40<03:13, 2623.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99438/608042 [00:40<03:16, 2584.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99704/608042 [00:40<03:19, 2546.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 100004/608042 [00:40<03:15, 2598.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100366/608042 [00:40<02:56, 2874.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100665/608042 [00:40<03:07, 2711.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100941/608042 [00:40<03:17, 2573.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101248/608042 [00:41<03:08, 2691.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101636/608042 [00:41<02:50, 2971.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101943/608042 [00:41<02:53, 2922.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102247/608042 [00:41<03:09, 2672.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102545/608042 [00:41<03:03, 2752.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102865/608042 [00:41<02:57, 2846.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103163/608042 [00:41<03:06, 2701.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103439/608042 [00:41<03:10, 2643.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103708/608042 [00:41<03:26, 2440.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103972/608042 [00:42<03:33, 2363.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104230/608042 [00:42<03:30, 2393.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104541/608042 [00:42<03:15, 2573.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104837/608042 [00:42<03:12, 2620.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105113/608042 [00:42<03:24, 2457.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105367/608042 [00:42<03:28, 2413.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105611/608042 [00:42<03:35, 2327.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105873/608042 [00:42<03:33, 2348.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 106120/608042 [00:43<03:34, 2335.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 106374/608042 [00:43<03:46, 2215.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 102024-08-03T04:45:03.832268751Z 6656/608042 [00:43<03:33, 2349.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 106933/608042 [00:43<03:27, 2412.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 107212/608042 [00:43<03:20, 2502.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 107508/608042 [00:43<03:11, 2608.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 107773/608042 [00:43<03:19, 2508.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 108030/608042 [00:43<03:28, 2394.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 108336/608042 [00:43<03:14, 2563.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 108659/608042 [00:43<03:02, 2738.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 108983/608042 [00:44<02:56, 2831.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 109284/608042 [00:44<03:02, 2729.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 109580/608042 [00:44<03:12, 2588.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 109859/608042 [00:44<03:16, 2538.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 110148/608042 [00:44<03:11, 2600.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 110425/608042 [00:44<03:09, 2619.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 110700/608042 [00:44<03:18, 2503.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 110962/608042 [00:44<03:28, 2382.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 111215/608042 [00:45<03:30, 2361.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 111467/608042 [00:45<03:29, 2371.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 111728/608042 [00:45<03:31, 2348.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 111970/608042 [00:45<03:45, 2203.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 112196/608042 [00:45<04:03, 2040.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 112451/608042 [00:45<03:48, 2167.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 112702/608042 [00:45<03:41, 2236.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 112955/608042 [00:45<03:38, 2270.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 113255/608042 [00:45<03:20, 2461.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 113583/608042 [00:46<03:04, 2685.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 113865/608042 [00:46<03:08, 2622.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 114135/608042 [00:46<03:08, 2623.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 114419/608042 [00:46<03:09, 2605.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 114758/608042 [00:46<03:00, 2737.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 115050/608042 [00:46<02:59, 2754.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 115340/608042 [00:46<03:02, 2702.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 115692/608042 [00:46<02:49, 2909.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 115997/608042 [00:46<02:49, 2904.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 116343/608042 [00:46<02:43, 3011.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 116654/608042 [00:47<02:57, 2761.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 117034/608042 [00:47<02:42, 3028.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 117354/608042 [00:47<02:40, 3050.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 117677/608042 [00:47<02:49, 2890.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 118046/608042 [00:47<02:44, 2977.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 118358/608042 [00:47<02:43, 2998.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 118673/608042 [00:47<02:54, 2806.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 118985/608042 [00:47<02:52, 2828.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 119271/608042 [00:48<03:03, 2666.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 119557/608042 [00:48<03:07, 2608.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 119848/608042 [00:48<03:02, 2672.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 120138/608042 [00:48<02:58, 2731.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 120421/608042 [00:48<02:59, 2709.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 120695/608042 [00:48<03:26, 2360.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 120964/608042 [00:48<03:20, 2434.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 121294/608042 [00:48<03:04, 2641.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 121577/608042 [00:48<03:03, 2649.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 121855/608042 [00:49<03:01, 2673.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 122135/608042 [00:49<03:02, 2662.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 122409/608042 [00:49<03:01, 2670.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 122686/608042 [00:49<03:01, 2678.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 123003/608042 [00:49<02:52, 2816.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 123291/608042 [00:49<03:14, 2489.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 123567/608042 [00:49<03:23, 2378.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 123817/608042 [00:49<03:24, 2368.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 124071/608042 [00:49<03:30, 2296.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 124365/608042 [00:50<03:18, 2435.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 124644/608042 [00:50<03:19, 2428.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 124924/608042 [00:50<03:12, 2505.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 125210/608042 [00:50<03:06, 2593.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 125511/608042 [00:50<03:00, 2679.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 125812/608042 [00:50<02:55, 2743.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 126090/608042 [00:50<03:03, 2629.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 126395/608042 [00:50<02:56, 2723.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 126672/608042 [00:50<03:08, 2559.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 126962/608042 [00:51<03:04, 2601.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 127241/608042 [00:51<03:06, 2576.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 127506/608042 [00:51<03:12, 2492.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 127762/608042 [00:51<03:17, 2432.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 128014/608042 [00:51<03:23, 2362.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 128304/608042 [00:51<03:15, 2454.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 128569/608042 [00:51<03:15, 2454.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 128815/608042 [00:51<03:18, 2418.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 129059/608042 [00:51<03:17, 2421.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 129349/608042 [00:51<03:07, 2558.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 129710/608042 [00:52<02:49, 2821.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 130012/608042 [00:52<03:03, 2603.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 130280/608042 [00:52<03:18, 2403.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 130527/608042 [00:52<03:22, 2352.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 130769/608042 [00:52<03:22, 2359.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 131058/608042 [00:52<03:11, 2490.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 131315/608042 [00:52<03:32, 2248.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 131575/608042 [00:52<03:25, 2321.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 131892/608042 [00:53<03:10, 2493.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 132181/608042 [00:53<03:04, 2580.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 132447/608042 [00:53<03:06, 2546.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 132706/608042 [00:53<03:22, 2342.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 133024/608042 [00:53<03:06, 2551.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 133284/608042 [00:53<03:06, 2541.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 133555/608042 [00:53<03:04, 2567.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 133895/608042 [00:53<02:51, 2766.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 134176/608042 [00:53<03:00, 2623.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 134540/608042 [00:54<02:43, 2891.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 134855/608042 [00:54<02:41, 2932.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 135154/608042 [00:54<02:59, 2631.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 135434/608042 [00:54<03:00, 2618.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 135711/608042 [00:54<03:09, 2496.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 135988/608042 [00:54<03:06, 2529.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 136251/608042 [00:54<03:16, 2395.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 136581/608042 [00:54<02:59, 2631.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 136868/608042 [00:54<03:09, 2484.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 137131/608042 [00:55<03:17, 2387.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 137429/608042 [00:55<03:05, 2542.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 137749/608042 [00:55<02:53, 2713.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 138033/608042 [00:55<02:55, 2684.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 138355/608042 [00:55<02:46, 2825.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 138658/608042 [00:55<02:43, 2863.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 138974/608042 [00:55<02:42, 2885.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 139275/608042 [00:55<02:51, 2727.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 139566/608042 [00:55<02:52, 2720.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 139988/608042 [00:56<02:30, 3107.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 140311/608042 [00:56<02:35, 3001.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 140656/608042 [00:56<02:31, 3080.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 140968/608042 [00:56<02:32, 3069.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 141293/608042 [00:56<02:34, 3030.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 141610/608042 [00:56<02:36, 2977.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 141957/608042 [00:56<02:31, 3071.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 142274/608042 [00:56<02:40, 2893.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 142613/608042 [00:56<02:34, 3014.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▎       | 142921/608042 [00:57<02:49, 2751.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▎       | 143253/608042 [00:57<02:42, 2857.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▎       | 143548/608042 [00:57<02:48, 2757.27 2024-08-03T04:45:03.832268751Z 
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 0/608042 [00:00<?, ? examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 6/608042 [00:00<4:50:40, 34.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 45/608042 [00:00<53:35, 189.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 211/608042 [00:00<13:27, 752.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 398/608042 [00:00<08:51, 1143.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 693/608042 [00:00<05:56, 1703.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 909/608042 [00:00<05:32, 1824.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1238/608042 [00:00<04:47, 2112.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1585/608042 [00:00<04:04, 2481.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1875/608042 [00:01<03:55, 2576.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2204/608042 [00:01<03:38, 2778.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2497/608042 [00:01<03:52, 2599.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2766/608042 [00:01<04:21, 2311.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 3022/608042 [00:01<04:16, 2355.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3287/608042 [00:01<04:08, 2432.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3545/608042 [00:01<04:06, 2454.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3823/608042 [00:01<03:57, 2541.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4089/608042 [00:01<04:06, 2453.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4355/608042 [00:02<04:07, 2441.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4745/608042 [00:02<03:32, 2844.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5043/608042 [00:02<03:35, 2793.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5331/608042 [00:02<03:47, 2651.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5602/608042 [00:02<03:51, 2607.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5872/608042 [00:02<03:53, 2583.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6133/608042 [00:02<04:04, 2465.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6390/608042 [00:02<04:05, 2447.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6642/608042 [00:02<04:10, 2397.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6883/608042 [00:03<04:25, 2267.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7169/608042 [00:03<04:09, 2410.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7423/608042 [00:03<04:13, 2372.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7687/608042 [00:03<04:05, 2446.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7939/608042 [00:03<04:07, 2423.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8234/608042 [00:03<03:53, 2572.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8526/608042 [00:03<03:45, 2658.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8795/608042 [00:03<04:17, 2324.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 9047/608042 [00:03<04:18, 2317.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9373/608042 [00:04<03:53, 2568.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9644/608042 [00:04<03:55, 2536.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9919/608042 [00:04<03:55, 2537.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10176/608042 [00:04<03:56, 2528.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10436/608042 [00:04<04:02, 2463.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10691/608042 [00:04<04:26, 2239.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10944/608042 [00:04<04:21, 2285.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11269/608042 [00:04<03:58, 2502.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11528/608042 [00:04<04:06, 2417.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11790/608042 [00:05<04:05, 2431.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12036/608042 [00:05<04:15, 2329.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12327/608042 [00:05<04:00, 2481.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12580/608042 [00:05<04:13, 2349.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12846/608042 [00:05<04:07, 2404.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13100/608042 [00:05<04:04, 2436.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13378/608042 [00:05<03:55, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13636/608042 [00:05<04:06, 2412.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13884/608042 [00:05<04:05, 2418.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14163/608042 [00:06<03:56, 2506.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14425/608042 [00:06<04:06, 2412.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14673/608042 [00:06<04:11, 2359.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14912/608042 [00:06<04:17, 2305.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 15168/608042 [00:06<04:10, 2370.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15423/608042 [00:06<04:08, 2382.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15666/608042 [00:06<04:14, 2331.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15916/608042 [00:06<04:38, 2122.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16159/608042 [00:06<04:34, 2159.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16391/608042 [00:07<04:29, 2192.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16658/608042 [00:07<04:16, 2302.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16932/608042 [00:07<04:07, 2392.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17215/608042 [00:07<03:56, 2502.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17548/608042 [00:07<03:36, 2726.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17833/608042 [00:07<03:54, 2519.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18095/608042 [00:07<04:00, 2457.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18363/608042 [00:07<03:57, 2487.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18648/608042 [00:07<03:51, 2549.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18963/608042 [00:07<03:37, 2705.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19240/608042 [00:08<03:48, 2571.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19503/608042 [00:08<04:02, 2424.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19750/608042 [00:08<04:18, 2272.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19990/608042 [00:08<04:22, 2238.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20300/608042 [00:08<03:59, 2454.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20567/608042 [00:08<04:11, 2331.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20825/608042 [00:08<04:06, 2378.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 21154/608042 [00:08<03:46, 2589.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21438/608042 [00:09<04:00, 2437.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21747/608042 [00:09<03:51, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22038/608042 [00:09<03:46, 2592.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22352/608042 [00:09<03:36, 2700.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22640/608042 [00:09<03:34, 2734.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 22919/608042 [00:09<03:38, 2678.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23190/608042 [00:09<03:49, 2544.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23507/608042 [00:09<03:41, 2643.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23856/608042 [00:09<03:29, 2786.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24212/608042 [00:10<03:15, 2982.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24517/608042 [00:10<03:22, 2876.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24812/608042 [00:10<03:26, 2827.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25100/608042 [00:10<03:56, 2469.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25377/608042 [00:10<03:54, 2488.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25681/608042 [00:10<03:43, 2608.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25989/608042 [00:10<03:35, 2705.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26272/608042 [00:10<03:57, 2444.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26538/608042 [00:10<03:53, 2493.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26801/608042 [00:11<04:07, 2348.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27057/608042 [00:11<04:02, 2392.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27343/608042 [00:11<03:50, 2518.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27645/608042 [00:11<03:40, 2636.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27959/608042 [00:11<03:29, 2767.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28247/608042 [00:11<03:33, 2719.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28526/608042 [00:11<03:43, 2597.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28804/608042 [00:11<03:47, 2542.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29139/608042 [00:11<03:31, 2734.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29453/608042 [00:12<03:23, 2848.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29755/608042 [00:12<03:36, 2676.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30032/608042 [00:12<03:34, 2696.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30311/608042 [00:12<03:35, 2678.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30584/608042 [00:12<03:46, 2544.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30845/608042 [00:12<03:46, 2548.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31102/608042 [00:12<03:51, 2490.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31362/608042 [00:12<03:59, 2403.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31641/608042 [00:12<03:55, 2442.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31902/608042 [00:13<04:03, 2364.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32181/608042 [00:13<03:59, 2404.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32433/608042 [00:13<03:59, 2400.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32695/608042 [00:13<04:00, 2395.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32956/608042 [00:13<03:57, 2425.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 33212/608042 [00:13<04:00, 2392.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33454/608042 [00:13<04:04, 2346.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33764/608042 [00:13<03:44, 2556.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34039/608042 [00:13<03:40, 2607.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34315/608042 [00:14<03:46, 2537.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34588/608042 [00:14<03:45, 2538.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34848/608042 [00:14<03:58, 2403.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35125/608042 [00:14<03:55, 2436.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35378/608042 [00:14<04:04, 2345.21 examples/s]
2024-08-03T04:45:03.832268751Z Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35627/608042 [00:14<04:17, 2223.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35903/608042 [00:14<04:04, 2335.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36162/608042 [00:14<04:00, 2381.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36427/608042 [00:14<03:53, 2451.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36674/608042 [00:15<03:55, 2429.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36919/608042 [00:15<03:56, 2417.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37228/608042 [00:15<03:39, 2606.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37499/608042 [00:15<03:39, 2602.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37814/608042 [00:15<03:26, 2757.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38104/608042 [00:15<03:33, 2670.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38380/608042 [00:15<03:37, 2613.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38671/608042 [00:15<03:31, 2691.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38943/608042 [00:15<03:42, 2554.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39209/608042 [00:15<03:48, 2491.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39468/608042 [00:16<03:51, 2457.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 39737/608042 [00:16<03:53, 2433.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40065/608042 [00:16<03:38, 2601.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40454/608042 [00:16<03:12, 2955.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40768/608042 [00:16<03:27, 2737.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41048/608042 [00:16<03:26, 2741.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41329/608042 [00:16<03:43, 2536.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41589/608042 [00:16<03:54, 2414.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41923/608042 [00:17<03:37, 2605.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42195/608042 [00:17<03:51, 2444.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42470/608042 [00:17<03:44, 2518.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42772/608042 [00:17<03:33, 2649.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43089/608042 [00:17<03:26, 2731.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43370/608042 [00:17<03:37, 2595.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43635/608042 [00:17<03:40, 2562.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43905/608042 [00:17<03:40, 2562.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44203/608042 [00:17<03:31, 2660.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44525/608042 [00:17<03:20, 2810.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44815/608042 [00:18<03:20, 2805.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45101/608042 [00:18<03:33, 2642.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45373/608042 [00:18<03:37, 2592.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45719/608042 [00:18<03:27, 2707.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45992/608042 [00:18<03:30, 2669.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46269/608042 [00:18<03:37, 2585.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46610/608042 [00:18<03:20, 2798.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46897/608042 [00:18<03:24, 2746.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47184/608042 [00:19<03:44, 2499.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47445/608042 [00:19<03:42, 2521.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47701/608042 [00:19<03:47, 2465.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47951/608042 [00:19<03:47, 2462.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48200/608042 [00:19<03:51, 2421.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48445/608042 [00:19<04:01, 2317.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48706/608042 [00:19<03:57, 2358.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49009/608042 [00:19<03:41, 2525.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49277/608042 [00:19<03:52, 2398.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49532/608042 [00:20<04:11, 2219.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49791/608042 [00:20<04:01, 2307.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50071/608042 [00:20<03:52, 2403.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50347/608042 [00:20<03:43, 2496.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50610/608042 [00:20<03:50, 2418.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50866/608042 [00:20<03:56, 2351.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51114/608042 [00:20<03:55, 2363.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51368/608042 [00:20<03:57, 2343.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51683/608042 [00:20<03:37, 2561.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 51958/608042 [00:21<03:49, 2422.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52218/608042 [00:21<03:52, 2387.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52481/608042 [00:21<03:53, 2378.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52827/608042 [00:21<03:30, 2636.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 53129/608042 [00:21<03:24, 2716.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53439/608042 [00:21<03:17, 2814.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53731/608042 [00:21<03:52, 2384.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53986/608042 [00:21<03:50, 2401.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54246/608042 [00:21<03:55, 2352.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54498/608042 [00:22<04:05, 2253.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54785/608042 [00:22<03:49, 2406.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55039/608042 [00:22<03:51, 2390.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55285/608042 [00:22<03:58, 2322.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55521/608042 [00:22<04:06, 2238.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55830/608042 [00:22<03:45, 2447.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56097/608042 [00:22<03:43, 2473.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56412/608042 [00:22<03:28, 2642.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56682/608042 [00:22<03:34, 2575.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56965/608042 [00:23<03:33, 2580.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57239/608042 [00:23<03:38, 2523.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57518/608042 [00:23<03:34, 2563.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 57782/608042 [00:23<03:45, 2436.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58143/608042 [00:23<03:20, 2745.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58427/608042 [00:23<03:30, 2616.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58702/608042 [00:23<03:29, 2620.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58994/608042 [00:23<03:23, 2702.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59311/608042 [00:23<03:15, 2804.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59610/608042 [00:24<03:14, 2824.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59910/608042 [00:24<03:28, 2630.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60186/608042 [00:24<03:36, 2532.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60448/608042 [00:24<03:34, 2555.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60755/608042 [00:24<03:24, 2671.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61030/608042 [00:24<03:26, 2648.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61359/608042 [00:24<03:17, 2771.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61658/608042 [00:24<03:15, 2795.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61941/608042 [00:24<03:19, 2735.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62216/608042 [00:25<03:37, 2512.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62516/608042 [00:25<03:30, 2586.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62779/608042 [00:25<03:46, 2402.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63044/608042 [00:25<03:41, 2465.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63306/608042 [00:25<03:51, 2350.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63554/608042 [00:25<03:51, 2350.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63826/608042 [00:25<03:44, 2428.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64079/608042 [00:25<03:52, 2334.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64333/608042 [00:25<03:48, 2383.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64602/608042 [00:26<03:41, 2457.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64882/608042 [00:26<03:33, 2538.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65161/608042 [00:26<03:28, 2609.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65428/608042 [00:26<03:33, 2545.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65687/608042 [00:26<03:41, 2451.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66008/608042 [00:26<03:24, 2656.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66293/608042 [00:26<03:23, 2667.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66562/608042 [00:26<03:35, 2508.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66832/608042 [00:26<03:41, 2446.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67131/608042 [00:27<03:37, 2492.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67393/608042 [00:27<03:36, 2491.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67672/608042 [00:27<03:34, 2514.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67927/608042 [00:27<03:40, 2450.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 68218/608042 [00:27<03:31, 2551.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68539/608042 [00:27<03:18, 2720.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68837/608042 [00:27<03:12, 2794.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69140/608042 [00:27<03:22, 2666.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69410/608042 [00:27<03:28, 2583.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69676/608042 [00:27<03:37, 2475.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 69939/608042 [00:28<03:39, 2446.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70187/608042 [00:28<03:44, 2399.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70435/608042 [00:28<03:43, 2407.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70686/608042 [00:28<04:01, 2225.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70953/608042 [00:28<03:51, 2315.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71242/608042 [00:28<03:37, 2465.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71496/608042 [00:28<03:38, 2450.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71763/608042 [00:28<03:362024-08-03T04:45:03.832268751Z , 2479.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72013/608042 [00:28<03:40, 2429.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72262/608042 [00:29<03:44, 2389.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72521/608042 [00:29<03:41, 2422.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72807/608042 [00:29<03:30, 2544.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73074/608042 [00:29<03:33, 2504.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73344/608042 [00:29<03:30, 2538.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73608/608042 [00:29<03:40, 2418.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73920/608042 [00:29<03:26, 2580.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74198/608042 [00:29<03:22, 2633.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74483/608042 [00:29<03:18, 2694.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74769/608042 [00:30<03:17, 2704.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75044/608042 [00:30<03:28, 2558.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75311/608042 [00:30<03:42, 2389.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75555/608042 [00:30<03:52, 2293.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75808/608042 [00:30<04:04, 2176.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76096/608042 [00:30<03:45, 2360.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76377/608042 [00:30<03:36, 2453.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76662/608042 [00:30<03:28, 2547.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76927/608042 [00:30<03:53, 2276.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77176/608042 [00:31<04:03, 2183.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77521/608042 [00:31<03:33, 2490.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77778/608042 [00:31<03:54, 2260.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78013/608042 [00:31<03:59, 2215.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78378/608042 [00:31<03:32, 2489.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78637/608042 [00:31<03:45, 2343.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78889/608042 [00:31<03:46, 2337.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79134/608042 [00:31<03:54, 2252.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79373/608042 [00:32<03:56, 2237.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79623/608042 [00:32<03:51, 2283.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79860/608042 [00:32<03:56, 2229.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80114/608042 [00:32<03:55, 2241.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80359/608042 [00:32<03:52, 2266.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80597/608042 [00:32<03:52, 2265.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80826/608042 [00:32<04:00, 2188.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81084/608042 [00:32<04:02, 2175.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81366/608042 [00:32<03:47, 2315.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81657/608042 [00:33<03:33, 2462.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81913/608042 [00:33<03:39, 2398.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82163/608042 [00:33<03:37, 2416.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82446/608042 [00:33<03:27, 2532.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82719/608042 [00:33<03:24, 2568.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82978/608042 [00:33<03:35, 2433.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 83233/608042 [00:33<03:54, 2236.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 83511/608042 [00:33<03:40, 2374.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 83771/608042 [00:33<03:35, 2436.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84019/608042 [00:34<03:57, 2209.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84259/608042 [00:34<03:54, 2232.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84594/608042 [00:34<03:26, 2528.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84857/608042 [00:34<03:29, 2500.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85119/608042 [00:34<03:41, 2356.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85428/608042 [00:34<03:25, 2537.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85696/608042 [00:34<03:32, 2459.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85952/608042 [00:34<03:31, 2473.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86230/608042 [00:34<03:28, 2508.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86503/608042 [00:35<03:27, 2508.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86829/608042 [00:35<03:11, 2719.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87133/608042 [00:35<03:25, 2536.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87405/608042 [00:35<03:28, 2501.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87662/608042 [00:35<03:35, 2416.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87914/608042 [00:35<03:32, 2444.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 88163/608042 [00:35<03:38, 2383.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88409/608042 [00:35<03:39, 2371.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88651/608042 [00:35<04:02, 2141.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88882/608042 [00:36<04:01, 2146.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89162/608042 [00:36<03:49, 2264.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89468/608042 [00:36<03:31, 2452.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89733/608042 [00:36<03:29, 2475.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90047/608042 [00:36<03:17, 2622.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90323/608042 [00:36<03:18, 2610.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90587/608042 [00:36<03:22, 2550.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90845/608042 [00:36<03:31, 2447.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 91171/608042 [00:36<03:14, 2662.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 91440/608042 [00:37<03:37, 2377.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 91696/608042 [00:37<03:45, 2289.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92010/608042 [00:37<03:27, 2488.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92280/608042 [00:37<03:26, 2493.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92557/608042 [00:37<03:27, 2488.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92867/608042 [00:37<03:19, 2577.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93131/608042 [00:37<03:34, 2398.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93394/608042 [00:37<03:31, 2428.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93646/608042 [00:37<03:38, 2354.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93895/608042 [00:38<03:37, 2365.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 94136/608042 [00:38<03:37, 2364.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94389/608042 [00:38<03:39, 2342.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94689/608042 [00:38<03:23, 2523.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94968/608042 [00:38<03:28, 2459.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95220/608042 [00:38<03:34, 2395.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95465/608042 [00:38<03:47, 2256.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95703/608042 [00:38<03:44, 2284.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95948/608042 [00:38<03:55, 2173.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96171/608042 [00:39<03:57, 2151.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96438/608042 [00:39<03:43, 2290.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96681/608042 [00:39<03:48, 2234.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96919/608042 [00:39<03:51, 2208.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97230/608042 [00:39<03:35, 2373.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97513/608042 [00:39<03:25, 2484.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97768/608042 [00:39<03:38, 2338.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98014/608042 [00:39<04:01, 2115.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98290/608042 [00:39<03:46, 2251.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98534/608042 [00:40<03:42, 2291.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 98888/608042 [00:40<03:15, 2609.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99160/608042 [00:40<03:13, 2623.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99438/608042 [00:40<03:16, 2584.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99704/608042 [00:40<03:19, 2546.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 100004/608042 [00:40<03:15, 2598.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100366/608042 [00:40<02:56, 2874.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100665/608042 [00:40<03:07, 2711.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100941/608042 [00:40<03:17, 2573.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101248/608042 [00:41<03:08, 2691.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101636/608042 [00:41<02:50, 2971.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101943/608042 [00:41<02:53, 2922.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102247/608042 [00:41<03:09, 2672.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102545/608042 [00:41<03:03, 2752.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102865/608042 [00:41<02:57, 2846.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103163/608042 [00:41<03:06, 2701.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103439/608042 [00:41<03:10, 2643.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103708/608042 [00:41<03:26, 2440.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103972/608042 [00:42<03:33, 2363.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104230/608042 [00:42<03:30, 2393.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104541/608042 [00:42<03:15, 2573.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104837/608042 [00:42<03:12, 2620.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105113/608042 [00:42<03:24, 2457.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105367/608042 [00:42<03:28, 2413.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105611/608042 [00:42<03:35, 2327.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105873/608042 [00:42<03:33, 2348.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 106120/608042 [00:43<03:34, 2335.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 106374/608042 [00:43<03:46, 2215.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 102024-08-03T04:45:03.832268751Z 6656/608042 [00:43<03:33, 2349.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 106933/608042 [00:43<03:27, 2412.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 107212/608042 [00:43<03:20, 2502.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 107508/608042 [00:43<03:11, 2608.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 107773/608042 [00:43<03:19, 2508.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 108030/608042 [00:43<03:28, 2394.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 108336/608042 [00:43<03:14, 2563.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 108659/608042 [00:43<03:02, 2738.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 108983/608042 [00:44<02:56, 2831.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 109284/608042 [00:44<03:02, 2729.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 109580/608042 [00:44<03:12, 2588.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 109859/608042 [00:44<03:16, 2538.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 110148/608042 [00:44<03:11, 2600.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 110425/608042 [00:44<03:09, 2619.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 110700/608042 [00:44<03:18, 2503.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 110962/608042 [00:44<03:28, 2382.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 111215/608042 [00:45<03:30, 2361.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 111467/608042 [00:45<03:29, 2371.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 111728/608042 [00:45<03:31, 2348.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 111970/608042 [00:45<03:45, 2203.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 112196/608042 [00:45<04:03, 2040.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 112451/608042 [00:45<03:48, 2167.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 112702/608042 [00:45<03:41, 2236.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 112955/608042 [00:45<03:38, 2270.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 113255/608042 [00:45<03:20, 2461.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 113583/608042 [00:46<03:04, 2685.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 113865/608042 [00:46<03:08, 2622.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 114135/608042 [00:46<03:08, 2623.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 114419/608042 [00:46<03:09, 2605.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 114758/608042 [00:46<03:00, 2737.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 115050/608042 [00:46<02:59, 2754.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 115340/608042 [00:46<03:02, 2702.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 115692/608042 [00:46<02:49, 2909.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 115997/608042 [00:46<02:49, 2904.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 116343/608042 [00:46<02:43, 3011.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 116654/608042 [00:47<02:57, 2761.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 117034/608042 [00:47<02:42, 3028.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 117354/608042 [00:47<02:40, 3050.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 117677/608042 [00:47<02:49, 2890.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 118046/608042 [00:47<02:44, 2977.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 118358/608042 [00:47<02:43, 2998.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 118673/608042 [00:47<02:54, 2806.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 118985/608042 [00:47<02:52, 2828.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 119271/608042 [00:48<03:03, 2666.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 119557/608042 [00:48<03:07, 2608.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 119848/608042 [00:48<03:02, 2672.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 120138/608042 [00:48<02:58, 2731.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 120421/608042 [00:48<02:59, 2709.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 120695/608042 [00:48<03:26, 2360.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 120964/608042 [00:48<03:20, 2434.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 121294/608042 [00:48<03:04, 2641.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 121577/608042 [00:48<03:03, 2649.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 121855/608042 [00:49<03:01, 2673.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 122135/608042 [00:49<03:02, 2662.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 122409/608042 [00:49<03:01, 2670.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 122686/608042 [00:49<03:01, 2678.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 123003/608042 [00:49<02:52, 2816.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 123291/608042 [00:49<03:14, 2489.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 123567/608042 [00:49<03:23, 2378.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 123817/608042 [00:49<03:24, 2368.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 124071/608042 [00:49<03:30, 2296.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 124365/608042 [00:50<03:18, 2435.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 124644/608042 [00:50<03:19, 2428.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 124924/608042 [00:50<03:12, 2505.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 125210/608042 [00:50<03:06, 2593.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 125511/608042 [00:50<03:00, 2679.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 125812/608042 [00:50<02:55, 2743.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 126090/608042 [00:50<03:03, 2629.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 126395/608042 [00:50<02:56, 2723.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 126672/608042 [00:50<03:08, 2559.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 126962/608042 [00:51<03:04, 2601.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 127241/608042 [00:51<03:06, 2576.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 127506/608042 [00:51<03:12, 2492.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 127762/608042 [00:51<03:17, 2432.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 128014/608042 [00:51<03:23, 2362.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 128304/608042 [00:51<03:15, 2454.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 128569/608042 [00:51<03:15, 2454.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 128815/608042 [00:51<03:18, 2418.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 129059/608042 [00:51<03:17, 2421.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 129349/608042 [00:51<03:07, 2558.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 129710/608042 [00:52<02:49, 2821.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 130012/608042 [00:52<03:03, 2603.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 130280/608042 [00:52<03:18, 2403.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 130527/608042 [00:52<03:22, 2352.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 130769/608042 [00:52<03:22, 2359.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 131058/608042 [00:52<03:11, 2490.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 131315/608042 [00:52<03:32, 2248.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 131575/608042 [00:52<03:25, 2321.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 131892/608042 [00:53<03:10, 2493.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 132181/608042 [00:53<03:04, 2580.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 132447/608042 [00:53<03:06, 2546.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 132706/608042 [00:53<03:22, 2342.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 133024/608042 [00:53<03:06, 2551.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 133284/608042 [00:53<03:06, 2541.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 133555/608042 [00:53<03:04, 2567.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 133895/608042 [00:53<02:51, 2766.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 134176/608042 [00:53<03:00, 2623.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 134540/608042 [00:54<02:43, 2891.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 134855/608042 [00:54<02:41, 2932.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 135154/608042 [00:54<02:59, 2631.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 135434/608042 [00:54<03:00, 2618.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 135711/608042 [00:54<03:09, 2496.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 135988/608042 [00:54<03:06, 2529.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 136251/608042 [00:54<03:16, 2395.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 136581/608042 [00:54<02:59, 2631.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 136868/608042 [00:54<03:09, 2484.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 137131/608042 [00:55<03:17, 2387.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 137429/608042 [00:55<03:05, 2542.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 137749/608042 [00:55<02:53, 2713.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 138033/608042 [00:55<02:55, 2684.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 138355/608042 [00:55<02:46, 2825.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 138658/608042 [00:55<02:43, 2863.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 138974/608042 [00:55<02:42, 2885.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 139275/608042 [00:55<02:51, 2727.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 139566/608042 [00:55<02:52, 2720.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 139988/608042 [00:56<02:30, 3107.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 140311/608042 [00:56<02:35, 3001.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 140656/608042 [00:56<02:31, 3080.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 140968/608042 [00:56<02:32, 3069.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 141293/608042 [00:56<02:34, 3030.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 141610/608042 [00:56<02:36, 2977.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 141957/608042 [00:56<02:31, 3071.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 142274/608042 [00:56<02:40, 2893.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 142613/608042 [00:56<02:34, 3014.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▎       | 142921/608042 [00:57<02:49, 2751.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▎       | 143253/608042 [00:57<02:42, 2857.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▎       | 143548/608042 [00:57<02:48, 2757.27 2024-08-03T04:45:03.832268751Z 
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 0/608042 [00:00<?, ? examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 6/608042 [00:00<4:50:40, 34.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 45/608042 [00:00<53:35, 189.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 211/608042 [00:00<13:27, 752.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 398/608042 [00:00<08:51, 1143.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 693/608042 [00:00<05:56, 1703.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 909/608042 [00:00<05:32, 1824.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1238/608042 [00:00<04:47, 2112.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1585/608042 [00:00<04:04, 2481.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1875/608042 [00:01<03:55, 2576.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2204/608042 [00:01<03:38, 2778.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2497/608042 [00:01<03:52, 2599.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2766/608042 [00:01<04:21, 2311.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 3022/608042 [00:01<04:16, 2355.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3287/608042 [00:01<04:08, 2432.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3545/608042 [00:01<04:06, 2454.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3823/608042 [00:01<03:57, 2541.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4089/608042 [00:01<04:06, 2453.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4355/608042 [00:02<04:07, 2441.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4745/608042 [00:02<03:32, 2844.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5043/608042 [00:02<03:35, 2793.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5331/608042 [00:02<03:47, 2651.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5602/608042 [00:02<03:51, 2607.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5872/608042 [00:02<03:53, 2583.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6133/608042 [00:02<04:04, 2465.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6390/608042 [00:02<04:05, 2447.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6642/608042 [00:02<04:10, 2397.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6883/608042 [00:03<04:25, 2267.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7169/608042 [00:03<04:09, 2410.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7423/608042 [00:03<04:13, 2372.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7687/608042 [00:03<04:05, 2446.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7939/608042 [00:03<04:07, 2423.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8234/608042 [00:03<03:53, 2572.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8526/608042 [00:03<03:45, 2658.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8795/608042 [00:03<04:17, 2324.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 9047/608042 [00:03<04:18, 2317.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9373/608042 [00:04<03:53, 2568.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9644/608042 [00:04<03:55, 2536.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9919/608042 [00:04<03:55, 2537.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10176/608042 [00:04<03:56, 2528.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10436/608042 [00:04<04:02, 2463.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10691/608042 [00:04<04:26, 2239.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10944/608042 [00:04<04:21, 2285.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11269/608042 [00:04<03:58, 2502.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11528/608042 [00:04<04:06, 2417.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11790/608042 [00:05<04:05, 2431.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12036/608042 [00:05<04:15, 2329.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12327/608042 [00:05<04:00, 2481.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12580/608042 [00:05<04:13, 2349.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12846/608042 [00:05<04:07, 2404.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13100/608042 [00:05<04:04, 2436.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13378/608042 [00:05<03:55, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13636/608042 [00:05<04:06, 2412.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13884/608042 [00:05<04:05, 2418.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14163/608042 [00:06<03:56, 2506.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14425/608042 [00:06<04:06, 2412.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14673/608042 [00:06<04:11, 2359.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14912/608042 [00:06<04:17, 2305.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 15168/608042 [00:06<04:10, 2370.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15423/608042 [00:06<04:08, 2382.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15666/608042 [00:06<04:14, 2331.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15916/608042 [00:06<04:38, 2122.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16159/608042 [00:06<04:34, 2159.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16391/608042 [00:07<04:29, 2192.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16658/608042 [00:07<04:16, 2302.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16932/608042 [00:07<04:07, 2392.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17215/608042 [00:07<03:56, 2502.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17548/608042 [00:07<03:36, 2726.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17833/608042 [00:07<03:54, 2519.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18095/608042 [00:07<04:00, 2457.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18363/608042 [00:07<03:57, 2487.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18648/608042 [00:07<03:51, 2549.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18963/608042 [00:07<03:37, 2705.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19240/608042 [00:08<03:48, 2571.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19503/608042 [00:08<04:02, 2424.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19750/608042 [00:08<04:18, 2272.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19990/608042 [00:08<04:22, 2238.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20300/608042 [00:08<03:59, 2454.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20567/608042 [00:08<04:11, 2331.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20825/608042 [00:08<04:06, 2378.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 21154/608042 [00:08<03:46, 2589.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21438/608042 [00:09<04:00, 2437.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21747/608042 [00:09<03:51, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22038/608042 [00:09<03:46, 2592.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22352/608042 [00:09<03:36, 2700.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22640/608042 [00:09<03:34, 2734.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 22919/608042 [00:09<03:38, 2678.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23190/608042 [00:09<03:49, 2544.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23507/608042 [00:09<03:41, 2643.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23856/608042 [00:09<03:29, 2786.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24212/608042 [00:10<03:15, 2982.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24517/608042 [00:10<03:22, 2876.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24812/608042 [00:10<03:26, 2827.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25100/608042 [00:10<03:56, 2469.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25377/608042 [00:10<03:54, 2488.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25681/608042 [00:10<03:43, 2608.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25989/608042 [00:10<03:35, 2705.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26272/608042 [00:10<03:57, 2444.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26538/608042 [00:10<03:53, 2493.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26801/608042 [00:11<04:07, 2348.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27057/608042 [00:11<04:02, 2392.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27343/608042 [00:11<03:50, 2518.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27645/608042 [00:11<03:40, 2636.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27959/608042 [00:11<03:29, 2767.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28247/608042 [00:11<03:33, 2719.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28526/608042 [00:11<03:43, 2597.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28804/608042 [00:11<03:47, 2542.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29139/608042 [00:11<03:31, 2734.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29453/608042 [00:12<03:23, 2848.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29755/608042 [00:12<03:36, 2676.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30032/608042 [00:12<03:34, 2696.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30311/608042 [00:12<03:35, 2678.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30584/608042 [00:12<03:46, 2544.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30845/608042 [00:12<03:46, 2548.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31102/608042 [00:12<03:51, 2490.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31362/608042 [00:12<03:59, 2403.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31641/608042 [00:12<03:55, 2442.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31902/608042 [00:13<04:03, 2364.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32181/608042 [00:13<03:59, 2404.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32433/608042 [00:13<03:59, 2400.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32695/608042 [00:13<04:00, 2395.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32956/608042 [00:13<03:57, 2425.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 33212/608042 [00:13<04:00, 2392.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33454/608042 [00:13<04:04, 2346.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33764/608042 [00:13<03:44, 2556.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34039/608042 [00:13<03:40, 2607.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34315/608042 [00:14<03:46, 2537.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34588/608042 [00:14<03:45, 2538.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34848/608042 [00:14<03:58, 2403.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35125/608042 [00:14<03:55, 2436.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35378/608042 [00:14<04:04, 2345.21 examples/s]
2024-08-03T04:45:03.832268751Z Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35627/608042 [00:14<04:17, 2223.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35903/608042 [00:14<04:04, 2335.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36162/608042 [00:14<04:00, 2381.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36427/608042 [00:14<03:53, 2451.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36674/608042 [00:15<03:55, 2429.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36919/608042 [00:15<03:56, 2417.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37228/608042 [00:15<03:39, 2606.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37499/608042 [00:15<03:39, 2602.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37814/608042 [00:15<03:26, 2757.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38104/608042 [00:15<03:33, 2670.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38380/608042 [00:15<03:37, 2613.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38671/608042 [00:15<03:31, 2691.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38943/608042 [00:15<03:42, 2554.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39209/608042 [00:15<03:48, 2491.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39468/608042 [00:16<03:51, 2457.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 39737/608042 [00:16<03:53, 2433.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40065/608042 [00:16<03:38, 2601.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40454/608042 [00:16<03:12, 2955.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40768/608042 [00:16<03:27, 2737.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41048/608042 [00:16<03:26, 2741.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41329/608042 [00:16<03:43, 2536.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41589/608042 [00:16<03:54, 2414.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41923/608042 [00:17<03:37, 2605.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42195/608042 [00:17<03:51, 2444.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42470/608042 [00:17<03:44, 2518.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42772/608042 [00:17<03:33, 2649.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43089/608042 [00:17<03:26, 2731.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43370/608042 [00:17<03:37, 2595.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43635/608042 [00:17<03:40, 2562.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43905/608042 [00:17<03:40, 2562.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44203/608042 [00:17<03:31, 2660.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44525/608042 [00:17<03:20, 2810.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44815/608042 [00:18<03:20, 2805.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45101/608042 [00:18<03:33, 2642.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45373/608042 [00:18<03:37, 2592.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45719/608042 [00:18<03:27, 2707.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45992/608042 [00:18<03:30, 2669.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46269/608042 [00:18<03:37, 2585.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46610/608042 [00:18<03:20, 2798.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46897/608042 [00:18<03:24, 2746.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47184/608042 [00:19<03:44, 2499.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47445/608042 [00:19<03:42, 2521.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47701/608042 [00:19<03:47, 2465.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47951/608042 [00:19<03:47, 2462.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48200/608042 [00:19<03:51, 2421.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48445/608042 [00:19<04:01, 2317.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48706/608042 [00:19<03:57, 2358.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49009/608042 [00:19<03:41, 2525.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49277/608042 [00:19<03:52, 2398.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49532/608042 [00:20<04:11, 2219.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49791/608042 [00:20<04:01, 2307.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50071/608042 [00:20<03:52, 2403.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50347/608042 [00:20<03:43, 2496.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50610/608042 [00:20<03:50, 2418.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50866/608042 [00:20<03:56, 2351.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51114/608042 [00:20<03:55, 2363.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51368/608042 [00:20<03:57, 2343.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51683/608042 [00:20<03:37, 2561.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 51958/608042 [00:21<03:49, 2422.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52218/608042 [00:21<03:52, 2387.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52481/608042 [00:21<03:53, 2378.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52827/608042 [00:21<03:30, 2636.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 53129/608042 [00:21<03:24, 2716.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53439/608042 [00:21<03:17, 2814.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53731/608042 [00:21<03:52, 2384.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53986/608042 [00:21<03:50, 2401.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54246/608042 [00:21<03:55, 2352.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54498/608042 [00:22<04:05, 2253.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54785/608042 [00:22<03:49, 2406.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55039/608042 [00:22<03:51, 2390.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55285/608042 [00:22<03:58, 2322.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55521/608042 [00:22<04:06, 2238.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55830/608042 [00:22<03:45, 2447.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56097/608042 [00:22<03:43, 2473.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56412/608042 [00:22<03:28, 2642.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56682/608042 [00:22<03:34, 2575.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56965/608042 [00:23<03:33, 2580.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57239/608042 [00:23<03:38, 2523.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57518/608042 [00:23<03:34, 2563.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 57782/608042 [00:23<03:45, 2436.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58143/608042 [00:23<03:20, 2745.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58427/608042 [00:23<03:30, 2616.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58702/608042 [00:23<03:29, 2620.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58994/608042 [00:23<03:23, 2702.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59311/608042 [00:23<03:15, 2804.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59610/608042 [00:24<03:14, 2824.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59910/608042 [00:24<03:28, 2630.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60186/608042 [00:24<03:36, 2532.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60448/608042 [00:24<03:34, 2555.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60755/608042 [00:24<03:24, 2671.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61030/608042 [00:24<03:26, 2648.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61359/608042 [00:24<03:17, 2771.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61658/608042 [00:24<03:15, 2795.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61941/608042 [00:24<03:19, 2735.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62216/608042 [00:25<03:37, 2512.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62516/608042 [00:25<03:30, 2586.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62779/608042 [00:25<03:46, 2402.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63044/608042 [00:25<03:41, 2465.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63306/608042 [00:25<03:51, 2350.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63554/608042 [00:25<03:51, 2350.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63826/608042 [00:25<03:44, 2428.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64079/608042 [00:25<03:52, 2334.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64333/608042 [00:25<03:48, 2383.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64602/608042 [00:26<03:41, 2457.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64882/608042 [00:26<03:33, 2538.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65161/608042 [00:26<03:28, 2609.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65428/608042 [00:26<03:33, 2545.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65687/608042 [00:26<03:41, 2451.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66008/608042 [00:26<03:24, 2656.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66293/608042 [00:26<03:23, 2667.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66562/608042 [00:26<03:35, 2508.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66832/608042 [00:26<03:41, 2446.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67131/608042 [00:27<03:37, 2492.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67393/608042 [00:27<03:36, 2491.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67672/608042 [00:27<03:34, 2514.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67927/608042 [00:27<03:40, 2450.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 68218/608042 [00:27<03:31, 2551.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68539/608042 [00:27<03:18, 2720.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68837/608042 [00:27<03:12, 2794.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69140/608042 [00:27<03:22, 2666.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69410/608042 [00:27<03:28, 2583.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69676/608042 [00:27<03:37, 2475.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 69939/608042 [00:28<03:39, 2446.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70187/608042 [00:28<03:44, 2399.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70435/608042 [00:28<03:43, 2407.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70686/608042 [00:28<04:01, 2225.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70953/608042 [00:28<03:51, 2315.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71242/608042 [00:28<03:37, 2465.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71496/608042 [00:28<03:38, 2450.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71763/608042 [00:28<03:362024-08-03T04:45:03.832268751Z , 2479.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72013/608042 [00:28<03:40, 2429.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72262/608042 [00:29<03:44, 2389.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72521/608042 [00:29<03:41, 2422.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72807/608042 [00:29<03:30, 2544.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73074/608042 [00:29<03:33, 2504.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73344/608042 [00:29<03:30, 2538.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73608/608042 [00:29<03:40, 2418.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73920/608042 [00:29<03:26, 2580.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74198/608042 [00:29<03:22, 2633.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74483/608042 [00:29<03:18, 2694.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74769/608042 [00:30<03:17, 2704.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75044/608042 [00:30<03:28, 2558.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75311/608042 [00:30<03:42, 2389.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75555/608042 [00:30<03:52, 2293.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75808/608042 [00:30<04:04, 2176.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76096/608042 [00:30<03:45, 2360.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76377/608042 [00:30<03:36, 2453.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76662/608042 [00:30<03:28, 2547.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76927/608042 [00:30<03:53, 2276.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77176/608042 [00:31<04:03, 2183.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77521/608042 [00:31<03:33, 2490.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77778/608042 [00:31<03:54, 2260.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78013/608042 [00:31<03:59, 2215.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78378/608042 [00:31<03:32, 2489.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78637/608042 [00:31<03:45, 2343.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78889/608042 [00:31<03:46, 2337.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79134/608042 [00:31<03:54, 2252.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79373/608042 [00:32<03:56, 2237.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79623/608042 [00:32<03:51, 2283.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79860/608042 [00:32<03:56, 2229.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80114/608042 [00:32<03:55, 2241.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80359/608042 [00:32<03:52, 2266.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80597/608042 [00:32<03:52, 2265.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80826/608042 [00:32<04:00, 2188.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81084/608042 [00:32<04:02, 2175.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81366/608042 [00:32<03:47, 2315.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81657/608042 [00:33<03:33, 2462.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81913/608042 [00:33<03:39, 2398.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82163/608042 [00:33<03:37, 2416.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82446/608042 [00:33<03:27, 2532.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82719/608042 [00:33<03:24, 2568.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82978/608042 [00:33<03:35, 2433.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 83233/608042 [00:33<03:54, 2236.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 83511/608042 [00:33<03:40, 2374.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 83771/608042 [00:33<03:35, 2436.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84019/608042 [00:34<03:57, 2209.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84259/608042 [00:34<03:54, 2232.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84594/608042 [00:34<03:26, 2528.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84857/608042 [00:34<03:29, 2500.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85119/608042 [00:34<03:41, 2356.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85428/608042 [00:34<03:25, 2537.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85696/608042 [00:34<03:32, 2459.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85952/608042 [00:34<03:31, 2473.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86230/608042 [00:34<03:28, 2508.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86503/608042 [00:35<03:27, 2508.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86829/608042 [00:35<03:11, 2719.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87133/608042 [00:35<03:25, 2536.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87405/608042 [00:35<03:28, 2501.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87662/608042 [00:35<03:35, 2416.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87914/608042 [00:35<03:32, 2444.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 88163/608042 [00:35<03:38, 2383.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88409/608042 [00:35<03:39, 2371.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88651/608042 [00:35<04:02, 2141.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88882/608042 [00:36<04:01, 2146.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89162/608042 [00:36<03:49, 2264.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89468/608042 [00:36<03:31, 2452.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89733/608042 [00:36<03:29, 2475.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90047/608042 [00:36<03:17, 2622.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90323/608042 [00:36<03:18, 2610.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90587/608042 [00:36<03:22, 2550.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90845/608042 [00:36<03:31, 2447.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 91171/608042 [00:36<03:14, 2662.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 91440/608042 [00:37<03:37, 2377.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 91696/608042 [00:37<03:45, 2289.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92010/608042 [00:37<03:27, 2488.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92280/608042 [00:37<03:26, 2493.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92557/608042 [00:37<03:27, 2488.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92867/608042 [00:37<03:19, 2577.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93131/608042 [00:37<03:34, 2398.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93394/608042 [00:37<03:31, 2428.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93646/608042 [00:37<03:38, 2354.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93895/608042 [00:38<03:37, 2365.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 94136/608042 [00:38<03:37, 2364.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94389/608042 [00:38<03:39, 2342.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94689/608042 [00:38<03:23, 2523.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94968/608042 [00:38<03:28, 2459.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95220/608042 [00:38<03:34, 2395.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95465/608042 [00:38<03:47, 2256.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95703/608042 [00:38<03:44, 2284.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95948/608042 [00:38<03:55, 2173.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96171/608042 [00:39<03:57, 2151.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96438/608042 [00:39<03:43, 2290.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96681/608042 [00:39<03:48, 2234.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96919/608042 [00:39<03:51, 2208.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97230/608042 [00:39<03:35, 2373.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97513/608042 [00:39<03:25, 2484.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97768/608042 [00:39<03:38, 2338.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98014/608042 [00:39<04:01, 2115.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98290/608042 [00:39<03:46, 2251.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98534/608042 [00:40<03:42, 2291.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 98888/608042 [00:40<03:15, 2609.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99160/608042 [00:40<03:13, 2623.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99438/608042 [00:40<03:16, 2584.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99704/608042 [00:40<03:19, 2546.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 100004/608042 [00:40<03:15, 2598.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100366/608042 [00:40<02:56, 2874.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100665/608042 [00:40<03:07, 2711.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100941/608042 [00:40<03:17, 2573.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101248/608042 [00:41<03:08, 2691.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101636/608042 [00:41<02:50, 2971.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101943/608042 [00:41<02:53, 2922.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102247/608042 [00:41<03:09, 2672.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102545/608042 [00:41<03:03, 2752.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102865/608042 [00:41<02:57, 2846.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103163/608042 [00:41<03:06, 2701.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103439/608042 [00:41<03:10, 2643.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103708/608042 [00:41<03:26, 2440.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103972/608042 [00:42<03:33, 2363.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104230/608042 [00:42<03:30, 2393.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104541/608042 [00:42<03:15, 2573.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104837/608042 [00:42<03:12, 2620.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105113/608042 [00:42<03:24, 2457.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105367/608042 [00:42<03:28, 2413.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105611/608042 [00:42<03:35, 2327.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105873/608042 [00:42<03:33, 2348.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 106120/608042 [00:43<03:34, 2335.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 106374/608042 [00:43<03:46, 2215.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 102024-08-03T04:45:03.832268751Z s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 178963/608042 [01:11<02:59, 2387.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 179325/608042 [01:11<02:39, 2691.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 179606/608042 [01:11<02:42, 2640.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 179877/608042 [01:11<02:51, 2497.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 180181/608042 [01:11<02:42, 2636.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 180453/608042 [01:11<02:50, 2514.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 180709/608042 [01:12<03:08, 2271.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 181027/608042 [01:12<02:50, 2504.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 181288/608042 [01:12<02:50, 2507.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 181550/608042 [01:12<02:52, 2477.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 181805/608042 [01:12<02:51, 2480.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 182110/608042 [01:12<02:41, 2640.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 182378/608042 [01:12<02:50, 2492.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 182635/608042 [01:12<03:00, 2360.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 182877/608042 [01:12<03:00, 2358.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 183222/608042 [01:13<02:39, 2659.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 183511/608042 [01:13<02:44, 2586.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 183801/608042 [01:13<02:40, 2641.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 184075/608042 [01:13<02:43, 2590.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 184346/608042 [01:13<02:48, 2519.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 184600/608042 [01:13<02:51, 2473.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 184855/608042 [01:13<02:51, 2461.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 185118/608042 [01:13<02:50, 2483.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 185410/608042 [01:13<02:42, 2598.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 185738/608042 [01:14<02:31, 2794.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 186065/608042 [01:14<02:23, 2931.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 186369/608042 [01:14<02:29, 2816.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 186655/608042 [01:14<02:51, 2450.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 186911/608042 [01:14<02:58, 2359.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 187179/608042 [01:14<02:52, 2437.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 187430/608042 [01:14<02:52, 2436.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 187680/608042 [01:14<02:55, 2399.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 187966/608042 [01:14<02:47, 2505.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 188231/608042 [01:15<02:45, 2539.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 188492/608042 [01:15<02:44, 2549.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 188764/608042 [01:15<02:43, 2566.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 189029/608042 [01:15<02:52, 2429.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 189306/608042 [01:15<02:50, 2452.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 189591/608042 [01:15<02:44, 2544.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 189847/608042 [01:15<02:54, 2403.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███▏      | 190156/608042 [01:15<02:43, 2560.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███▏      | 190473/608042 [01:15<02:35, 2684.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███▏      | 190748/608042 [01:16<02:36, 2671.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███▏      | 191044/608042 [01:16<02:35, 2686.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███▏      | 191321/608042 [01:16<02:38, 2635.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 191603/608042 [01:16<02:44, 2535.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 191905/608042 [01:16<02:39, 2605.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 192188/608042 [01:16<02:37, 2647.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 192455/608042 [01:16<02:45, 2514.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 192859/608042 [01:16<02:23, 2902.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 193158/608042 [01:16<02:37, 2636.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 193436/608042 [01:17<02:51, 2416.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 193731/608042 [01:17<02:43, 2535.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 194010/608042 [01:17<02:45, 2500.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 194300/608042 [01:17<02:38, 2603.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 194570/608042 [01:17<02:56, 2342.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 194891/608042 [01:17<02:43, 2532.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 195227/608042 [01:17<02:30, 2750.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 195517/608042 [01:17<02:37, 2618.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 195786/608042 [01:18<02:46, 2476.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 196092/608042 [01:18<02:36, 2630.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 196379/608042 [01:18<02:32, 2693.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 196662/608042 [01:18<02:30, 2726.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 197019/608042 [01:18<02:21, 2912.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 197313/608042 [01:18<02:51, 2394.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 197619/608042 [01:18<02:41, 2546.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 197907/608042 [01:18<02:44, 2487.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 198197/608042 [01:18<02:38, 2584.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 198471/608042 [01:19<02:36, 2609.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 198856/608042 [01:19<02:25, 2813.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 199171/608042 [01:19<02:21, 2893.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 199469/608042 [01:19<02:23, 2850.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 199758/608042 [01:19<02:27, 2760.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 200043/608042 [01:19<02:29, 2728.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 200325/608042 [01:19<02:33, 2651.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 200627/608042 [01:19<02:33, 2660.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 200940/608042 [01:19<02:26, 2773.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 201230/608042 [01:20<02:35, 2622.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 201496/608042 [01:20<02:46, 2443.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 201766/608042 [01:20<02:47, 2430.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 202108/608042 [01:20<02:34, 2631.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 202390/608042 [01:20<02:31, 2673.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 202666/608042 [01:20<02:33, 2637.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 202932/608042 [01:20<02:49, 2388.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 203178/608042 [01:20<02:58, 2272.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 203435/608042 [01:20<02:53, 2336.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 203691/608042 [01:21<02:48, 2393.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▎      | 203948/608042 [01:21<02:51, 2362.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▎      | 204195/608042 [01:21<02:50, 2362.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▎      | 204484/608042 [01:21<02:42, 2488.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▎      | 204748/608042 [01:21<02:42, 2483.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▎      | 205003/608042 [01:21<02:51, 2345.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 205252/608042 [01:21<02:50, 2368.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 205507/608042 [01:21<02:52, 2334.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 205749/608042 [01:21<03:00, 2228.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 205986/608042 [01:22<02:59, 2234.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 206264/608042 [01:22<02:49, 2371.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 206524/608042 [01:22<02:52, 2330.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 206883/608042 [01:22<02:30, 2657.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 207166/608042 [01:22<02:37, 2543.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 207478/608042 [01:22<02:30, 2666.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 207761/608042 [01:22<02:30, 2666.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 208036/608042 [01:22<02:34, 2581.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 208380/608042 [01:22<02:23, 2778.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 208665/608042 [01:23<02:34, 2589.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 208928/608042 [01:23<02:35, 2574.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 209200/608042 [01:23<02:53, 2297.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 209449/608042 [01:23<02:58, 2230.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 209760/608042 [01:23<02:45, 2411.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 210029/608042 [01:23<02:41, 2469.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 210346/608042 [01:23<02:29, 2655.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 210616/608042 [01:23<02:44, 2416.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 211004/608042 [01:23<02:21, 2805.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 211297/608042 [01:24<02:35, 2552.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 211613/608042 [01:24<02:26, 2701.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 211899/608042 [01:24<02:32, 2600.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 212182/608042 [01:24<02:32, 2591.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 212451/608042 [01:24<02:42, 2430.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 212766/608042 [01:24<02:30, 2618.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 213065/608042 [01:24<02:25, 2709.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 213377/608042 [01:24<02:21, 2789.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 213680/608042 [01:24<02:24, 2726.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 213957/608042 [01:25<02:24, 2727.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 214248/608042 [012024-08-03T04:45:03.832268751Z :25<02:26, 2691.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 214535/608042 [01:25<02:26, 2690.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 214849/608042 [01:25<02:20, 2795.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 215143/608042 [01:25<02:27, 2663.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 215427/608042 [01:25<02:32, 2576.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 215696/608042 [01:25<02:30, 2603.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 215963/608042 [01:25<02:46, 2360.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 216229/608042 [01:25<02:41, 2424.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 216484/608042 [01:26<02:46, 2352.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 216726/608042 [01:26<02:48, 2320.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 216990/608042 [01:26<02:43, 2392.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 217233/608042 [01:26<02:50, 2290.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 217531/608042 [01:26<02:39, 2453.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 217789/608042 [01:26<02:36, 2486.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 218088/608042 [01:26<02:34, 2522.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 218348/608042 [01:26<02:44, 2373.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 218652/608042 [01:26<02:33, 2545.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 218911/608042 [01:27<02:42, 2394.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 219172/608042 [01:27<02:40, 2416.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 219423/608042 [01:27<02:40, 2428.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 219734/608042 [01:27<02:29, 2604.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 220012/608042 [01:27<02:27, 2631.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 220296/608042 [01:27<02:28, 2615.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▋      | 220561/608042 [01:27<02:32, 2542.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▋      | 220849/608042 [01:27<02:27, 2632.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▋      | 221130/608042 [01:27<02:37, 2453.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▋      | 221401/608042 [01:28<02:37, 2455.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▋      | 221692/608042 [01:28<02:31, 2542.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 221988/608042 [01:28<02:25, 2644.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 222319/608042 [01:28<02:16, 2819.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 222615/608042 [01:28<02:32, 2533.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 222889/608042 [01:28<02:31, 2546.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 223160/608042 [01:28<02:29, 2566.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 223426/608042 [01:28<02:35, 2473.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 223680/608042 [01:28<02:47, 2294.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 223926/608042 [01:29<02:45, 2316.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 224247/608042 [01:29<02:30, 2549.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 224526/608042 [01:29<02:30, 2550.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 224789/608042 [01:29<02:30, 2554.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 225144/608042 [01:29<02:17, 2783.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 225470/608042 [01:29<02:17, 2790.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 225754/608042 [01:29<02:21, 2701.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 226118/608042 [01:29<02:11, 2901.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 226423/608042 [01:30<02:32, 2496.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 226711/608042 [01:30<02:27, 2581.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 226980/608042 [01:30<02:26, 2594.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 227257/608042 [01:30<02:35, 2454.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 227509/608042 [01:30<02:46, 2288.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 227774/608042 [01:30<02:40, 2376.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 228048/608042 [01:30<02:35, 2441.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 228316/608042 [01:30<02:31, 2506.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 228595/608042 [01:30<02:30, 2523.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 228910/608042 [01:31<02:27, 2568.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 229175/608042 [01:31<02:31, 2502.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 229437/608042 [01:31<02:34, 2442.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 229685/608042 [01:31<02:39, 2374.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 229925/608042 [01:31<02:39, 2366.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 230167/608042 [01:31<02:45, 2285.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 230446/608042 [01:31<02:40, 2356.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 230752/608042 [01:31<02:31, 2489.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 231093/608042 [01:31<02:18, 2714.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 231385/608042 [01:32<02:28, 2540.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 231676/608042 [01:32<02:29, 2520.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 232029/608042 [01:32<02:15, 2775.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 232311/608042 [01:32<02:16, 2758.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 232592/608042 [01:32<02:23, 2614.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 232868/608042 [01:32<02:25, 2584.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 233185/608042 [01:32<02:17, 2727.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 233466/608042 [01:32<02:19, 2683.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 233771/608042 [01:32<02:15, 2755.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 234052/608042 [01:33<02:26, 2545.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▊      | 234318/608042 [01:33<02:34, 2423.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▊      | 234607/608042 [01:33<02:33, 2433.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▊      | 234876/608042 [01:33<02:33, 2424.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▊      | 235195/608042 [01:33<02:22, 2622.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▊      | 235542/608042 [01:33<02:12, 2809.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 235836/608042 [01:33<02:14, 2769.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 236125/608042 [01:33<02:13, 2786.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 236412/608042 [01:33<02:12, 2800.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 236694/608042 [01:34<02:20, 2645.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 236973/608042 [01:34<02:29, 2483.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 237226/608042 [01:34<02:33, 2411.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 237478/608042 [01:34<02:33, 2411.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 237757/608042 [01:34<02:31, 2446.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 238003/608042 [01:34<02:40, 2299.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 238255/608042 [01:34<02:38, 2338.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 238500/608042 [01:34<02:39, 2315.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 238791/608042 [01:34<02:30, 2460.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 239100/608042 [01:35<02:20, 2622.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 239381/608042 [01:35<02:22, 2587.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 239642/608042 [01:35<02:25, 2536.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 239905/608042 [01:35<02:35, 2366.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 240170/608042 [01:35<02:35, 2371.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 240451/608042 [01:35<02:27, 2489.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 240751/608042 [01:35<02:20, 2611.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 241048/608042 [01:35<02:15, 2707.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 241324/608042 [01:35<02:17, 2666.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 241602/608042 [01:35<02:18, 2653.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 241900/608042 [01:36<02:18, 2644.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 242170/608042 [01:36<02:21, 2590.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 242456/608042 [01:36<02:18, 2643.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 242725/608042 [01:36<02:24, 2522.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 243020/608042 [01:36<02:23, 2543.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 243301/608042 [01:36<02:20, 2589.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 243716/608042 [01:36<02:00, 3012.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 244030/608042 [01:36<02:18, 2625.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 244422/608042 [01:37<02:03, 2941.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 244733/608042 [01:37<02:19, 2599.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 245015/608042 [01:37<02:16, 2650.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 245311/608042 [01:37<02:15, 2678.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 245605/608042 [01:37<02:12, 2745.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 245888/608042 [01:37<02:18, 2606.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 246187/608042 [01:37<02:14, 2693.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 246461/608042 [01:37<02:23, 2527.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 246721/608042 [01:37<02:23, 2525.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 246991/608042 [01:38<02:25, 2478.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 247300/608042 [01:38<02:17, 2626.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 247566/608042 [01:38<02:18, 2601.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 247842/608042 [01:38<02:22, 2531.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 248153/608042 [01:38<02:13, 2686.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 248428/608042 [01:38<02:19, 2573.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 248701/608042 [01:38<02:24, 2493.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 249029/608042 [01:38<02:12, 2702.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 249309/608042 [01:38<02:21, 2531.56 exampl2024-08-03T04:45:03.832268751Z es/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 249582/608042 [01:39<02:20, 2553.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 249864/608042 [01:39<02:23, 2503.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 250118/608042 [01:39<02:34, 2310.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 250405/608042 [01:39<02:26, 2443.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 250662/608042 [01:39<02:27, 2423.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████▏     | 250937/608042 [01:39<02:24, 2475.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████▏     | 251229/608042 [01:39<02:18, 2568.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████▏     | 251498/608042 [01:39<02:23, 2483.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████▏     | 251791/608042 [01:39<02:16, 2605.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████▏     | 252072/608042 [01:40<02:19, 2549.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 252368/608042 [01:40<02:15, 2624.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 252657/608042 [01:40<02:18, 2559.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 252924/608042 [01:40<02:20, 2523.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 253185/608042 [01:40<02:25, 2431.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 253438/608042 [01:40<02:30, 2355.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 253782/608042 [01:40<02:16, 2599.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 254056/608042 [01:40<02:19, 2542.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 254313/608042 [01:40<02:27, 2394.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 254621/608042 [01:41<02:23, 2468.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 254911/608042 [01:41<02:17, 2574.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 255184/608042 [01:41<02:16, 2583.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 255463/608042 [01:41<02:14, 2612.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 255742/608042 [01:41<02:17, 2555.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 256063/608042 [01:41<02:08, 2737.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 256362/608042 [01:41<02:05, 2803.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 256649/608042 [01:41<02:08, 2725.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 256947/608042 [01:41<02:15, 2597.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 257214/608042 [01:42<02:24, 2428.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 257536/608042 [01:42<02:13, 2629.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 257836/608042 [01:42<02:13, 2631.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 258128/608042 [01:42<02:12, 2642.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 258412/608042 [01:42<02:19, 2514.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 258668/608042 [01:42<02:29, 2341.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 258926/608042 [01:42<02:25, 2395.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 259177/608042 [01:42<02:26, 2373.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 259434/608042 [01:42<02:25, 2403.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 259756/608042 [01:43<02:15, 2579.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 260028/608042 [01:43<02:18, 2506.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 260302/608042 [01:43<02:16, 2554.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 260577/608042 [01:43<02:15, 2562.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 260940/608042 [01:43<02:01, 2862.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 261235/608042 [01:43<02:10, 2651.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 261520/608042 [01:43<02:14, 2570.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 261781/608042 [01:43<02:19, 2488.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 262092/608042 [01:43<02:13, 2585.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 262414/608042 [01:44<02:09, 2667.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 262770/608042 [01:44<02:00, 2874.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 263084/608042 [01:44<02:02, 2812.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 263372/608042 [01:44<02:10, 2634.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 263642/608042 [01:44<02:16, 2519.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 263912/608042 [01:44<02:15, 2545.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 264186/608042 [01:44<02:13, 2566.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 264445/608042 [01:44<02:28, 2307.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▎     | 264682/608042 [01:45<02:32, 2244.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▎     | 264957/608042 [01:45<02:25, 2361.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▎     | 265200/608042 [01:45<02:26, 2345.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▎     | 265439/608042 [01:45<02:42, 2105.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▎     | 265730/608042 [01:45<02:29, 2289.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▎     | 266001/608042 [01:45<02:23, 2378.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 266245/608042 [01:45<02:37, 2174.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 266518/608042 [01:45<02:27, 2319.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 266764/608042 [01:45<02:30, 2267.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 266998/608042 [01:46<02:36, 2181.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 267407/608042 [01:46<02:06, 2698.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 267690/608042 [01:46<02:13, 2551.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 267953/608042 [01:46<02:15, 2510.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 268228/608042 [01:46<02:12, 2557.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 268559/608042 [01:46<02:03, 2756.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 268885/608042 [01:46<01:57, 2884.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 269200/608042 [01:46<01:54, 2957.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 269503/608042 [01:46<02:09, 2616.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 269800/608042 [01:47<02:05, 2700.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 270080/608042 [01:47<02:10, 2585.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 270351/608042 [01:47<02:16, 2482.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 270680/608042 [01:47<02:06, 2674.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 270955/608042 [01:47<02:09, 2601.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 271244/608042 [01:47<02:08, 2623.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 271519/608042 [01:47<02:10, 2587.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 271805/608042 [01:47<02:06, 2651.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 272121/608042 [01:47<02:02, 2745.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 272402/608042 [01:48<02:15, 2481.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 272659/608042 [01:48<02:20, 2393.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 273006/608042 [01:48<02:07, 2637.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 273280/608042 [01:48<02:08, 2605.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 273553/608042 [01:48<02:20, 2379.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 273847/608042 [01:48<02:12, 2524.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 274118/608042 [01:48<02:10, 2567.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 274391/608042 [01:48<02:18, 2402.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 274667/608042 [01:48<02:13, 2491.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 274939/608042 [01:49<02:11, 2540.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 275198/608042 [01:49<02:13, 2498.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 275454/608042 [01:49<02:19, 2385.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 275710/608042 [01:49<02:19, 2390.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 275976/608042 [01:49<02:17, 2412.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 276222/608042 [01:49<02:28, 2239.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 276479/608042 [01:49<02:22, 2320.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 276784/608042 [01:49<02:12, 2504.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 277043/608042 [01:49<02:15, 2442.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 277364/608042 [01:50<02:05, 2632.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 277672/608042 [01:50<02:00, 2740.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 277950/608042 [01:50<02:15, 2430.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 278273/608042 [01:50<02:05, 2637.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 278554/608042 [01:50<02:05, 2617.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 278826/608042 [01:50<02:09, 2542.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 279085/608042 [01:50<02:14, 2441.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 279340/608042 [01:50<02:21, 2329.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 279579/608042 [01:50<02:22, 2303.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 279836/608042 [01:51<02:25, 2260.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 280069/608042 [01:51<02:37, 2076.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 280400/608042 [01:51<02:17, 2376.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 280653/608042 [01:51<02:19, 2348.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 280915/608042 [01:51<02:16, 2394.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▋     | 281234/608042 [01:51<02:07, 2563.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▋     | 281506/608042 [01:51<02:14, 2431.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▋     | 281772/608042 [01:51<02:27, 2215.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▋     | 282001/608042 [01:52<02:30, 2172.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▋     | 282318/608042 [01:52<02:14, 2422.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▋     | 282566/608042 [01:52<02:16, 2377.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 282847/608042 [01:52<02:11, 2468.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 283100/608042 [01:52<02:11, 2475.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 283353/608042 [01:52<02:10, 2483.42 examples/s]
Tokenizing and reformatting instruction data (num_pro2024-08-03T04:45:03.832268751Z c=16):  47%|████▋     | 283658/608042 [01:52<02:02, 2638.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 283925/608042 [01:52<02:08, 2531.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 284222/608042 [01:52<02:01, 2655.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 284500/608042 [01:52<02:04, 2597.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 284824/608042 [01:53<01:56, 2767.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 285105/608042 [01:53<02:06, 2559.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 285392/608042 [01:53<02:02, 2637.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 285695/608042 [01:53<01:58, 2719.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 285988/608042 [01:53<02:03, 2597.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 286277/608042 [01:53<02:01, 2650.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 286603/608042 [01:53<01:56, 2749.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 286880/608042 [01:53<01:56, 2745.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 287159/608042 [01:53<01:57, 2741.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 287471/608042 [01:54<01:53, 2822.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 287765/608042 [01:54<01:56, 2755.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 288046/608042 [01:54<01:55, 2761.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 288328/608042 [01:54<01:59, 2680.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 288600/608042 [01:54<02:02, 2605.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 288868/608042 [01:54<02:09, 2469.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 289172/608042 [01:54<02:01, 2614.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 289437/608042 [01:54<02:03, 2583.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 289697/608042 [01:54<02:12, 2404.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 289964/608042 [01:55<02:08, 2470.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 290216/608042 [01:55<02:12, 2393.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 290522/608042 [01:55<02:06, 2518.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 290799/608042 [01:55<02:02, 2587.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 291073/608042 [01:55<02:02, 2589.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 291338/608042 [01:55<02:08, 2461.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 291618/608042 [01:55<02:04, 2541.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 291883/608042 [01:55<02:04, 2543.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 292156/608042 [01:55<02:05, 2515.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 292457/608042 [01:56<01:59, 2647.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 292726/608042 [01:56<01:58, 2653.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 293007/608042 [01:56<02:08, 2456.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 293261/608042 [01:56<02:14, 2348.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 293506/608042 [01:56<02:14, 2342.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 293774/608042 [01:56<02:10, 2402.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 294133/608042 [01:56<01:55, 2722.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 294409/608042 [01:56<01:56, 2698.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 294686/608042 [01:56<02:02, 2562.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▊     | 294949/608042 [01:57<02:11, 2375.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▊     | 295200/608042 [01:57<02:09, 2410.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▊     | 295542/608042 [01:57<01:56, 2677.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▊     | 295821/608042 [01:57<01:59, 2608.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▊     | 296090/608042 [01:57<02:01, 2577.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▊     | 296371/608042 [01:57<01:58, 2631.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 296664/608042 [01:57<01:55, 2685.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 296962/608042 [01:57<01:53, 2731.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 297253/608042 [01:57<01:55, 2680.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 297522/608042 [01:58<02:01, 2552.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 297783/608042 [01:58<02:06, 2449.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 298092/608042 [01:58<01:58, 2620.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 298362/608042 [01:58<02:00, 2576.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 298682/608042 [01:58<01:54, 2706.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 298955/608042 [01:58<01:56, 2652.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 299228/608042 [01:58<01:57, 2624.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 299492/608042 [01:58<02:00, 2555.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 299761/608042 [01:58<01:58, 2591.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 300027/608042 [01:59<02:14, 2282.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 300275/608042 [01:59<02:12, 2329.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 300519/608042 [01:59<02:16, 2259.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 300777/608042 [01:59<02:11, 2330.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 301076/608042 [01:59<02:02, 2506.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 301361/608042 [01:59<01:57, 2602.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 301683/608042 [01:59<01:52, 2713.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 301957/608042 [01:59<01:53, 2689.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 302240/608042 [01:59<01:53, 2697.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 302534/608042 [01:59<01:53, 2691.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 302842/608042 [02:00<01:49, 2787.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 303154/608042 [02:00<01:46, 2859.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 303443/608042 [02:00<01:49, 2785.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 303765/608042 [02:00<01:47, 2819.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 304054/608042 [02:00<01:50, 2749.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 304388/608042 [02:00<01:44, 2905.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 304713/608042 [02:00<01:42, 2962.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 305018/608042 [02:00<01:46, 2832.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 305309/608042 [02:01<02:05, 2418.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 305568/608042 [02:01<02:03, 2439.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 305820/608042 [02:01<02:05, 2404.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 306070/608042 [02:01<02:05, 2409.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 306430/608042 [02:01<01:50, 2734.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 306750/608042 [02:01<01:47, 2790.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 307042/608042 [02:01<01:48, 2784.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 307360/608042 [02:01<01:43, 2891.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 307672/608042 [02:01<01:42, 2918.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 307975/608042 [02:01<01:48, 2757.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 308256/608042 [02:02<01:50, 2723.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 308537/608042 [02:02<01:52, 2663.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 308819/608042 [02:02<01:56, 2566.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 309172/608042 [02:02<01:47, 2772.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 309459/608042 [02:02<01:56, 2564.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 309796/608042 [02:02<01:49, 2721.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 310078/608042 [02:02<01:52, 2639.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 310371/608042 [02:02<01:50, 2687.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 310709/608042 [02:02<01:43, 2872.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 311031/608042 [02:03<01:42, 2897.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 311331/608042 [02:03<01:46, 2783.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 311620/608042 [02:03<01:54, 2581.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████▏    | 311897/608042 [02:03<01:53, 2612.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████▏    | 312166/608042 [02:03<02:00, 2464.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████▏    | 312433/608042 [02:03<02:05, 2357.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████▏    | 312805/608042 [02:03<01:52, 2616.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████▏    | 313081/608042 [02:03<01:57, 2515.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 313356/608042 [02:04<02:00, 2446.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 313649/608042 [02:04<01:58, 2491.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 313913/608042 [02:04<01:58, 2492.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 314174/608042 [02:04<02:00, 2438.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 314438/608042 [02:04<01:58, 2477.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 314811/608042 [02:04<01:46, 2742.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 315089/608042 [02:04<01:52, 2613.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 315352/608042 [02:04<01:56, 2505.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 315620/608042 [02:04<01:54, 2544.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 315876/608042 [02:05<01:59, 2436.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 316143/608042 [02:05<01:58, 2462.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 316392/608042 [02:05<01:58, 2460.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 316654/608042 [02:05<02:03, 2364.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 316901/608042 [02:05<02:09, 2245.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 317219/608042 [02:05<01:57, 2476.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 317481/608042 [02:05<01:57, 2471.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 317765/608042 [02:05<01:52, 2571.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 318039/608042 [02:05<01:50, 2618.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=12024-08-03T04:45:03.832268751Z 6):  52%|█████▏    | 318308/608042 [02:06<01:53, 2542.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 318566/608042 [02:06<01:53, 2540.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 318844/608042 [02:06<01:53, 2557.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 319152/608042 [02:06<01:50, 2606.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 319453/608042 [02:06<01:46, 2702.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 319737/608042 [02:06<01:55, 2499.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 319998/608042 [02:06<01:54, 2514.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 320267/608042 [02:06<01:57, 2442.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 320530/608042 [02:06<01:58, 2428.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 320778/608042 [02:07<01:58, 2429.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 321077/608042 [02:07<01:52, 2548.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 321339/608042 [02:07<01:52, 2558.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 321623/608042 [02:07<01:50, 2600.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 321885/608042 [02:07<01:59, 2385.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 322136/608042 [02:07<02:02, 2326.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 322493/608042 [02:07<01:47, 2662.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 322766/608042 [02:07<01:55, 2477.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 323027/608042 [02:07<01:56, 2440.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 323369/608042 [02:08<01:45, 2696.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 323651/608042 [02:08<01:49, 2602.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 323952/608042 [02:08<01:46, 2655.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 324324/608042 [02:08<01:39, 2865.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 324616/608042 [02:08<01:45, 2693.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 324946/608042 [02:08<01:40, 2803.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 325244/608042 [02:08<01:39, 2829.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▎    | 325537/608042 [02:08<01:45, 2686.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▎    | 325827/608042 [02:08<01:44, 2710.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▎    | 326107/608042 [02:09<01:52, 2501.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▎    | 326390/608042 [02:09<01:51, 2520.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▎    | 326682/608042 [02:09<01:47, 2617.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 326951/608042 [02:09<01:49, 2557.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 327222/608042 [02:09<01:48, 2587.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 327497/608042 [02:09<01:59, 2348.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 327747/608042 [02:09<01:59, 2340.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 328011/608042 [02:09<01:56, 2410.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 328281/608042 [02:09<01:56, 2397.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 328596/608042 [02:10<01:47, 2588.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 328906/608042 [02:10<01:43, 2696.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 329182/608042 [02:10<01:49, 2549.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 329442/608042 [02:10<01:54, 2441.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 329751/608042 [02:10<01:47, 2592.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 330035/608042 [02:10<01:46, 2617.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 330346/608042 [02:10<01:41, 2735.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 330627/608042 [02:10<01:43, 2691.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 330918/608042 [02:10<01:47, 2588.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 331193/608042 [02:11<01:50, 2494.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 331450/608042 [02:11<01:54, 2422.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 331710/608042 [02:11<01:52, 2446.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 331967/608042 [02:11<01:54, 2409.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 332232/608042 [02:11<01:53, 2437.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 332544/608042 [02:11<01:44, 2628.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 332813/608042 [02:11<01:47, 2565.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 333077/608042 [02:11<01:47, 2562.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 333336/608042 [02:11<01:57, 2333.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 333611/608042 [02:12<01:53, 2409.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 333931/608042 [02:12<01:45, 2588.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 334234/608042 [02:12<01:44, 2616.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 334498/608042 [02:12<01:49, 2489.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 334758/608042 [02:12<01:51, 2448.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 335007/608042 [02:12<01:53, 2409.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 335257/608042 [02:12<01:59, 2281.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 335542/608042 [02:12<01:52, 2424.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 335802/608042 [02:12<01:54, 2387.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 336079/608042 [02:13<01:53, 2398.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 336320/608042 [02:13<01:53, 2391.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 336575/608042 [02:13<01:54, 2363.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 336813/608042 [02:13<01:57, 2304.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 337090/608042 [02:13<01:53, 2388.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 337415/608042 [02:13<01:44, 2593.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 337678/608042 [02:13<01:47, 2523.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 337937/608042 [02:13<01:47, 2504.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 338194/608042 [02:13<01:52, 2393.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 338441/608042 [02:14<01:52, 2394.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 338703/608042 [02:14<01:49, 2451.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 339019/608042 [02:14<01:41, 2645.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 339318/608042 [02:14<01:38, 2729.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 339700/608042 [02:14<01:28, 3016.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 340013/608042 [02:14<01:41, 2640.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 340342/608042 [02:14<01:36, 2784.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 340634/608042 [02:14<01:34, 2814.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 340935/608042 [02:14<01:42, 2602.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 341212/608042 [02:15<01:44, 2554.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 341530/608042 [02:15<01:38, 2715.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 341816/608042 [02:15<01:44, 2557.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▋    | 342077/608042 [02:15<01:44, 2552.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▋    | 342338/608042 [02:15<01:49, 2432.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▋    | 342588/608042 [02:15<01:49, 2429.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▋    | 342843/608042 [02:15<01:47, 2461.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▋    | 343194/608042 [02:15<01:36, 2734.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▋    | 343478/608042 [02:15<01:41, 2619.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 343761/608042 [02:16<01:52, 2352.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 344013/608042 [02:16<01:50, 2388.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 344299/608042 [02:16<01:45, 2508.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 344627/608042 [02:16<01:43, 2556.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 344886/608042 [02:16<01:45, 2486.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 345193/608042 [02:16<01:40, 2602.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 345489/608042 [02:16<01:37, 2689.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 345775/608042 [02:16<01:37, 2695.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 346055/608042 [02:16<01:40, 2608.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 346328/608042 [02:17<01:39, 2636.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 346605/608042 [02:17<01:47, 2431.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 346914/608042 [02:17<01:43, 2530.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 347202/608042 [02:17<01:40, 2604.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 347506/608042 [02:17<01:36, 2710.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 347782/608042 [02:17<01:42, 2528.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 348042/608042 [02:17<01:44, 2482.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 348302/608042 [02:17<01:47, 2405.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 348546/608042 [02:17<01:51, 2329.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 348836/608042 [02:18<01:44, 2469.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 349211/608042 [02:18<01:33, 2780.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 349504/608042 [02:18<01:42, 2533.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 349768/608042 [02:18<01:41, 2551.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 350036/608042 [02:18<01:42, 2505.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 350301/608042 [02:18<01:48, 2370.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 350553/608042 [02:18<01:49, 2341.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 350808/608042 [02:18<01:50, 2317.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 351044/608042 [02:18<01:50, 2324.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 351280/608042 [02:19<01:54, 2249.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 351523/608042 [02:19<01:53, 2265.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 351809/608042 [02:19<01:45, 2428.83 examples/s]
2024-08-03T04:45:03.832268751Z Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 352177/608042 [02:19<01:32, 2760.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 352463/608042 [02:19<01:39, 2561.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 352792/608042 [02:19<01:34, 2689.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 353067/608042 [02:19<01:36, 2633.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 353335/608042 [02:19<01:38, 2577.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 353676/608042 [02:19<01:30, 2801.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 353968/608042 [02:20<01:41, 2507.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 354240/608042 [02:20<01:40, 2519.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 354542/608042 [02:20<01:35, 2652.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 354817/608042 [02:20<01:36, 2632.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 355112/608042 [02:20<01:36, 2627.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 355410/608042 [02:20<01:33, 2709.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 355687/608042 [02:20<01:32, 2722.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▊    | 355961/608042 [02:20<01:38, 2557.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▊    | 356234/608042 [02:20<01:39, 2540.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▊    | 356493/608042 [02:21<01:39, 2516.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▊    | 356748/608042 [02:21<01:40, 2510.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▊    | 357051/608042 [02:21<01:34, 2657.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 357318/608042 [02:21<01:36, 2605.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 357594/608042 [02:21<01:35, 2628.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 357867/608042 [02:21<01:37, 2575.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 358172/608042 [02:21<01:32, 2687.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 358457/608042 [02:21<01:32, 2695.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 358729/608042 [02:21<01:36, 2586.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 359087/608042 [02:22<01:26, 2864.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 359383/608042 [02:22<01:31, 2725.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 359675/608042 [02:22<01:30, 2758.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 359973/608042 [02:22<01:31, 2715.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 360247/608042 [02:22<01:31, 2719.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 360526/608042 [02:22<01:37, 2536.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 360789/608042 [02:22<01:36, 2557.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 361054/608042 [02:22<01:41, 2444.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 361307/608042 [02:22<01:43, 2377.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 361557/608042 [02:23<01:42, 2394.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 361846/608042 [02:23<01:39, 2485.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 362118/608042 [02:23<01:41, 2419.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 362443/608042 [02:23<01:33, 2614.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 362721/608042 [02:23<01:34, 2589.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 362985/608042 [02:23<01:42, 2385.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 363275/608042 [02:23<01:43, 2372.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 363528/608042 [02:23<01:46, 2298.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 363773/608042 [02:23<01:49, 2230.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 364026/608042 [02:24<01:51, 2193.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 364317/608042 [02:24<01:44, 2325.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 364602/608042 [02:24<01:42, 2364.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 364914/608042 [02:24<01:35, 2548.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 365178/608042 [02:24<01:34, 2563.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 365442/608042 [02:24<01:37, 2486.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 365792/608042 [02:24<01:28, 2728.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 366073/608042 [02:24<01:35, 2542.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 366336/608042 [02:24<01:37, 2478.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 366617/608042 [02:25<01:34, 2564.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 366941/608042 [02:25<01:29, 2698.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 367233/608042 [02:25<01:27, 2752.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 367512/608042 [02:25<01:27, 2750.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 367799/608042 [02:25<01:31, 2631.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 368072/608042 [02:25<01:30, 2655.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 368359/608042 [02:25<01:28, 2701.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 368640/608042 [02:25<01:28, 2712.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 368921/608042 [02:25<01:28, 2699.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 369224/608042 [02:25<01:26, 2776.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 369524/608042 [02:26<01:30, 2633.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 369796/608042 [02:26<01:30, 2630.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 370067/608042 [02:26<01:39, 2384.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 370361/608042 [02:26<01:33, 2530.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 370687/608042 [02:26<01:26, 2728.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 370981/608042 [02:26<01:26, 2737.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 371261/608042 [02:26<01:28, 2678.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 371533/608042 [02:26<01:30, 2603.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 371856/608042 [02:26<01:25, 2770.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 372137/608042 [02:27<01:32, 2545.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████▏   | 372468/608042 [02:27<01:25, 2745.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████▏   | 372844/608042 [02:27<01:20, 2916.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████▏   | 373164/608042 [02:27<01:24, 2793.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████▏   | 373563/608042 [02:27<01:18, 3004.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████▏   | 373877/608042 [02:27<01:21, 2879.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 374172/608042 [02:27<01:22, 2832.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 374477/608042 [02:27<01:23, 2804.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 374779/608042 [02:28<01:21, 2859.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 375069/608042 [02:28<01:24, 2755.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 375434/608042 [02:28<01:18, 2975.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 375742/608042 [02:28<01:19, 2925.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 376042/608042 [02:28<01:24, 2759.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 376331/608042 [02:28<01:25, 2700.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 376629/608042 [02:28<01:25, 2719.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 376905/608042 [02:28<01:25, 2703.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 377180/608042 [02:28<01:35, 2424.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 377431/608042 [02:29<01:39, 2326.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 377714/608042 [02:29<01:34, 2446.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 377964/608042 [02:29<01:35, 2416.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 378223/608042 [02:29<01:34, 2442.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 378498/608042 [02:29<01:31, 2515.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 378762/608042 [02:29<01:32, 2483.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 379066/608042 [02:29<01:26, 2636.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 379437/608042 [02:29<01:19, 2882.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 379746/608042 [02:29<01:19, 2884.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 380051/608042 [02:30<01:20, 2844.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 380356/608042 [02:30<01:23, 2731.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 380634/608042 [02:30<01:28, 2557.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 380964/608042 [02:30<01:24, 2702.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 381238/608042 [02:30<01:26, 2613.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 381510/608042 [02:30<01:26, 2617.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 381776/608042 [02:30<01:32, 2456.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 382028/608042 [02:30<01:35, 2364.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 382297/608042 [02:30<01:34, 2401.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 382598/608042 [02:31<01:28, 2545.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 382861/608042 [02:31<01:33, 2408.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 383162/608042 [02:31<01:29, 2522.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 383507/608042 [02:31<01:21, 2760.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 383857/608042 [02:31<01:15, 2952.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 384158/608042 [02:31<01:15, 2952.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 384493/608042 [02:31<01:13, 3024.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 384805/608042 [02:31<01:19, 2798.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 385096/608042 [02:31<01:23, 2680.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 385408/608042 [02:32<01:19, 2797.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 385698/608042 [02:32<01:21, 2724.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 386040/608042 [02:32<01:17, 2851.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▎   | 386338/608042 [02:32<01:19, 2800.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▎   | 386640/608042 [02:32<01:17, 2859.2024-08-03T04:45:03.832268751Z 40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▎   | 386929/608042 [02:32<01:25, 2579.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▎   | 387214/608042 [02:32<01:24, 2624.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▎   | 387520/608042 [02:32<01:21, 2707.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 387796/608042 [02:32<01:24, 2619.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 388106/608042 [02:33<01:20, 2734.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 388464/608042 [02:33<01:13, 2971.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 388771/608042 [02:33<01:14, 2926.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 389107/608042 [02:33<01:12, 3016.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 389413/608042 [02:33<01:17, 2815.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 389702/608042 [02:33<01:24, 2596.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 389973/608042 [02:33<01:23, 2612.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 390244/608042 [02:33<01:24, 2588.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 390541/608042 [02:33<01:21, 2661.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 390819/608042 [02:34<01:26, 2508.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 391073/608042 [02:34<01:27, 2467.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 391328/608042 [02:34<01:28, 2459.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 391612/608042 [02:34<01:28, 2442.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 391902/608042 [02:34<01:25, 2528.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 392169/608042 [02:34<01:31, 2360.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 392465/608042 [02:34<01:27, 2450.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 392716/608042 [02:34<01:27, 2458.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 393019/608042 [02:34<01:22, 2617.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 393302/608042 [02:35<01:26, 2493.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 393591/608042 [02:35<01:23, 2581.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 393866/608042 [02:35<01:21, 2621.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 394235/608042 [02:35<01:13, 2922.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 394573/608042 [02:35<01:10, 3007.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 394877/608042 [02:35<01:14, 2877.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 395182/608042 [02:35<01:13, 2892.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 395476/608042 [02:35<01:20, 2631.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 395763/608042 [02:35<01:28, 2388.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 396058/608042 [02:36<01:24, 2513.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 396340/608042 [02:36<01:21, 2590.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 396624/608042 [02:36<01:20, 2620.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 396896/608042 [02:36<01:26, 2450.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 397197/608042 [02:36<01:21, 2572.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 397463/608042 [02:36<01:27, 2397.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 397763/608042 [02:36<01:23, 2515.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 398068/608042 [02:36<01:21, 2592.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 398348/608042 [02:36<01:21, 2561.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 398618/608042 [02:37<01:29, 2336.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 398871/608042 [02:37<01:27, 2386.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 399156/608042 [02:37<01:24, 2478.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 399426/608042 [02:37<01:24, 2454.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 399745/608042 [02:37<01:19, 2615.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 400023/608042 [02:37<01:18, 2648.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 400295/608042 [02:37<01:24, 2462.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 400565/608042 [02:37<01:22, 2508.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 400828/608042 [02:37<01:23, 2493.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 401089/608042 [02:38<01:23, 2475.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 401348/608042 [02:38<01:36, 2141.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 401571/608042 [02:38<01:41, 2028.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 401788/608042 [02:38<01:40, 2055.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 402008/608042 [02:38<01:40, 2047.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 402311/608042 [02:38<01:29, 2303.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 402546/608042 [02:38<01:28, 2314.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▋   | 402837/608042 [02:38<01:24, 2422.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▋   | 403102/608042 [02:38<01:24, 2439.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▋   | 403348/608042 [02:39<01:24, 2427.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▋   | 403595/608042 [02:39<01:26, 2363.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▋   | 403867/608042 [02:39<01:25, 2392.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▋   | 404118/608042 [02:39<01:24, 2418.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 404414/608042 [02:39<01:19, 2561.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 404679/608042 [02:39<01:27, 2332.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 404981/608042 [02:39<01:21, 2484.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 405268/608042 [02:39<01:20, 2527.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 405585/608042 [02:39<01:15, 2694.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 405864/608042 [02:40<01:17, 2609.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 406142/608042 [02:40<01:16, 2649.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 406412/608042 [02:40<01:18, 2563.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 406693/608042 [02:40<01:17, 2594.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 407001/608042 [02:40<01:14, 2715.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 407277/608042 [02:40<01:14, 2680.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 407548/608042 [02:40<01:24, 2375.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 407888/608042 [02:40<01:17, 2583.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 408163/608042 [02:40<01:16, 2611.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 408452/608042 [02:41<01:16, 2602.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 408746/608042 [02:41<01:14, 2691.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 409074/608042 [02:41<01:11, 2767.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 409362/608042 [02:41<01:13, 2710.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 409640/608042 [02:41<01:16, 2607.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 409911/608042 [02:41<01:22, 2404.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 410156/608042 [02:41<01:22, 2410.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 410465/608042 [02:41<01:17, 2538.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 410722/608042 [02:41<01:21, 2411.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 411059/608042 [02:42<01:14, 2650.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 411333/608042 [02:42<01:17, 2525.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 411594/608042 [02:42<01:18, 2513.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 411894/608042 [02:42<01:15, 2609.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 412165/608042 [02:42<01:15, 2577.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 412517/608042 [02:42<01:09, 2814.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 412850/608042 [02:42<01:06, 2949.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 413148/608042 [02:42<01:07, 2893.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 413445/608042 [02:42<01:08, 2823.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 413741/608042 [02:43<01:08, 2823.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 414032/608042 [02:43<01:14, 2603.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 414325/608042 [02:43<01:12, 2669.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 414603/608042 [02:43<01:15, 2571.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 414981/608042 [02:43<01:07, 2869.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 415307/608042 [02:43<01:07, 2850.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 415634/608042 [02:43<01:05, 2953.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 415932/608042 [02:43<01:12, 2649.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 416204/608042 [02:43<01:11, 2665.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 416487/608042 [02:44<01:17, 2481.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▊   | 416794/608042 [02:44<01:12, 2628.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▊   | 417170/608042 [02:44<01:05, 2900.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▊   | 417484/608042 [02:44<01:04, 2955.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▊   | 417824/608042 [02:44<01:01, 3070.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 418145/608042 [02:44<01:09, 2735.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 418428/608042 [02:44<01:12, 2602.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 418718/608042 [02:44<01:10, 2673.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 419002/608042 [02:45<01:12, 2598.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 419276/608042 [02:45<01:13, 2564.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 419543/608042 [02:45<01:18, 2411.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 419791/608042 [02:45<01:17, 2424.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 420039/608042 [02:45<01:22, 2270.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 420269/608042 [02:45<01:22, 2276.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 420598/608042 [2024-08-03T04:45:03.832268751Z 02:45<01:17, 2415.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 420857/608042 [02:45<01:17, 2412.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 421105/608042 [02:45<01:19, 2349.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 421433/608042 [02:46<01:12, 2559.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 421754/608042 [02:46<01:08, 2712.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 422051/608042 [02:46<01:10, 2625.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 422333/608042 [02:46<01:09, 2673.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 422625/608042 [02:46<01:10, 2618.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 422901/608042 [02:46<01:13, 2522.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 423169/608042 [02:46<01:12, 2559.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 423484/608042 [02:46<01:09, 2656.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 423760/608042 [02:46<01:12, 2535.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 424018/608042 [02:47<01:13, 2509.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 424362/608042 [02:47<01:06, 2762.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 424687/608042 [02:47<01:04, 2842.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 424998/608042 [02:47<01:04, 2844.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 425331/608042 [02:47<01:01, 2981.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 425636/608042 [02:47<01:02, 2924.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 425931/608042 [02:47<01:04, 2803.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 426214/608042 [02:47<01:05, 2765.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 426507/608042 [02:47<01:10, 2567.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 426778/608042 [02:48<01:13, 2480.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 427105/608042 [02:48<01:08, 2644.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 427399/608042 [02:48<01:07, 2686.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 427670/608042 [02:48<01:12, 2503.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 427933/608042 [02:48<01:11, 2521.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 428210/608042 [02:48<01:09, 2580.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 428490/608042 [02:48<01:07, 2642.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 428876/608042 [02:48<01:01, 2896.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 429239/608042 [02:48<00:58, 3068.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 429548/608042 [02:48<00:58, 3028.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 429871/608042 [02:49<01:02, 2851.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 430161/608042 [02:49<01:03, 2817.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 430460/608042 [02:49<01:02, 2860.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 430755/608042 [02:49<01:02, 2841.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 431046/608042 [02:49<01:08, 2600.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 431337/608042 [02:49<01:06, 2652.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 431612/608042 [02:49<01:08, 2581.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 431876/608042 [02:49<01:10, 2485.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 432152/608042 [02:50<01:09, 2513.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 432411/608042 [02:50<01:11, 2444.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 432796/608042 [02:50<01:01, 2831.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 433107/608042 [02:50<01:00, 2895.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████▏  | 433403/608042 [02:50<01:01, 2859.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████▏  | 433700/608042 [02:50<01:06, 2614.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████▏  | 433979/608042 [02:50<01:06, 2636.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████▏  | 434256/608042 [02:50<01:06, 2599.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████▏  | 434520/608042 [02:50<01:10, 2445.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 434781/608042 [02:51<01:12, 2397.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 435125/608042 [02:51<01:04, 2668.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 435431/608042 [02:51<01:03, 2734.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 435720/608042 [02:51<01:03, 2727.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 436011/608042 [02:51<01:04, 2682.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 436296/608042 [02:51<01:06, 2578.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 436574/608042 [02:51<01:05, 2616.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 436842/608042 [02:51<01:07, 2537.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 437170/608042 [02:51<01:02, 2718.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 437448/608042 [02:51<01:02, 2712.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 437729/608042 [02:52<01:06, 2546.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 438006/608042 [02:52<01:05, 2607.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 438277/608042 [02:52<01:09, 2455.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 438681/608042 [02:52<00:59, 2843.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 438973/608042 [02:52<01:06, 2530.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 439240/608042 [02:52<01:07, 2516.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 439509/608042 [02:52<01:06, 2535.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 439790/608042 [02:52<01:05, 2585.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 440103/608042 [02:53<01:01, 2727.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 440451/608042 [02:53<00:57, 2938.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 440776/608042 [02:53<00:56, 2969.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 441081/608042 [02:53<00:56, 2957.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 441382/608042 [02:53<01:04, 2596.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 441659/608042 [02:53<01:04, 2595.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 441930/608042 [02:53<01:03, 2613.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 442199/608042 [02:53<01:04, 2564.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 442497/608042 [02:53<01:01, 2678.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 442777/608042 [02:54<01:02, 2627.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 443044/608042 [02:54<01:05, 2519.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 443301/608042 [02:54<01:11, 2303.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 443550/608042 [02:54<01:11, 2304.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 443824/608042 [02:54<01:09, 2377.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 444069/608042 [02:54<01:12, 2249.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 444299/608042 [02:54<01:16, 2146.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 444591/608042 [02:54<01:09, 2346.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 444874/608042 [02:54<01:06, 2453.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 445135/608042 [02:55<01:05, 2494.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 445439/608042 [02:55<01:01, 2648.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 445716/608042 [02:55<01:01, 2618.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 446008/608042 [02:55<01:00, 2683.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 446283/608042 [02:55<01:04, 2491.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 446539/608042 [02:55<01:06, 2426.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 446878/608042 [02:55<01:00, 2663.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▎  | 447200/608042 [02:55<00:57, 2811.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▎  | 447497/608042 [02:55<00:56, 2833.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▎  | 447783/608042 [02:56<01:01, 2601.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▎  | 448050/608042 [02:56<01:01, 2608.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▎  | 448320/608042 [02:56<01:01, 2588.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 448604/608042 [02:56<01:00, 2628.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 448893/608042 [02:56<00:59, 2658.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 449222/608042 [02:56<00:56, 2787.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 449502/608042 [02:56<00:59, 2659.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 449784/608042 [02:56<00:59, 2662.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 450060/608042 [02:56<00:59, 2661.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 450340/608042 [02:56<00:59, 2663.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 450614/608042 [02:57<00:58, 2680.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 450889/608042 [02:57<01:03, 2458.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 451156/608042 [02:57<01:02, 2504.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 451417/608042 [02:57<01:06, 2351.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 451689/608042 [02:57<01:03, 2447.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 451945/608042 [02:57<01:04, 2403.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 452233/608042 [02:57<01:02, 2495.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 452499/608042 [02:57<01:01, 2534.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 452756/608042 [02:57<01:03, 2458.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 453012/608042 [02:58<01:04, 2418.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 453256/608042 [02:58<01:04, 2387.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 453509/608042 [02:58<01:04, 2404.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 453757/608042 [02:58<01:09, 2222.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 454017/608042 [02:58<01:07, 2276.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|████��2024-08-03T04:45:03.832268751Z �██▍  | 454306/608042 [02:58<01:04, 2391.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 454610/608042 [02:58<01:01, 2484.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 454860/608042 [02:58<01:04, 2377.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 455109/608042 [02:58<01:03, 2390.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 455351/608042 [02:59<01:11, 2150.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 455576/608042 [02:59<01:11, 2141.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 455822/608042 [02:59<01:08, 2213.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 456133/608042 [02:59<01:04, 2368.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 456392/608042 [02:59<01:02, 2420.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 456687/608042 [02:59<00:59, 2546.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 456948/608042 [02:59<01:06, 2277.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 457249/608042 [02:59<01:01, 2457.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 457516/608042 [02:59<01:00, 2474.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 457849/608042 [03:00<00:56, 2660.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 458127/608042 [03:00<00:58, 2572.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 458464/608042 [03:00<00:55, 2696.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 458743/608042 [03:00<00:55, 2673.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 459094/608042 [03:00<00:51, 2897.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 459387/608042 [03:00<00:51, 2878.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 459697/608042 [03:00<00:50, 2918.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 459999/608042 [03:00<00:50, 2912.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 460312/608042 [03:00<00:51, 2872.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 460608/608042 [03:01<00:51, 2846.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 460898/608042 [03:01<00:54, 2701.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 461176/608042 [03:01<00:55, 2635.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 461452/608042 [03:01<00:57, 2536.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 461709/608042 [03:01<00:57, 2528.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 462029/608042 [03:01<00:54, 2695.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 462313/608042 [03:01<00:55, 2636.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 462583/608042 [03:01<00:54, 2652.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 462862/608042 [03:01<00:56, 2564.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 463157/608042 [03:02<00:55, 2613.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 463439/608042 [03:02<00:56, 2553.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▋  | 463711/608042 [03:02<00:56, 2572.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▋  | 463983/608042 [03:02<00:56, 2564.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▋  | 464245/608042 [03:02<00:55, 2579.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▋  | 464509/608042 [03:02<00:56, 2530.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▋  | 464777/608042 [03:02<00:56, 2515.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▋  | 465037/608042 [03:02<01:04, 2212.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 465336/608042 [03:02<00:59, 2379.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 465635/608042 [03:03<00:58, 2453.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 465889/608042 [03:03<01:02, 2291.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 466140/608042 [03:03<01:00, 2342.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 466459/608042 [03:03<00:57, 2449.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 466753/608042 [03:03<00:55, 2526.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 467011/608042 [03:03<00:56, 2516.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 467275/608042 [03:03<00:58, 2422.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 467619/608042 [03:03<00:52, 2665.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 467945/608042 [03:03<00:49, 2807.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 468243/608042 [03:04<00:49, 2801.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 468533/608042 [03:04<00:51, 2701.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 468805/608042 [03:04<00:52, 2654.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 469086/608042 [03:04<00:51, 2696.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 469367/608042 [03:04<00:51, 2690.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 469639/608042 [03:04<00:52, 2620.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 469909/608042 [03:04<00:53, 2579.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 470205/608042 [03:04<00:53, 2598.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 470479/608042 [03:04<00:53, 2570.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 470742/608042 [03:05<00:57, 2386.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 470985/608042 [03:05<00:59, 2314.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 471257/608042 [03:05<00:56, 2421.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 471571/608042 [03:05<00:53, 2529.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 471873/608042 [03:05<00:51, 2663.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 472144/608042 [03:05<00:54, 2489.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 472464/608042 [03:05<00:50, 2665.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 472740/608042 [03:05<00:51, 2626.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 473006/608042 [03:05<00:55, 2417.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 473309/608042 [03:06<00:52, 2568.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 473579/608042 [03:06<00:54, 2475.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 473835/608042 [03:06<00:54, 2451.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 474086/608042 [03:06<00:55, 2408.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 474335/608042 [03:06<00:56, 2383.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 474584/608042 [03:06<00:57, 2319.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 474828/608042 [03:06<00:58, 2286.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 475087/608042 [03:06<00:56, 2363.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 475366/608042 [03:06<00:53, 2474.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 475687/608042 [03:07<00:49, 2676.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 475967/608042 [03:07<00:51, 2544.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 476320/608042 [03:07<00:47, 2792.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 476605/608042 [03:07<00:47, 2748.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 476892/608042 [03:07<00:50, 2601.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 477155/608042 [03:07<00:52, 2509.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▊  | 477441/608042 [03:07<00:51, 2557.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▊  | 477761/608042 [03:07<00:47, 2728.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▊  | 478047/608042 [03:07<00:47, 2709.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▊  | 478359/608042 [03:08<00:46, 2810.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▊  | 478643/608042 [03:08<00:45, 2819.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 478941/608042 [03:08<00:50, 2546.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 479261/608042 [03:08<00:48, 2658.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 479559/608042 [03:08<00:46, 2745.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 479838/608042 [03:08<00:47, 2674.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 480136/608042 [03:08<00:47, 2692.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 480458/608042 [03:08<00:45, 2829.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 480746/608042 [03:08<00:47, 2700.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 481025/608042 [03:09<00:48, 2598.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 481317/608042 [03:09<00:47, 2671.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 481620/608042 [03:09<00:45, 2757.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 481901/608042 [03:09<00:46, 2733.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 482178/608042 [03:09<00:46, 2682.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 482468/608042 [03:09<00:46, 2689.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 482748/608042 [03:09<00:49, 2533.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 483005/608042 [03:09<00:52, 2383.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 483263/608042 [03:09<00:51, 2432.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 483513/608042 [03:10<00:51, 2439.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 483797/608042 [03:10<00:49, 2500.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 484098/608042 [03:10<00:47, 2599.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 484380/608042 [03:10<00:46, 2656.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 484672/608042 [03:10<00:45, 2715.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 484966/608042 [03:10<00:52, 2340.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 485222/608042 [03:10<00:51, 2396.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 485534/608042 [03:10<00:47, 2555.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 485814/608042 [03:10<00:48, 2526.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 486117/608042 [03:11<00:48, 2533.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 486375/608042 [03:11<00:48, 2510.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 486637/608042 [03:11<00:47, 2540.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 486899/608042 [03:11<00:49, 2427.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 487183/608042 2024-08-03T04:45:03.832268751Z [03:11<00:47, 2531.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 487448/608042 [03:11<00:48, 2463.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 487704/608042 [03:11<00:49, 2448.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 487958/608042 [03:11<00:50, 2396.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 488203/608042 [03:11<00:52, 2273.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 488517/608042 [03:12<00:48, 2484.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 488770/608042 [03:12<00:50, 2348.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 489023/608042 [03:12<00:50, 2370.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 489372/608042 [03:12<00:46, 2558.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 489639/608042 [03:12<00:46, 2553.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 489935/608042 [03:12<00:45, 2596.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 490225/608042 [03:12<00:44, 2643.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 490505/608042 [03:12<00:44, 2668.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 490787/608042 [03:12<00:44, 2653.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 491053/608042 [03:12<00:44, 2615.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 491333/608042 [03:13<00:44, 2632.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 491598/608042 [03:13<00:44, 2629.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 491901/608042 [03:13<00:43, 2685.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 492171/608042 [03:13<00:45, 2562.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 492435/608042 [03:13<00:46, 2480.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 492685/608042 [03:13<00:46, 2467.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 492933/608042 [03:13<00:47, 2442.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 493193/608042 [03:13<00:49, 2338.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 493477/608042 [03:13<00:46, 2448.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 493757/608042 [03:14<00:45, 2509.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 494020/608042 [03:14<00:45, 2531.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████▏ | 494317/608042 [03:14<00:44, 2582.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████▏ | 494580/608042 [03:14<00:45, 2476.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████▏ | 494844/608042 [03:14<00:45, 2496.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████▏ | 495107/608042 [03:14<00:45, 2507.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████▏ | 495361/608042 [03:14<00:47, 2380.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 495635/608042 [03:14<00:45, 2480.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 495903/608042 [03:14<00:44, 2500.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 496165/608042 [03:15<00:44, 2531.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 496524/608042 [03:15<00:39, 2813.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 496815/608042 [03:15<00:42, 2620.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 497085/608042 [03:15<00:43, 2562.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 497396/608042 [03:15<00:40, 2708.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 497735/608042 [03:15<00:38, 2881.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 498040/608042 [03:15<00:42, 2581.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 498342/608042 [03:15<00:40, 2678.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 498628/608042 [03:15<00:40, 2703.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 498909/608042 [03:16<00:41, 2602.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 499220/608042 [03:16<00:40, 2654.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 499491/608042 [03:16<00:40, 2664.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 499800/608042 [03:16<00:39, 2772.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 500083/608042 [03:16<00:39, 2718.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 500405/608042 [03:16<00:37, 2853.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 500725/608042 [03:16<00:36, 2915.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 501024/608042 [03:16<00:40, 2655.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 501313/608042 [03:16<00:39, 2709.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 501598/608042 [03:17<00:40, 2613.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 501956/608042 [03:17<00:37, 2838.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 502245/608042 [03:17<00:40, 2620.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 502514/608042 [03:17<00:46, 2268.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 502837/608042 [03:17<00:42, 2452.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 503162/608042 [03:17<00:39, 2653.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 503541/608042 [03:17<00:35, 2925.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 503914/608042 [03:17<00:34, 3056.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 504239/608042 [03:17<00:35, 2890.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 504546/608042 [03:18<00:37, 2755.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 504830/608042 [03:18<00:37, 2750.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 505125/608042 [03:18<00:40, 2515.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 505398/608042 [03:18<00:40, 2540.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 505663/608042 [03:18<00:40, 2509.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 506005/608042 [03:18<00:37, 2738.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 506298/608042 [03:18<00:38, 2673.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 506592/608042 [03:18<00:37, 2689.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 506874/608042 [03:19<00:37, 2686.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 507158/608042 [03:19<00:38, 2651.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 507461/608042 [03:19<00:36, 2742.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▎ | 507745/608042 [03:19<00:38, 2615.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▎ | 508074/608042 [03:19<00:36, 2754.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▎ | 508353/608042 [03:19<00:38, 2598.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▎ | 508643/608042 [03:19<00:37, 2627.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▎ | 508998/608042 [03:19<00:34, 2863.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 509308/608042 [03:19<00:36, 2714.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 509586/608042 [03:20<00:37, 2609.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 509862/608042 [03:20<00:37, 2615.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 510139/608042 [03:20<00:37, 2600.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 510422/608042 [03:20<00:37, 2625.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 510756/608042 [03:20<00:34, 2783.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 511036/608042 [03:20<00:37, 2600.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 511326/608042 [03:20<00:36, 2667.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 511606/608042 [03:20<00:38, 2520.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 511891/608042 [03:20<00:37, 2548.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 512148/608042 [03:21<00:38, 2475.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 512449/608042 [03:21<00:36, 2597.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 512723/608042 [03:21<00:38, 2456.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 512973/608042 [03:21<00:40, 2323.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 513226/608042 [03:21<00:41, 2271.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 513468/608042 [03:21<00:41, 2303.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 513763/608042 [03:21<00:38, 2423.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 514008/608042 [03:21<00:39, 2410.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 514251/608042 [03:21<00:40, 2294.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 514504/608042 [03:22<00:42, 2195.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 514745/608042 [03:22<00:42, 2206.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 515021/608042 [03:22<00:40, 2318.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 515337/608042 [03:22<00:36, 2525.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 515594/608042 [03:22<00:37, 2491.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 515876/608042 [03:22<00:35, 2569.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 516137/608042 [03:22<00:38, 2386.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 516456/608042 [03:22<00:36, 2525.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 516772/608042 [03:22<00:33, 2695.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 517052/608042 [03:23<00:34, 2618.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 517322/608042 [03:23<00:34, 2611.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 517596/608042 [03:23<00:36, 2483.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 517884/608042 [03:23<00:34, 2584.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 518165/608042 [03:23<00:37, 2400.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 518421/608042 [03:23<00:36, 2442.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 518674/608042 [03:23<00:40, 2233.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 518940/608042 [03:23<00:38, 2339.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 519230/608042 [03:23<00:36, 2454.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 519499/608042 [03:24<00:35, 2501.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 519766/608042 [03:24<00:37, 2324.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 5202024-08-03T04:45:03.832268751Z 025/608042 [03:24<00:38, 2290.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 520305/608042 [03:24<00:36, 2417.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 520708/608042 [03:24<00:30, 2826.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 521003/608042 [03:24<00:33, 2621.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 521376/608042 [03:24<00:29, 2905.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 521675/608042 [03:24<00:31, 2714.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 522055/608042 [03:24<00:29, 2936.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 522358/608042 [03:25<00:30, 2834.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 522734/608042 [03:25<00:27, 3054.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 523047/608042 [03:25<00:30, 2747.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 523340/608042 [03:25<00:32, 2634.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 523612/608042 [03:25<00:34, 2482.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 523948/608042 [03:25<00:31, 2662.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 524225/608042 [03:25<00:31, 2658.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▋ | 524495/608042 [03:25<00:31, 2665.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▋ | 524808/608042 [03:26<00:29, 2789.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▋ | 525101/608042 [03:26<00:31, 2646.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▋ | 525372/608042 [03:26<00:31, 2593.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▋ | 525646/608042 [03:26<00:33, 2473.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▋ | 525944/608042 [03:26<00:31, 2591.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 526243/608042 [03:26<00:30, 2671.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 526517/608042 [03:26<00:31, 2559.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 526776/608042 [03:26<00:32, 2530.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 527041/608042 [03:26<00:33, 2422.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 527289/608042 [03:27<00:34, 2323.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 527545/608042 [03:27<00:36, 2181.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 527860/608042 [03:27<00:34, 2342.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 528117/608042 [03:27<00:33, 2396.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 528452/608042 [03:27<00:30, 2643.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 528721/608042 [03:27<00:30, 2607.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 528989/608042 [03:27<00:30, 2613.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 529254/608042 [03:27<00:30, 2587.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 529515/608042 [03:27<00:35, 2213.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 529790/608042 [03:28<00:33, 2351.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 530064/608042 [03:28<00:31, 2451.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 530326/608042 [03:28<00:32, 2385.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 530570/608042 [03:28<00:32, 2389.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 530864/608042 [03:28<00:30, 2532.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 531196/608042 [03:28<00:28, 2718.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 531471/608042 [03:28<00:29, 2577.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 531786/608042 [03:28<00:28, 2711.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 532109/608042 [03:28<00:26, 2837.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 532398/608042 [03:29<00:27, 2759.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 532679/608042 [03:29<00:28, 2682.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 532963/608042 [03:29<00:28, 2661.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 533256/608042 [03:29<00:27, 2690.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 533675/608042 [03:29<00:25, 2958.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 533970/608042 [03:29<00:25, 2895.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 534262/608042 [03:29<00:26, 2835.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 534567/608042 [03:29<00:29, 2463.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 534829/608042 [03:29<00:30, 2431.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 535087/608042 [03:30<00:30, 2402.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 535356/608042 [03:30<00:32, 2261.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 535639/608042 [03:30<00:30, 2405.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 535885/608042 [03:30<00:30, 2382.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 536158/608042 [03:30<00:29, 2414.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 536449/608042 [03:30<00:28, 2519.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 536705/608042 [03:30<00:28, 2479.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 537014/608042 [03:30<00:26, 2649.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 537284/608042 [03:30<00:29, 2406.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 537535/608042 [03:31<00:29, 2383.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 537805/608042 [03:31<00:28, 2468.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 538096/608042 [03:31<00:27, 2582.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▊ | 538364/608042 [03:31<00:27, 2563.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▊ | 538627/608042 [03:31<00:28, 2477.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▊ | 538880/608042 [03:31<00:27, 2483.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▊ | 539175/608042 [03:31<00:26, 2614.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▊ | 539446/608042 [03:31<00:26, 2578.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 539711/608042 [03:31<00:26, 2589.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 539983/608042 [03:32<00:26, 2584.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 540246/608042 [03:32<00:28, 2358.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 540488/608042 [03:32<00:29, 2318.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 540727/608042 [03:32<00:29, 2288.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 541032/608042 [03:32<00:27, 2475.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 541424/608042 [03:32<00:23, 2820.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 541804/608042 [03:32<00:22, 2978.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 542104/608042 [03:32<00:22, 2935.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 542403/608042 [03:32<00:22, 2935.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 542705/608042 [03:33<00:23, 2797.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 542991/608042 [03:33<00:23, 2750.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 543279/608042 [03:33<00:23, 2777.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 543644/608042 [03:33<00:22, 2904.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 543936/608042 [03:33<00:22, 2839.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 544223/608042 [03:33<00:23, 2669.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 544528/608042 [03:33<00:23, 2750.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 544830/608042 [03:33<00:22, 2788.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 545118/608042 [03:33<00:22, 2745.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 545397/608042 [03:34<00:24, 2546.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 545665/608042 [03:34<00:26, 2361.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 545921/608042 [03:34<00:27, 2292.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 546177/608042 [03:34<00:26, 2338.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 546418/608042 [03:34<00:27, 2239.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 546657/608042 [03:34<00:27, 2255.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 546967/608042 [03:34<00:24, 2471.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 547217/608042 [03:34<00:26, 2338.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 547460/608042 [03:34<00:28, 2139.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 547771/608042 [03:35<00:25, 2367.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 548063/608042 [03:35<00:23, 2508.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 548472/608042 [03:35<00:20, 2870.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 548810/608042 [03:35<00:20, 2942.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 549155/608042 [03:35<00:19, 3068.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 549465/608042 [03:35<00:19, 2992.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 549780/608042 [03:35<00:20, 2890.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 550086/608042 [03:35<00:20, 2834.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 550390/608042 [03:35<00:20, 2854.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 550738/608042 [03:36<00:18, 3017.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 551053/608042 [03:36<00:20, 2771.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 551343/608042 [03:36<00:20, 2718.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 551623/608042 [03:36<00:21, 2634.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 551898/608042 [03:36<00:21, 2653.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 552191/608042 [03:36<00:21, 2654.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 552464/608042 [03:36<00:23, 2368.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 552708/608042 [03:36<00:23, 2321.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 553023/608042 [03:36<00:21, 2517.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 553281/608042 [03:37<00:21, 2530.06 examples/s]
Tokenizing and reformatting instructi2024-08-03T04:45:03.832268751Z on data (num_proc=16):  91%|█████████ | 553586/608042 [03:37<00:21, 2585.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 553870/608042 [03:37<00:20, 2592.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 554181/608042 [03:37<00:19, 2720.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 554514/608042 [03:37<00:18, 2867.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 554806/608042 [03:37<00:19, 2698.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████▏| 555102/608042 [03:37<00:19, 2765.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████▏| 555397/608042 [03:37<00:19, 2749.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████▏| 555679/608042 [03:37<00:21, 2468.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████▏| 555932/608042 [03:38<00:23, 2220.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████▏| 556198/608042 [03:38<00:22, 2283.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 556441/608042 [03:38<00:22, 2310.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 556764/608042 [03:38<00:20, 2557.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 557036/608042 [03:38<00:20, 2480.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 557292/608042 [03:38<00:20, 2423.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 557548/608042 [03:38<00:21, 2361.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 557807/608042 [03:38<00:20, 2399.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 558101/608042 [03:38<00:19, 2534.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 558393/608042 [03:39<00:18, 2638.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 558666/608042 [03:39<00:20, 2352.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 559030/608042 [03:39<00:18, 2679.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 559310/608042 [03:39<00:19, 2454.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 559582/608042 [03:39<00:20, 2348.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 559938/608042 [03:39<00:18, 2558.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 560284/608042 [03:39<00:17, 2740.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 560569/608042 [03:39<00:17, 2664.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 560848/608042 [03:40<00:17, 2695.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 561151/608042 [03:40<00:16, 2787.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 561443/608042 [03:40<00:16, 2790.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 561731/608042 [03:40<00:17, 2640.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 562000/608042 [03:40<00:19, 2395.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 562261/608042 [03:40<00:18, 2444.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 562550/608042 [03:40<00:17, 2560.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 562819/608042 [03:40<00:18, 2475.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 563073/608042 [03:40<00:18, 2439.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 563327/608042 [03:41<00:18, 2430.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 563576/608042 [03:41<00:19, 2330.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 563895/608042 [03:41<00:17, 2522.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 564149/608042 [03:41<00:17, 2521.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 564413/608042 [03:41<00:17, 2527.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 564717/608042 [03:41<00:16, 2659.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 564990/608042 [03:41<00:16, 2586.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 565307/608042 [03:41<00:15, 2722.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 565625/608042 [03:41<00:14, 2852.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 565914/608042 [03:41<00:15, 2769.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 566194/608042 [03:42<00:15, 2647.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 566472/608042 [03:42<00:15, 2619.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 566755/608042 [03:42<00:16, 2501.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 567010/608042 [03:42<00:16, 2485.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 567264/608042 [03:42<00:17, 2346.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 567550/608042 [03:42<00:16, 2476.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 567819/608042 [03:42<00:16, 2403.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 568066/608042 [03:42<00:16, 2420.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 568350/608042 [03:43<00:15, 2497.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▎| 568618/608042 [03:43<00:16, 2434.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▎| 568886/608042 [03:43<00:15, 2481.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▎| 569151/608042 [03:43<00:15, 2517.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▎| 569415/608042 [03:43<00:15, 2447.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▎| 569744/608042 [03:43<00:14, 2600.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▎| 570010/608042 [03:43<00:14, 2608.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 570310/608042 [03:43<00:13, 2697.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 570583/608042 [03:43<00:15, 2455.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 570846/608042 [03:44<00:15, 2393.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 571124/608042 [03:44<00:15, 2451.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 571470/608042 [03:44<00:13, 2707.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 571751/608042 [03:44<00:14, 2495.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 572027/608042 [03:44<00:14, 2535.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 572354/608042 [03:44<00:13, 2612.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 572679/608042 [03:44<00:13, 2719.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 572962/608042 [03:44<00:13, 2685.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 573238/608042 [03:44<00:13, 2658.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 573518/608042 [03:45<00:13, 2580.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 573843/608042 [03:45<00:12, 2730.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 574120/608042 [03:45<00:13, 2606.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 574405/608042 [03:45<00:13, 2461.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 574711/608042 [03:45<00:12, 2619.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 574994/608042 [03:45<00:12, 2659.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 575312/608042 [03:45<00:11, 2754.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 575605/608042 [03:45<00:13, 2482.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 575942/608042 [03:45<00:12, 2673.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 576334/608042 [03:46<00:10, 2974.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 576649/608042 [03:46<00:12, 2530.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 576947/608042 [03:46<00:11, 2604.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 577224/608042 [03:46<00:12, 2548.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 577595/608042 [03:46<00:10, 2824.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 577895/608042 [03:46<00:10, 2755.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 578180/608042 [03:46<00:11, 2646.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 578456/608042 [03:46<00:11, 2660.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 578731/608042 [03:46<00:11, 2635.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 579058/608042 [03:47<00:10, 2780.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 579353/608042 [03:47<00:10, 2787.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 579649/608042 [03:47<00:10, 2705.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 579982/608042 [03:47<00:09, 2860.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 580274/608042 [03:47<00:10, 2775.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 580570/608042 [03:47<00:09, 2825.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 580861/608042 [03:47<00:09, 2740.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 581138/608042 [03:47<00:09, 2744.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 581464/608042 [03:47<00:09, 2863.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 581755/608042 [03:48<00:09, 2673.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 582074/608042 [03:48<00:09, 2633.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 582356/608042 [03:48<00:09, 2581.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 582623/608042 [03:48<00:10, 2489.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 582882/608042 [03:48<00:10, 2411.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 583166/608042 [03:48<00:09, 2521.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 583487/608042 [03:48<00:09, 2705.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 583767/608042 [03:48<00:09, 2592.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 584036/608042 [03:48<00:09, 2506.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 584302/608042 [03:49<00:09, 2494.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 584560/608042 [03:49<00:09, 2470.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 584826/608042 [03:49<00:09, 2434.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 585083/608042 [03:49<00:09, 2464.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▋| 585350/608042 [03:49<00:10, 2185.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▋| 585743/608042 [03:49<00:08, 2568.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▋| 586070/608042 [03:49<00:08, 2725.42 examples/s]
Tokenizing and reformatting instruction data (2024-08-03T04:45:03.832268751Z num_proc=16):  96%|█████████▋| 586366/608042 [03:49<00:08, 2601.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▋| 586635/608042 [03:50<00:08, 2383.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 586886/608042 [03:50<00:09, 2313.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 587262/608042 [03:50<00:07, 2685.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 587604/608042 [03:50<00:07, 2859.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 587902/608042 [03:50<00:07, 2856.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 588194/608042 [03:50<00:07, 2632.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 588472/608042 [03:50<00:07, 2529.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 588742/608042 [03:50<00:07, 2571.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 589046/608042 [03:50<00:07, 2645.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 589324/608042 [03:51<00:07, 2669.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 589597/608042 [03:51<00:07, 2516.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 589891/608042 [03:51<00:06, 2614.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 590159/608042 [03:51<00:06, 2564.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 590442/608042 [03:51<00:06, 2626.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 590745/608042 [03:51<00:06, 2725.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 591031/608042 [03:51<00:06, 2634.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 591345/608042 [03:51<00:06, 2766.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 591625/608042 [03:51<00:06, 2594.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 591936/608042 [03:52<00:05, 2711.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 592216/608042 [03:52<00:06, 2531.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 592474/608042 [03:52<00:06, 2345.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 592727/608042 [03:52<00:06, 2332.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 592971/608042 [03:52<00:06, 2248.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 593226/608042 [03:52<00:06, 2328.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 593466/608042 [03:52<00:06, 2209.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 593759/608042 [03:52<00:05, 2402.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 594012/608042 [03:52<00:06, 2290.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 594256/608042 [03:53<00:06, 2194.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 594505/608042 [03:53<00:05, 2268.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 594769/608042 [03:53<00:05, 2225.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 594994/608042 [03:53<00:06, 2151.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 595219/608042 [03:53<00:06, 2035.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 595543/608042 [03:53<00:05, 2348.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 595783/608042 [03:53<00:05, 2311.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 596055/608042 [03:53<00:05, 2357.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 596311/608042 [03:53<00:05, 2297.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 596557/608042 [03:54<00:04, 2315.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 596814/608042 [03:54<00:04, 2331.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 597065/608042 [03:54<00:04, 2232.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 597362/608042 [03:54<00:04, 2395.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 597649/608042 [03:54<00:04, 2521.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 597926/608042 [03:54<00:04, 2184.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 598155/608042 [03:54<00:04, 2195.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 598388/608042 [03:54<00:04, 2108.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 598656/608042 [03:55<00:04, 2206.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 598899/608042 [03:55<00:04, 2006.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▊| 599126/608042 [03:55<00:04, 1886.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▊| 599341/608042 [03:55<00:04, 1897.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▊| 599542/608042 [03:55<00:04, 1830.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▊| 599728/608042 [03:55<00:04, 1673.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▊| 599912/608042 [03:55<00:04, 1706.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▊| 600116/608042 [03:55<00:04, 1677.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▊| 600286/608042 [03:56<00:04, 1554.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 600450/608042 [03:56<00:05, 1514.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 600603/608042 [03:56<00:05, 1477.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 600824/608042 [03:56<00:04, 1626.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 601012/608042 [03:56<00:04, 1452.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 601181/608042 [03:56<00:04, 1423.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 601340/608042 [03:56<00:04, 1397.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 601486/608042 [03:56<00:05, 1261.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 601620/608042 [03:57<00:05, 1193.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 601747/608042 [03:57<00:05, 1177.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 601910/608042 [03:57<00:04, 1283.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 602064/608042 [03:57<00:04, 1312.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 602225/608042 [03:57<00:04, 1291.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 602437/608042 [03:57<00:03, 1478.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 602592/608042 [03:57<00:04, 1307.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 602730/608042 [03:57<00:04, 1209.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 602859/608042 [03:58<00:04, 1137.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603000/608042 [03:58<00:04, 1181.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603163/608042 [03:58<00:03, 1282.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603309/608042 [03:58<00:03, 1316.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603446/608042 [03:58<00:03, 1178.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603591/608042 [03:58<00:03, 1242.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603723/608042 [03:58<00:04, 1055.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603843/608042 [03:58<00:04, 915.97 examples/s] 
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603982/608042 [03:59<00:04, 976.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604086/608042 [03:59<00:04, 979.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604206/608042 [03:59<00:04, 851.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604321/608042 [03:59<00:04, 886.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604422/608042 [03:59<00:04, 724.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604503/608042 [03:59<00:05, 632.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604579/608042 [03:59<00:05, 656.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604657/608042 [04:00<00:05, 595.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604723/608042 [04:00<00:06, 517.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604780/608042 [04:00<00:06, 523.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604840/608042 [04:00<00:07, 409.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604935/608042 [04:00<00:06, 498.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604996/608042 [04:00<00:05, 519.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605059/608042 [04:00<00:05, 538.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605142/608042 [04:01<00:05, 538.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605200/608042 [04:01<00:05, 530.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605280/608042 [04:01<00:04, 554.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605339/608042 [04:01<00:04, 551.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605414/608042 [04:01<00:04, 536.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605493/608042 [04:01<00:04, 549.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605583/608042 [04:01<00:04, 611.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605661/608042 [04:02<00:03, 614.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605729/608042 [04:02<00:04, 475.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605787/608042 [04:02<00:06, 348.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605842/608042 [04:02<00:07, 310.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605897/608042 [04:02<00:06, 326.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605940/608042 [04:03<00:09, 216.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605977/608042 [04:03<00:08, 237.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606015/608042 [04:03<00:09, 210.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606051/608042 [04:04<00:12, 156.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606078/608042 [04:04<00:11, 170.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606112/608042 [04:04<00:09, 196.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606160/608042 [04:04<00:09, 199.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606190/608042 [04:04<00:09, 198.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606220/608042 [04:04<00:10, 172.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606243/608042 [04:05<00:10, 174.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████�2024-08-03T04:45:03.832268751Z ��███▉| 606290/608042 [04:05<00:09, 177.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606309/608042 [04:05<00:10, 171.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606332/608042 [04:05<00:13, 125.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606347/608042 [04:05<00:13, 121.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606368/608042 [04:06<00:13, 125.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606384/608042 [04:06<00:14, 117.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606397/608042 [04:06<00:18, 87.20 examples/s] 
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606407/608042 [04:06<00:19, 85.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606438/608042 [04:06<00:12, 123.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606457/608042 [04:06<00:12, 122.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606491/608042 [04:07<00:09, 162.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606513/608042 [04:07<00:11, 133.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606533/608042 [04:07<00:13, 112.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606556/608042 [04:07<00:12, 121.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606595/608042 [04:07<00:09, 157.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606630/608042 [04:08<00:08, 169.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606657/608042 [04:08<00:07, 173.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606679/608042 [04:08<00:09, 140.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606709/608042 [04:08<00:07, 169.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606740/608042 [04:08<00:09, 132.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606773/608042 [04:09<00:08, 154.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606805/608042 [04:09<00:06, 177.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606832/608042 [04:09<00:07, 159.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606862/608042 [04:09<00:06, 177.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606903/608042 [04:09<00:07, 142.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606933/608042 [04:10<00:07, 152.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606953/608042 [04:10<00:08, 128.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606989/608042 [04:10<00:07, 148.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607017/608042 [04:10<00:06, 150.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607039/608042 [04:10<00:07, 133.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607058/608042 [04:10<00:07, 137.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607078/608042 [04:11<00:07, 125.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607120/608042 [04:11<00:06, 133.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607136/608042 [04:11<00:09, 99.69 examples/s] 
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607150/608042 [04:11<00:09, 96.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607168/608042 [04:12<00:08, 103.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607197/608042 [04:12<00:08, 105.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607217/608042 [04:12<00:07, 113.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607239/608042 [04:12<00:06, 124.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607264/608042 [04:12<00:06, 127.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607285/608042 [04:12<00:05, 134.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607327/608042 [04:13<00:03, 185.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607354/608042 [04:13<00:03, 188.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607399/608042 [04:13<00:02, 243.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607434/608042 [04:13<00:03, 194.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607465/608042 [04:13<00:03, 190.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607496/608042 [04:14<00:03, 167.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607516/608042 [04:14<00:03, 152.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607545/608042 [04:14<00:02, 167.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607575/608042 [04:14<00:02, 168.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607607/608042 [04:14<00:02, 155.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607666/608042 [04:14<00:01, 233.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607710/608042 [04:14<00:01, 242.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607745/608042 [04:15<00:01, 193.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607775/608042 [04:15<00:01, 206.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607815/608042 [04:15<00:00, 243.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607865/608042 [04:15<00:00, 220.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607902/608042 [04:15<00:00, 225.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607929/608042 [04:16<00:00, 198.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607955/608042 [04:16<00:00, 199.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607992/608042 [04:16<00:00, 185.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 608013/608042 [04:16<00:00, 150.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 608039/608042 [04:16<00:00, 164.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|██████████| 608042/608042 [04:16<00:00, 2366.67 examples/s]
2024-08-03T04:45:03.832268751Z 
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 0/608042 [00:00<?, ? examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 6/608042 [00:00<4:50:40, 34.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 45/608042 [00:00<53:35, 189.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 211/608042 [00:00<13:27, 752.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 398/608042 [00:00<08:51, 1143.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 693/608042 [00:00<05:56, 1703.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 909/608042 [00:00<05:32, 1824.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1238/608042 [00:00<04:47, 2112.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1585/608042 [00:00<04:04, 2481.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 1875/608042 [00:01<03:55, 2576.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2204/608042 [00:01<03:38, 2778.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2497/608042 [00:01<03:52, 2599.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 2766/608042 [00:01<04:21, 2311.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   0%|          | 3022/608042 [00:01<04:16, 2355.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3287/608042 [00:01<04:08, 2432.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3545/608042 [00:01<04:06, 2454.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 3823/608042 [00:01<03:57, 2541.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4089/608042 [00:01<04:06, 2453.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4355/608042 [00:02<04:07, 2441.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 4745/608042 [00:02<03:32, 2844.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5043/608042 [00:02<03:35, 2793.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5331/608042 [00:02<03:47, 2651.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5602/608042 [00:02<03:51, 2607.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 5872/608042 [00:02<03:53, 2583.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6133/608042 [00:02<04:04, 2465.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6390/608042 [00:02<04:05, 2447.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6642/608042 [00:02<04:10, 2397.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 6883/608042 [00:03<04:25, 2267.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7169/608042 [00:03<04:09, 2410.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|          | 7423/608042 [00:03<04:13, 2372.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7687/608042 [00:03<04:05, 2446.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 7939/608042 [00:03<04:07, 2423.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8234/608042 [00:03<03:53, 2572.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8526/608042 [00:03<03:45, 2658.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 8795/608042 [00:03<04:17, 2324.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   1%|▏         | 9047/608042 [00:03<04:18, 2317.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9373/608042 [00:04<03:53, 2568.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9644/608042 [00:04<03:55, 2536.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 9919/608042 [00:04<03:55, 2537.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10176/608042 [00:04<03:56, 2528.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10436/608042 [00:04<04:02, 2463.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10691/608042 [00:04<04:26, 2239.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 10944/608042 [00:04<04:21, 2285.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11269/608042 [00:04<03:58, 2502.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11528/608042 [00:04<04:06, 2417.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 11790/608042 [00:05<04:05, 2431.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12036/608042 [00:05<04:15, 2329.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12327/608042 [00:05<04:00, 2481.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12580/608042 [00:05<04:13, 2349.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 12846/608042 [00:05<04:07, 2404.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13100/608042 [00:05<04:04, 2436.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13378/608042 [00:05<03:55, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13636/608042 [00:05<04:06, 2412.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 13884/608042 [00:05<04:05, 2418.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14163/608042 [00:06<03:56, 2506.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14425/608042 [00:06<04:06, 2412.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14673/608042 [00:06<04:11, 2359.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 14912/608042 [00:06<04:17, 2305.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   2%|▏         | 15168/608042 [00:06<04:10, 2370.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15423/608042 [00:06<04:08, 2382.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15666/608042 [00:06<04:14, 2331.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 15916/608042 [00:06<04:38, 2122.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16159/608042 [00:06<04:34, 2159.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16391/608042 [00:07<04:29, 2192.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16658/608042 [00:07<04:16, 2302.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 16932/608042 [00:07<04:07, 2392.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17215/608042 [00:07<03:56, 2502.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17548/608042 [00:07<03:36, 2726.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 17833/608042 [00:07<03:54, 2519.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18095/608042 [00:07<04:00, 2457.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18363/608042 [00:07<03:57, 2487.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18648/608042 [00:07<03:51, 2549.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 18963/608042 [00:07<03:37, 2705.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19240/608042 [00:08<03:48, 2571.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19503/608042 [00:08<04:02, 2424.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19750/608042 [00:08<04:18, 2272.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 19990/608042 [00:08<04:22, 2238.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20300/608042 [00:08<03:59, 2454.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20567/608042 [00:08<04:11, 2331.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 20825/608042 [00:08<04:06, 2378.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   3%|▎         | 21154/608042 [00:08<03:46, 2589.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21438/608042 [00:09<04:00, 2437.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 21747/608042 [00:09<03:51, 2527.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22038/608042 [00:09<03:46, 2592.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22352/608042 [00:09<03:36, 2700.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▎         | 22640/608042 [00:09<03:34, 2734.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 22919/608042 [00:09<03:38, 2678.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23190/608042 [00:09<03:49, 2544.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23507/608042 [00:09<03:41, 2643.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 23856/608042 [00:09<03:29, 2786.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24212/608042 [00:10<03:15, 2982.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24517/608042 [00:10<03:22, 2876.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 24812/608042 [00:10<03:26, 2827.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25100/608042 [00:10<03:56, 2469.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25377/608042 [00:10<03:54, 2488.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25681/608042 [00:10<03:43, 2608.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 25989/608042 [00:10<03:35, 2705.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26272/608042 [00:10<03:57, 2444.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26538/608042 [00:10<03:53, 2493.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 26801/608042 [00:11<04:07, 2348.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27057/608042 [00:11<04:02, 2392.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   4%|▍         | 27343/608042 [00:11<03:50, 2518.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27645/608042 [00:11<03:40, 2636.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 27959/608042 [00:11<03:29, 2767.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28247/608042 [00:11<03:33, 2719.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28526/608042 [00:11<03:43, 2597.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 28804/608042 [00:11<03:47, 2542.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29139/608042 [00:11<03:31, 2734.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29453/608042 [00:12<03:23, 2848.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 29755/608042 [00:12<03:36, 2676.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30032/608042 [00:12<03:34, 2696.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▍         | 30311/608042 [00:12<03:35, 2678.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30584/608042 [00:12<03:46, 2544.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 30845/608042 [00:12<03:46, 2548.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31102/608042 [00:12<03:51, 2490.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31362/608042 [00:12<03:59, 2403.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31641/608042 [00:12<03:55, 2442.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 31902/608042 [00:13<04:03, 2364.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32181/608042 [00:13<03:59, 2404.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32433/608042 [00:13<03:59, 2400.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32695/608042 [00:13<04:00, 2395.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 32956/608042 [00:13<03:57, 2425.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   5%|▌         | 33212/608042 [00:13<04:00, 2392.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33454/608042 [00:13<04:04, 2346.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 33764/608042 [00:13<03:44, 2556.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34039/608042 [00:13<03:40, 2607.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34315/608042 [00:14<03:46, 2537.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34588/608042 [00:14<03:45, 2538.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 34848/608042 [00:14<03:58, 2403.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35125/608042 [00:14<03:55, 2436.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35378/608042 [00:14<04:04, 2345.21 examples/s]
2024-08-03T04:45:03.832268751Z Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35627/608042 [00:14<04:17, 2223.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 35903/608042 [00:14<04:04, 2335.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36162/608042 [00:14<04:00, 2381.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36427/608042 [00:14<03:53, 2451.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36674/608042 [00:15<03:55, 2429.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 36919/608042 [00:15<03:56, 2417.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37228/608042 [00:15<03:39, 2606.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37499/608042 [00:15<03:39, 2602.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▌         | 37814/608042 [00:15<03:26, 2757.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38104/608042 [00:15<03:33, 2670.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38380/608042 [00:15<03:37, 2613.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38671/608042 [00:15<03:31, 2691.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 38943/608042 [00:15<03:42, 2554.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39209/608042 [00:15<03:48, 2491.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   6%|▋         | 39468/608042 [00:16<03:51, 2457.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 39737/608042 [00:16<03:53, 2433.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40065/608042 [00:16<03:38, 2601.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40454/608042 [00:16<03:12, 2955.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 40768/608042 [00:16<03:27, 2737.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41048/608042 [00:16<03:26, 2741.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41329/608042 [00:16<03:43, 2536.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41589/608042 [00:16<03:54, 2414.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 41923/608042 [00:17<03:37, 2605.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42195/608042 [00:17<03:51, 2444.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42470/608042 [00:17<03:44, 2518.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 42772/608042 [00:17<03:33, 2649.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43089/608042 [00:17<03:26, 2731.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43370/608042 [00:17<03:37, 2595.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43635/608042 [00:17<03:40, 2562.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 43905/608042 [00:17<03:40, 2562.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44203/608042 [00:17<03:31, 2660.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44525/608042 [00:17<03:20, 2810.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 44815/608042 [00:18<03:20, 2805.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45101/608042 [00:18<03:33, 2642.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   7%|▋         | 45373/608042 [00:18<03:37, 2592.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45719/608042 [00:18<03:27, 2707.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 45992/608042 [00:18<03:30, 2669.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46269/608042 [00:18<03:37, 2585.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46610/608042 [00:18<03:20, 2798.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 46897/608042 [00:18<03:24, 2746.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47184/608042 [00:19<03:44, 2499.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47445/608042 [00:19<03:42, 2521.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47701/608042 [00:19<03:47, 2465.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 47951/608042 [00:19<03:47, 2462.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48200/608042 [00:19<03:51, 2421.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48445/608042 [00:19<04:01, 2317.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 48706/608042 [00:19<03:57, 2358.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49009/608042 [00:19<03:41, 2525.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49277/608042 [00:19<03:52, 2398.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49532/608042 [00:20<04:11, 2219.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 49791/608042 [00:20<04:01, 2307.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50071/608042 [00:20<03:52, 2403.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50347/608042 [00:20<03:43, 2496.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50610/608042 [00:20<03:50, 2418.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 50866/608042 [00:20<03:56, 2351.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51114/608042 [00:20<03:55, 2363.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51368/608042 [00:20<03:57, 2343.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   8%|▊         | 51683/608042 [00:20<03:37, 2561.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 51958/608042 [00:21<03:49, 2422.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52218/608042 [00:21<03:52, 2387.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52481/608042 [00:21<03:53, 2378.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 52827/608042 [00:21<03:30, 2636.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▊         | 53129/608042 [00:21<03:24, 2716.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53439/608042 [00:21<03:17, 2814.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53731/608042 [00:21<03:52, 2384.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 53986/608042 [00:21<03:50, 2401.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54246/608042 [00:21<03:55, 2352.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54498/608042 [00:22<04:05, 2253.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 54785/608042 [00:22<03:49, 2406.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55039/608042 [00:22<03:51, 2390.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55285/608042 [00:22<03:58, 2322.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55521/608042 [00:22<04:06, 2238.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 55830/608042 [00:22<03:45, 2447.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56097/608042 [00:22<03:43, 2473.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56412/608042 [00:22<03:28, 2642.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56682/608042 [00:22<03:34, 2575.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 56965/608042 [00:23<03:33, 2580.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57239/608042 [00:23<03:38, 2523.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):   9%|▉         | 57518/608042 [00:23<03:34, 2563.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 57782/608042 [00:23<03:45, 2436.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58143/608042 [00:23<03:20, 2745.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58427/608042 [00:23<03:30, 2616.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58702/608042 [00:23<03:29, 2620.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 58994/608042 [00:23<03:23, 2702.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59311/608042 [00:23<03:15, 2804.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59610/608042 [00:24<03:14, 2824.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 59910/608042 [00:24<03:28, 2630.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60186/608042 [00:24<03:36, 2532.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60448/608042 [00:24<03:34, 2555.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|▉         | 60755/608042 [00:24<03:24, 2671.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61030/608042 [00:24<03:26, 2648.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61359/608042 [00:24<03:17, 2771.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61658/608042 [00:24<03:15, 2795.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 61941/608042 [00:24<03:19, 2735.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62216/608042 [00:25<03:37, 2512.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62516/608042 [00:25<03:30, 2586.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 62779/608042 [00:25<03:46, 2402.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63044/608042 [00:25<03:41, 2465.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63306/608042 [00:25<03:51, 2350.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63554/608042 [00:25<03:51, 2350.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  10%|█         | 63826/608042 [00:25<03:44, 2428.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64079/608042 [00:25<03:52, 2334.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64333/608042 [00:25<03:48, 2383.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64602/608042 [00:26<03:41, 2457.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 64882/608042 [00:26<03:33, 2538.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65161/608042 [00:26<03:28, 2609.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65428/608042 [00:26<03:33, 2545.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 65687/608042 [00:26<03:41, 2451.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66008/608042 [00:26<03:24, 2656.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66293/608042 [00:26<03:23, 2667.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66562/608042 [00:26<03:35, 2508.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 66832/608042 [00:26<03:41, 2446.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67131/608042 [00:27<03:37, 2492.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67393/608042 [00:27<03:36, 2491.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67672/608042 [00:27<03:34, 2514.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 67927/608042 [00:27<03:40, 2450.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█         | 68218/608042 [00:27<03:31, 2551.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68539/608042 [00:27<03:18, 2720.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 68837/608042 [00:27<03:12, 2794.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69140/608042 [00:27<03:22, 2666.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69410/608042 [00:27<03:28, 2583.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  11%|█▏        | 69676/608042 [00:27<03:37, 2475.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 69939/608042 [00:28<03:39, 2446.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70187/608042 [00:28<03:44, 2399.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70435/608042 [00:28<03:43, 2407.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70686/608042 [00:28<04:01, 2225.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 70953/608042 [00:28<03:51, 2315.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71242/608042 [00:28<03:37, 2465.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71496/608042 [00:28<03:38, 2450.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 71763/608042 [00:28<03:362024-08-03T04:45:03.832268751Z , 2479.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72013/608042 [00:28<03:40, 2429.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72262/608042 [00:29<03:44, 2389.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72521/608042 [00:29<03:41, 2422.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 72807/608042 [00:29<03:30, 2544.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73074/608042 [00:29<03:33, 2504.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73344/608042 [00:29<03:30, 2538.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73608/608042 [00:29<03:40, 2418.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 73920/608042 [00:29<03:26, 2580.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74198/608042 [00:29<03:22, 2633.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74483/608042 [00:29<03:18, 2694.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 74769/608042 [00:30<03:17, 2704.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75044/608042 [00:30<03:28, 2558.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75311/608042 [00:30<03:42, 2389.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75555/608042 [00:30<03:52, 2293.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  12%|█▏        | 75808/608042 [00:30<04:04, 2176.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76096/608042 [00:30<03:45, 2360.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76377/608042 [00:30<03:36, 2453.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76662/608042 [00:30<03:28, 2547.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 76927/608042 [00:30<03:53, 2276.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77176/608042 [00:31<04:03, 2183.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77521/608042 [00:31<03:33, 2490.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 77778/608042 [00:31<03:54, 2260.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78013/608042 [00:31<03:59, 2215.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78378/608042 [00:31<03:32, 2489.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78637/608042 [00:31<03:45, 2343.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 78889/608042 [00:31<03:46, 2337.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79134/608042 [00:31<03:54, 2252.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79373/608042 [00:32<03:56, 2237.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79623/608042 [00:32<03:51, 2283.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 79860/608042 [00:32<03:56, 2229.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80114/608042 [00:32<03:55, 2241.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80359/608042 [00:32<03:52, 2266.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80597/608042 [00:32<03:52, 2265.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 80826/608042 [00:32<04:00, 2188.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81084/608042 [00:32<04:02, 2175.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81366/608042 [00:32<03:47, 2315.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81657/608042 [00:33<03:33, 2462.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  13%|█▎        | 81913/608042 [00:33<03:39, 2398.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82163/608042 [00:33<03:37, 2416.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82446/608042 [00:33<03:27, 2532.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82719/608042 [00:33<03:24, 2568.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 82978/608042 [00:33<03:35, 2433.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 83233/608042 [00:33<03:54, 2236.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▎        | 83511/608042 [00:33<03:40, 2374.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 83771/608042 [00:33<03:35, 2436.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84019/608042 [00:34<03:57, 2209.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84259/608042 [00:34<03:54, 2232.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84594/608042 [00:34<03:26, 2528.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 84857/608042 [00:34<03:29, 2500.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85119/608042 [00:34<03:41, 2356.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85428/608042 [00:34<03:25, 2537.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85696/608042 [00:34<03:32, 2459.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 85952/608042 [00:34<03:31, 2473.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86230/608042 [00:34<03:28, 2508.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86503/608042 [00:35<03:27, 2508.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 86829/608042 [00:35<03:11, 2719.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87133/608042 [00:35<03:25, 2536.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87405/608042 [00:35<03:28, 2501.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87662/608042 [00:35<03:35, 2416.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 87914/608042 [00:35<03:32, 2444.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  14%|█▍        | 88163/608042 [00:35<03:38, 2383.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88409/608042 [00:35<03:39, 2371.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88651/608042 [00:35<04:02, 2141.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 88882/608042 [00:36<04:01, 2146.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89162/608042 [00:36<03:49, 2264.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89468/608042 [00:36<03:31, 2452.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 89733/608042 [00:36<03:29, 2475.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90047/608042 [00:36<03:17, 2622.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90323/608042 [00:36<03:18, 2610.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90587/608042 [00:36<03:22, 2550.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 90845/608042 [00:36<03:31, 2447.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▍        | 91171/608042 [00:36<03:14, 2662.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 91440/608042 [00:37<03:37, 2377.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 91696/608042 [00:37<03:45, 2289.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92010/608042 [00:37<03:27, 2488.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92280/608042 [00:37<03:26, 2493.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92557/608042 [00:37<03:27, 2488.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 92867/608042 [00:37<03:19, 2577.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93131/608042 [00:37<03:34, 2398.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93394/608042 [00:37<03:31, 2428.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93646/608042 [00:37<03:38, 2354.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 93895/608042 [00:38<03:37, 2365.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  15%|█▌        | 94136/608042 [00:38<03:37, 2364.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94389/608042 [00:38<03:39, 2342.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94689/608042 [00:38<03:23, 2523.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 94968/608042 [00:38<03:28, 2459.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95220/608042 [00:38<03:34, 2395.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95465/608042 [00:38<03:47, 2256.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95703/608042 [00:38<03:44, 2284.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 95948/608042 [00:38<03:55, 2173.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96171/608042 [00:39<03:57, 2151.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96438/608042 [00:39<03:43, 2290.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96681/608042 [00:39<03:48, 2234.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 96919/608042 [00:39<03:51, 2208.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97230/608042 [00:39<03:35, 2373.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97513/608042 [00:39<03:25, 2484.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 97768/608042 [00:39<03:38, 2338.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98014/608042 [00:39<04:01, 2115.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98290/608042 [00:39<03:46, 2251.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▌        | 98534/608042 [00:40<03:42, 2291.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 98888/608042 [00:40<03:15, 2609.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99160/608042 [00:40<03:13, 2623.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99438/608042 [00:40<03:16, 2584.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 99704/608042 [00:40<03:19, 2546.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  16%|█▋        | 100004/608042 [00:40<03:15, 2598.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100366/608042 [00:40<02:56, 2874.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100665/608042 [00:40<03:07, 2711.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 100941/608042 [00:40<03:17, 2573.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101248/608042 [00:41<03:08, 2691.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101636/608042 [00:41<02:50, 2971.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 101943/608042 [00:41<02:53, 2922.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102247/608042 [00:41<03:09, 2672.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102545/608042 [00:41<03:03, 2752.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 102865/608042 [00:41<02:57, 2846.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103163/608042 [00:41<03:06, 2701.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103439/608042 [00:41<03:10, 2643.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103708/608042 [00:41<03:26, 2440.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 103972/608042 [00:42<03:33, 2363.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104230/608042 [00:42<03:30, 2393.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104541/608042 [00:42<03:15, 2573.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 104837/608042 [00:42<03:12, 2620.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105113/608042 [00:42<03:24, 2457.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105367/608042 [00:42<03:28, 2413.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105611/608042 [00:42<03:35, 2327.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 105873/608042 [00:42<03:33, 2348.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 106120/608042 [00:43<03:34, 2335.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  17%|█▋        | 106374/608042 [00:43<03:46, 2215.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 102024-08-03T04:45:03.832268751Z 6656/608042 [00:43<03:33, 2349.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 106933/608042 [00:43<03:27, 2412.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 107212/608042 [00:43<03:20, 2502.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 107508/608042 [00:43<03:11, 2608.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 107773/608042 [00:43<03:19, 2508.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 108030/608042 [00:43<03:28, 2394.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 108336/608042 [00:43<03:14, 2563.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 108659/608042 [00:43<03:02, 2738.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 108983/608042 [00:44<02:56, 2831.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 109284/608042 [00:44<03:02, 2729.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 109580/608042 [00:44<03:12, 2588.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 109859/608042 [00:44<03:16, 2538.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 110148/608042 [00:44<03:11, 2600.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 110425/608042 [00:44<03:09, 2619.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 110700/608042 [00:44<03:18, 2503.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 110962/608042 [00:44<03:28, 2382.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 111215/608042 [00:45<03:30, 2361.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 111467/608042 [00:45<03:29, 2371.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 111728/608042 [00:45<03:31, 2348.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 111970/608042 [00:45<03:45, 2203.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 112196/608042 [00:45<04:03, 2040.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  18%|█▊        | 112451/608042 [00:45<03:48, 2167.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 112702/608042 [00:45<03:41, 2236.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 112955/608042 [00:45<03:38, 2270.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 113255/608042 [00:45<03:20, 2461.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 113583/608042 [00:46<03:04, 2685.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▊        | 113865/608042 [00:46<03:08, 2622.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 114135/608042 [00:46<03:08, 2623.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 114419/608042 [00:46<03:09, 2605.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 114758/608042 [00:46<03:00, 2737.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 115050/608042 [00:46<02:59, 2754.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 115340/608042 [00:46<03:02, 2702.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 115692/608042 [00:46<02:49, 2909.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 115997/608042 [00:46<02:49, 2904.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 116343/608042 [00:46<02:43, 3011.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 116654/608042 [00:47<02:57, 2761.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 117034/608042 [00:47<02:42, 3028.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 117354/608042 [00:47<02:40, 3050.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 117677/608042 [00:47<02:49, 2890.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 118046/608042 [00:47<02:44, 2977.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  19%|█▉        | 118358/608042 [00:47<02:43, 2998.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 118673/608042 [00:47<02:54, 2806.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 118985/608042 [00:47<02:52, 2828.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 119271/608042 [00:48<03:03, 2666.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 119557/608042 [00:48<03:07, 2608.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 119848/608042 [00:48<03:02, 2672.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 120138/608042 [00:48<02:58, 2731.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 120421/608042 [00:48<02:59, 2709.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 120695/608042 [00:48<03:26, 2360.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 120964/608042 [00:48<03:20, 2434.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 121294/608042 [00:48<03:04, 2641.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|█▉        | 121577/608042 [00:48<03:03, 2649.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 121855/608042 [00:49<03:01, 2673.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 122135/608042 [00:49<03:02, 2662.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 122409/608042 [00:49<03:01, 2670.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 122686/608042 [00:49<03:01, 2678.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 123003/608042 [00:49<02:52, 2816.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 123291/608042 [00:49<03:14, 2489.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 123567/608042 [00:49<03:23, 2378.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 123817/608042 [00:49<03:24, 2368.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 124071/608042 [00:49<03:30, 2296.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 124365/608042 [00:50<03:18, 2435.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  20%|██        | 124644/608042 [00:50<03:19, 2428.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 124924/608042 [00:50<03:12, 2505.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 125210/608042 [00:50<03:06, 2593.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 125511/608042 [00:50<03:00, 2679.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 125812/608042 [00:50<02:55, 2743.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 126090/608042 [00:50<03:03, 2629.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 126395/608042 [00:50<02:56, 2723.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 126672/608042 [00:50<03:08, 2559.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 126962/608042 [00:51<03:04, 2601.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 127241/608042 [00:51<03:06, 2576.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 127506/608042 [00:51<03:12, 2492.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 127762/608042 [00:51<03:17, 2432.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 128014/608042 [00:51<03:23, 2362.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 128304/608042 [00:51<03:15, 2454.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 128569/608042 [00:51<03:15, 2454.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 128815/608042 [00:51<03:18, 2418.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██        | 129059/608042 [00:51<03:17, 2421.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 129349/608042 [00:51<03:07, 2558.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 129710/608042 [00:52<02:49, 2821.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 130012/608042 [00:52<03:03, 2603.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 130280/608042 [00:52<03:18, 2403.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  21%|██▏       | 130527/608042 [00:52<03:22, 2352.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 130769/608042 [00:52<03:22, 2359.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 131058/608042 [00:52<03:11, 2490.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 131315/608042 [00:52<03:32, 2248.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 131575/608042 [00:52<03:25, 2321.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 131892/608042 [00:53<03:10, 2493.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 132181/608042 [00:53<03:04, 2580.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 132447/608042 [00:53<03:06, 2546.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 132706/608042 [00:53<03:22, 2342.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 133024/608042 [00:53<03:06, 2551.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 133284/608042 [00:53<03:06, 2541.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 133555/608042 [00:53<03:04, 2567.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 133895/608042 [00:53<02:51, 2766.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 134176/608042 [00:53<03:00, 2623.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 134540/608042 [00:54<02:43, 2891.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 134855/608042 [00:54<02:41, 2932.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 135154/608042 [00:54<02:59, 2631.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 135434/608042 [00:54<03:00, 2618.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 135711/608042 [00:54<03:09, 2496.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 135988/608042 [00:54<03:06, 2529.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 136251/608042 [00:54<03:16, 2395.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  22%|██▏       | 136581/608042 [00:54<02:59, 2631.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 136868/608042 [00:54<03:09, 2484.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 137131/608042 [00:55<03:17, 2387.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 137429/608042 [00:55<03:05, 2542.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 137749/608042 [00:55<02:53, 2713.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 138033/608042 [00:55<02:55, 2684.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 138355/608042 [00:55<02:46, 2825.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 138658/608042 [00:55<02:43, 2863.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 138974/608042 [00:55<02:42, 2885.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 139275/608042 [00:55<02:51, 2727.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 139566/608042 [00:55<02:52, 2720.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 139988/608042 [00:56<02:30, 3107.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 140311/608042 [00:56<02:35, 3001.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 140656/608042 [00:56<02:31, 3080.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 140968/608042 [00:56<02:32, 3069.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 141293/608042 [00:56<02:34, 3030.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 141610/608042 [00:56<02:36, 2977.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 141957/608042 [00:56<02:31, 3071.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 142274/608042 [00:56<02:40, 2893.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  23%|██▎       | 142613/608042 [00:56<02:34, 3014.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▎       | 142921/608042 [00:57<02:49, 2751.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▎       | 143253/608042 [00:57<02:42, 2857.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▎       | 143548/608042 [00:57<02:48, 2757.27 2024-08-03T04:45:03.832268751Z examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▎       | 143836/608042 [00:57<03:05, 2504.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▎       | 144094/608042 [00:57<03:12, 2404.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▎       | 144363/608042 [00:57<03:08, 2454.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 144612/608042 [00:57<03:15, 2368.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 144853/608042 [00:57<03:16, 2355.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 145130/608042 [00:57<03:09, 2447.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 145503/608042 [00:58<02:52, 2679.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 145778/608042 [00:58<02:54, 2652.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 146049/608042 [00:58<03:09, 2438.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 146307/608042 [00:58<03:15, 2364.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 146584/608042 [00:58<03:06, 2471.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 146834/608042 [00:58<03:21, 2288.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 147165/608042 [00:58<03:06, 2477.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 147440/608042 [00:58<03:09, 2434.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 147708/608042 [00:59<03:09, 2434.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 147970/608042 [00:59<03:05, 2478.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 148230/608042 [00:59<03:19, 2308.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 148491/608042 [00:59<03:13, 2379.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  24%|██▍       | 148756/608042 [00:59<03:11, 2395.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▍       | 149042/608042 [00:59<03:03, 2501.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▍       | 149297/608042 [00:59<03:06, 2457.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▍       | 149560/608042 [00:59<03:03, 2500.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▍       | 149819/608042 [00:59<03:07, 2449.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▍       | 150145/608042 [00:59<02:53, 2638.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▍       | 150425/608042 [01:00<02:57, 2583.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▍       | 150744/608042 [01:00<02:51, 2673.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▍       | 151017/608042 [01:00<03:02, 2504.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▍       | 151320/608042 [01:00<02:55, 2606.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▍       | 151613/608042 [01:00<02:51, 2663.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▍       | 151913/608042 [01:00<02:46, 2731.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▌       | 152301/608042 [01:00<02:36, 2918.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▌       | 152645/608042 [01:00<02:29, 3039.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▌       | 152954/608042 [01:01<02:50, 2670.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▌       | 153236/608042 [01:01<03:04, 2465.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▌       | 153512/608042 [01:01<03:06, 2438.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▌       | 153805/608042 [01:01<02:58, 2547.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▌       | 154095/608042 [01:01<02:54, 2599.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▌       | 154452/608042 [01:01<02:43, 2772.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▌       | 154748/608042 [01:01<02:42, 2793.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  25%|██▌       | 155037/608042 [01:01<03:01, 2497.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 155308/608042 [01:01<03:09, 2390.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 155608/608042 [01:02<02:59, 2522.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 155881/608042 [01:02<02:57, 2547.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 156147/608042 [01:02<03:03, 2457.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 156442/608042 [01:02<02:54, 2590.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 156718/608042 [01:02<03:08, 2398.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 156983/608042 [01:02<03:10, 2373.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 157327/608042 [01:02<02:51, 2629.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 157599/608042 [01:02<03:09, 2375.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 157845/608042 [01:03<03:20, 2242.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 158080/608042 [01:03<03:24, 2200.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 158314/608042 [01:03<03:21, 2229.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 158569/608042 [01:03<03:15, 2300.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 158811/608042 [01:03<03:18, 2267.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 159152/608042 [01:03<02:53, 2582.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▌       | 159418/608042 [01:03<02:55, 2559.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▋       | 159702/608042 [01:03<02:52, 2601.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▋       | 159969/608042 [01:03<03:01, 2469.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▋       | 160263/608042 [01:03<02:57, 2525.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▋       | 160526/608042 [01:04<02:58, 2502.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▋       | 160789/608042 [01:04<03:04, 2424.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  26%|██▋       | 161047/608042 [01:04<03:02, 2451.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 161304/608042 [01:04<03:17, 2263.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 161554/608042 [01:04<03:13, 2311.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 161795/608042 [01:04<03:13, 2301.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 162062/608042 [01:04<03:05, 2401.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 162324/608042 [01:04<03:05, 2406.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 162599/608042 [01:04<02:58, 2493.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 162851/608042 [01:05<03:02, 2444.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 163141/608042 [01:05<02:56, 2527.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 163397/608042 [01:05<02:58, 2487.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 163744/608042 [01:05<02:40, 2764.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 164030/608042 [01:05<02:48, 2637.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 164299/608042 [01:05<02:50, 2604.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 164581/608042 [01:05<02:46, 2658.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 164852/608042 [01:05<02:57, 2493.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 165111/608042 [01:05<03:03, 2416.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 165424/608042 [01:06<02:50, 2599.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 165701/608042 [01:06<02:55, 2525.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 165970/608042 [01:06<02:59, 2459.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 166226/608042 [01:06<02:58, 2475.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 166476/608042 [01:06<02:58, 2477.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 166736/608042 [01:06<02:56, 2501.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  27%|██▋       | 167017/608042 [01:06<02:54, 2527.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 167291/608042 [01:06<03:01, 2427.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 167604/608042 [01:06<02:49, 2597.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 167931/608042 [01:07<02:41, 2725.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 168232/608042 [01:07<02:38, 2767.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 168514/608042 [01:07<02:45, 2658.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 168788/608042 [01:07<02:48, 2599.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 169057/608042 [01:07<02:47, 2624.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 169324/608042 [01:07<02:51, 2557.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 169597/608042 [01:07<02:53, 2524.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 169897/608042 [01:07<02:49, 2588.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 170196/608042 [01:07<02:43, 2672.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 170539/608042 [01:08<02:33, 2849.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 170846/608042 [01:08<02:39, 2743.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 171123/608042 [01:08<02:43, 2676.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 171396/608042 [01:08<02:52, 2534.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 171729/608042 [01:08<02:38, 2750.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 172018/608042 [01:08<02:45, 2633.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 172302/608042 [01:08<02:43, 2661.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 172575/608042 [01:08<02:50, 2560.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 172839/608042 [01:08<02:48, 2580.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  28%|██▊       | 173102/608042 [01:09<02:48, 2586.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▊       | 173362/608042 [01:09<02:48, 2582.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▊       | 173635/608042 [01:09<02:54, 2483.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▊       | 173970/608042 [01:09<02:39, 2723.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▊       | 174257/608042 [01:09<02:39, 2711.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▊       | 174537/608042 [01:09<02:57, 2443.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▊       | 174796/608042 [01:09<02:56, 2458.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 175051/608042 [01:09<02:56, 2447.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 175302/608042 [01:09<02:58, 2428.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 175581/608042 [01:10<02:55, 2458.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 175916/608042 [01:10<02:42, 2665.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 176228/608042 [01:10<02:36, 2758.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 176515/608042 [01:10<02:36, 2750.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 176797/608042 [01:10<02:44, 2623.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 177063/608042 [01:10<03:05, 2321.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 177304/608042 [01:10<03:31, 2040.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 177607/608042 [01:10<03:11, 2243.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 177863/608042 [01:10<03:05, 2313.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 178128/608042 [01:11<03:09, 2274.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 178388/608042 [01:11<03:05, 2321.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 178693/608042 [01:11<02:57, 2425.16 examples/2024-08-03T04:45:03.832268751Z s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 178963/608042 [01:11<02:59, 2387.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  29%|██▉       | 179325/608042 [01:11<02:39, 2691.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 179606/608042 [01:11<02:42, 2640.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 179877/608042 [01:11<02:51, 2497.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 180181/608042 [01:11<02:42, 2636.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 180453/608042 [01:11<02:50, 2514.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 180709/608042 [01:12<03:08, 2271.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 181027/608042 [01:12<02:50, 2504.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 181288/608042 [01:12<02:50, 2507.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 181550/608042 [01:12<02:52, 2477.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 181805/608042 [01:12<02:51, 2480.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 182110/608042 [01:12<02:41, 2640.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|██▉       | 182378/608042 [01:12<02:50, 2492.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 182635/608042 [01:12<03:00, 2360.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 182877/608042 [01:12<03:00, 2358.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 183222/608042 [01:13<02:39, 2659.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 183511/608042 [01:13<02:44, 2586.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 183801/608042 [01:13<02:40, 2641.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 184075/608042 [01:13<02:43, 2590.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 184346/608042 [01:13<02:48, 2519.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 184600/608042 [01:13<02:51, 2473.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 184855/608042 [01:13<02:51, 2461.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 185118/608042 [01:13<02:50, 2483.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  30%|███       | 185410/608042 [01:13<02:42, 2598.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 185738/608042 [01:14<02:31, 2794.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 186065/608042 [01:14<02:23, 2931.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 186369/608042 [01:14<02:29, 2816.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 186655/608042 [01:14<02:51, 2450.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 186911/608042 [01:14<02:58, 2359.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 187179/608042 [01:14<02:52, 2437.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 187430/608042 [01:14<02:52, 2436.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 187680/608042 [01:14<02:55, 2399.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 187966/608042 [01:14<02:47, 2505.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 188231/608042 [01:15<02:45, 2539.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 188492/608042 [01:15<02:44, 2549.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 188764/608042 [01:15<02:43, 2566.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 189029/608042 [01:15<02:52, 2429.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 189306/608042 [01:15<02:50, 2452.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 189591/608042 [01:15<02:44, 2544.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███       | 189847/608042 [01:15<02:54, 2403.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███▏      | 190156/608042 [01:15<02:43, 2560.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███▏      | 190473/608042 [01:15<02:35, 2684.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███▏      | 190748/608042 [01:16<02:36, 2671.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███▏      | 191044/608042 [01:16<02:35, 2686.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  31%|███▏      | 191321/608042 [01:16<02:38, 2635.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 191603/608042 [01:16<02:44, 2535.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 191905/608042 [01:16<02:39, 2605.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 192188/608042 [01:16<02:37, 2647.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 192455/608042 [01:16<02:45, 2514.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 192859/608042 [01:16<02:23, 2902.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 193158/608042 [01:16<02:37, 2636.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 193436/608042 [01:17<02:51, 2416.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 193731/608042 [01:17<02:43, 2535.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 194010/608042 [01:17<02:45, 2500.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 194300/608042 [01:17<02:38, 2603.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 194570/608042 [01:17<02:56, 2342.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 194891/608042 [01:17<02:43, 2532.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 195227/608042 [01:17<02:30, 2750.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 195517/608042 [01:17<02:37, 2618.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 195786/608042 [01:18<02:46, 2476.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 196092/608042 [01:18<02:36, 2630.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 196379/608042 [01:18<02:32, 2693.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 196662/608042 [01:18<02:30, 2726.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 197019/608042 [01:18<02:21, 2912.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  32%|███▏      | 197313/608042 [01:18<02:51, 2394.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 197619/608042 [01:18<02:41, 2546.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 197907/608042 [01:18<02:44, 2487.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 198197/608042 [01:18<02:38, 2584.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 198471/608042 [01:19<02:36, 2609.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 198856/608042 [01:19<02:25, 2813.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 199171/608042 [01:19<02:21, 2893.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 199469/608042 [01:19<02:23, 2850.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 199758/608042 [01:19<02:27, 2760.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 200043/608042 [01:19<02:29, 2728.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 200325/608042 [01:19<02:33, 2651.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 200627/608042 [01:19<02:33, 2660.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 200940/608042 [01:19<02:26, 2773.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 201230/608042 [01:20<02:35, 2622.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 201496/608042 [01:20<02:46, 2443.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 201766/608042 [01:20<02:47, 2430.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 202108/608042 [01:20<02:34, 2631.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 202390/608042 [01:20<02:31, 2673.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 202666/608042 [01:20<02:33, 2637.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 202932/608042 [01:20<02:49, 2388.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 203178/608042 [01:20<02:58, 2272.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 203435/608042 [01:20<02:53, 2336.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  33%|███▎      | 203691/608042 [01:21<02:48, 2393.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▎      | 203948/608042 [01:21<02:51, 2362.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▎      | 204195/608042 [01:21<02:50, 2362.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▎      | 204484/608042 [01:21<02:42, 2488.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▎      | 204748/608042 [01:21<02:42, 2483.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▎      | 205003/608042 [01:21<02:51, 2345.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 205252/608042 [01:21<02:50, 2368.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 205507/608042 [01:21<02:52, 2334.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 205749/608042 [01:21<03:00, 2228.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 205986/608042 [01:22<02:59, 2234.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 206264/608042 [01:22<02:49, 2371.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 206524/608042 [01:22<02:52, 2330.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 206883/608042 [01:22<02:30, 2657.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 207166/608042 [01:22<02:37, 2543.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 207478/608042 [01:22<02:30, 2666.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 207761/608042 [01:22<02:30, 2666.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 208036/608042 [01:22<02:34, 2581.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 208380/608042 [01:22<02:23, 2778.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 208665/608042 [01:23<02:34, 2589.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 208928/608042 [01:23<02:35, 2574.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 209200/608042 [01:23<02:53, 2297.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 209449/608042 [01:23<02:58, 2230.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  34%|███▍      | 209760/608042 [01:23<02:45, 2411.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 210029/608042 [01:23<02:41, 2469.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 210346/608042 [01:23<02:29, 2655.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 210616/608042 [01:23<02:44, 2416.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 211004/608042 [01:23<02:21, 2805.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 211297/608042 [01:24<02:35, 2552.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 211613/608042 [01:24<02:26, 2701.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 211899/608042 [01:24<02:32, 2600.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 212182/608042 [01:24<02:32, 2591.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 212451/608042 [01:24<02:42, 2430.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▍      | 212766/608042 [01:24<02:30, 2618.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 213065/608042 [01:24<02:25, 2709.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 213377/608042 [01:24<02:21, 2789.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 213680/608042 [01:24<02:24, 2726.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 213957/608042 [01:25<02:24, 2727.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 214248/608042 [012024-08-03T04:45:03.832268751Z :25<02:26, 2691.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 214535/608042 [01:25<02:26, 2690.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 214849/608042 [01:25<02:20, 2795.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 215143/608042 [01:25<02:27, 2663.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 215427/608042 [01:25<02:32, 2576.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  35%|███▌      | 215696/608042 [01:25<02:30, 2603.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 215963/608042 [01:25<02:46, 2360.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 216229/608042 [01:25<02:41, 2424.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 216484/608042 [01:26<02:46, 2352.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 216726/608042 [01:26<02:48, 2320.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 216990/608042 [01:26<02:43, 2392.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 217233/608042 [01:26<02:50, 2290.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 217531/608042 [01:26<02:39, 2453.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 217789/608042 [01:26<02:36, 2486.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 218088/608042 [01:26<02:34, 2522.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 218348/608042 [01:26<02:44, 2373.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 218652/608042 [01:26<02:33, 2545.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 218911/608042 [01:27<02:42, 2394.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 219172/608042 [01:27<02:40, 2416.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 219423/608042 [01:27<02:40, 2428.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 219734/608042 [01:27<02:29, 2604.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 220012/608042 [01:27<02:27, 2631.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▌      | 220296/608042 [01:27<02:28, 2615.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▋      | 220561/608042 [01:27<02:32, 2542.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▋      | 220849/608042 [01:27<02:27, 2632.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▋      | 221130/608042 [01:27<02:37, 2453.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▋      | 221401/608042 [01:28<02:37, 2455.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  36%|███▋      | 221692/608042 [01:28<02:31, 2542.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 221988/608042 [01:28<02:25, 2644.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 222319/608042 [01:28<02:16, 2819.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 222615/608042 [01:28<02:32, 2533.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 222889/608042 [01:28<02:31, 2546.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 223160/608042 [01:28<02:29, 2566.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 223426/608042 [01:28<02:35, 2473.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 223680/608042 [01:28<02:47, 2294.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 223926/608042 [01:29<02:45, 2316.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 224247/608042 [01:29<02:30, 2549.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 224526/608042 [01:29<02:30, 2550.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 224789/608042 [01:29<02:30, 2554.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 225144/608042 [01:29<02:17, 2783.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 225470/608042 [01:29<02:17, 2790.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 225754/608042 [01:29<02:21, 2701.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 226118/608042 [01:29<02:11, 2901.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 226423/608042 [01:30<02:32, 2496.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 226711/608042 [01:30<02:27, 2581.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 226980/608042 [01:30<02:26, 2594.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 227257/608042 [01:30<02:35, 2454.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 227509/608042 [01:30<02:46, 2288.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  37%|███▋      | 227774/608042 [01:30<02:40, 2376.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 228048/608042 [01:30<02:35, 2441.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 228316/608042 [01:30<02:31, 2506.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 228595/608042 [01:30<02:30, 2523.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 228910/608042 [01:31<02:27, 2568.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 229175/608042 [01:31<02:31, 2502.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 229437/608042 [01:31<02:34, 2442.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 229685/608042 [01:31<02:39, 2374.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 229925/608042 [01:31<02:39, 2366.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 230167/608042 [01:31<02:45, 2285.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 230446/608042 [01:31<02:40, 2356.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 230752/608042 [01:31<02:31, 2489.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 231093/608042 [01:31<02:18, 2714.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 231385/608042 [01:32<02:28, 2540.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 231676/608042 [01:32<02:29, 2520.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 232029/608042 [01:32<02:15, 2775.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 232311/608042 [01:32<02:16, 2758.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 232592/608042 [01:32<02:23, 2614.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 232868/608042 [01:32<02:25, 2584.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 233185/608042 [01:32<02:17, 2727.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 233466/608042 [01:32<02:19, 2683.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 233771/608042 [01:32<02:15, 2755.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  38%|███▊      | 234052/608042 [01:33<02:26, 2545.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▊      | 234318/608042 [01:33<02:34, 2423.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▊      | 234607/608042 [01:33<02:33, 2433.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▊      | 234876/608042 [01:33<02:33, 2424.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▊      | 235195/608042 [01:33<02:22, 2622.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▊      | 235542/608042 [01:33<02:12, 2809.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 235836/608042 [01:33<02:14, 2769.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 236125/608042 [01:33<02:13, 2786.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 236412/608042 [01:33<02:12, 2800.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 236694/608042 [01:34<02:20, 2645.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 236973/608042 [01:34<02:29, 2483.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 237226/608042 [01:34<02:33, 2411.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 237478/608042 [01:34<02:33, 2411.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 237757/608042 [01:34<02:31, 2446.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 238003/608042 [01:34<02:40, 2299.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 238255/608042 [01:34<02:38, 2338.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 238500/608042 [01:34<02:39, 2315.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 238791/608042 [01:34<02:30, 2460.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 239100/608042 [01:35<02:20, 2622.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 239381/608042 [01:35<02:22, 2587.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 239642/608042 [01:35<02:25, 2536.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 239905/608042 [01:35<02:35, 2366.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  39%|███▉      | 240170/608042 [01:35<02:35, 2371.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 240451/608042 [01:35<02:27, 2489.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 240751/608042 [01:35<02:20, 2611.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 241048/608042 [01:35<02:15, 2707.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 241324/608042 [01:35<02:17, 2666.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 241602/608042 [01:35<02:18, 2653.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 241900/608042 [01:36<02:18, 2644.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 242170/608042 [01:36<02:21, 2590.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 242456/608042 [01:36<02:18, 2643.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 242725/608042 [01:36<02:24, 2522.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|███▉      | 243020/608042 [01:36<02:23, 2543.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 243301/608042 [01:36<02:20, 2589.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 243716/608042 [01:36<02:00, 3012.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 244030/608042 [01:36<02:18, 2625.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 244422/608042 [01:37<02:03, 2941.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 244733/608042 [01:37<02:19, 2599.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 245015/608042 [01:37<02:16, 2650.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 245311/608042 [01:37<02:15, 2678.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 245605/608042 [01:37<02:12, 2745.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 245888/608042 [01:37<02:18, 2606.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  40%|████      | 246187/608042 [01:37<02:14, 2693.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 246461/608042 [01:37<02:23, 2527.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 246721/608042 [01:37<02:23, 2525.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 246991/608042 [01:38<02:25, 2478.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 247300/608042 [01:38<02:17, 2626.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 247566/608042 [01:38<02:18, 2601.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 247842/608042 [01:38<02:22, 2531.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 248153/608042 [01:38<02:13, 2686.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 248428/608042 [01:38<02:19, 2573.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 248701/608042 [01:38<02:24, 2493.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 249029/608042 [01:38<02:12, 2702.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 249309/608042 [01:38<02:21, 2531.56 exampl2024-08-03T04:45:03.832268751Z es/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 249582/608042 [01:39<02:20, 2553.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 249864/608042 [01:39<02:23, 2503.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 250118/608042 [01:39<02:34, 2310.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 250405/608042 [01:39<02:26, 2443.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████      | 250662/608042 [01:39<02:27, 2423.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████▏     | 250937/608042 [01:39<02:24, 2475.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████▏     | 251229/608042 [01:39<02:18, 2568.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████▏     | 251498/608042 [01:39<02:23, 2483.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████▏     | 251791/608042 [01:39<02:16, 2605.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  41%|████▏     | 252072/608042 [01:40<02:19, 2549.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 252368/608042 [01:40<02:15, 2624.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 252657/608042 [01:40<02:18, 2559.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 252924/608042 [01:40<02:20, 2523.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 253185/608042 [01:40<02:25, 2431.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 253438/608042 [01:40<02:30, 2355.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 253782/608042 [01:40<02:16, 2599.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 254056/608042 [01:40<02:19, 2542.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 254313/608042 [01:40<02:27, 2394.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 254621/608042 [01:41<02:23, 2468.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 254911/608042 [01:41<02:17, 2574.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 255184/608042 [01:41<02:16, 2583.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 255463/608042 [01:41<02:14, 2612.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 255742/608042 [01:41<02:17, 2555.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 256063/608042 [01:41<02:08, 2737.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 256362/608042 [01:41<02:05, 2803.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 256649/608042 [01:41<02:08, 2725.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 256947/608042 [01:41<02:15, 2597.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 257214/608042 [01:42<02:24, 2428.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 257536/608042 [01:42<02:13, 2629.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 257836/608042 [01:42<02:13, 2631.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 258128/608042 [01:42<02:12, 2642.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  42%|████▏     | 258412/608042 [01:42<02:19, 2514.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 258668/608042 [01:42<02:29, 2341.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 258926/608042 [01:42<02:25, 2395.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 259177/608042 [01:42<02:26, 2373.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 259434/608042 [01:42<02:25, 2403.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 259756/608042 [01:43<02:15, 2579.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 260028/608042 [01:43<02:18, 2506.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 260302/608042 [01:43<02:16, 2554.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 260577/608042 [01:43<02:15, 2562.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 260940/608042 [01:43<02:01, 2862.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 261235/608042 [01:43<02:10, 2651.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 261520/608042 [01:43<02:14, 2570.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 261781/608042 [01:43<02:19, 2488.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 262092/608042 [01:43<02:13, 2585.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 262414/608042 [01:44<02:09, 2667.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 262770/608042 [01:44<02:00, 2874.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 263084/608042 [01:44<02:02, 2812.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 263372/608042 [01:44<02:10, 2634.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 263642/608042 [01:44<02:16, 2519.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 263912/608042 [01:44<02:15, 2545.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 264186/608042 [01:44<02:13, 2566.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  43%|████▎     | 264445/608042 [01:44<02:28, 2307.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▎     | 264682/608042 [01:45<02:32, 2244.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▎     | 264957/608042 [01:45<02:25, 2361.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▎     | 265200/608042 [01:45<02:26, 2345.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▎     | 265439/608042 [01:45<02:42, 2105.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▎     | 265730/608042 [01:45<02:29, 2289.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▎     | 266001/608042 [01:45<02:23, 2378.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 266245/608042 [01:45<02:37, 2174.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 266518/608042 [01:45<02:27, 2319.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 266764/608042 [01:45<02:30, 2267.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 266998/608042 [01:46<02:36, 2181.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 267407/608042 [01:46<02:06, 2698.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 267690/608042 [01:46<02:13, 2551.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 267953/608042 [01:46<02:15, 2510.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 268228/608042 [01:46<02:12, 2557.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 268559/608042 [01:46<02:03, 2756.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 268885/608042 [01:46<01:57, 2884.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 269200/608042 [01:46<01:54, 2957.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 269503/608042 [01:46<02:09, 2616.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 269800/608042 [01:47<02:05, 2700.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 270080/608042 [01:47<02:10, 2585.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  44%|████▍     | 270351/608042 [01:47<02:16, 2482.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 270680/608042 [01:47<02:06, 2674.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 270955/608042 [01:47<02:09, 2601.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 271244/608042 [01:47<02:08, 2623.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 271519/608042 [01:47<02:10, 2587.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 271805/608042 [01:47<02:06, 2651.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 272121/608042 [01:47<02:02, 2745.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 272402/608042 [01:48<02:15, 2481.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 272659/608042 [01:48<02:20, 2393.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 273006/608042 [01:48<02:07, 2637.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 273280/608042 [01:48<02:08, 2605.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▍     | 273553/608042 [01:48<02:20, 2379.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 273847/608042 [01:48<02:12, 2524.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 274118/608042 [01:48<02:10, 2567.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 274391/608042 [01:48<02:18, 2402.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 274667/608042 [01:48<02:13, 2491.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 274939/608042 [01:49<02:11, 2540.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 275198/608042 [01:49<02:13, 2498.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 275454/608042 [01:49<02:19, 2385.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 275710/608042 [01:49<02:19, 2390.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 275976/608042 [01:49<02:17, 2412.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 276222/608042 [01:49<02:28, 2239.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  45%|████▌     | 276479/608042 [01:49<02:22, 2320.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 276784/608042 [01:49<02:12, 2504.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 277043/608042 [01:49<02:15, 2442.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 277364/608042 [01:50<02:05, 2632.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 277672/608042 [01:50<02:00, 2740.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 277950/608042 [01:50<02:15, 2430.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 278273/608042 [01:50<02:05, 2637.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 278554/608042 [01:50<02:05, 2617.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 278826/608042 [01:50<02:09, 2542.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 279085/608042 [01:50<02:14, 2441.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 279340/608042 [01:50<02:21, 2329.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 279579/608042 [01:50<02:22, 2303.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 279836/608042 [01:51<02:25, 2260.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 280069/608042 [01:51<02:37, 2076.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 280400/608042 [01:51<02:17, 2376.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 280653/608042 [01:51<02:19, 2348.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▌     | 280915/608042 [01:51<02:16, 2394.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▋     | 281234/608042 [01:51<02:07, 2563.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▋     | 281506/608042 [01:51<02:14, 2431.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▋     | 281772/608042 [01:51<02:27, 2215.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▋     | 282001/608042 [01:52<02:30, 2172.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▋     | 282318/608042 [01:52<02:14, 2422.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  46%|████▋     | 282566/608042 [01:52<02:16, 2377.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 282847/608042 [01:52<02:11, 2468.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 283100/608042 [01:52<02:11, 2475.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 283353/608042 [01:52<02:10, 2483.42 examples/s]
Tokenizing and reformatting instruction data (num_pro2024-08-03T04:45:03.832268751Z c=16):  47%|████▋     | 283658/608042 [01:52<02:02, 2638.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 283925/608042 [01:52<02:08, 2531.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 284222/608042 [01:52<02:01, 2655.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 284500/608042 [01:52<02:04, 2597.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 284824/608042 [01:53<01:56, 2767.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 285105/608042 [01:53<02:06, 2559.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 285392/608042 [01:53<02:02, 2637.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 285695/608042 [01:53<01:58, 2719.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 285988/608042 [01:53<02:03, 2597.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 286277/608042 [01:53<02:01, 2650.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 286603/608042 [01:53<01:56, 2749.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 286880/608042 [01:53<01:56, 2745.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 287159/608042 [01:53<01:57, 2741.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 287471/608042 [01:54<01:53, 2822.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 287765/608042 [01:54<01:56, 2755.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 288046/608042 [01:54<01:55, 2761.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 288328/608042 [01:54<01:59, 2680.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  47%|████▋     | 288600/608042 [01:54<02:02, 2605.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 288868/608042 [01:54<02:09, 2469.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 289172/608042 [01:54<02:01, 2614.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 289437/608042 [01:54<02:03, 2583.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 289697/608042 [01:54<02:12, 2404.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 289964/608042 [01:55<02:08, 2470.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 290216/608042 [01:55<02:12, 2393.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 290522/608042 [01:55<02:06, 2518.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 290799/608042 [01:55<02:02, 2587.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 291073/608042 [01:55<02:02, 2589.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 291338/608042 [01:55<02:08, 2461.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 291618/608042 [01:55<02:04, 2541.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 291883/608042 [01:55<02:04, 2543.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 292156/608042 [01:55<02:05, 2515.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 292457/608042 [01:56<01:59, 2647.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 292726/608042 [01:56<01:58, 2653.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 293007/608042 [01:56<02:08, 2456.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 293261/608042 [01:56<02:14, 2348.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 293506/608042 [01:56<02:14, 2342.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 293774/608042 [01:56<02:10, 2402.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 294133/608042 [01:56<01:55, 2722.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 294409/608042 [01:56<01:56, 2698.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  48%|████▊     | 294686/608042 [01:56<02:02, 2562.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▊     | 294949/608042 [01:57<02:11, 2375.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▊     | 295200/608042 [01:57<02:09, 2410.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▊     | 295542/608042 [01:57<01:56, 2677.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▊     | 295821/608042 [01:57<01:59, 2608.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▊     | 296090/608042 [01:57<02:01, 2577.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▊     | 296371/608042 [01:57<01:58, 2631.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 296664/608042 [01:57<01:55, 2685.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 296962/608042 [01:57<01:53, 2731.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 297253/608042 [01:57<01:55, 2680.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 297522/608042 [01:58<02:01, 2552.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 297783/608042 [01:58<02:06, 2449.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 298092/608042 [01:58<01:58, 2620.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 298362/608042 [01:58<02:00, 2576.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 298682/608042 [01:58<01:54, 2706.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 298955/608042 [01:58<01:56, 2652.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 299228/608042 [01:58<01:57, 2624.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 299492/608042 [01:58<02:00, 2555.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 299761/608042 [01:58<01:58, 2591.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 300027/608042 [01:59<02:14, 2282.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 300275/608042 [01:59<02:12, 2329.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 300519/608042 [01:59<02:16, 2259.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  49%|████▉     | 300777/608042 [01:59<02:11, 2330.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 301076/608042 [01:59<02:02, 2506.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 301361/608042 [01:59<01:57, 2602.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 301683/608042 [01:59<01:52, 2713.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 301957/608042 [01:59<01:53, 2689.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 302240/608042 [01:59<01:53, 2697.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 302534/608042 [01:59<01:53, 2691.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 302842/608042 [02:00<01:49, 2787.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 303154/608042 [02:00<01:46, 2859.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 303443/608042 [02:00<01:49, 2785.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|████▉     | 303765/608042 [02:00<01:47, 2819.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 304054/608042 [02:00<01:50, 2749.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 304388/608042 [02:00<01:44, 2905.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 304713/608042 [02:00<01:42, 2962.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 305018/608042 [02:00<01:46, 2832.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 305309/608042 [02:01<02:05, 2418.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 305568/608042 [02:01<02:03, 2439.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 305820/608042 [02:01<02:05, 2404.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 306070/608042 [02:01<02:05, 2409.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 306430/608042 [02:01<01:50, 2734.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 306750/608042 [02:01<01:47, 2790.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  50%|█████     | 307042/608042 [02:01<01:48, 2784.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 307360/608042 [02:01<01:43, 2891.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 307672/608042 [02:01<01:42, 2918.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 307975/608042 [02:01<01:48, 2757.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 308256/608042 [02:02<01:50, 2723.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 308537/608042 [02:02<01:52, 2663.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 308819/608042 [02:02<01:56, 2566.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 309172/608042 [02:02<01:47, 2772.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 309459/608042 [02:02<01:56, 2564.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 309796/608042 [02:02<01:49, 2721.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 310078/608042 [02:02<01:52, 2639.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 310371/608042 [02:02<01:50, 2687.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 310709/608042 [02:02<01:43, 2872.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 311031/608042 [02:03<01:42, 2897.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 311331/608042 [02:03<01:46, 2783.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████     | 311620/608042 [02:03<01:54, 2581.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████▏    | 311897/608042 [02:03<01:53, 2612.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████▏    | 312166/608042 [02:03<02:00, 2464.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████▏    | 312433/608042 [02:03<02:05, 2357.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████▏    | 312805/608042 [02:03<01:52, 2616.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  51%|█████▏    | 313081/608042 [02:03<01:57, 2515.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 313356/608042 [02:04<02:00, 2446.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 313649/608042 [02:04<01:58, 2491.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 313913/608042 [02:04<01:58, 2492.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 314174/608042 [02:04<02:00, 2438.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 314438/608042 [02:04<01:58, 2477.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 314811/608042 [02:04<01:46, 2742.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 315089/608042 [02:04<01:52, 2613.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 315352/608042 [02:04<01:56, 2505.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 315620/608042 [02:04<01:54, 2544.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 315876/608042 [02:05<01:59, 2436.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 316143/608042 [02:05<01:58, 2462.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 316392/608042 [02:05<01:58, 2460.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 316654/608042 [02:05<02:03, 2364.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 316901/608042 [02:05<02:09, 2245.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 317219/608042 [02:05<01:57, 2476.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 317481/608042 [02:05<01:57, 2471.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 317765/608042 [02:05<01:52, 2571.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 318039/608042 [02:05<01:50, 2618.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=12024-08-03T04:45:03.832268751Z 6):  52%|█████▏    | 318308/608042 [02:06<01:53, 2542.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 318566/608042 [02:06<01:53, 2540.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 318844/608042 [02:06<01:53, 2557.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  52%|█████▏    | 319152/608042 [02:06<01:50, 2606.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 319453/608042 [02:06<01:46, 2702.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 319737/608042 [02:06<01:55, 2499.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 319998/608042 [02:06<01:54, 2514.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 320267/608042 [02:06<01:57, 2442.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 320530/608042 [02:06<01:58, 2428.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 320778/608042 [02:07<01:58, 2429.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 321077/608042 [02:07<01:52, 2548.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 321339/608042 [02:07<01:52, 2558.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 321623/608042 [02:07<01:50, 2600.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 321885/608042 [02:07<01:59, 2385.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 322136/608042 [02:07<02:02, 2326.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 322493/608042 [02:07<01:47, 2662.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 322766/608042 [02:07<01:55, 2477.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 323027/608042 [02:07<01:56, 2440.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 323369/608042 [02:08<01:45, 2696.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 323651/608042 [02:08<01:49, 2602.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 323952/608042 [02:08<01:46, 2655.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 324324/608042 [02:08<01:39, 2865.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 324616/608042 [02:08<01:45, 2693.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 324946/608042 [02:08<01:40, 2803.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  53%|█████▎    | 325244/608042 [02:08<01:39, 2829.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▎    | 325537/608042 [02:08<01:45, 2686.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▎    | 325827/608042 [02:08<01:44, 2710.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▎    | 326107/608042 [02:09<01:52, 2501.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▎    | 326390/608042 [02:09<01:51, 2520.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▎    | 326682/608042 [02:09<01:47, 2617.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 326951/608042 [02:09<01:49, 2557.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 327222/608042 [02:09<01:48, 2587.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 327497/608042 [02:09<01:59, 2348.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 327747/608042 [02:09<01:59, 2340.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 328011/608042 [02:09<01:56, 2410.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 328281/608042 [02:09<01:56, 2397.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 328596/608042 [02:10<01:47, 2588.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 328906/608042 [02:10<01:43, 2696.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 329182/608042 [02:10<01:49, 2549.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 329442/608042 [02:10<01:54, 2441.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 329751/608042 [02:10<01:47, 2592.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 330035/608042 [02:10<01:46, 2617.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 330346/608042 [02:10<01:41, 2735.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 330627/608042 [02:10<01:43, 2691.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 330918/608042 [02:10<01:47, 2588.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  54%|█████▍    | 331193/608042 [02:11<01:50, 2494.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 331450/608042 [02:11<01:54, 2422.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 331710/608042 [02:11<01:52, 2446.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 331967/608042 [02:11<01:54, 2409.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 332232/608042 [02:11<01:53, 2437.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 332544/608042 [02:11<01:44, 2628.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 332813/608042 [02:11<01:47, 2565.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 333077/608042 [02:11<01:47, 2562.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 333336/608042 [02:11<01:57, 2333.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 333611/608042 [02:12<01:53, 2409.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 333931/608042 [02:12<01:45, 2588.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▍    | 334234/608042 [02:12<01:44, 2616.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 334498/608042 [02:12<01:49, 2489.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 334758/608042 [02:12<01:51, 2448.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 335007/608042 [02:12<01:53, 2409.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 335257/608042 [02:12<01:59, 2281.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 335542/608042 [02:12<01:52, 2424.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 335802/608042 [02:12<01:54, 2387.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 336079/608042 [02:13<01:53, 2398.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 336320/608042 [02:13<01:53, 2391.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 336575/608042 [02:13<01:54, 2363.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 336813/608042 [02:13<01:57, 2304.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 337090/608042 [02:13<01:53, 2388.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  55%|█████▌    | 337415/608042 [02:13<01:44, 2593.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 337678/608042 [02:13<01:47, 2523.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 337937/608042 [02:13<01:47, 2504.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 338194/608042 [02:13<01:52, 2393.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 338441/608042 [02:14<01:52, 2394.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 338703/608042 [02:14<01:49, 2451.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 339019/608042 [02:14<01:41, 2645.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 339318/608042 [02:14<01:38, 2729.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 339700/608042 [02:14<01:28, 3016.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 340013/608042 [02:14<01:41, 2640.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 340342/608042 [02:14<01:36, 2784.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 340634/608042 [02:14<01:34, 2814.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 340935/608042 [02:14<01:42, 2602.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 341212/608042 [02:15<01:44, 2554.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 341530/608042 [02:15<01:38, 2715.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▌    | 341816/608042 [02:15<01:44, 2557.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▋    | 342077/608042 [02:15<01:44, 2552.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▋    | 342338/608042 [02:15<01:49, 2432.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▋    | 342588/608042 [02:15<01:49, 2429.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▋    | 342843/608042 [02:15<01:47, 2461.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▋    | 343194/608042 [02:15<01:36, 2734.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  56%|█████▋    | 343478/608042 [02:15<01:41, 2619.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 343761/608042 [02:16<01:52, 2352.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 344013/608042 [02:16<01:50, 2388.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 344299/608042 [02:16<01:45, 2508.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 344627/608042 [02:16<01:43, 2556.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 344886/608042 [02:16<01:45, 2486.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 345193/608042 [02:16<01:40, 2602.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 345489/608042 [02:16<01:37, 2689.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 345775/608042 [02:16<01:37, 2695.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 346055/608042 [02:16<01:40, 2608.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 346328/608042 [02:17<01:39, 2636.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 346605/608042 [02:17<01:47, 2431.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 346914/608042 [02:17<01:43, 2530.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 347202/608042 [02:17<01:40, 2604.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 347506/608042 [02:17<01:36, 2710.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 347782/608042 [02:17<01:42, 2528.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 348042/608042 [02:17<01:44, 2482.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 348302/608042 [02:17<01:47, 2405.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 348546/608042 [02:17<01:51, 2329.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 348836/608042 [02:18<01:44, 2469.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 349211/608042 [02:18<01:33, 2780.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  57%|█████▋    | 349504/608042 [02:18<01:42, 2533.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 349768/608042 [02:18<01:41, 2551.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 350036/608042 [02:18<01:42, 2505.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 350301/608042 [02:18<01:48, 2370.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 350553/608042 [02:18<01:49, 2341.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 350808/608042 [02:18<01:50, 2317.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 351044/608042 [02:18<01:50, 2324.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 351280/608042 [02:19<01:54, 2249.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 351523/608042 [02:19<01:53, 2265.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 351809/608042 [02:19<01:45, 2428.83 examples/s]
2024-08-03T04:45:03.832268751Z Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 352177/608042 [02:19<01:32, 2760.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 352463/608042 [02:19<01:39, 2561.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 352792/608042 [02:19<01:34, 2689.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 353067/608042 [02:19<01:36, 2633.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 353335/608042 [02:19<01:38, 2577.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 353676/608042 [02:19<01:30, 2801.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 353968/608042 [02:20<01:41, 2507.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 354240/608042 [02:20<01:40, 2519.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 354542/608042 [02:20<01:35, 2652.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 354817/608042 [02:20<01:36, 2632.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 355112/608042 [02:20<01:36, 2627.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 355410/608042 [02:20<01:33, 2709.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  58%|█████▊    | 355687/608042 [02:20<01:32, 2722.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▊    | 355961/608042 [02:20<01:38, 2557.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▊    | 356234/608042 [02:20<01:39, 2540.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▊    | 356493/608042 [02:21<01:39, 2516.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▊    | 356748/608042 [02:21<01:40, 2510.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▊    | 357051/608042 [02:21<01:34, 2657.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 357318/608042 [02:21<01:36, 2605.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 357594/608042 [02:21<01:35, 2628.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 357867/608042 [02:21<01:37, 2575.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 358172/608042 [02:21<01:32, 2687.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 358457/608042 [02:21<01:32, 2695.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 358729/608042 [02:21<01:36, 2586.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 359087/608042 [02:22<01:26, 2864.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 359383/608042 [02:22<01:31, 2725.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 359675/608042 [02:22<01:30, 2758.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 359973/608042 [02:22<01:31, 2715.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 360247/608042 [02:22<01:31, 2719.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 360526/608042 [02:22<01:37, 2536.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 360789/608042 [02:22<01:36, 2557.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 361054/608042 [02:22<01:41, 2444.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 361307/608042 [02:22<01:43, 2377.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  59%|█████▉    | 361557/608042 [02:23<01:42, 2394.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 361846/608042 [02:23<01:39, 2485.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 362118/608042 [02:23<01:41, 2419.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 362443/608042 [02:23<01:33, 2614.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 362721/608042 [02:23<01:34, 2589.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 362985/608042 [02:23<01:42, 2385.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 363275/608042 [02:23<01:43, 2372.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 363528/608042 [02:23<01:46, 2298.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 363773/608042 [02:23<01:49, 2230.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 364026/608042 [02:24<01:51, 2193.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 364317/608042 [02:24<01:44, 2325.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|█████▉    | 364602/608042 [02:24<01:42, 2364.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 364914/608042 [02:24<01:35, 2548.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 365178/608042 [02:24<01:34, 2563.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 365442/608042 [02:24<01:37, 2486.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 365792/608042 [02:24<01:28, 2728.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 366073/608042 [02:24<01:35, 2542.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 366336/608042 [02:24<01:37, 2478.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 366617/608042 [02:25<01:34, 2564.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 366941/608042 [02:25<01:29, 2698.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 367233/608042 [02:25<01:27, 2752.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 367512/608042 [02:25<01:27, 2750.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  60%|██████    | 367799/608042 [02:25<01:31, 2631.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 368072/608042 [02:25<01:30, 2655.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 368359/608042 [02:25<01:28, 2701.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 368640/608042 [02:25<01:28, 2712.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 368921/608042 [02:25<01:28, 2699.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 369224/608042 [02:25<01:26, 2776.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 369524/608042 [02:26<01:30, 2633.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 369796/608042 [02:26<01:30, 2630.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 370067/608042 [02:26<01:39, 2384.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 370361/608042 [02:26<01:33, 2530.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 370687/608042 [02:26<01:26, 2728.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 370981/608042 [02:26<01:26, 2737.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 371261/608042 [02:26<01:28, 2678.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 371533/608042 [02:26<01:30, 2603.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 371856/608042 [02:26<01:25, 2770.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████    | 372137/608042 [02:27<01:32, 2545.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████▏   | 372468/608042 [02:27<01:25, 2745.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████▏   | 372844/608042 [02:27<01:20, 2916.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████▏   | 373164/608042 [02:27<01:24, 2793.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████▏   | 373563/608042 [02:27<01:18, 3004.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  61%|██████▏   | 373877/608042 [02:27<01:21, 2879.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 374172/608042 [02:27<01:22, 2832.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 374477/608042 [02:27<01:23, 2804.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 374779/608042 [02:28<01:21, 2859.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 375069/608042 [02:28<01:24, 2755.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 375434/608042 [02:28<01:18, 2975.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 375742/608042 [02:28<01:19, 2925.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 376042/608042 [02:28<01:24, 2759.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 376331/608042 [02:28<01:25, 2700.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 376629/608042 [02:28<01:25, 2719.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 376905/608042 [02:28<01:25, 2703.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 377180/608042 [02:28<01:35, 2424.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 377431/608042 [02:29<01:39, 2326.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 377714/608042 [02:29<01:34, 2446.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 377964/608042 [02:29<01:35, 2416.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 378223/608042 [02:29<01:34, 2442.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 378498/608042 [02:29<01:31, 2515.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 378762/608042 [02:29<01:32, 2483.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 379066/608042 [02:29<01:26, 2636.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 379437/608042 [02:29<01:19, 2882.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  62%|██████▏   | 379746/608042 [02:29<01:19, 2884.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 380051/608042 [02:30<01:20, 2844.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 380356/608042 [02:30<01:23, 2731.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 380634/608042 [02:30<01:28, 2557.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 380964/608042 [02:30<01:24, 2702.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 381238/608042 [02:30<01:26, 2613.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 381510/608042 [02:30<01:26, 2617.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 381776/608042 [02:30<01:32, 2456.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 382028/608042 [02:30<01:35, 2364.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 382297/608042 [02:30<01:34, 2401.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 382598/608042 [02:31<01:28, 2545.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 382861/608042 [02:31<01:33, 2408.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 383162/608042 [02:31<01:29, 2522.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 383507/608042 [02:31<01:21, 2760.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 383857/608042 [02:31<01:15, 2952.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 384158/608042 [02:31<01:15, 2952.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 384493/608042 [02:31<01:13, 3024.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 384805/608042 [02:31<01:19, 2798.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 385096/608042 [02:31<01:23, 2680.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 385408/608042 [02:32<01:19, 2797.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 385698/608042 [02:32<01:21, 2724.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  63%|██████▎   | 386040/608042 [02:32<01:17, 2851.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▎   | 386338/608042 [02:32<01:19, 2800.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▎   | 386640/608042 [02:32<01:17, 2859.2024-08-03T04:45:03.832268751Z 40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▎   | 386929/608042 [02:32<01:25, 2579.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▎   | 387214/608042 [02:32<01:24, 2624.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▎   | 387520/608042 [02:32<01:21, 2707.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 387796/608042 [02:32<01:24, 2619.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 388106/608042 [02:33<01:20, 2734.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 388464/608042 [02:33<01:13, 2971.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 388771/608042 [02:33<01:14, 2926.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 389107/608042 [02:33<01:12, 3016.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 389413/608042 [02:33<01:17, 2815.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 389702/608042 [02:33<01:24, 2596.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 389973/608042 [02:33<01:23, 2612.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 390244/608042 [02:33<01:24, 2588.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 390541/608042 [02:33<01:21, 2661.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 390819/608042 [02:34<01:26, 2508.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 391073/608042 [02:34<01:27, 2467.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 391328/608042 [02:34<01:28, 2459.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 391612/608042 [02:34<01:28, 2442.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 391902/608042 [02:34<01:25, 2528.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  64%|██████▍   | 392169/608042 [02:34<01:31, 2360.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 392465/608042 [02:34<01:27, 2450.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 392716/608042 [02:34<01:27, 2458.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 393019/608042 [02:34<01:22, 2617.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 393302/608042 [02:35<01:26, 2493.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 393591/608042 [02:35<01:23, 2581.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 393866/608042 [02:35<01:21, 2621.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 394235/608042 [02:35<01:13, 2922.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 394573/608042 [02:35<01:10, 3007.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 394877/608042 [02:35<01:14, 2877.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▍   | 395182/608042 [02:35<01:13, 2892.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 395476/608042 [02:35<01:20, 2631.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 395763/608042 [02:35<01:28, 2388.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 396058/608042 [02:36<01:24, 2513.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 396340/608042 [02:36<01:21, 2590.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 396624/608042 [02:36<01:20, 2620.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 396896/608042 [02:36<01:26, 2450.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 397197/608042 [02:36<01:21, 2572.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 397463/608042 [02:36<01:27, 2397.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 397763/608042 [02:36<01:23, 2515.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  65%|██████▌   | 398068/608042 [02:36<01:21, 2592.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 398348/608042 [02:36<01:21, 2561.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 398618/608042 [02:37<01:29, 2336.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 398871/608042 [02:37<01:27, 2386.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 399156/608042 [02:37<01:24, 2478.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 399426/608042 [02:37<01:24, 2454.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 399745/608042 [02:37<01:19, 2615.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 400023/608042 [02:37<01:18, 2648.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 400295/608042 [02:37<01:24, 2462.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 400565/608042 [02:37<01:22, 2508.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 400828/608042 [02:37<01:23, 2493.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 401089/608042 [02:38<01:23, 2475.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 401348/608042 [02:38<01:36, 2141.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 401571/608042 [02:38<01:41, 2028.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 401788/608042 [02:38<01:40, 2055.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 402008/608042 [02:38<01:40, 2047.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 402311/608042 [02:38<01:29, 2303.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▌   | 402546/608042 [02:38<01:28, 2314.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▋   | 402837/608042 [02:38<01:24, 2422.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▋   | 403102/608042 [02:38<01:24, 2439.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▋   | 403348/608042 [02:39<01:24, 2427.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▋   | 403595/608042 [02:39<01:26, 2363.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▋   | 403867/608042 [02:39<01:25, 2392.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  66%|██████▋   | 404118/608042 [02:39<01:24, 2418.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 404414/608042 [02:39<01:19, 2561.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 404679/608042 [02:39<01:27, 2332.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 404981/608042 [02:39<01:21, 2484.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 405268/608042 [02:39<01:20, 2527.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 405585/608042 [02:39<01:15, 2694.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 405864/608042 [02:40<01:17, 2609.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 406142/608042 [02:40<01:16, 2649.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 406412/608042 [02:40<01:18, 2563.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 406693/608042 [02:40<01:17, 2594.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 407001/608042 [02:40<01:14, 2715.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 407277/608042 [02:40<01:14, 2680.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 407548/608042 [02:40<01:24, 2375.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 407888/608042 [02:40<01:17, 2583.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 408163/608042 [02:40<01:16, 2611.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 408452/608042 [02:41<01:16, 2602.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 408746/608042 [02:41<01:14, 2691.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 409074/608042 [02:41<01:11, 2767.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 409362/608042 [02:41<01:13, 2710.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 409640/608042 [02:41<01:16, 2607.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 409911/608042 [02:41<01:22, 2404.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  67%|██████▋   | 410156/608042 [02:41<01:22, 2410.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 410465/608042 [02:41<01:17, 2538.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 410722/608042 [02:41<01:21, 2411.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 411059/608042 [02:42<01:14, 2650.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 411333/608042 [02:42<01:17, 2525.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 411594/608042 [02:42<01:18, 2513.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 411894/608042 [02:42<01:15, 2609.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 412165/608042 [02:42<01:15, 2577.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 412517/608042 [02:42<01:09, 2814.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 412850/608042 [02:42<01:06, 2949.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 413148/608042 [02:42<01:07, 2893.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 413445/608042 [02:42<01:08, 2823.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 413741/608042 [02:43<01:08, 2823.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 414032/608042 [02:43<01:14, 2603.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 414325/608042 [02:43<01:12, 2669.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 414603/608042 [02:43<01:15, 2571.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 414981/608042 [02:43<01:07, 2869.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 415307/608042 [02:43<01:07, 2850.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 415634/608042 [02:43<01:05, 2953.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 415932/608042 [02:43<01:12, 2649.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 416204/608042 [02:43<01:11, 2665.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  68%|██████▊   | 416487/608042 [02:44<01:17, 2481.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▊   | 416794/608042 [02:44<01:12, 2628.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▊   | 417170/608042 [02:44<01:05, 2900.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▊   | 417484/608042 [02:44<01:04, 2955.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▊   | 417824/608042 [02:44<01:01, 3070.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 418145/608042 [02:44<01:09, 2735.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 418428/608042 [02:44<01:12, 2602.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 418718/608042 [02:44<01:10, 2673.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 419002/608042 [02:45<01:12, 2598.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 419276/608042 [02:45<01:13, 2564.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 419543/608042 [02:45<01:18, 2411.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 419791/608042 [02:45<01:17, 2424.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 420039/608042 [02:45<01:22, 2270.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 420269/608042 [02:45<01:22, 2276.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 420598/608042 [2024-08-03T04:45:03.832268751Z 02:45<01:17, 2415.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 420857/608042 [02:45<01:17, 2412.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 421105/608042 [02:45<01:19, 2349.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 421433/608042 [02:46<01:12, 2559.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 421754/608042 [02:46<01:08, 2712.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 422051/608042 [02:46<01:10, 2625.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  69%|██████▉   | 422333/608042 [02:46<01:09, 2673.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 422625/608042 [02:46<01:10, 2618.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 422901/608042 [02:46<01:13, 2522.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 423169/608042 [02:46<01:12, 2559.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 423484/608042 [02:46<01:09, 2656.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 423760/608042 [02:46<01:12, 2535.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 424018/608042 [02:47<01:13, 2509.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 424362/608042 [02:47<01:06, 2762.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 424687/608042 [02:47<01:04, 2842.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 424998/608042 [02:47<01:04, 2844.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|██████▉   | 425331/608042 [02:47<01:01, 2981.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 425636/608042 [02:47<01:02, 2924.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 425931/608042 [02:47<01:04, 2803.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 426214/608042 [02:47<01:05, 2765.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 426507/608042 [02:47<01:10, 2567.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 426778/608042 [02:48<01:13, 2480.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 427105/608042 [02:48<01:08, 2644.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 427399/608042 [02:48<01:07, 2686.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 427670/608042 [02:48<01:12, 2503.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 427933/608042 [02:48<01:11, 2521.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 428210/608042 [02:48<01:09, 2580.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  70%|███████   | 428490/608042 [02:48<01:07, 2642.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 428876/608042 [02:48<01:01, 2896.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 429239/608042 [02:48<00:58, 3068.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 429548/608042 [02:48<00:58, 3028.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 429871/608042 [02:49<01:02, 2851.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 430161/608042 [02:49<01:03, 2817.65 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 430460/608042 [02:49<01:02, 2860.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 430755/608042 [02:49<01:02, 2841.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 431046/608042 [02:49<01:08, 2600.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 431337/608042 [02:49<01:06, 2652.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 431612/608042 [02:49<01:08, 2581.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 431876/608042 [02:49<01:10, 2485.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 432152/608042 [02:50<01:09, 2513.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 432411/608042 [02:50<01:11, 2444.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 432796/608042 [02:50<01:01, 2831.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████   | 433107/608042 [02:50<01:00, 2895.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████▏  | 433403/608042 [02:50<01:01, 2859.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████▏  | 433700/608042 [02:50<01:06, 2614.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████▏  | 433979/608042 [02:50<01:06, 2636.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████▏  | 434256/608042 [02:50<01:06, 2599.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  71%|███████▏  | 434520/608042 [02:50<01:10, 2445.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 434781/608042 [02:51<01:12, 2397.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 435125/608042 [02:51<01:04, 2668.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 435431/608042 [02:51<01:03, 2734.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 435720/608042 [02:51<01:03, 2727.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 436011/608042 [02:51<01:04, 2682.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 436296/608042 [02:51<01:06, 2578.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 436574/608042 [02:51<01:05, 2616.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 436842/608042 [02:51<01:07, 2537.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 437170/608042 [02:51<01:02, 2718.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 437448/608042 [02:51<01:02, 2712.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 437729/608042 [02:52<01:06, 2546.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 438006/608042 [02:52<01:05, 2607.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 438277/608042 [02:52<01:09, 2455.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 438681/608042 [02:52<00:59, 2843.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 438973/608042 [02:52<01:06, 2530.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 439240/608042 [02:52<01:07, 2516.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 439509/608042 [02:52<01:06, 2535.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 439790/608042 [02:52<01:05, 2585.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 440103/608042 [02:53<01:01, 2727.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 440451/608042 [02:53<00:57, 2938.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  72%|███████▏  | 440776/608042 [02:53<00:56, 2969.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 441081/608042 [02:53<00:56, 2957.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 441382/608042 [02:53<01:04, 2596.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 441659/608042 [02:53<01:04, 2595.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 441930/608042 [02:53<01:03, 2613.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 442199/608042 [02:53<01:04, 2564.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 442497/608042 [02:53<01:01, 2678.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 442777/608042 [02:54<01:02, 2627.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 443044/608042 [02:54<01:05, 2519.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 443301/608042 [02:54<01:11, 2303.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 443550/608042 [02:54<01:11, 2304.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 443824/608042 [02:54<01:09, 2377.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 444069/608042 [02:54<01:12, 2249.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 444299/608042 [02:54<01:16, 2146.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 444591/608042 [02:54<01:09, 2346.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 444874/608042 [02:54<01:06, 2453.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 445135/608042 [02:55<01:05, 2494.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 445439/608042 [02:55<01:01, 2648.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 445716/608042 [02:55<01:01, 2618.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 446008/608042 [02:55<01:00, 2683.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 446283/608042 [02:55<01:04, 2491.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 446539/608042 [02:55<01:06, 2426.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  73%|███████▎  | 446878/608042 [02:55<01:00, 2663.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▎  | 447200/608042 [02:55<00:57, 2811.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▎  | 447497/608042 [02:55<00:56, 2833.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▎  | 447783/608042 [02:56<01:01, 2601.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▎  | 448050/608042 [02:56<01:01, 2608.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▎  | 448320/608042 [02:56<01:01, 2588.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 448604/608042 [02:56<01:00, 2628.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 448893/608042 [02:56<00:59, 2658.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 449222/608042 [02:56<00:56, 2787.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 449502/608042 [02:56<00:59, 2659.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 449784/608042 [02:56<00:59, 2662.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 450060/608042 [02:56<00:59, 2661.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 450340/608042 [02:56<00:59, 2663.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 450614/608042 [02:57<00:58, 2680.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 450889/608042 [02:57<01:03, 2458.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 451156/608042 [02:57<01:02, 2504.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 451417/608042 [02:57<01:06, 2351.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 451689/608042 [02:57<01:03, 2447.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 451945/608042 [02:57<01:04, 2403.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 452233/608042 [02:57<01:02, 2495.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 452499/608042 [02:57<01:01, 2534.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  74%|███████▍  | 452756/608042 [02:57<01:03, 2458.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 453012/608042 [02:58<01:04, 2418.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 453256/608042 [02:58<01:04, 2387.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 453509/608042 [02:58<01:04, 2404.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 453757/608042 [02:58<01:09, 2222.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 454017/608042 [02:58<01:07, 2276.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|████��2024-08-03T04:45:03.832268751Z �██▍  | 454306/608042 [02:58<01:04, 2391.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 454610/608042 [02:58<01:01, 2484.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 454860/608042 [02:58<01:04, 2377.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 455109/608042 [02:58<01:03, 2390.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 455351/608042 [02:59<01:11, 2150.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 455576/608042 [02:59<01:11, 2141.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▍  | 455822/608042 [02:59<01:08, 2213.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 456133/608042 [02:59<01:04, 2368.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 456392/608042 [02:59<01:02, 2420.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 456687/608042 [02:59<00:59, 2546.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 456948/608042 [02:59<01:06, 2277.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 457249/608042 [02:59<01:01, 2457.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 457516/608042 [02:59<01:00, 2474.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 457849/608042 [03:00<00:56, 2660.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 458127/608042 [03:00<00:58, 2572.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 458464/608042 [03:00<00:55, 2696.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  75%|███████▌  | 458743/608042 [03:00<00:55, 2673.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 459094/608042 [03:00<00:51, 2897.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 459387/608042 [03:00<00:51, 2878.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 459697/608042 [03:00<00:50, 2918.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 459999/608042 [03:00<00:50, 2912.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 460312/608042 [03:00<00:51, 2872.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 460608/608042 [03:01<00:51, 2846.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 460898/608042 [03:01<00:54, 2701.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 461176/608042 [03:01<00:55, 2635.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 461452/608042 [03:01<00:57, 2536.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 461709/608042 [03:01<00:57, 2528.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 462029/608042 [03:01<00:54, 2695.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 462313/608042 [03:01<00:55, 2636.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 462583/608042 [03:01<00:54, 2652.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 462862/608042 [03:01<00:56, 2564.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 463157/608042 [03:02<00:55, 2613.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▌  | 463439/608042 [03:02<00:56, 2553.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▋  | 463711/608042 [03:02<00:56, 2572.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▋  | 463983/608042 [03:02<00:56, 2564.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▋  | 464245/608042 [03:02<00:55, 2579.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▋  | 464509/608042 [03:02<00:56, 2530.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▋  | 464777/608042 [03:02<00:56, 2515.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  76%|███████▋  | 465037/608042 [03:02<01:04, 2212.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 465336/608042 [03:02<00:59, 2379.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 465635/608042 [03:03<00:58, 2453.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 465889/608042 [03:03<01:02, 2291.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 466140/608042 [03:03<01:00, 2342.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 466459/608042 [03:03<00:57, 2449.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 466753/608042 [03:03<00:55, 2526.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 467011/608042 [03:03<00:56, 2516.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 467275/608042 [03:03<00:58, 2422.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 467619/608042 [03:03<00:52, 2665.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 467945/608042 [03:03<00:49, 2807.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 468243/608042 [03:04<00:49, 2801.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 468533/608042 [03:04<00:51, 2701.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 468805/608042 [03:04<00:52, 2654.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 469086/608042 [03:04<00:51, 2696.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 469367/608042 [03:04<00:51, 2690.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 469639/608042 [03:04<00:52, 2620.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 469909/608042 [03:04<00:53, 2579.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 470205/608042 [03:04<00:53, 2598.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 470479/608042 [03:04<00:53, 2570.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 470742/608042 [03:05<00:57, 2386.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  77%|███████▋  | 470985/608042 [03:05<00:59, 2314.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 471257/608042 [03:05<00:56, 2421.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 471571/608042 [03:05<00:53, 2529.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 471873/608042 [03:05<00:51, 2663.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 472144/608042 [03:05<00:54, 2489.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 472464/608042 [03:05<00:50, 2665.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 472740/608042 [03:05<00:51, 2626.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 473006/608042 [03:05<00:55, 2417.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 473309/608042 [03:06<00:52, 2568.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 473579/608042 [03:06<00:54, 2475.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 473835/608042 [03:06<00:54, 2451.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 474086/608042 [03:06<00:55, 2408.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 474335/608042 [03:06<00:56, 2383.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 474584/608042 [03:06<00:57, 2319.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 474828/608042 [03:06<00:58, 2286.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 475087/608042 [03:06<00:56, 2363.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 475366/608042 [03:06<00:53, 2474.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 475687/608042 [03:07<00:49, 2676.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 475967/608042 [03:07<00:51, 2544.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 476320/608042 [03:07<00:47, 2792.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 476605/608042 [03:07<00:47, 2748.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 476892/608042 [03:07<00:50, 2601.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  78%|███████▊  | 477155/608042 [03:07<00:52, 2509.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▊  | 477441/608042 [03:07<00:51, 2557.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▊  | 477761/608042 [03:07<00:47, 2728.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▊  | 478047/608042 [03:07<00:47, 2709.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▊  | 478359/608042 [03:08<00:46, 2810.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▊  | 478643/608042 [03:08<00:45, 2819.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 478941/608042 [03:08<00:50, 2546.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 479261/608042 [03:08<00:48, 2658.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 479559/608042 [03:08<00:46, 2745.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 479838/608042 [03:08<00:47, 2674.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 480136/608042 [03:08<00:47, 2692.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 480458/608042 [03:08<00:45, 2829.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 480746/608042 [03:08<00:47, 2700.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 481025/608042 [03:09<00:48, 2598.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 481317/608042 [03:09<00:47, 2671.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 481620/608042 [03:09<00:45, 2757.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 481901/608042 [03:09<00:46, 2733.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 482178/608042 [03:09<00:46, 2682.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 482468/608042 [03:09<00:46, 2689.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 482748/608042 [03:09<00:49, 2533.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 483005/608042 [03:09<00:52, 2383.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  79%|███████▉  | 483263/608042 [03:09<00:51, 2432.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 483513/608042 [03:10<00:51, 2439.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 483797/608042 [03:10<00:49, 2500.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 484098/608042 [03:10<00:47, 2599.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 484380/608042 [03:10<00:46, 2656.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 484672/608042 [03:10<00:45, 2715.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 484966/608042 [03:10<00:52, 2340.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 485222/608042 [03:10<00:51, 2396.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 485534/608042 [03:10<00:47, 2555.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 485814/608042 [03:10<00:48, 2526.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 486117/608042 [03:11<00:48, 2533.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|███████▉  | 486375/608042 [03:11<00:48, 2510.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 486637/608042 [03:11<00:47, 2540.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 486899/608042 [03:11<00:49, 2427.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 487183/608042 2024-08-03T04:45:03.832268751Z [03:11<00:47, 2531.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 487448/608042 [03:11<00:48, 2463.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 487704/608042 [03:11<00:49, 2448.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 487958/608042 [03:11<00:50, 2396.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 488203/608042 [03:11<00:52, 2273.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 488517/608042 [03:12<00:48, 2484.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 488770/608042 [03:12<00:50, 2348.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 489023/608042 [03:12<00:50, 2370.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  80%|████████  | 489372/608042 [03:12<00:46, 2558.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 489639/608042 [03:12<00:46, 2553.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 489935/608042 [03:12<00:45, 2596.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 490225/608042 [03:12<00:44, 2643.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 490505/608042 [03:12<00:44, 2668.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 490787/608042 [03:12<00:44, 2653.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 491053/608042 [03:12<00:44, 2615.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 491333/608042 [03:13<00:44, 2632.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 491598/608042 [03:13<00:44, 2629.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 491901/608042 [03:13<00:43, 2685.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 492171/608042 [03:13<00:45, 2562.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 492435/608042 [03:13<00:46, 2480.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 492685/608042 [03:13<00:46, 2467.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 492933/608042 [03:13<00:47, 2442.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 493193/608042 [03:13<00:49, 2338.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 493477/608042 [03:13<00:46, 2448.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 493757/608042 [03:14<00:45, 2509.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████  | 494020/608042 [03:14<00:45, 2531.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████▏ | 494317/608042 [03:14<00:44, 2582.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████▏ | 494580/608042 [03:14<00:45, 2476.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████▏ | 494844/608042 [03:14<00:45, 2496.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████▏ | 495107/608042 [03:14<00:45, 2507.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  81%|████████▏ | 495361/608042 [03:14<00:47, 2380.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 495635/608042 [03:14<00:45, 2480.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 495903/608042 [03:14<00:44, 2500.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 496165/608042 [03:15<00:44, 2531.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 496524/608042 [03:15<00:39, 2813.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 496815/608042 [03:15<00:42, 2620.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 497085/608042 [03:15<00:43, 2562.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 497396/608042 [03:15<00:40, 2708.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 497735/608042 [03:15<00:38, 2881.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 498040/608042 [03:15<00:42, 2581.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 498342/608042 [03:15<00:40, 2678.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 498628/608042 [03:15<00:40, 2703.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 498909/608042 [03:16<00:41, 2602.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 499220/608042 [03:16<00:40, 2654.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 499491/608042 [03:16<00:40, 2664.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 499800/608042 [03:16<00:39, 2772.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 500083/608042 [03:16<00:39, 2718.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 500405/608042 [03:16<00:37, 2853.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 500725/608042 [03:16<00:36, 2915.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 501024/608042 [03:16<00:40, 2655.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 501313/608042 [03:16<00:39, 2709.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  82%|████████▏ | 501598/608042 [03:17<00:40, 2613.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 501956/608042 [03:17<00:37, 2838.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 502245/608042 [03:17<00:40, 2620.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 502514/608042 [03:17<00:46, 2268.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 502837/608042 [03:17<00:42, 2452.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 503162/608042 [03:17<00:39, 2653.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 503541/608042 [03:17<00:35, 2925.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 503914/608042 [03:17<00:34, 3056.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 504239/608042 [03:17<00:35, 2890.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 504546/608042 [03:18<00:37, 2755.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 504830/608042 [03:18<00:37, 2750.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 505125/608042 [03:18<00:40, 2515.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 505398/608042 [03:18<00:40, 2540.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 505663/608042 [03:18<00:40, 2509.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 506005/608042 [03:18<00:37, 2738.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 506298/608042 [03:18<00:38, 2673.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 506592/608042 [03:18<00:37, 2689.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 506874/608042 [03:19<00:37, 2686.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 507158/608042 [03:19<00:38, 2651.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  83%|████████▎ | 507461/608042 [03:19<00:36, 2742.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▎ | 507745/608042 [03:19<00:38, 2615.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▎ | 508074/608042 [03:19<00:36, 2754.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▎ | 508353/608042 [03:19<00:38, 2598.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▎ | 508643/608042 [03:19<00:37, 2627.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▎ | 508998/608042 [03:19<00:34, 2863.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 509308/608042 [03:19<00:36, 2714.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 509586/608042 [03:20<00:37, 2609.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 509862/608042 [03:20<00:37, 2615.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 510139/608042 [03:20<00:37, 2600.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 510422/608042 [03:20<00:37, 2625.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 510756/608042 [03:20<00:34, 2783.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 511036/608042 [03:20<00:37, 2600.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 511326/608042 [03:20<00:36, 2667.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 511606/608042 [03:20<00:38, 2520.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 511891/608042 [03:20<00:37, 2548.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 512148/608042 [03:21<00:38, 2475.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 512449/608042 [03:21<00:36, 2597.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 512723/608042 [03:21<00:38, 2456.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 512973/608042 [03:21<00:40, 2323.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 513226/608042 [03:21<00:41, 2271.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 513468/608042 [03:21<00:41, 2303.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  84%|████████▍ | 513763/608042 [03:21<00:38, 2423.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 514008/608042 [03:21<00:39, 2410.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 514251/608042 [03:21<00:40, 2294.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 514504/608042 [03:22<00:42, 2195.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 514745/608042 [03:22<00:42, 2206.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 515021/608042 [03:22<00:40, 2318.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 515337/608042 [03:22<00:36, 2525.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 515594/608042 [03:22<00:37, 2491.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 515876/608042 [03:22<00:35, 2569.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 516137/608042 [03:22<00:38, 2386.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 516456/608042 [03:22<00:36, 2525.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▍ | 516772/608042 [03:22<00:33, 2695.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 517052/608042 [03:23<00:34, 2618.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 517322/608042 [03:23<00:34, 2611.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 517596/608042 [03:23<00:36, 2483.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 517884/608042 [03:23<00:34, 2584.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 518165/608042 [03:23<00:37, 2400.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 518421/608042 [03:23<00:36, 2442.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 518674/608042 [03:23<00:40, 2233.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 518940/608042 [03:23<00:38, 2339.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 519230/608042 [03:23<00:36, 2454.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 519499/608042 [03:24<00:35, 2501.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  85%|████████▌ | 519766/608042 [03:24<00:37, 2324.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 5202024-08-03T04:45:03.832268751Z 025/608042 [03:24<00:38, 2290.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 520305/608042 [03:24<00:36, 2417.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 520708/608042 [03:24<00:30, 2826.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 521003/608042 [03:24<00:33, 2621.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 521376/608042 [03:24<00:29, 2905.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 521675/608042 [03:24<00:31, 2714.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 522055/608042 [03:24<00:29, 2936.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 522358/608042 [03:25<00:30, 2834.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 522734/608042 [03:25<00:27, 3054.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 523047/608042 [03:25<00:30, 2747.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 523340/608042 [03:25<00:32, 2634.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 523612/608042 [03:25<00:34, 2482.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 523948/608042 [03:25<00:31, 2662.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▌ | 524225/608042 [03:25<00:31, 2658.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▋ | 524495/608042 [03:25<00:31, 2665.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▋ | 524808/608042 [03:26<00:29, 2789.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▋ | 525101/608042 [03:26<00:31, 2646.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▋ | 525372/608042 [03:26<00:31, 2593.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▋ | 525646/608042 [03:26<00:33, 2473.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  86%|████████▋ | 525944/608042 [03:26<00:31, 2591.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 526243/608042 [03:26<00:30, 2671.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 526517/608042 [03:26<00:31, 2559.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 526776/608042 [03:26<00:32, 2530.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 527041/608042 [03:26<00:33, 2422.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 527289/608042 [03:27<00:34, 2323.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 527545/608042 [03:27<00:36, 2181.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 527860/608042 [03:27<00:34, 2342.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 528117/608042 [03:27<00:33, 2396.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 528452/608042 [03:27<00:30, 2643.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 528721/608042 [03:27<00:30, 2607.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 528989/608042 [03:27<00:30, 2613.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 529254/608042 [03:27<00:30, 2587.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 529515/608042 [03:27<00:35, 2213.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 529790/608042 [03:28<00:33, 2351.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 530064/608042 [03:28<00:31, 2451.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 530326/608042 [03:28<00:32, 2385.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 530570/608042 [03:28<00:32, 2389.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 530864/608042 [03:28<00:30, 2532.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 531196/608042 [03:28<00:28, 2718.50 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 531471/608042 [03:28<00:29, 2577.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  87%|████████▋ | 531786/608042 [03:28<00:28, 2711.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 532109/608042 [03:28<00:26, 2837.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 532398/608042 [03:29<00:27, 2759.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 532679/608042 [03:29<00:28, 2682.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 532963/608042 [03:29<00:28, 2661.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 533256/608042 [03:29<00:27, 2690.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 533675/608042 [03:29<00:25, 2958.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 533970/608042 [03:29<00:25, 2895.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 534262/608042 [03:29<00:26, 2835.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 534567/608042 [03:29<00:29, 2463.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 534829/608042 [03:29<00:30, 2431.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 535087/608042 [03:30<00:30, 2402.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 535356/608042 [03:30<00:32, 2261.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 535639/608042 [03:30<00:30, 2405.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 535885/608042 [03:30<00:30, 2382.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 536158/608042 [03:30<00:29, 2414.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 536449/608042 [03:30<00:28, 2519.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 536705/608042 [03:30<00:28, 2479.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 537014/608042 [03:30<00:26, 2649.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 537284/608042 [03:30<00:29, 2406.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 537535/608042 [03:31<00:29, 2383.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 537805/608042 [03:31<00:28, 2468.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  88%|████████▊ | 538096/608042 [03:31<00:27, 2582.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▊ | 538364/608042 [03:31<00:27, 2563.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▊ | 538627/608042 [03:31<00:28, 2477.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▊ | 538880/608042 [03:31<00:27, 2483.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▊ | 539175/608042 [03:31<00:26, 2614.74 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▊ | 539446/608042 [03:31<00:26, 2578.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 539711/608042 [03:31<00:26, 2589.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 539983/608042 [03:32<00:26, 2584.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 540246/608042 [03:32<00:28, 2358.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 540488/608042 [03:32<00:29, 2318.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 540727/608042 [03:32<00:29, 2288.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 541032/608042 [03:32<00:27, 2475.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 541424/608042 [03:32<00:23, 2820.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 541804/608042 [03:32<00:22, 2978.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 542104/608042 [03:32<00:22, 2935.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 542403/608042 [03:32<00:22, 2935.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 542705/608042 [03:33<00:23, 2797.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 542991/608042 [03:33<00:23, 2750.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 543279/608042 [03:33<00:23, 2777.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 543644/608042 [03:33<00:22, 2904.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  89%|████████▉ | 543936/608042 [03:33<00:22, 2839.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 544223/608042 [03:33<00:23, 2669.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 544528/608042 [03:33<00:23, 2750.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 544830/608042 [03:33<00:22, 2788.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 545118/608042 [03:33<00:22, 2745.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 545397/608042 [03:34<00:24, 2546.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 545665/608042 [03:34<00:26, 2361.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 545921/608042 [03:34<00:27, 2292.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 546177/608042 [03:34<00:26, 2338.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 546418/608042 [03:34<00:27, 2239.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 546657/608042 [03:34<00:27, 2255.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 546967/608042 [03:34<00:24, 2471.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|████████▉ | 547217/608042 [03:34<00:26, 2338.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 547460/608042 [03:34<00:28, 2139.03 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 547771/608042 [03:35<00:25, 2367.20 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 548063/608042 [03:35<00:23, 2508.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 548472/608042 [03:35<00:20, 2870.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 548810/608042 [03:35<00:20, 2942.30 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 549155/608042 [03:35<00:19, 3068.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 549465/608042 [03:35<00:19, 2992.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 549780/608042 [03:35<00:20, 2890.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  90%|█████████ | 550086/608042 [03:35<00:20, 2834.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 550390/608042 [03:35<00:20, 2854.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 550738/608042 [03:36<00:18, 3017.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 551053/608042 [03:36<00:20, 2771.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 551343/608042 [03:36<00:20, 2718.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 551623/608042 [03:36<00:21, 2634.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 551898/608042 [03:36<00:21, 2653.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 552191/608042 [03:36<00:21, 2654.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 552464/608042 [03:36<00:23, 2368.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 552708/608042 [03:36<00:23, 2321.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 553023/608042 [03:36<00:21, 2517.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 553281/608042 [03:37<00:21, 2530.06 examples/s]
Tokenizing and reformatting instructi2024-08-03T04:45:03.832268751Z on data (num_proc=16):  91%|█████████ | 553586/608042 [03:37<00:21, 2585.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 553870/608042 [03:37<00:20, 2592.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 554181/608042 [03:37<00:19, 2720.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 554514/608042 [03:37<00:18, 2867.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████ | 554806/608042 [03:37<00:19, 2698.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████▏| 555102/608042 [03:37<00:19, 2765.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████▏| 555397/608042 [03:37<00:19, 2749.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████▏| 555679/608042 [03:37<00:21, 2468.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████▏| 555932/608042 [03:38<00:23, 2220.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  91%|█████████▏| 556198/608042 [03:38<00:22, 2283.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 556441/608042 [03:38<00:22, 2310.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 556764/608042 [03:38<00:20, 2557.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 557036/608042 [03:38<00:20, 2480.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 557292/608042 [03:38<00:20, 2423.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 557548/608042 [03:38<00:21, 2361.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 557807/608042 [03:38<00:20, 2399.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 558101/608042 [03:38<00:19, 2534.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 558393/608042 [03:39<00:18, 2638.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 558666/608042 [03:39<00:20, 2352.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 559030/608042 [03:39<00:18, 2679.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 559310/608042 [03:39<00:19, 2454.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 559582/608042 [03:39<00:20, 2348.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 559938/608042 [03:39<00:18, 2558.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 560284/608042 [03:39<00:17, 2740.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 560569/608042 [03:39<00:17, 2664.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 560848/608042 [03:40<00:17, 2695.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 561151/608042 [03:40<00:16, 2787.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 561443/608042 [03:40<00:16, 2790.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 561731/608042 [03:40<00:17, 2640.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 562000/608042 [03:40<00:19, 2395.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  92%|█████████▏| 562261/608042 [03:40<00:18, 2444.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 562550/608042 [03:40<00:17, 2560.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 562819/608042 [03:40<00:18, 2475.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 563073/608042 [03:40<00:18, 2439.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 563327/608042 [03:41<00:18, 2430.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 563576/608042 [03:41<00:19, 2330.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 563895/608042 [03:41<00:17, 2522.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 564149/608042 [03:41<00:17, 2521.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 564413/608042 [03:41<00:17, 2527.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 564717/608042 [03:41<00:16, 2659.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 564990/608042 [03:41<00:16, 2586.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 565307/608042 [03:41<00:15, 2722.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 565625/608042 [03:41<00:14, 2852.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 565914/608042 [03:41<00:15, 2769.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 566194/608042 [03:42<00:15, 2647.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 566472/608042 [03:42<00:15, 2619.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 566755/608042 [03:42<00:16, 2501.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 567010/608042 [03:42<00:16, 2485.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 567264/608042 [03:42<00:17, 2346.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 567550/608042 [03:42<00:16, 2476.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 567819/608042 [03:42<00:16, 2403.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 568066/608042 [03:42<00:16, 2420.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  93%|█████████▎| 568350/608042 [03:43<00:15, 2497.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▎| 568618/608042 [03:43<00:16, 2434.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▎| 568886/608042 [03:43<00:15, 2481.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▎| 569151/608042 [03:43<00:15, 2517.70 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▎| 569415/608042 [03:43<00:15, 2447.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▎| 569744/608042 [03:43<00:14, 2600.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▎| 570010/608042 [03:43<00:14, 2608.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 570310/608042 [03:43<00:13, 2697.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 570583/608042 [03:43<00:15, 2455.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 570846/608042 [03:44<00:15, 2393.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 571124/608042 [03:44<00:15, 2451.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 571470/608042 [03:44<00:13, 2707.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 571751/608042 [03:44<00:14, 2495.45 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 572027/608042 [03:44<00:14, 2535.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 572354/608042 [03:44<00:13, 2612.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 572679/608042 [03:44<00:13, 2719.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 572962/608042 [03:44<00:13, 2685.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 573238/608042 [03:44<00:13, 2658.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 573518/608042 [03:45<00:13, 2580.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 573843/608042 [03:45<00:12, 2730.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 574120/608042 [03:45<00:13, 2606.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  94%|█████████▍| 574405/608042 [03:45<00:13, 2461.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 574711/608042 [03:45<00:12, 2619.86 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 574994/608042 [03:45<00:12, 2659.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 575312/608042 [03:45<00:11, 2754.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 575605/608042 [03:45<00:13, 2482.64 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 575942/608042 [03:45<00:12, 2673.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 576334/608042 [03:46<00:10, 2974.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 576649/608042 [03:46<00:12, 2530.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 576947/608042 [03:46<00:11, 2604.63 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 577224/608042 [03:46<00:12, 2548.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▍| 577595/608042 [03:46<00:10, 2824.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 577895/608042 [03:46<00:10, 2755.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 578180/608042 [03:46<00:11, 2646.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 578456/608042 [03:46<00:11, 2660.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 578731/608042 [03:46<00:11, 2635.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 579058/608042 [03:47<00:10, 2780.24 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 579353/608042 [03:47<00:10, 2787.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 579649/608042 [03:47<00:10, 2705.76 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 579982/608042 [03:47<00:09, 2860.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 580274/608042 [03:47<00:10, 2775.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  95%|█████████▌| 580570/608042 [03:47<00:09, 2825.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 580861/608042 [03:47<00:09, 2740.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 581138/608042 [03:47<00:09, 2744.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 581464/608042 [03:47<00:09, 2863.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 581755/608042 [03:48<00:09, 2673.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 582074/608042 [03:48<00:09, 2633.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 582356/608042 [03:48<00:09, 2581.35 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 582623/608042 [03:48<00:10, 2489.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 582882/608042 [03:48<00:10, 2411.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 583166/608042 [03:48<00:09, 2521.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 583487/608042 [03:48<00:09, 2705.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 583767/608042 [03:48<00:09, 2592.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 584036/608042 [03:48<00:09, 2506.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 584302/608042 [03:49<00:09, 2494.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 584560/608042 [03:49<00:09, 2470.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 584826/608042 [03:49<00:09, 2434.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▌| 585083/608042 [03:49<00:09, 2464.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▋| 585350/608042 [03:49<00:10, 2185.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▋| 585743/608042 [03:49<00:08, 2568.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▋| 586070/608042 [03:49<00:08, 2725.42 examples/s]
Tokenizing and reformatting instruction data (2024-08-03T04:45:03.832268751Z num_proc=16):  96%|█████████▋| 586366/608042 [03:49<00:08, 2601.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  96%|█████████▋| 586635/608042 [03:50<00:08, 2383.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 586886/608042 [03:50<00:09, 2313.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 587262/608042 [03:50<00:07, 2685.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 587604/608042 [03:50<00:07, 2859.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 587902/608042 [03:50<00:07, 2856.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 588194/608042 [03:50<00:07, 2632.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 588472/608042 [03:50<00:07, 2529.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 588742/608042 [03:50<00:07, 2571.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 589046/608042 [03:50<00:07, 2645.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 589324/608042 [03:51<00:07, 2669.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 589597/608042 [03:51<00:07, 2516.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 589891/608042 [03:51<00:06, 2614.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 590159/608042 [03:51<00:06, 2564.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 590442/608042 [03:51<00:06, 2626.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 590745/608042 [03:51<00:06, 2725.49 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 591031/608042 [03:51<00:06, 2634.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 591345/608042 [03:51<00:06, 2766.18 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 591625/608042 [03:51<00:06, 2594.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 591936/608042 [03:52<00:05, 2711.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 592216/608042 [03:52<00:06, 2531.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 592474/608042 [03:52<00:06, 2345.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  97%|█████████▋| 592727/608042 [03:52<00:06, 2332.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 592971/608042 [03:52<00:06, 2248.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 593226/608042 [03:52<00:06, 2328.00 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 593466/608042 [03:52<00:06, 2209.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 593759/608042 [03:52<00:05, 2402.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 594012/608042 [03:52<00:06, 2290.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 594256/608042 [03:53<00:06, 2194.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 594505/608042 [03:53<00:05, 2268.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 594769/608042 [03:53<00:05, 2225.57 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 594994/608042 [03:53<00:06, 2151.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 595219/608042 [03:53<00:06, 2035.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 595543/608042 [03:53<00:05, 2348.31 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 595783/608042 [03:53<00:05, 2311.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 596055/608042 [03:53<00:05, 2357.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 596311/608042 [03:53<00:05, 2297.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 596557/608042 [03:54<00:04, 2315.69 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 596814/608042 [03:54<00:04, 2331.77 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 597065/608042 [03:54<00:04, 2232.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 597362/608042 [03:54<00:04, 2395.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 597649/608042 [03:54<00:04, 2521.09 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 597926/608042 [03:54<00:04, 2184.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 598155/608042 [03:54<00:04, 2195.27 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 598388/608042 [03:54<00:04, 2108.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 598656/608042 [03:55<00:04, 2206.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  98%|█████████▊| 598899/608042 [03:55<00:04, 2006.25 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▊| 599126/608042 [03:55<00:04, 1886.46 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▊| 599341/608042 [03:55<00:04, 1897.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▊| 599542/608042 [03:55<00:04, 1830.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▊| 599728/608042 [03:55<00:04, 1673.73 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▊| 599912/608042 [03:55<00:04, 1706.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▊| 600116/608042 [03:55<00:04, 1677.61 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▊| 600286/608042 [03:56<00:04, 1554.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 600450/608042 [03:56<00:05, 1514.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 600603/608042 [03:56<00:05, 1477.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 600824/608042 [03:56<00:04, 1626.51 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 601012/608042 [03:56<00:04, 1452.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 601181/608042 [03:56<00:04, 1423.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 601340/608042 [03:56<00:04, 1397.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 601486/608042 [03:56<00:05, 1261.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 601620/608042 [03:57<00:05, 1193.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 601747/608042 [03:57<00:05, 1177.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 601910/608042 [03:57<00:04, 1283.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 602064/608042 [03:57<00:04, 1312.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 602225/608042 [03:57<00:04, 1291.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 602437/608042 [03:57<00:03, 1478.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 602592/608042 [03:57<00:04, 1307.53 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 602730/608042 [03:57<00:04, 1209.06 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 602859/608042 [03:58<00:04, 1137.41 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603000/608042 [03:58<00:04, 1181.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603163/608042 [03:58<00:03, 1282.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603309/608042 [03:58<00:03, 1316.54 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603446/608042 [03:58<00:03, 1178.05 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603591/608042 [03:58<00:03, 1242.95 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603723/608042 [03:58<00:04, 1055.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603843/608042 [03:58<00:04, 915.97 examples/s] 
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 603982/608042 [03:59<00:04, 976.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604086/608042 [03:59<00:04, 979.84 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604206/608042 [03:59<00:04, 851.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604321/608042 [03:59<00:04, 886.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604422/608042 [03:59<00:04, 724.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604503/608042 [03:59<00:05, 632.37 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604579/608042 [03:59<00:05, 656.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604657/608042 [04:00<00:05, 595.19 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604723/608042 [04:00<00:06, 517.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604780/608042 [04:00<00:06, 523.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604840/608042 [04:00<00:07, 409.22 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604935/608042 [04:00<00:06, 498.21 examples/s]
Tokenizing and reformatting instruction data (num_proc=16):  99%|█████████▉| 604996/608042 [04:00<00:05, 519.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605059/608042 [04:00<00:05, 538.92 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605142/608042 [04:01<00:05, 538.55 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605200/608042 [04:01<00:05, 530.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605280/608042 [04:01<00:04, 554.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605339/608042 [04:01<00:04, 551.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605414/608042 [04:01<00:04, 536.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605493/608042 [04:01<00:04, 549.98 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605583/608042 [04:01<00:04, 611.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605661/608042 [04:02<00:03, 614.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605729/608042 [04:02<00:04, 475.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605787/608042 [04:02<00:06, 348.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605842/608042 [04:02<00:07, 310.66 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605897/608042 [04:02<00:06, 326.96 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605940/608042 [04:03<00:09, 216.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 605977/608042 [04:03<00:08, 237.99 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606015/608042 [04:03<00:09, 210.39 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606051/608042 [04:04<00:12, 156.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606078/608042 [04:04<00:11, 170.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606112/608042 [04:04<00:09, 196.08 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606160/608042 [04:04<00:09, 199.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606190/608042 [04:04<00:09, 198.44 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606220/608042 [04:04<00:10, 172.15 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606243/608042 [04:05<00:10, 174.56 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████�2024-08-03T04:45:03.832268751Z ��███▉| 606290/608042 [04:05<00:09, 177.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606309/608042 [04:05<00:10, 171.04 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606332/608042 [04:05<00:13, 125.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606347/608042 [04:05<00:13, 121.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606368/608042 [04:06<00:13, 125.48 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606384/608042 [04:06<00:14, 117.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606397/608042 [04:06<00:18, 87.20 examples/s] 
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606407/608042 [04:06<00:19, 85.68 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606438/608042 [04:06<00:12, 123.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606457/608042 [04:06<00:12, 122.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606491/608042 [04:07<00:09, 162.83 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606513/608042 [04:07<00:11, 133.36 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606533/608042 [04:07<00:13, 112.97 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606556/608042 [04:07<00:12, 121.10 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606595/608042 [04:07<00:09, 157.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606630/608042 [04:08<00:08, 169.75 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606657/608042 [04:08<00:07, 173.32 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606679/608042 [04:08<00:09, 140.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606709/608042 [04:08<00:07, 169.47 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606740/608042 [04:08<00:09, 132.82 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606773/608042 [04:09<00:08, 154.14 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606805/608042 [04:09<00:06, 177.72 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606832/608042 [04:09<00:07, 159.90 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606862/608042 [04:09<00:06, 177.26 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606903/608042 [04:09<00:07, 142.38 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606933/608042 [04:10<00:07, 152.12 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606953/608042 [04:10<00:08, 128.81 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 606989/608042 [04:10<00:07, 148.80 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607017/608042 [04:10<00:06, 150.16 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607039/608042 [04:10<00:07, 133.42 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607058/608042 [04:10<00:07, 137.67 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607078/608042 [04:11<00:07, 125.29 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607120/608042 [04:11<00:06, 133.28 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607136/608042 [04:11<00:09, 99.69 examples/s] 
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607150/608042 [04:11<00:09, 96.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607168/608042 [04:12<00:08, 103.89 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607197/608042 [04:12<00:08, 105.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607217/608042 [04:12<00:07, 113.17 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607239/608042 [04:12<00:06, 124.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607264/608042 [04:12<00:06, 127.78 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607285/608042 [04:12<00:05, 134.59 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607327/608042 [04:13<00:03, 185.87 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607354/608042 [04:13<00:03, 188.88 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607399/608042 [04:13<00:02, 243.62 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607434/608042 [04:13<00:03, 194.60 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607465/608042 [04:13<00:03, 190.01 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607496/608042 [04:14<00:03, 167.58 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607516/608042 [04:14<00:03, 152.23 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607545/608042 [04:14<00:02, 167.33 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607575/608042 [04:14<00:02, 168.07 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607607/608042 [04:14<00:02, 155.94 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607666/608042 [04:14<00:01, 233.71 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607710/608042 [04:14<00:01, 242.43 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607745/608042 [04:15<00:01, 193.85 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607775/608042 [04:15<00:01, 206.79 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607815/608042 [04:15<00:00, 243.34 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607865/608042 [04:15<00:00, 220.93 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607902/608042 [04:15<00:00, 225.40 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607929/608042 [04:16<00:00, 198.02 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607955/608042 [04:16<00:00, 199.11 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 607992/608042 [04:16<00:00, 185.91 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 608013/608042 [04:16<00:00, 150.13 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|█████████▉| 608039/608042 [04:16<00:00, 164.52 examples/s]
Tokenizing and reformatting instruction data (num_proc=16): 100%|██████████| 608042/608042 [04:16<00:00, 2366.67 examples/s]
2024-08-03T04:49:25.890481407Z 
Filter:   0%|          | 0/608042 [00:00<?, ? examples/s]
Filter:   0%|          | 3000/608042 [00:00<00:29, 20242.42 examples/s]
Filter:   1%|          | 7000/608042 [00:00<00:23, 25641.98 examples/s]
Filter:   2%|▏         | 11000/608042 [00:00<00:21, 28059.87 examples/s]
Filter:   2%|▏         | 15000/608042 [00:00<00:20, 28663.83 examples/s]
Filter:   3%|▎         | 19000/608042 [00:00<00:19, 29738.17 examples/s]
Filter:   4%|▍         | 23000/608042 [00:00<00:19, 29613.79 examples/s]
Filter:   4%|▍         | 27000/608042 [00:00<00:19, 30259.58 examples/s]
Filter:   5%|▌         | 31000/608042 [00:01<00:18, 30819.18 examples/s]
Filter:   6%|▌         | 35000/608042 [00:01<00:18, 30994.43 examples/s]
Filter:   6%|▋         | 39000/608042 [00:01<00:18, 31420.14 examples/s]
Filter:   7%|▋         | 43000/608042 [00:01<00:18, 31111.19 examples/s]
Filter:   8%|▊         | 47000/608042 [00:01<00:17, 31375.08 examples/s]
Filter:   8%|▊         | 51000/608042 [00:01<00:17, 31475.89 examples/s]
Filter:   9%|▉         | 55000/608042 [00:01<00:17, 31663.37 examples/s]
Filter:  10%|▉         | 59000/608042 [00:01<00:17, 31612.59 examples/s]
Filter:  10%|█         | 63000/608042 [00:02<00:17, 31625.69 examples/s]
Filter:  11%|█         | 67000/608042 [00:02<00:17, 31723.61 examples/s]
Filter:  12%|█▏        | 71000/608042 [00:02<00:17, 31311.29 examples/s]
Filter:  12%|█▏        | 75000/608042 [00:02<00:16, 31475.37 examples/s]
Filter:  13%|█▎        | 79000/608042 [00:02<00:16, 31563.42 examples/s]
Filter:  14%|█▎        | 83000/608042 [00:02<00:16, 31832.45 examples/s]
Filter:  14%|█▍        | 87000/608042 [00:02<00:16, 31997.30 examples/s]
Filter:  15%|█▍        | 91000/608042 [00:02<00:16, 32171.49 examples/s]
Filter:  16%|█▌        | 95000/608042 [00:03<00:15, 32316.61 examples/s]
Filter:  16%|█▋        | 99000/608042 [00:03<00:15, 32289.00 examples/s]
Filter:  17%|█▋        | 103000/608042 [00:03<00:15, 32376.93 examples/s]
Filter:  18%|█▊        | 107000/608042 [00:03<00:15, 32421.43 examples/s]
Filter:  18%|█▊        | 111000/608042 [00:03<00:15, 32433.29 examples/s]
Filter:  19%|█▉        | 115000/608042 [00:03<00:15, 32408.29 examples/s]
Filter:  20%|█▉        | 119000/608042 [00:03<00:15, 32361.06 examples/s]
Filter:  20%|██        | 123000/608042 [00:03<00:14, 32385.25 examples/s]
Filter:  21%|██        | 127000/608042 [00:04<00:14, 32267.68 examples/s]
Filter:  22%|██▏       | 131000/608042 [00:04<00:14, 32361.71 examples/s]
Filter:  22%|██▏       | 135000/608042 [00:04<00:14, 32405.49 examples/s]
Filter:  23%|██▎       | 139000/608042 [00:04<00:14, 32408.36 examples/s]
Filter:  24%|██▎       | 143000/608042 [00:04<00:14, 32448.43 examples/s]
Filter:  24%|██▍       | 147000/608042 [00:04<00:14, 32380.16 examples/s]
Filter:  25%|██▍       | 151000/608042 [00:04<00:14, 32408.39 examples/s]
Filter:  25%|██▌       | 155000/608042 [00:04<00:14, 32280.05 examples/s]
Filter:  26%|██▌       | 159000/608042 [00:05<00:13, 32087.36 examples/s]
Filter:  27%|██▋       | 163000/608042 [00:05<00:13, 31988.67 examples/s]
Filter:  27%|██▋       | 167000/608042 [00:05<00:13, 31984.95 examples/s]
Filter:  28%|██▊       | 171000/608042 [00:05<00:13, 31939.82 examples/s]
Filter:  29%|██▉       | 175000/608042 [00:05<00:13, 31955.93 examples/s]
Filter:  29%|██▉       | 179000/608042 [00:05<00:13, 31920.88 examples/s]
Filter:  30%|███       | 183000/608042 [00:05<00:13, 31930.38 examples/s]
Filter:  31%|███       | 187000/608042 [00:05<00:13, 31944.76 examples/s]
Filter:  31%|███▏      | 191000/608042 [00:06<00:13, 31944.93 examples/s]
Filter:  32%|███▏      | 195000/608042 [00:06<00:12, 32064.40 examples/s]
Filter:  33%|███▎      | 199000/608042 [00:06<00:12, 32096.35 examples/s]
Filter:  33%|███▎      | 203000/608042 [00:06<00:12, 32164.67 examples/s]
Filter:  34%|███▍      | 208000/608042 [00:06<00:23, 17385.91 examples/s]
Filter:  35%|███▍      | 212000/608042 [00:07<00:19, 19962.91 examples/s]
Filter:  36%|███▌      | 216000/608042 [00:07<00:17, 22424.86 examples/s]
Filter:  36%|███▌      | 220000/608042 [00:07<00:15, 24679.88 examples/s]
Filter:  37%|███▋      | 224000/608042 [00:07<00:14, 26594.18 examples/s]
Filter:  37%|███▋      | 228000/608042 [00:07<00:13, 28115.10 examples/s]
Filter:  38%|███▊      | 232000/608042 [00:07<00:12, 29173.68 examples/s]
Filter:  39%|███▉      | 236000/608042 [00:07<00:12, 29981.19 examples/s]
Filter:  39%|███▉      | 240000/608042 [00:07<00:12, 30541.43 examples/s]
Filter:  40%|████      | 244000/608042 [00:08<00:11, 30904.82 examples/s]
Filter:  41%|████      | 248000/608042 [00:08<00:11, 31282.77 examples/s]
Filter:  41%|████▏     | 252000/608042 [00:08<00:11, 31467.15 examples/s]
Filter:  42%|████▏     | 256000/608042 [00:08<00:11, 31558.63 examples/s]
Filter:  43%|████▎     | 260000/608042 [00:08<00:10, 31670.82 examples/s]
Filter:  43%|████▎     | 264000/608042 [00:08<00:10, 31717.07 examples/s]
Filter:  44%|████▍     | 268000/608042 [00:08<00:10, 31810.31 examples/s]
Filter:  45%|████▍     | 272000/608042 [00:08<00:10, 31787.32 examples/s]
Filter:  45%|████▌     | 276000/608042 [00:09<00:10, 31812.32 examples/s]
Filter:  46%|████▌     | 280000/608042 [00:09<00:10, 31874.18 examples/s]
Filter:  47%|████▋     | 284000/608042 [00:09<00:10, 31912.46 examples/s]
Filter:  47%|████▋     | 288000/608042 [00:09<00:10, 31779.79 examples/s]
Filter:  48%|████▊     | 292000/608042 [00:09<00:09, 31745.93 examples/s]
Filter:  49%|████▊     | 296000/608042 [00:09<00:09, 31706.59 examples/s]
Filter:  49%|████▉     | 300000/608042 [00:09<00:09, 31668.43 examples/s]
Filter:  50%|████▉     | 304000/608042 [00:09<00:09, 31677.93 examples/s]
Filter:  51%|█████     | 308000/608042 [00:10<00:09, 31602.11 examples/s]
Filter:  51%|█████▏    | 312000/608042 [00:10<00:09, 31189.71 examples/s]
Filter:  52%|█████▏    | 316000/608042 [00:10<00:09, 31303.14 examples/s]
Filter:  53%|█████▎    | 320000/608042 [00:10<00:09, 31411.42 examples/s]
Filter:  53%|█████▎    | 324000/608042 [00:10<00:09, 31286.83 examples/s]
Filter:  54%|█████▍    | 328000/608042 [00:10<00:08, 31415.74 examples/s]
Filter:  55%|█████▍    | 332000/608042 [00:10<00:08, 31463.56 examples/s]
Filter:  55%|█████▌    | 336000/608042 [00:10<00:08, 31221.12 examples/s]
Filter:  56%|█████▌    | 340000/608042 [00:11<00:08, 31465.78 examples/s]
Filter:  57%|█████▋    | 344000/608042 [00:11<00:08, 31577.06 examples/s]
Filter:  57%|█████▋    | 348000/608042 [00:11<00:08, 31641.92 examples/s]
Filter:  58%|█████▊    | 352000/608042 [00:11<00:08, 31340.12 examples/s]
Filter:  59%|█████▊    | 356000/608042 [00:11<00:08, 31467.94 examples/s]
Filter:  59%|█████▉    | 360000/608042 [00:11<00:07, 31495.67 examples/s]
Filter:  60%|█████▉    | 364000/608042 [00:11<00:07, 31636.84 examples/s]
Filter:  61%|██████    | 368000/608042 [00:11<00:07, 31708.17 examples/s]
Filter:  61%|██████    | 372000/608042 [00:12<00:07, 31657.32 examples/s]
Filter:  62%|██████▏   | 376000/608042 [00:12<00:07, 31744.43 examples/s]
Filter:  62%|██████▏   | 380000/608042 [00:12<00:07, 31673.95 examples/s]
Filter:  63%|██████▎   | 384000/608042 [00:12<00:07, 31909.95 examples/s]
Filter:  64%|██████▍   | 388000/608042 [00:12<00:06, 32133.33 examples/s]
Filter:  64%|██████▍   | 392000/608042 [00:12<00:06, 32231.11 examples/s]
Filter:  65%|██████▌   | 396000/608042 [00:12<00:06, 31956.40 examples/s]
Filter:  66%|██████▌   | 400000/608042 [00:12<00:06, 32033.09 examples/s]
Filter:  66%|██████▋   | 404000/608042 [00:13<00:06, 32161.18 examples/s]
Filter:  67%|██████▋   | 408000/608042 [00:13<00:06, 32294.91 examples/s]
Filter:  68%|██████▊   | 412000/608042 [00:13<00:06, 32364.20 examples/s]
Filter:  69%|██████▊   | 417000/608042 [00:13<00:08, 22231.77 examples/s]
Filter:  69%|██████▉   | 421000/608042 [00:13<00:07, 24408.25 examples/s]
Filter:  70%|██████▉   | 425000/608042 [00:13<00:06, 26185.13 examples/s]
Filter:  71%|███████   | 429000/608042 [00:14<00:06, 27706.82 examples/s]
Filter:  71%|███████   | 433000/608042 [00:14<00:06, 28999.18 examples/s]
Filter:  72%|███████▏  | 437000/608042 [00:14<00:05, 29988.32 examples/s]
Filter:  73%|███████▎  | 441000/608042 [00:14<00:05, 30665.25 examples/s]
Filter:  73%|███████▎  | 445000/608042 [00:14<00:05, 30846.59 examples/s]
Filter:  74%|███████▍  | 449000/608042 [00:14<00:05, 31281.29 examples/s]
Filter:  75%|███████▍  | 453000/608042 [00:14<00:04, 31666.98 examples/s]
Filter:  75%|███████▌  | 457000/608042 [00:14<00:04, 31865.92 examples/s]
Filter:  76%|███████▌  | 461000/608042 [00:15<00:04, 32066.66 examples/s]
Filter:  76%|███████▋  | 465000/608042 [00:15<00:04, 32029.04 examples/s]
Filter:  77%|███████▋  | 469000/608042 [00:15<00:04, 32189.97 examples/s]
Filter:  78%|███████▊  | 473000/608042 [00:15<00:04, 32342.94 examples/s]
Filter:  78%|███████▊  | 477000/608042 [00:15<00:04, 32439.46 examples/s]
Filter:  79%|███████▉  | 481000/608042 [00:15<00:03, 32402.25 examples/s]
Filter:  80%|███████▉  | 485000/608042 [00:15<00:03, 32445.33 examples/s]
Filter:  80%|████████  | 489000/608042 [00:15<00:03, 32361.32 examples/s]
Filter:  81%|████████  | 493000/608042 [00:16<00:03, 32320.52 examples/s]
Filter:  82%|████████▏ | 497000/608042 [00:16<00:03, 32240.45 examples/s]
Filter:  82%|████████▏ | 501000/608042 [00:16<00:03, 31918.79 examples/s]
Filter:  83%|████████▎ | 505000/608042 [00:16<00:03, 31844.15 examples/s]
Filter:  84%|████████▎ | 509000/608042 [00:16<00:03, 31857.84 examples/s]
Filter:  84%|████████▍ | 513000/608042 [00:16<00:03, 31515.86 examples/s]
Filter:  85%|████████▌ | 517000/608042 [00:16<00:02, 31642.23 examples/s]
Filter:  86%|████████▌ | 521000/608042 [00:16<00:02, 31672.98 examples/s]
Filter:  86%|████████▋ | 525000/608042 [00:17<00:02, 31743.41 examples/s]
Filter:  87%|████████▋ | 529000/608042 [00:17<00:02, 31754.31 examples/s]
Filter:  88%|████████▊ | 533000/608042 [00:17<00:02, 31856.65 examples/s]
Filter:  88%|████████▊ | 537000/608042 [00:17<00:02, 31977.95 examples/s]
Filter:  89%|████████▉ | 541000/608042 [00:17<00:02, 32185.34 examples/s]
Filter:  90%|████████▉ | 545000/608042 [00:17<00:01, 32284.61 examples/s]
Filter:  90%|█████████ | 549000/608042 [00:17<00:01, 32380.11 examples/s]
Filter:  91%|█████████ | 553000/608042 [00:17<00:01, 32364.73 examples/s]
Filter:  92%|█████████▏| 557000/608042 [00:18<00:01, 32204.07 examples/s]
Filter:  92%|█████████▏| 561000/608042 [00:18<00:01, 32357.66 examples/s]
Filter:  93%|█████████▎| 565000/608042 [00:18<00:01, 32280.46 examples/s]
Filter:  94%|█████████▎| 569000/608042 [00:18<00:01, 32344.55 examples/s]
Filter:  94%|█████████▍| 573000/608042 [00:18<00:01, 32287.19 examples/s]
Filter:  95%|█████████▍| 577000/608042 [00:18<00:00, 32201.79 examples/s]
Filter:  96%|█████████▌| 581000/608042 [00:18<00:00, 32329.52 examples/s]
Filter:  96%|█████████▌| 585000/608042 [00:18<00:00, 32421.46 examples/s]
Filter:  97%|█████████▋| 589000/608042 [00:19<00:00, 32452.44 examples/s]
Filter:  98%|█████████▊| 593000/608042 [00:19<00:00, 32042.89 examples/s]
Filter:  98%|█████████▊| 597000/608042 [00:19<00:00, 32218.45 examples/s]
Filter:  99%|█████████▉| 601000/608042 [00:19<00:00, 32188.45 examples/s]
Filter:  99%|█████████▉| 605000/608042 [00:19<00:00, 32231.89 examples/s]
Filter: 100%|██████████| 608042/608042 [00:19<00:00, 30923.52 examples/s]
2024-08-03T04:49:25.895413512Z 08/02/2024 21:49:25 - INFO - __main__ - Sample 116739 of the training set: {'input_ids': tensor([50279,    29,    93,  ...,  1083,    15, 50279]), 'labels': tensor([ -100,  -100,  -100,  ...,  1083,    15, 50279]), 'attention_mask': tensor([1, 1, 1,  ..., 1, 1, 1])}.
2024-08-03T04:49:25.897421340Z 08/02/2024 21:49:25 - INFO - __main__ - Sample 26225 of the training set: {'input_ids': tensor([50279,    29,    93,  4537, 49651,   187,   688,   436,  4836,    13,
2024-08-03T04:49:25.897423273Z           368,   403,  1677,   767, 25491,    27, 12318,   285,   308,   647,
2024-08-03T04:49:25.897424765Z            13,  9070,   342,   654, 33032, 13208,   380, 12318,   285,   253,
2024-08-03T04:49:25.897426082Z           308,   647,  3394,   403,  2159, 25491,  6830,  7668,  5014,    15,
2024-08-03T04:49:25.897427264Z           380,  4454,   273,  2173,   952,   452,   644,  7932,   407, 12314,
2024-08-03T04:49:25.897428435Z          3000,   313,    70,    15,    72,   904, 15694,    57,    13, 15694,
2024-08-03T04:49:25.897430044Z            58,    13, 15694,    59,   481, 15694,    57,   310,  1900,   253,
2024-08-03T04:49:25.897433042Z          2256,   273,   253,  2362,    15,  1422,   452,   281,  3653,  1880,
2024-08-03T04:49:25.897434266Z           253, 12318,   310,   908,   323,   253,   308,   647,   390,   417,
2024-08-03T04:49:25.897435443Z            15,   380, 10393,  8631, 15363,  7848,  1972,   390,  4648,   273,
2024-08-03T04:49:25.897437030Z          5113,   285,  3797,  1097,  6867,   285, 34162,  4648,    15,  1198,
2024-08-03T04:49:25.897438226Z          1650,    13,   247,  1684, 30736, 22205,   476,  5431,   320,   908,
2024-08-03T04:49:25.897439401Z           281,  2186,  1684, 30736,    13,   533,   352,   812,   671,  5752,
2024-08-03T04:49:25.897440648Z           347,   247,  7856,   275, 34162,  9534,    15,  6550,  1419,   634,
2024-08-03T04:49:25.897441824Z          9172,   715,   346,  4374,     3,   285,   346,  2302,  3446,   380,
2024-08-03T04:49:25.897443003Z         12616,   778,   671,  3831,   346, 15362,   995,   247, 30300,   326,
2024-08-03T04:49:25.897444181Z           476,   320,   271,  1789,    13,   247,  1436,    13,   285,    16,
2024-08-03T04:49:25.897445355Z           263,   271,  2250,    15,   187,   187, 16698,  3280,    27, 12318,
2024-08-03T04:49:25.897446538Z            27, 11281,    29, 33032,    31,    53,   647,    27,  4459, 29023,
2024-08-03T04:49:25.897447745Z           187, 16698,  3453,    27,  6279,   187, 16698,  8813,    27,   831,
2024-08-03T04:49:25.897448924Z           310,   247,  1175,  1650,    15,   380, 11281,   310,   908,   323,
2024-08-03T04:49:25.897450099Z          8785,   253, 29023,    15,   187,    50,    27, 12318,    27, 10831,
2024-08-03T04:49:25.897451285Z          4421,    29, 33032,    31,    53,   647,    27,  1978,   634,  4707,
2024-08-03T04:49:25.897452491Z          6079,   275,   253, 16846,   187,    34,    27,   187,    29,    93,
2024-08-03T04:49:25.897454078Z           515,  5567, 49651,   187,  4374, 50279]), 'labels': tensor([ -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897455373Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897456559Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897457763Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897458934Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897460132Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897461305Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897462473Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897463659Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897464838Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897466043Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897468788Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897470725Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897471914Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897473097Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897474282Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897475453Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897476650Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897477853Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897479292Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897480458Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897481663Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897482835Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897484143Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.897485373Z          -100,  -100,  -100,   187,  4374, 50279]), 'attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2024-08-03T04:49:25.897486840Z         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2024-08-03T04:49:25.897488090Z         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2024-08-03T04:49:25.897489330Z         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2024-08-03T04:49:25.897490531Z         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2024-08-03T04:49:25.897491723Z         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2024-08-03T04:49:25.897492917Z         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2024-08-03T04:49:25.897494094Z         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2024-08-03T04:49:25.897495294Z         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2024-08-03T04:49:25.897496473Z         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2024-08-03T04:49:25.897497671Z         1, 1, 1, 1, 1, 1])}.
2024-08-03T04:49:25.898197305Z 08/02/2024 21:49:25 - INFO - __main__ - Sample 288389 of the training set: {'input_ids': tensor([50279,    29,    93,  4537, 49651,   187, 24408,    70,   253, 28380,
2024-08-03T04:49:25.898213207Z           273,  1249,    16,    22,   970,   247,   418,  5738,  1159,    15,
2024-08-03T04:49:25.898215098Z           187,    29,    93,   515,  5567, 49651,   187,  3701, 28380,     9,
2024-08-03T04:49:25.898225517Z            69,  1741,   423,    13, 11812,    10,   187, 50276,  2309, 43690,
2024-08-03T04:49:25.898227048Z          2462, 11812,   187,   423,   187,   187,  2307, 19901,     9,   805,
2024-08-03T04:49:25.898228394Z            13,   608,    10, 50279]), 'labels': tensor([ -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.898229655Z          -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
2024-08-03T04:49:25.898231360Z          -100,  -100,  -100,  -100,  -100,  -100,   187,  3701, 28380,     9,
2024-08-03T04:49:25.898232597Z            69,  1741,   423,    13, 11812,    10,   187, 50276,  2309, 43690,
2024-08-03T04:49:25.898233931Z          2462, 11812,   187,   423,   187,   187,  2307, 19901,     9,   805,
2024-08-03T04:49:25.898235174Z            13,   608,    10, 50279]), 'attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2024-08-03T04:49:25.898236580Z         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2024-08-03T04:49:25.898237848Z         1, 1, 1, 1, 1, 1])}.
2024-08-03T04:49:25.947394051Z [2024-08-02 21:49:25,947] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.4, git-hash=unknown, git-branch=unknown
2024-08-03T04:49:26.027498164Z [2024-08-02 21:49:26,027] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
2024-08-03T04:49:26.036280330Z [2024-08-02 21:49:26,036] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
2024-08-03T04:49:26.036306962Z [2024-08-02 21:49:26,036] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
2024-08-03T04:49:27.207521111Z [2024-08-02 21:49:27,207] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
2024-08-03T04:49:27.207542936Z [2024-08-02 21:49:27,207] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'>
2024-08-03T04:49:27.207545737Z [2024-08-02 21:49:27,207] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False
2024-08-03T04:49:27.207547333Z [2024-08-02 21:49:27,207] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 3 optimizer
2024-08-03T04:49:27.432920005Z [2024-08-02 21:49:27,432] [INFO] [utils.py:781:see_memory_usage] Stage 3 initialize beginning
2024-08-03T04:49:27.433352239Z [2024-08-02 21:49:27,433] [INFO] [utils.py:782:see_memory_usage] MA 0.4 GB         Max_MA 0.97 GB         CA 1.36 GB         Max_CA 1 GB 
2024-08-03T04:49:27.433524647Z [2024-08-02 21:49:27,433] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 42.03 GB, percent = 2.1%
2024-08-03T04:49:27.446755219Z [2024-08-02 21:49:27,446] [INFO] [stage3.py:130:__init__] Reduce bucket size 4194304
2024-08-03T04:49:27.446767013Z [2024-08-02 21:49:27,446] [INFO] [stage3.py:131:__init__] Prefetch bucket size 3774873
2024-08-03T04:49:27.636836316Z [2024-08-02 21:49:27,636] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
2024-08-03T04:49:27.637097769Z [2024-08-02 21:49:27,637] [INFO] [utils.py:782:see_memory_usage] MA 0.4 GB         Max_MA 0.4 GB         CA 1.36 GB         Max_CA 1 GB 
2024-08-03T04:49:27.637258471Z [2024-08-02 21:49:27,637] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 42.02 GB, percent = 2.1%
2024-08-03T04:49:27.747635173Z Parameter Offload: Total persistent parameters: 133120 in 65 params
2024-08-03T04:49:28.167394033Z [2024-08-02 21:49:28,167] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
2024-08-03T04:49:28.167774611Z [2024-08-02 21:49:28,167] [INFO] [utils.py:782:see_memory_usage] MA 0.4 GB         Max_MA 0.4 GB         CA 1.36 GB         Max_CA 1 GB 
2024-08-03T04:49:28.167934785Z [2024-08-02 21:49:28,167] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 42.08 GB, percent = 2.1%
2024-08-03T04:49:28.317370442Z [2024-08-02 21:49:28,316] [INFO] [utils.py:781:see_memory_usage] Before creating fp16 partitions
2024-08-03T04:49:28.317660551Z [2024-08-02 21:49:28,317] [INFO] [utils.py:782:see_memory_usage] MA 0.4 GB         Max_MA 0.4 GB         CA 1.36 GB         Max_CA 1 GB 
2024-08-03T04:49:28.317814447Z [2024-08-02 21:49:28,317] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 42.08 GB, percent = 2.1%
2024-08-03T04:57:27.560934447Z [2024-08-02 21:57:27,560] [INFO] [utils.py:781:see_memory_usage] After creating fp16 partitions: 1
2024-08-03T04:57:27.561411865Z [2024-08-02 21:57:27,561] [INFO] [utils.py:782:see_memory_usage] MA 0.4 GB         Max_MA 0.4 GB         CA 0.41 GB         Max_CA 1 GB 
2024-08-03T04:57:27.561555360Z [2024-08-02 21:57:27,561] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 41.42 GB, percent = 2.1%
2024-08-03T04:57:27.719495238Z [2024-08-02 21:57:27,719] [INFO] [utils.py:781:see_memory_usage] Before creating fp32 partitions
2024-08-03T04:57:27.719857010Z [2024-08-02 21:57:27,719] [INFO] [utils.py:782:see_memory_usage] MA 0.4 GB         Max_MA 0.4 GB         CA 0.41 GB         Max_CA 0 GB 
2024-08-03T04:57:27.720001681Z [2024-08-02 21:57:27,719] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 41.42 GB, percent = 2.1%
2024-08-03T04:57:27.870929275Z [2024-08-02 21:57:27,870] [INFO] [utils.py:781:see_memory_usage] After creating fp32 partitions
2024-08-03T04:57:27.871169266Z [2024-08-02 21:57:27,871] [INFO] [utils.py:782:see_memory_usage] MA 1.21 GB         Max_MA 1.61 GB         CA 1.62 GB         Max_CA 2 GB 
2024-08-03T04:57:27.871314487Z [2024-08-02 21:57:27,871] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 41.42 GB, percent = 2.1%
2024-08-03T04:57:28.021792173Z [2024-08-02 21:57:28,021] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states
2024-08-03T04:57:28.022086525Z [2024-08-02 21:57:28,022] [INFO] [utils.py:782:see_memory_usage] MA 1.21 GB         Max_MA 1.21 GB         CA 1.62 GB         Max_CA 2 GB 
2024-08-03T04:57:28.022240465Z [2024-08-02 21:57:28,022] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 41.42 GB, percent = 2.1%
2024-08-03T04:57:28.171208324Z [2024-08-02 21:57:28,170] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states
2024-08-03T04:57:28.171602594Z [2024-08-02 21:57:28,171] [INFO] [utils.py:782:see_memory_usage] MA 1.21 GB         Max_MA 2.01 GB         CA 2.43 GB         Max_CA 2 GB 
2024-08-03T04:57:28.171752038Z [2024-08-02 21:57:28,171] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 41.41 GB, percent = 2.1%
2024-08-03T04:57:28.172194323Z [2024-08-02 21:57:28,172] [INFO] [stage3.py:486:_setup_for_real_optimizer] optimizer state initialized
2024-08-03T04:57:30.007801320Z [2024-08-02 21:57:30,007] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer
2024-08-03T04:57:30.008232789Z [2024-08-02 21:57:30,008] [INFO] [utils.py:782:see_memory_usage] MA 1.62 GB         Max_MA 2.0 GB         CA 2.43 GB         Max_CA 2 GB 
2024-08-03T04:57:30.008387160Z [2024-08-02 21:57:30,008] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 41.42 GB, percent = 2.1%
2024-08-03T04:57:30.008495559Z [2024-08-02 21:57:30,008] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer_Stage3
2024-08-03T04:57:30.008599235Z [2024-08-02 21:57:30,008] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
2024-08-03T04:57:30.008623982Z [2024-08-02 21:57:30,008] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
2024-08-03T04:57:30.008672225Z [2024-08-02 21:57:30,008] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[(0.9, 0.999)]
2024-08-03T04:57:30.016705070Z [2024-08-02 21:57:30,016] [INFO] [config.py:997:print] DeepSpeedEngine configuration:
2024-08-03T04:57:30.017307332Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   activation_checkpointing_config  {
2024-08-03T04:57:30.017309325Z     "partition_activations": false, 
2024-08-03T04:57:30.017311019Z     "contiguous_memory_optimization": false, 
2024-08-03T04:57:30.017312552Z     "cpu_checkpointing": false, 
2024-08-03T04:57:30.017314248Z     "number_checkpoints": null, 
2024-08-03T04:57:30.017315622Z     "synchronize_checkpoint_boundary": false, 
2024-08-03T04:57:30.017317158Z     "profile": false
2024-08-03T04:57:30.017318571Z }
2024-08-03T04:57:30.017319946Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
2024-08-03T04:57:30.017322398Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   amp_enabled .................. False
2024-08-03T04:57:30.017323757Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   amp_params ................... False
2024-08-03T04:57:30.017498691Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   autotuning_config ............ {
2024-08-03T04:57:30.017500298Z     "enabled": false, 
2024-08-03T04:57:30.017501691Z     "start_step": null, 
2024-08-03T04:57:30.017502882Z     "end_step": null, 
2024-08-03T04:57:30.017516392Z     "metric_path": null, 
2024-08-03T04:57:30.017517790Z     "arg_mappings": null, 
2024-08-03T04:57:30.017519164Z     "metric": "throughput", 
2024-08-03T04:57:30.017520457Z     "model_info": null, 
2024-08-03T04:57:30.017521833Z     "results_dir": "autotuning_results", 
2024-08-03T04:57:30.017523163Z     "exps_dir": "autotuning_exps", 
2024-08-03T04:57:30.017524578Z     "overwrite": true, 
2024-08-03T04:57:30.017525730Z     "fast": true, 
2024-08-03T04:57:30.017527296Z     "start_profile_step": 3, 
2024-08-03T04:57:30.017528492Z     "end_profile_step": 5, 
2024-08-03T04:57:30.017529866Z     "tuner_type": "gridsearch", 
2024-08-03T04:57:30.017531074Z     "tuner_early_stopping": 5, 
2024-08-03T04:57:30.017532462Z     "tuner_num_trials": 50, 
2024-08-03T04:57:30.017533634Z     "model_info_path": null, 
2024-08-03T04:57:30.017535036Z     "mp_size": 1, 
2024-08-03T04:57:30.017536327Z     "max_train_batch_size": null, 
2024-08-03T04:57:30.017537698Z     "min_train_batch_size": 1, 
2024-08-03T04:57:30.017539717Z     "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
2024-08-03T04:57:30.017541227Z     "min_train_micro_batch_size_per_gpu": 1, 
2024-08-03T04:57:30.017542399Z     "num_tuning_micro_batch_sizes": 3
2024-08-03T04:57:30.017543749Z }
2024-08-03T04:57:30.017544850Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   bfloat16_enabled ............. True
2024-08-03T04:57:30.017550340Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   bfloat16_immediate_grad_update  False
2024-08-03T04:57:30.017551630Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   checkpoint_parallel_write_pipeline  False
2024-08-03T04:57:30.017562289Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   checkpoint_tag_validation_enabled  True
2024-08-03T04:57:30.017591813Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   checkpoint_tag_validation_fail  False
2024-08-03T04:57:30.017593916Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f89781a6bc0>
2024-08-03T04:57:30.017597972Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   communication_data_type ...... None
2024-08-03T04:57:30.017667132Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
2024-08-03T04:57:30.017684706Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   curriculum_enabled_legacy .... False
2024-08-03T04:57:30.017686322Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   curriculum_params_legacy ..... False
2024-08-03T04:57:30.017708821Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
2024-08-03T04:57:30.017715946Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   data_efficiency_enabled ...... False
2024-08-03T04:57:30.017729731Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   dataloader_drop_last ......... False
2024-08-03T04:57:30.017753845Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   disable_allgather ............ False
2024-08-03T04:57:30.017761449Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   dump_state ................... False
2024-08-03T04:57:30.017765753Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   dynamic_loss_scale_args ...... None
2024-08-03T04:57:30.017787470Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   eigenvalue_enabled ........... False
2024-08-03T04:57:30.017797667Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   eigenvalue_gas_boundary_resolution  1
2024-08-03T04:57:30.017814312Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   eigenvalue_layer_name ........ bert.encoder.layer
2024-08-03T04:57:30.017830372Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   eigenvalue_layer_num ......... 0
2024-08-03T04:57:30.017831664Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   eigenvalue_max_iter .......... 100
2024-08-03T04:57:30.017862000Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   eigenvalue_stability ......... 1e-06
2024-08-03T04:57:30.017866137Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   eigenvalue_tol ............... 0.01
2024-08-03T04:57:30.017892091Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   eigenvalue_verbose ........... False
2024-08-03T04:57:30.017896006Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   elasticity_enabled ........... False
2024-08-03T04:57:30.017957619Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   flops_profiler_config ........ {
2024-08-03T04:57:30.017959299Z     "enabled": false, 
2024-08-03T04:57:30.017960689Z     "recompute_fwd_factor": 0.0, 
2024-08-03T04:57:30.017962085Z     "profile_step": 1, 
2024-08-03T04:57:30.017967811Z     "module_depth": -1, 
2024-08-03T04:57:30.017969027Z     "top_modules": 1, 
2024-08-03T04:57:30.017970387Z     "detailed": true, 
2024-08-03T04:57:30.017971579Z     "output_file": null
2024-08-03T04:57:30.017972932Z }
2024-08-03T04:57:30.017992623Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   fp16_auto_cast ............... None
2024-08-03T04:57:30.018009621Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   fp16_enabled ................. False
2024-08-03T04:57:30.018016171Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   fp16_master_weights_and_gradients  False
2024-08-03T04:57:30.018018128Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   global_rank .................. 0
2024-08-03T04:57:30.018033132Z [2024-08-02 21:57:30,017] [INFO] [config.py:1001:print]   grad_accum_dtype ............. None
2024-08-03T04:57:30.018053203Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   gradient_accumulation_steps .. 2
2024-08-03T04:57:30.018056865Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   gradient_clipping ............ 1.0
2024-08-03T04:57:30.018107885Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   gradient_predivide_factor .... 1.0
2024-08-03T04:57:30.018132614Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   graph_harvesting ............. False
2024-08-03T04:57:30.018139458Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
2024-08-03T04:57:30.018142861Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   initial_dynamic_scale ........ 1
2024-08-03T04:57:30.018165857Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   load_universal_checkpoint .... False
2024-08-03T04:57:30.018168538Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   loss_scale ................... 1.0
2024-08-03T04:57:30.018196201Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   memory_breakdown ............. False
2024-08-03T04:57:30.018197994Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   mics_hierarchial_params_gather  False
2024-08-03T04:57:30.018220306Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   mics_shard_size .............. -1
2024-08-03T04:57:30.018283373Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
2024-08-03T04:57:30.018326574Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   nebula_config ................ {
2024-08-03T04:57:30.018328120Z     "enabled": false, 
2024-08-03T04:57:30.018329533Z     "persistent_storage_path": null, 
2024-08-03T04:57:30.018342014Z     "persistent_time_interval": 100, 
2024-08-03T04:57:30.018344472Z     "num_of_version_in_retention": 2, 
2024-08-03T04:57:30.018346236Z     "enable_nebula_load": true, 
2024-08-03T04:57:30.018347845Z     "load_path": null
2024-08-03T04:57:30.018349045Z }
2024-08-03T04:57:30.018351902Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   optimizer_legacy_fusion ...... False
2024-08-03T04:57:30.018353561Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   optimizer_name ............... None
2024-08-03T04:57:30.018378818Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   optimizer_params ............. None
2024-08-03T04:57:30.018391684Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
2024-08-03T04:57:30.018408887Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   pld_enabled .................. False
2024-08-03T04:57:30.018425424Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   pld_params ................... False
2024-08-03T04:57:30.018435347Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   prescale_gradients ........... False
2024-08-03T04:57:30.018444059Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   scheduler_name ............... None
2024-08-03T04:57:30.018470966Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   scheduler_params ............. None
2024-08-03T04:57:30.018476691Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   seq_parallel_communication_data_type  torch.float32
2024-08-03T04:57:30.018498873Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   sparse_attention ............. None
2024-08-03T04:57:30.018516671Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   sparse_gradients_enabled ..... False
2024-08-03T04:57:30.018537077Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   steps_per_print .............. inf
2024-08-03T04:57:30.018571375Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   timers_config ................ enabled=True synchronized=True
2024-08-03T04:57:30.018573412Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   train_batch_size ............. 128
2024-08-03T04:57:30.018578337Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   train_micro_batch_size_per_gpu  2
2024-08-03T04:57:30.018613487Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   use_data_before_expert_parallel_  False
2024-08-03T04:57:30.018615237Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   use_node_local_storage ....... False
2024-08-03T04:57:30.018630505Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   wall_clock_breakdown ......... False
2024-08-03T04:57:30.018647889Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   weight_quantization_config ... None
2024-08-03T04:57:30.018655709Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   world_size ................... 32
2024-08-03T04:57:30.018682819Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   zero_allow_untested_optimizer  True
2024-08-03T04:57:30.018766654Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=4194304 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=3774873 param_persistence_threshold=20480 model_persistence_threshold=sys.maxsize max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=True use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
2024-08-03T04:57:30.018772695Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   zero_enabled ................. True
2024-08-03T04:57:30.018774048Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   zero_force_ds_cpu_optimizer .. True
2024-08-03T04:57:30.018779893Z [2024-08-02 21:57:30,018] [INFO] [config.py:1001:print]   zero_optimization_stage ...... 3
2024-08-03T04:57:30.018875994Z [2024-08-02 21:57:30,018] [INFO] [config.py:987:print_user_config]   json = {
2024-08-03T04:57:30.018877524Z     "bf16": {
2024-08-03T04:57:30.018878769Z         "enabled": true
2024-08-03T04:57:30.018880164Z     }, 
2024-08-03T04:57:30.018881274Z     "zero_optimization": {
2024-08-03T04:57:30.018882658Z         "stage": 3, 
2024-08-03T04:57:30.018883826Z         "overlap_comm": true, 
2024-08-03T04:57:30.018885230Z         "contiguous_gradients": true, 
2024-08-03T04:57:30.018887373Z         "sub_group_size": 1.000000e+09, 
2024-08-03T04:57:30.018888614Z         "reduce_bucket_size": 4.194304e+06, 
2024-08-03T04:57:30.018889822Z         "stage3_prefetch_bucket_size": 3.774873e+06, 
2024-08-03T04:57:30.018891165Z         "stage3_param_persistence_threshold": 2.048000e+04, 
2024-08-03T04:57:30.018892398Z         "stage3_max_live_parameters": 1.000000e+09, 
2024-08-03T04:57:30.018893556Z         "stage3_max_reuse_distance": 1.000000e+09, 
2024-08-03T04:57:30.018894722Z         "stage3_gather_16bit_weights_on_model_save": true
2024-08-03T04:57:30.018896093Z     }, 
2024-08-03T04:57:30.018897188Z     "gradient_accumulation_steps": 2, 
2024-08-03T04:57:30.018898397Z     "gradient_clipping": 1.0, 
2024-08-03T04:57:30.018899564Z     "steps_per_print": inf, 
2024-08-03T04:57:30.018900748Z     "train_batch_size": 128, 
2024-08-03T04:57:30.018904397Z     "train_micro_batch_size_per_gpu": 2, 
2024-08-03T04:57:30.018905640Z     "wall_clock_breakdown": false, 
2024-08-03T04:57:30.018906829Z     "fp16": {
2024-08-03T04:57:30.018907959Z         "enabled": false
2024-08-03T04:57:30.018909166Z     }, 
2024-08-03T04:57:30.018910257Z     "zero_allow_untested_optimizer": true
2024-08-03T04:57:30.018911491Z }
2024-08-03T04:57:30.062095958Z 08/02/2024 21:57:30 - INFO - __main__ - ***** Running training *****
2024-08-03T04:57:30.062109297Z 08/02/2024 21:57:30 - INFO - __main__ -   Num examples = 607950
2024-08-03T04:57:30.062111359Z 08/02/2024 21:57:30 - INFO - __main__ -   Num Epochs = 2
2024-08-03T04:57:30.062112848Z 08/02/2024 21:57:30 - INFO - __main__ -   Instantaneous batch size per device = 2
2024-08-03T04:57:30.062114497Z 08/02/2024 21:57:30 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 128
2024-08-03T04:57:30.062116133Z 08/02/2024 21:57:30 - INFO - __main__ -   Gradient Accumulation steps = 2
2024-08-03T04:57:30.062117545Z 08/02/2024 21:57:30 - INFO - __main__ -   Total optimization steps = 9500
2024-08-03T04:57:30.105850899Z 
  0%|          | 0/9500 [00:00<?, ?it/s]/opt/miniconda3/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:274.)
2024-08-03T04:57:30.105871642Z   batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
2024-08-03T04:57:30.107105607Z /opt/miniconda3/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:274.)
2024-08-03T04:57:30.107109931Z   batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
2024-08-03T04:57:30.107555424Z /opt/miniconda3/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:274.)
2024-08-03T04:57:30.107558048Z   batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
2024-08-03T04:57:30.107632267Z /opt/miniconda3/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:274.)
2024-08-03T04:57:30.107634309Z   batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
2024-08-03T04:57:30.107933912Z /opt/miniconda3/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:274.)
2024-08-03T04:57:30.107945688Z   batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
2024-08-03T04:57:30.108479619Z /opt/miniconda3/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:274.)
2024-08-03T04:57:30.108497024Z   batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
2024-08-03T04:57:30.109566117Z /opt/miniconda3/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:274.)
2024-08-03T04:57:30.109572934Z   batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
2024-08-03T04:57:30.110215259Z /opt/miniconda3/lib/python3.10/site-packages/transformers/data/data_collator.py:656: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:274.)
2024-08-03T04:57:30.110222520Z   batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
2024-08-03T04:57:36.085014087Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:57:36.085042251Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:57:36.085044210Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:57:36.085045699Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:57:36.085047399Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:57:36.085048768Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Using network IB
2024-08-03T04:57:36.085050288Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:57:36.085051643Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Using network IB
2024-08-03T04:57:36.085053106Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Using network IB
2024-08-03T04:57:36.085054429Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Using network IB
2024-08-03T04:57:36.085055891Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Using network IB
2024-08-03T04:57:36.085057214Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Using network IB
2024-08-03T04:57:36.085058716Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:57:36.085060083Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Using non-device net plugin version 0
2024-08-03T04:57:36.085061604Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Using network IB
2024-08-03T04:57:36.085080838Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Using network IB
2024-08-03T04:57:36.087858719Z jupiter-cs-aus-207:102:685 [7] NCCL INFO bootstrapSplit: comm 0x7fb0fc08dce0 parent 0xa2679d0 rank 7 nranks 32 color 698429859 key 7 prev 6 next 8 - DONE
2024-08-03T04:57:36.087865219Z jupiter-cs-aus-207:102:685 [7] NCCL INFO comm 0x7fb0fc08dce0 rank 7 nranks 32 cudaDev 7 nvmlDev 7 busId e4000 commId 0xfb389571c16e2b31 - Init START
2024-08-03T04:57:36.087866943Z jupiter-cs-aus-207:95:683 [0] NCCL INFO bootstrapSplit: comm 0x7f860807d1e0 parent 0xa3860f0 rank 0 nranks 32 color 698429859 key 0 prev 31 next 1 - DONE
2024-08-03T04:57:36.087868441Z jupiter-cs-aus-207:96:684 [1] NCCL INFO bootstrapSplit: comm 0x7f68380a5750 parent 0xb44bbd0 rank 1 nranks 32 color 698429859 key 1 prev 0 next 2 - DONE
2024-08-03T04:57:36.087869995Z jupiter-cs-aus-207:96:684 [1] NCCL INFO comm 0x7f68380a5750 rank 1 nranks 32 cudaDev 1 nvmlDev 1 busId 2a000 commId 0xfb389571c16e2b31 - Init START
2024-08-03T04:57:36.087871431Z jupiter-cs-aus-207:95:683 [0] NCCL INFO comm 0x7f860807d1e0 rank 0 nranks 32 cudaDev 0 nvmlDev 0 busId 18000 commId 0xfb389571c16e2b31 - Init START
2024-08-03T04:57:36.087873003Z jupiter-cs-aus-207:97:689 [2] NCCL INFO bootstrapSplit: comm 0x7faef40900e0 parent 0xab5c5a0 rank 2 nranks 32 color 698429859 key 2 prev 1 next 3 - DONE
2024-08-03T04:57:36.087874422Z jupiter-cs-aus-207:97:689 [2] NCCL INFO comm 0x7faef40900e0 rank 2 nranks 32 cudaDev 2 nvmlDev 2 busId 3a000 commId 0xfb389571c16e2b31 - Init START
2024-08-03T04:57:36.087875942Z jupiter-cs-aus-207:101:686 [6] NCCL INFO bootstrapSplit: comm 0x7f86f008dc80 parent 0x9d1ef00 rank 6 nranks 32 color 698429859 key 6 prev 5 next 7 - DONE
2024-08-03T04:57:36.087877371Z jupiter-cs-aus-207:100:687 [5] NCCL INFO bootstrapSplit: comm 0x7fc68c07bc10 parent 0xa142060 rank 5 nranks 32 color 698429859 key 5 prev 4 next 6 - DONE
2024-08-03T04:57:36.087878933Z jupiter-cs-aus-207:98:690 [3] NCCL INFO bootstrapSplit: comm 0x7fa8e00b2640 parent 0xb892530 rank 3 nranks 32 color 698429859 key 3 prev 2 next 4 - DONE
2024-08-03T04:57:36.087880365Z jupiter-cs-aus-207:101:686 [6] NCCL INFO comm 0x7f86f008dc80 rank 6 nranks 32 cudaDev 6 nvmlDev 6 busId 91000 commId 0xfb389571c16e2b31 - Init START
2024-08-03T04:57:36.087881906Z jupiter-cs-aus-207:100:687 [5] NCCL INFO comm 0x7fc68c07bc10 rank 5 nranks 32 cudaDev 5 nvmlDev 5 busId 8b000 commId 0xfb389571c16e2b31 - Init START
2024-08-03T04:57:36.087883560Z jupiter-cs-aus-207:99:688 [4] NCCL INFO bootstrapSplit: comm 0x7fb03408cf40 parent 0xb068620 rank 4 nranks 32 color 698429859 key 4 prev 3 next 5 - DONE
2024-08-03T04:57:36.087885126Z jupiter-cs-aus-207:98:690 [3] NCCL INFO comm 0x7fa8e00b2640 rank 3 nranks 32 cudaDev 3 nvmlDev 3 busId 5d000 commId 0xfb389571c16e2b31 - Init START
2024-08-03T04:57:36.087886565Z jupiter-cs-aus-207:99:688 [4] NCCL INFO comm 0x7fb03408cf40 rank 4 nranks 32 cudaDev 4 nvmlDev 4 busId 84000 commId 0xfb389571c16e2b31 - Init START
2024-08-03T04:57:39.703883314Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Setting affinity for GPU 2 to ffff,fffffffd,00000000,0000ffff,fffffffd
2024-08-03T04:57:39.703898119Z jupiter-cs-aus-207:97:689 [2] NCCL INFO NVLS multicast support is not available on dev 2
2024-08-03T04:57:39.733227498Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Setting affinity for GPU 7 to ffffffff,fffe0000,00000000,ffffffff,fffe0000,00000000
2024-08-03T04:57:39.733238752Z jupiter-cs-aus-207:102:685 [7] NCCL INFO NVLS multicast support is not available on dev 7
2024-08-03T04:57:39.788920536Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Setting affinity for GPU 3 to ffff,fffffffd,00000000,0000ffff,fffffffd
2024-08-03T04:57:39.788931686Z jupiter-cs-aus-207:98:690 [3] NCCL INFO NVLS multicast support is not available on dev 3
2024-08-03T04:57:39.829514817Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Setting affinity for GPU 4 to ffffffff,fffe0000,00000000,ffffffff,fffe0000,00000000
2024-08-03T04:57:39.829526956Z jupiter-cs-aus-207:99:688 [4] NCCL INFO NVLS multicast support is not available on dev 4
2024-08-03T04:57:39.851208264Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Setting affinity for GPU 0 to ffff,fffffffd,00000000,0000ffff,fffffffd
2024-08-03T04:57:39.851224533Z jupiter-cs-aus-207:95:683 [0] NCCL INFO NVLS multicast support is not available on dev 0
2024-08-03T04:57:39.862762732Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Setting affinity for GPU 1 to ffff,fffffffd,00000000,0000ffff,fffffffd
2024-08-03T04:57:39.862774102Z jupiter-cs-aus-207:96:684 [1] NCCL INFO NVLS multicast support is not available on dev 1
2024-08-03T04:57:39.866999484Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Setting affinity for GPU 6 to ffffffff,fffe0000,00000000,ffffffff,fffe0000,00000000
2024-08-03T04:57:39.867010226Z jupiter-cs-aus-207:101:686 [6] NCCL INFO NVLS multicast support is not available on dev 6
2024-08-03T04:57:39.867684314Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Setting affinity for GPU 5 to ffffffff,fffe0000,00000000,ffffffff,fffe0000,00000000
2024-08-03T04:57:39.867696697Z jupiter-cs-aus-207:100:687 [5] NCCL INFO NVLS multicast support is not available on dev 5
2024-08-03T04:57:39.938541695Z jupiter-cs-aus-207:95:683 [0] NCCL INFO comm 0x7f860807d1e0 rank 0 nRanks 32 nNodes 4 localRanks 8 localRank 0 MNNVL 0
2024-08-03T04:57:39.938548606Z jupiter-cs-aus-207:97:689 [2] NCCL INFO comm 0x7faef40900e0 rank 2 nRanks 32 nNodes 4 localRanks 8 localRank 2 MNNVL 0
2024-08-03T04:57:39.938550991Z jupiter-cs-aus-207:96:684 [1] NCCL INFO comm 0x7f68380a5750 rank 1 nRanks 32 nNodes 4 localRanks 8 localRank 1 MNNVL 0
2024-08-03T04:57:39.938553099Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 00/16 :    0   7   6   5   4   3   2   1   9  10  11  12  13  14  15   8  16  23  22  21
2024-08-03T04:57:39.938555316Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 01/16 :    0   8  15  14  13  12  11  10   9  17  18  19  20  21  22  23  16  24  31  30
2024-08-03T04:57:39.938558072Z jupiter-cs-aus-207:102:685 [7] NCCL INFO comm 0x7fb0fc08dce0 rank 7 nRanks 32 nNodes 4 localRanks 8 localRank 7 MNNVL 0
2024-08-03T04:57:39.938566886Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 02/16 :    0   7   6   5   4   3  11  12  13  14  15   8   9  10  18  17  16  23  22  21
2024-08-03T04:57:39.938569295Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 03/16 :    0   1   2  10   9   8  15  14  13  12  11  19  20  21  22  23  16  17  18  26
2024-08-03T04:57:39.938571064Z jupiter-cs-aus-207:101:686 [6] NCCL INFO comm 0x7f86f008dc80 rank 6 nRanks 32 nNodes 4 localRanks 8 localRank 6 MNNVL 0
2024-08-03T04:57:39.938580357Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 04/16 :    0   7   6   5  13  14  15   8   9  10  11  12  20  19  18  17  16  23  22  21
2024-08-03T04:57:39.938582205Z jupiter-cs-aus-207:98:690 [3] NCCL INFO comm 0x7fa8e00b2640 rank 3 nRanks 32 nNodes 4 localRanks 8 localRank 3 MNNVL 0
2024-08-03T04:57:39.938584415Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/17/-1->1->-1 [2] -1/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->9 [10] -1/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0
2024-08-03T04:57:39.938589022Z jupiter-cs-aus-207:100:687 [5] NCCL INFO comm 0x7fc68c07bc10 rank 5 nRanks 32 nNodes 4 localRanks 8 localRank 5 MNNVL 0
2024-08-03T04:57:39.938590783Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 05/16 :    0   1   2   3   4  12  11  10   9   8  15  14  13  21  22  23  16  17  18  19
2024-08-03T04:57:39.938593176Z jupiter-cs-aus-207:96:684 [1] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:57:39.938595000Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/18/-1->2->-1 [3] -1/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->10 [11] -1/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1
2024-08-03T04:57:39.938598571Z jupiter-cs-aus-207:97:689 [2] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:57:39.938600760Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 06/16 :    0   7  15   8   9  10  11  12  13  14  22  21  20  19  18  17  16  23  31  24
2024-08-03T04:57:39.938602982Z jupiter-cs-aus-207:99:688 [4] NCCL INFO comm 0x7fb03408cf40 rank 4 nRanks 32 nNodes 4 localRanks 8 localRank 4 MNNVL 0
2024-08-03T04:57:39.938605233Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 07/16 :    0   1   2   3   4   5   6  14  13  12  11  10   9   8  15  23  16  17  18  19
2024-08-03T04:57:39.938607419Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 08/16 :    0   7   6   5   4   3   2   1   9  10  11  12  13  14  15   8  16  23  22  21
2024-08-03T04:57:39.938609339Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 09/16 :    0   8  15  14  13  12  11  10   9  17  18  19  20  21  22  23  16  24  31  30
2024-08-03T04:57:39.938611374Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 10/16 :    0   7   6   5   4   3  11  12  13  14  15   8   9  10  18  17  16  23  22  21
2024-08-03T04:57:39.938613534Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 11/16 :    0   1   2  10   9   8  15  14  13  12  11  19  20  21  22  23  16  17  18  26
2024-08-03T04:57:39.938615510Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 12/16 :    0   7   6   5  13  14  15   8   9  10  11  12  20  19  18  17  16  23  22  21
2024-08-03T04:57:39.938617989Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] 0/-1/-1->7->6 [2] 0/-1/-1->7->6 [3] 0/-1/-1->7->6 [4] 0/-1/-1->7->6 [5] 0/-1/-1->7->6 [6] 0/-1/-1->7->6 [7] 0/23/-1->7->-1 [8] -1/-1/-1->7->6 [9] 0/-1/-1->7->6 [10] 0/-1/-1->7->6 [11] 0/-1/-1->7->6 [12] 0/-1/-1->7->6 [13] 0/-1/-1->7->6 [14] 0/-1/-1->7->6 [15] 0/-1/-1->7->15
2024-08-03T04:57:39.938623172Z jupiter-cs-aus-207:102:685 [7] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:57:39.938624771Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 13/16 :    0   1   2   3   4  12  11  10   9   8  15  14  13  21  22  23  16  17  18  19
2024-08-03T04:57:39.938626077Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/22/-1->6->-1 [7] -1/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->14 [15] -1/-1/-1->6->5
2024-08-03T04:57:39.938628072Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/19/-1->3->-1 [4] -1/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->11 [12] -1/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2
2024-08-03T04:57:39.938630119Z jupiter-cs-aus-207:101:686 [6] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:57:39.938631311Z jupiter-cs-aus-207:98:690 [3] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:57:39.938632489Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/21/-1->5->-1 [6] -1/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->13 [14] -1/-1/-1->5->4 [15] 6/-1/-1->5->4
2024-08-03T04:57:39.938634613Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 14/16 :    0   7  15   8   9  10  11  12  13  14  22  21  20  19  18  17  16  23  31  24
2024-08-03T04:57:39.938636011Z jupiter-cs-aus-207:100:687 [5] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:57:39.938637183Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/20/-1->4->-1 [5] -1/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->12 [13] -1/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3
2024-08-03T04:57:39.938639329Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 15/16 :    0   1   2   3   4   5   6  14  13  12  11  10   9   8  15  23  16  17  18  19
2024-08-03T04:57:39.938640593Z jupiter-cs-aus-207:99:688 [4] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:57:39.938641992Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Trees [0] 1/16/-1->0->-1 [1] -1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->7 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7 [8] 1/-1/-1->0->8 [9] -1/-1/-1->0->7 [10] 1/-1/-1->0->7 [11] 1/-1/-1->0->7 [12] 1/-1/-1->0->7 [13] 1/-1/-1->0->7 [14] 1/-1/-1->0->7 [15] 1/-1/-1->0->7
2024-08-03T04:57:39.938645706Z jupiter-cs-aus-207:95:683 [0] NCCL INFO P2P Chunksize set to 131072
2024-08-03T04:57:40.022615048Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.022628461Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.024104584Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.024195352Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.025728260Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.025823324Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.027451633Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.027563678Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.029082032Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.029165340Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.030727859Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.030769639Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.030788457Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.030806300Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.030851813Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.032482334Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.032504166Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.032535346Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.032813914Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.033813047Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.033841344Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.033866646Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.034135333Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.035158468Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.035213420Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.035236738Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.036301455Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.036322715Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.036352704Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.037313963Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.037316465Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.037350131Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.037372486Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.038343568Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.038365254Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.039105808Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.039119642Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.039635138Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.040066217Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.040526053Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.040999523Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.041459544Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.041924032Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.042001139Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 00/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.042486379Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.042503579Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 02/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.042932865Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.043218805Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 04/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.043362167Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 03/0 : 27[3] -> 3[3] [receive] via NET/IB/4/GDRDMA
2024-08-03T04:57:40.043374124Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.043413023Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 11/0 : 27[3] -> 3[3] [receive] via NET/IB/4/GDRDMA
2024-08-03T04:57:40.043452090Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 06/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.043454863Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 02/0 : 3[3] -> 11[3] [send] via NET/IB/4/GDRDMA
2024-08-03T04:57:40.043519286Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 10/0 : 3[3] -> 11[3] [send] via NET/IB/4/GDRDMA
2024-08-03T04:57:40.043522168Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 04/0 : 28[4] -> 4[4] [receive] via NET/IB/5/GDRDMA
2024-08-03T04:57:40.043585224Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 12/0 : 28[4] -> 4[4] [receive] via NET/IB/5/GDRDMA
2024-08-03T04:57:40.043638032Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 05/0 : 4[4] -> 12[4] [send] via NET/IB/5/GDRDMA
2024-08-03T04:57:40.043698045Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 13/0 : 4[4] -> 12[4] [send] via NET/IB/5/GDRDMA
2024-08-03T04:57:40.043846866Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.043866017Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 08/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.044557493Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.044579907Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 10/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.045242743Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 12/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.045645636Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 14/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.045652506Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 02/0 : 26[2] -> 2[2] [receive] via NET/IB/2/GDRDMA
2024-08-03T04:57:40.045745909Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 10/0 : 26[2] -> 2[2] [receive] via NET/IB/2/GDRDMA
2024-08-03T04:57:40.045798903Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 03/0 : 2[2] -> 10[2] [send] via NET/IB/2/GDRDMA
2024-08-03T04:57:40.045800903Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 01/0 : 25[1] -> 1[1] [receive] via NET/IB/1/GDRDMA
2024-08-03T04:57:40.045852614Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 11/0 : 2[2] -> 10[2] [send] via NET/IB/2/GDRDMA
2024-08-03T04:57:40.045854378Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 09/0 : 25[1] -> 1[1] [receive] via NET/IB/1/GDRDMA
2024-08-03T04:57:40.045910484Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 00/0 : 1[1] -> 9[1] [send] via NET/IB/1/GDRDMA
2024-08-03T04:57:40.045966948Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 08/0 : 1[1] -> 9[1] [send] via NET/IB/1/GDRDMA
2024-08-03T04:57:40.047697603Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.048378102Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.048813700Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.048957198Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 05/0 : 29[5] -> 5[5] [receive] via NET/IB/6/GDRDMA
2024-08-03T04:57:40.049020868Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 13/0 : 29[5] -> 5[5] [receive] via NET/IB/6/GDRDMA
2024-08-03T04:57:40.049074355Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 04/0 : 5[5] -> 13[5] [send] via NET/IB/6/GDRDMA
2024-08-03T04:57:40.049077184Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.049120131Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 06/0 : 30[6] -> 6[6] [receive] via NET/IB/7/GDRDMA
2024-08-03T04:57:40.049122604Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 12/0 : 5[5] -> 13[5] [send] via NET/IB/6/GDRDMA
2024-08-03T04:57:40.049203960Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 14/0 : 30[6] -> 6[6] [receive] via NET/IB/7/GDRDMA
2024-08-03T04:57:40.049257553Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 07/0 : 6[6] -> 14[6] [send] via NET/IB/7/GDRDMA
2024-08-03T04:57:40.049319956Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 15/0 : 6[6] -> 14[6] [send] via NET/IB/7/GDRDMA
2024-08-03T04:57:40.049542140Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.049901948Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.050627257Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.050971264Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.051096239Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.051784457Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.052020729Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.052131354Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.052910154Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.053118111Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.053227764Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.053425288Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.054042408Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.054196883Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.054353440Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.054530187Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.054729971Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 07/0 : 31[7] -> 7[7] [receive] via NET/IB/8/GDRDMA
2024-08-03T04:57:40.054765300Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 00/0 : 24[0] -> 0[0] [receive] via NET/IB/0/GDRDMA
2024-08-03T04:57:40.054796551Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 15/0 : 31[7] -> 7[7] [receive] via NET/IB/8/GDRDMA
2024-08-03T04:57:40.054839770Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 08/0 : 24[0] -> 0[0] [receive] via NET/IB/0/GDRDMA
2024-08-03T04:57:40.054842534Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 06/0 : 7[7] -> 15[7] [send] via NET/IB/8/GDRDMA
2024-08-03T04:57:40.054887938Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 01/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA
2024-08-03T04:57:40.054890508Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 14/0 : 7[7] -> 15[7] [send] via NET/IB/8/GDRDMA
2024-08-03T04:57:40.054947449Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 09/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA
2024-08-03T04:57:40.055536315Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.055639258Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.056290648Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.056678385Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.056740461Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.057177944Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.057353390Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.058042568Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.058104651Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.058694093Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.058810241Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.058880527Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.059594990Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.059613349Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.060240032Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.060270047Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.060385948Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.061386937Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.061415793Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.061481748Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.062530802Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.062607068Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.062652276Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.063749258Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.063792685Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.063809500Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.064820269Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.064842890Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.064865685Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.065634283Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.066303949Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.066814226Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.067381157Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.067961735Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.068763803Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.070703796Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.448473031Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Connected all rings
2024-08-03T04:57:40.448710590Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Connected all rings
2024-08-03T04:57:40.448763575Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Connected all rings
2024-08-03T04:57:40.451174922Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.451535471Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.451600052Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.451920744Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.452082072Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.452118296Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.452246497Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.452606013Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.452662299Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.452687893Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.453063963Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Connected all rings
2024-08-03T04:57:40.453096980Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.453159636Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.453188721Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.453402355Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Connected all rings
2024-08-03T04:57:40.453405406Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Connected all rings
2024-08-03T04:57:40.453408057Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Connected all rings
2024-08-03T04:57:40.453414623Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Connected all rings
2024-08-03T04:57:40.453418913Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.453895772Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.454228929Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.454276438Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.455135373Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.455290918Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.455730137Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.456303337Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.456399995Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.457356983Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.457446393Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.458341714Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.458589036Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 12/0 : 4[4] -> 12[4] [send] via NET/IB/5/GDRDMA
2024-08-03T04:57:40.459151031Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.459543204Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.460087286Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.460090271Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.460507503Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.461176341Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.461194533Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.461532040Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.461625902Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.462328844Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.462471143Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.463178675Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.463197249Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.463883628Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.463904698Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.464485682Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.464579684Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.464599435Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.465348270Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.465385454Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.465406696Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.466165403Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.466199166Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.466399082Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 11/0 : 3[3] -> 11[3] [send] via NET/IB/4/GDRDMA
2024-08-03T04:57:40.466645797Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.466819250Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 13/0 : 5[5] -> 13[5] [send] via NET/IB/6/GDRDMA
2024-08-03T04:57:40.466861233Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 14/0 : 6[6] -> 14[6] [send] via NET/IB/7/GDRDMA
2024-08-03T04:57:40.466862811Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.467272706Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 01/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.467383872Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 10/0 : 2[2] -> 10[2] [send] via NET/IB/2/GDRDMA
2024-08-03T04:57:40.467402021Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 09/0 : 1[1] -> 9[1] [send] via NET/IB/1/GDRDMA
2024-08-03T04:57:40.467570008Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 03/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.467889296Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 05/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.468241029Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 07/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.468894583Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 09/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.469213249Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 11/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.469473651Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 04/0 : 20[4] -> 4[4] [receive] via NET/IB/5/GDRDMA
2024-08-03T04:57:40.469523479Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 04/0 : 4[4] -> 20[4] [send] via NET/IB/5/GDRDMA
2024-08-03T04:57:40.469563411Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 13/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.469710920Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 12/0 : 12[4] -> 4[4] [receive] via NET/IB/5/GDRDMA
2024-08-03T04:57:40.469792974Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 15/0 : 0[0] -> 7[7] via P2P/CUMEM
2024-08-03T04:57:40.470143274Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 15/0 : 7[7] -> 15[7] [send] via NET/IB/8/GDRDMA
2024-08-03T04:57:40.470146008Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 08/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA
2024-08-03T04:57:40.470369500Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 05/0 : 21[5] -> 5[5] [receive] via NET/IB/6/GDRDMA
2024-08-03T04:57:40.470413882Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 05/0 : 5[5] -> 21[5] [send] via NET/IB/6/GDRDMA
2024-08-03T04:57:40.470459663Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 06/0 : 22[6] -> 6[6] [receive] via NET/IB/7/GDRDMA
2024-08-03T04:57:40.470514476Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 06/0 : 6[6] -> 22[6] [send] via NET/IB/7/GDRDMA
2024-08-03T04:57:40.471607305Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 14/0 : 14[6] -> 6[6] [receive] via NET/IB/7/GDRDMA
2024-08-03T04:57:40.471655383Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 13/0 : 13[5] -> 5[5] [receive] via NET/IB/6/GDRDMA
2024-08-03T04:57:40.473043533Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 02/0 : 18[2] -> 2[2] [receive] via NET/IB/2/GDRDMA
2024-08-03T04:57:40.473096022Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 02/0 : 2[2] -> 18[2] [send] via NET/IB/2/GDRDMA
2024-08-03T04:57:40.473170653Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 03/0 : 19[3] -> 3[3] [receive] via NET/IB/4/GDRDMA
2024-08-03T04:57:40.473225984Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 03/0 : 3[3] -> 19[3] [send] via NET/IB/4/GDRDMA
2024-08-03T04:57:40.473278283Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 10/0 : 10[2] -> 2[2] [receive] via NET/IB/2/GDRDMA
2024-08-03T04:57:40.473403295Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 11/0 : 11[3] -> 3[3] [receive] via NET/IB/4/GDRDMA
2024-08-03T04:57:40.474205650Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 01/0 : 17[1] -> 1[1] [receive] via NET/IB/1/GDRDMA
2024-08-03T04:57:40.474254644Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 01/0 : 1[1] -> 17[1] [send] via NET/IB/1/GDRDMA
2024-08-03T04:57:40.474422121Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 09/0 : 9[1] -> 1[1] [receive] via NET/IB/1/GDRDMA
2024-08-03T04:57:40.474988874Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.475600343Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.475637676Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.476306266Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.476309778Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.476317759Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.477338802Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.477343595Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 07/0 : 23[7] -> 7[7] [receive] via NET/IB/8/GDRDMA
2024-08-03T04:57:40.477362299Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.477417068Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 07/0 : 7[7] -> 23[7] [send] via NET/IB/8/GDRDMA
2024-08-03T04:57:40.477422115Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.477512522Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 00/0 : 16[0] -> 0[0] [receive] via NET/IB/0/GDRDMA
2024-08-03T04:57:40.477567244Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 15/0 : 15[7] -> 7[7] [receive] via NET/IB/8/GDRDMA
2024-08-03T04:57:40.477569002Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 00/0 : 0[0] -> 16[0] [send] via NET/IB/0/GDRDMA
2024-08-03T04:57:40.477837601Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.478048328Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Channel 08/0 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA
2024-08-03T04:57:40.478452873Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.478836122Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.479162734Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.479303686Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.479696097Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.479741818Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.479764335Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.480594878Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.480616820Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM
2024-08-03T04:57:40.480650540Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.481481366Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.481570937Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM
2024-08-03T04:57:40.482253413Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.483502008Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.483524633Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.483974244Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.484727666Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.484817866Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.485151342Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.486320376Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.486343001Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.487072307Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.487534757Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.487905813Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM
2024-08-03T04:57:40.488462452Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.488841015Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.489052257Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.489064296Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM
2024-08-03T04:57:40.489079266Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.489383730Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.489948567Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.490044222Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.490111541Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.491314703Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.491413312Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.492077743Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.492134793Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.492574269Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.492891504Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.492944654Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.493618641Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.494811160Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.494855839Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.496399024Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.496786258Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.496904133Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.497552184Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.497598910Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM
2024-08-03T04:57:40.498327641Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.498830890Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.499390941Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM
2024-08-03T04:57:40.500500016Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM
2024-08-03T04:57:40.914410061Z jupiter-cs-aus-207:97:689 [2] NCCL INFO Connected all trees
2024-08-03T04:57:40.914430605Z jupiter-cs-aus-207:97:689 [2] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:57:40.914433398Z jupiter-cs-aus-207:97:689 [2] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:57:40.921618716Z jupiter-cs-aus-207:98:690 [3] NCCL INFO Connected all trees
2024-08-03T04:57:40.921627436Z jupiter-cs-aus-207:98:690 [3] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:57:40.921629905Z jupiter-cs-aus-207:98:690 [3] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:57:40.923032027Z jupiter-cs-aus-207:99:688 [4] NCCL INFO Connected all trees
2024-08-03T04:57:40.923044152Z jupiter-cs-aus-207:99:688 [4] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:57:40.923046164Z jupiter-cs-aus-207:99:688 [4] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:57:40.923073500Z jupiter-cs-aus-207:100:687 [5] NCCL INFO Connected all trees
2024-08-03T04:57:40.923078539Z jupiter-cs-aus-207:100:687 [5] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:57:40.923081210Z jupiter-cs-aus-207:100:687 [5] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:57:40.925576994Z jupiter-cs-aus-207:95:683 [0] NCCL INFO Connected all trees
2024-08-03T04:57:40.925579376Z jupiter-cs-aus-207:101:686 [6] NCCL INFO Connected all trees
2024-08-03T04:57:40.925580811Z jupiter-cs-aus-207:102:685 [7] NCCL INFO Connected all trees
2024-08-03T04:57:40.925582051Z jupiter-cs-aus-207:101:686 [6] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:57:40.925583563Z jupiter-cs-aus-207:101:686 [6] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:57:40.925584954Z jupiter-cs-aus-207:102:685 [7] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:57:40.925586651Z jupiter-cs-aus-207:96:684 [1] NCCL INFO Connected all trees
2024-08-03T04:57:40.925587959Z jupiter-cs-aus-207:102:685 [7] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:57:40.925605667Z jupiter-cs-aus-207:96:684 [1] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:57:40.925614982Z jupiter-cs-aus-207:96:684 [1] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:57:40.925630177Z jupiter-cs-aus-207:95:683 [0] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512
2024-08-03T04:57:40.925633114Z jupiter-cs-aus-207:95:683 [0] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
2024-08-03T04:57:40.974105637Z jupiter-cs-aus-207:102:685 [7] NCCL INFO comm 0x7fb0fc08dce0 rank 7 nranks 32 cudaDev 7 nvmlDev 7 busId e4000 commId 0xfb389571c16e2b31 - Init COMPLETE
2024-08-03T04:57:40.974148839Z jupiter-cs-aus-207:98:690 [3] NCCL INFO comm 0x7fa8e00b2640 rank 3 nranks 32 cudaDev 3 nvmlDev 3 busId 5d000 commId 0xfb389571c16e2b31 - Init COMPLETE
2024-08-03T04:57:40.974151507Z jupiter-cs-aus-207:99:688 [4] NCCL INFO comm 0x7fb03408cf40 rank 4 nranks 32 cudaDev 4 nvmlDev 4 busId 84000 commId 0xfb389571c16e2b31 - Init COMPLETE
2024-08-03T04:57:40.974153983Z jupiter-cs-aus-207:100:687 [5] NCCL INFO comm 0x7fc68c07bc10 rank 5 nranks 32 cudaDev 5 nvmlDev 5 busId 8b000 commId 0xfb389571c16e2b31 - Init COMPLETE
2024-08-03T04:57:40.974156205Z jupiter-cs-aus-207:95:683 [0] NCCL INFO comm 0x7f860807d1e0 rank 0 nranks 32 cudaDev 0 nvmlDev 0 busId 18000 commId 0xfb389571c16e2b31 - Init COMPLETE
2024-08-03T04:57:40.974158431Z jupiter-cs-aus-207:96:684 [1] NCCL INFO comm 0x7f68380a5750 rank 1 nranks 32 cudaDev 1 nvmlDev 1 busId 2a000 commId 0xfb389571c16e2b31 - Init COMPLETE
2024-08-03T04:57:40.976556780Z jupiter-cs-aus-207:101:686 [6] NCCL INFO comm 0x7f86f008dc80 rank 6 nranks 32 cudaDev 6 nvmlDev 6 busId 91000 commId 0xfb389571c16e2b31 - Init COMPLETE
2024-08-03T04:57:40.976567888Z jupiter-cs-aus-207:97:689 [2] NCCL INFO comm 0x7faef40900e0 rank 2 nranks 32 cudaDev 2 nvmlDev 2 busId 3a000 commId 0xfb389571c16e2b31 - Init COMPLETE
2024-08-03T04:58:48.487524938Z 
  0%|          | 1/9500 [01:18<206:55:43, 78.42s/it]08/02/2024 21:58:48 - INFO - __main__ -   Step: 1, LR: 7.018313411558286e-08, Loss: 810.426513671875
2024-08-03T04:59:01.041364269Z 
  0%|          | 2/9500 [01:30<104:40:54, 39.68s/it]08/02/2024 21:59:01 - INFO - __main__ -   Step: 2, LR: 1.4036626823116571e-07, Loss: 1173.222900390625
2024-08-03T04:59:13.132981838Z 
  0%|          | 3/9500 [01:43<71:26:29, 27.08s/it] 08/02/2024 21:59:13 - INFO - __main__ -   Step: 3, LR: 2.1054940234674854e-07, Loss: 845.58349609375
2024-08-03T04:59:25.157600879Z 
  0%|          | 4/9500 [01:55<55:45:16, 21.14s/it]08/02/2024 21:59:25 - INFO - __main__ -   Step: 4, LR: 2.8073253646233143e-07, Loss: 959.32763671875
2024-08-03T04:59:37.253936412Z 
  0%|          | 5/9500 [02:07<47:08:59, 17.88s/it]08/02/2024 21:59:37 - INFO - __main__ -   Step: 5, LR: 3.5091567057791426e-07, Loss: 927.6259155273438
2024-08-03T04:59:49.851036268Z 
  0%|          | 6/9500 [02:19<42:24:39, 16.08s/it]08/02/2024 21:59:49 - INFO - __main__ -   Step: 6, LR: 4.210988046934971e-07, Loss: 915.8277587890625
2024-08-03T05:00:01.887155134Z 
  0%|          | 7/9500 [02:31<38:55:07, 14.76s/it]08/02/2024 22:00:01 - INFO - __main__ -   Step: 7, LR: 4.912819388090799e-07, Loss: 880.4804077148438
2024-08-03T05:00:14.702557408Z 
  0%|          | 8/9500 [02:44<37:16:57, 14.14s/it]08/02/2024 22:00:14 - INFO - __main__ -   Step: 8, LR: 5.614650729246629e-07, Loss: 744.208984375
2024-08-03T05:00:26.793994135Z 
  0%|          | 9/9500 [02:56<35:35:27, 13.50s/it]08/02/2024 22:00:26 - INFO - __main__ -   Step: 9, LR: 6.316482070402456e-07, Loss: 1005.769775390625
2024-08-03T05:00:38.853850789Z 
  0%|          | 10/9500 [03:08<34:24:55, 13.06s/it]08/02/2024 22:00:38 - INFO - __main__ -   Step: 10, LR: 7.018313411558285e-07, Loss: 750.9688720703125
2024-08-03T05:00:51.205316769Z 
  0%|          | 11/9500 [03:21<33:50:37, 12.84s/it]08/02/2024 22:00:51 - INFO - __main__ -   Step: 11, LR: 7.720144752714113e-07, Loss: 913.9368896484375
2024-08-03T05:01:03.764522919Z 
  0%|          | 12/9500 [03:33<33:36:55, 12.75s/it]08/02/2024 22:01:03 - INFO - __main__ -   Step: 12, LR: 8.421976093869942e-07, Loss: 1049.71142578125
2024-08-03T05:01:15.598878445Z 
  0%|          | 13/9500 [03:45<32:52:37, 12.48s/it]08/02/2024 22:01:15 - INFO - __main__ -   Step: 13, LR: 9.123807435025771e-07, Loss: 837.425048828125
2024-08-03T05:01:27.697842786Z 
  0%|          | 14/9500 [03:57<32:34:24, 12.36s/it]08/02/2024 22:01:27 - INFO - __main__ -   Step: 14, LR: 9.825638776181598e-07, Loss: 846.2078857421875
2024-08-03T05:01:40.210659387Z 
  0%|          | 15/9500 [04:10<32:41:23, 12.41s/it]08/02/2024 22:01:40 - INFO - __main__ -   Step: 15, LR: 1.0527470117337428e-06, Loss: 954.1826782226562
2024-08-03T05:01:52.463349537Z 
  0%|          | 16/9500 [04:22<32:33:50, 12.36s/it]08/02/2024 22:01:52 - INFO - __main__ -   Step: 16, LR: 1.1229301458493257e-06, Loss: 756.9301147460938
2024-08-03T05:02:04.271015963Z 
  0%|          | 17/9500 [04:34<32:07:20, 12.19s/it]08/02/2024 22:02:04 - INFO - __main__ -   Step: 17, LR: 1.1931132799649084e-06, Loss: 746.9092407226562
2024-08-03T05:02:17.208726022Z 
  0%|          | 18/9500 [04:47<32:42:25, 12.42s/it]08/02/2024 22:02:17 - INFO - __main__ -   Step: 18, LR: 1.2632964140804912e-06, Loss: 726.599609375
2024-08-03T05:02:29.432389496Z 
  0%|          | 19/9500 [04:59<32:32:59, 12.36s/it]08/02/2024 22:02:29 - INFO - __main__ -   Step: 19, LR: 1.333479548196074e-06, Loss: 834.3648681640625
2024-08-03T05:02:41.421436502Z 
  0%|          | 20/9500 [05:11<32:15:13, 12.25s/it]08/02/2024 22:02:41 - INFO - __main__ -   Step: 20, LR: 1.403662682311657e-06, Loss: 677.414794921875
2024-08-03T05:02:54.549192605Z 
  0%|          | 21/9500 [05:24<32:56:43, 12.51s/it]08/02/2024 22:02:54 - INFO - __main__ -   Step: 21, LR: 1.4738458164272398e-06, Loss: 1100.613037109375
2024-08-03T05:03:06.688718558Z 
  0%|          | 22/9500 [05:36<32:38:50, 12.40s/it]08/02/2024 22:03:06 - INFO - __main__ -   Step: 22, LR: 1.5440289505428227e-06, Loss: 774.4853515625
2024-08-03T05:03:18.694726926Z 
  0%|          | 23/9500 [05:48<32:19:56, 12.28s/it]08/02/2024 22:03:18 - INFO - __main__ -   Step: 23, LR: 1.6142120846584056e-06, Loss: 813.7279052734375
2024-08-03T05:03:31.344557125Z 
  0%|          | 24/9500 [06:01<32:37:09, 12.39s/it]08/02/2024 22:03:31 - INFO - __main__ -   Step: 24, LR: 1.6843952187739884e-06, Loss: 830.3314819335938
2024-08-03T05:03:43.485556262Z 
  0%|          | 25/9500 [06:13<32:25:03, 12.32s/it]08/02/2024 22:03:43 - INFO - __main__ -   Step: 25, LR: 1.7545783528895713e-06, Loss: 893.2393798828125
2024-08-03T05:03:55.623772985Z 
  0%|          | 26/9500 [06:25<32:16:24, 12.26s/it]08/02/2024 22:03:55 - INFO - __main__ -   Step: 26, LR: 1.8247614870051542e-06, Loss: 659.5628662109375
2024-08-03T05:04:08.015615267Z 
  0%|          | 27/9500 [06:37<32:22:16, 12.30s/it]08/02/2024 22:04:08 - INFO - __main__ -   Step: 27, LR: 1.894944621120737e-06, Loss: 671.7074584960938
2024-08-03T05:04:20.288554112Z 
  0%|          | 28/9500 [06:50<32:20:39, 12.29s/it]08/02/2024 22:04:20 - INFO - __main__ -   Step: 28, LR: 1.9651277552363197e-06, Loss: 1036.3692626953125
2024-08-03T05:04:32.512412441Z 
  0%|          | 29/9500 [07:02<32:17:11, 12.27s/it]08/02/2024 22:04:32 - INFO - __main__ -   Step: 29, LR: 2.0353108893519026e-06, Loss: 981.8345947265625
2024-08-03T05:04:45.321509565Z 
  0%|          | 30/9500 [07:15<32:42:24, 12.43s/it]08/02/2024 22:04:45 - INFO - __main__ -   Step: 30, LR: 2.1054940234674856e-06, Loss: 877.7186889648438
2024-08-03T05:04:57.583715171Z 
  0%|          | 31/9500 [07:27<32:34:06, 12.38s/it]08/02/2024 22:04:57 - INFO - __main__ -   Step: 31, LR: 2.1756771575830685e-06, Loss: 796.18701171875
2024-08-03T05:05:09.410143212Z 
  0%|          | 32/9500 [07:39<32:07:35, 12.22s/it]08/02/2024 22:05:09 - INFO - __main__ -   Step: 32, LR: 2.2458602916986514e-06, Loss: 683.1591186523438
2024-08-03T05:05:21.920345397Z 
  0%|          | 33/9500 [07:51<32:21:19, 12.30s/it]08/02/2024 22:05:21 - INFO - __main__ -   Step: 33, LR: 2.3160434258142344e-06, Loss: 739.842529296875
2024-08-03T05:05:34.546135904Z 
  0%|          | 34/9500 [08:04<32:36:21, 12.40s/it]08/02/2024 22:05:34 - INFO - __main__ -   Step: 34, LR: 2.386226559929817e-06, Loss: 874.0208740234375
2024-08-03T05:05:46.553888408Z 
  0%|          | 35/9500 [08:16<32:17:36, 12.28s/it]08/02/2024 22:05:46 - INFO - __main__ -   Step: 35, LR: 2.4564096940454e-06, Loss: 913.9013671875
2024-08-03T05:05:59.183909388Z 
  0%|          | 36/9500 [08:29<32:33:48, 12.39s/it]08/02/2024 22:05:59 - INFO - __main__ -   Step: 36, LR: 2.5265928281609823e-06, Loss: 816.8082885742188
2024-08-03T05:06:11.593610504Z 
  0%|          | 37/9500 [08:41<32:34:41, 12.39s/it]08/02/2024 22:06:11 - INFO - __main__ -   Step: 37, LR: 2.5967759622765653e-06, Loss: 846.5671997070312
2024-08-03T05:06:23.628256486Z 
  0%|          | 38/9500 [08:53<32:17:31, 12.29s/it]08/02/2024 22:06:23 - INFO - __main__ -   Step: 38, LR: 2.666959096392148e-06, Loss: 606.8117065429688
2024-08-03T05:06:36.404006261Z 
  0%|          | 39/9500 [09:06<32:40:27, 12.43s/it]08/02/2024 22:06:36 - INFO - __main__ -   Step: 39, LR: 2.737142230507731e-06, Loss: 870.6192626953125
2024-08-03T05:06:48.553623756Z 
  0%|          | 40/9500 [09:18<32:26:51, 12.35s/it]08/02/2024 22:06:48 - INFO - __main__ -   Step: 40, LR: 2.807325364623314e-06, Loss: 563.3338623046875
2024-08-03T05:07:00.565963444Z 
  0%|          | 41/9500 [09:30<32:10:47, 12.25s/it]08/02/2024 22:07:00 - INFO - __main__ -   Step: 41, LR: 2.877508498738897e-06, Loss: 688.9085693359375
2024-08-03T05:07:13.311641601Z 
  0%|          | 42/9500 [09:43<32:34:06, 12.40s/it]08/02/2024 22:07:13 - INFO - __main__ -   Step: 42, LR: 2.9476916328544795e-06, Loss: 757.5635986328125
2024-08-03T05:07:25.568042469Z 
  0%|          | 43/9500 [09:55<32:27:18, 12.35s/it]08/02/2024 22:07:25 - INFO - __main__ -   Step: 43, LR: 3.0178747669700625e-06, Loss: 730.6018676757812
2024-08-03T05:07:37.807102217Z 
  0%|          | 44/9500 [10:07<32:21:38, 12.32s/it]08/02/2024 22:07:37 - INFO - __main__ -   Step: 44, LR: 3.0880579010856454e-06, Loss: 682.5567016601562
2024-08-03T05:07:50.400361784Z 
  0%|          | 45/9500 [10:20<32:34:20, 12.40s/it]08/02/2024 22:07:50 - INFO - __main__ -   Step: 45, LR: 3.1582410352012283e-06, Loss: 829.3323974609375
2024-08-03T05:08:02.578936998Z 
  0%|          | 46/9500 [10:32<32:23:31, 12.33s/it]08/02/2024 22:08:02 - INFO - __main__ -   Step: 46, LR: 3.2284241693168113e-06, Loss: 689.4173583984375
2024-08-03T05:08:14.614806169Z 
  0%|          | 47/9500 [10:44<32:09:14, 12.25s/it]08/02/2024 22:08:14 - INFO - __main__ -   Step: 47, LR: 3.2986073034323938e-06, Loss: 650.022705078125
2024-08-03T05:08:27.506840543Z 
  1%|          | 48/9500 [10:57<32:39:37, 12.44s/it]08/02/2024 22:08:27 - INFO - __main__ -   Step: 48, LR: 3.3687904375479767e-06, Loss: 641.2435302734375
2024-08-03T05:08:39.825548056Z 
  1%|          | 49/9500 [11:09<32:33:41, 12.40s/it]08/02/2024 22:08:39 - INFO - __main__ -   Step: 49, LR: 3.4389735716635597e-06, Loss: 749.7808837890625
2024-08-03T05:08:51.902635208Z 
  1%|          | 50/9500 [11:21<32:18:06, 12.31s/it]08/02/2024 22:08:51 - INFO - __main__ -   Step: 50, LR: 3.5091567057791426e-06, Loss: 552.93359375
2024-08-03T05:09:04.309080201Z 
  1%|          | 51/9500 [11:34<32:22:39, 12.34s/it]08/02/2024 22:09:04 - INFO - __main__ -   Step: 51, LR: 3.5793398398947255e-06, Loss: 818.320556640625
2024-08-03T05:09:16.571542769Z 
  1%|          | 52/9500 [11:46<32:18:59, 12.31s/it]08/02/2024 22:09:16 - INFO - __main__ -   Step: 52, LR: 3.6495229740103085e-06, Loss: 695.0732421875
2024-08-03T05:09:28.566677272Z 
  1%|          | 53/9500 [11:58<32:03:45, 12.22s/it]08/02/2024 22:09:28 - INFO - __main__ -   Step: 53, LR: 3.719706108125891e-06, Loss: 668.8046264648438
2024-08-03T05:09:41.051787940Z 
  1%|          | 54/9500 [12:10<32:16:09, 12.30s/it]08/02/2024 22:09:41 - INFO - __main__ -   Step: 54, LR: 3.789889242241474e-06, Loss: 784.37060546875
2024-08-03T05:09:53.457649656Z 
  1%|          | 55/9500 [12:23<32:21:01, 12.33s/it]08/02/2024 22:09:53 - INFO - __main__ -   Step: 55, LR: 3.860072376357057e-06, Loss: 516.0864868164062
2024-08-03T05:10:05.381850773Z 
  1%|          | 56/9500 [12:35<32:01:37, 12.21s/it]08/02/2024 22:10:05 - INFO - __main__ -   Step: 56, LR: 3.930255510472639e-06, Loss: 740.6519775390625
2024-08-03T05:10:17.984826433Z 
  1%|          | 57/9500 [12:47<32:20:02, 12.33s/it]08/02/2024 22:10:17 - INFO - __main__ -   Step: 57, LR: 4.000438644588223e-06, Loss: 609.7257690429688
2024-08-03T05:10:30.433079040Z 
  1%|          | 58/9500 [13:00<32:25:35, 12.36s/it]08/02/2024 22:10:30 - INFO - __main__ -   Step: 58, LR: 4.070621778703805e-06, Loss: 704.318359375
2024-08-03T05:10:42.380166801Z 
  1%|          | 59/9500 [13:12<32:05:43, 12.24s/it]08/02/2024 22:10:42 - INFO - __main__ -   Step: 59, LR: 4.140804912819389e-06, Loss: 675.7437744140625
2024-08-03T05:10:55.007171709Z 
  1%|          | 60/9500 [13:24<32:23:51, 12.36s/it]08/02/2024 22:10:55 - INFO - __main__ -   Step: 60, LR: 4.210988046934971e-06, Loss: 677.7422485351562
2024-08-03T05:11:07.815778011Z 
  1%|          | 61/9500 [13:37<32:45:03, 12.49s/it]08/02/2024 22:11:07 - INFO - __main__ -   Step: 61, LR: 4.2811711810505545e-06, Loss: 643.6136474609375
2024-08-03T05:11:20.036593867Z 
  1%|          | 62/9500 [13:49<32:32:05, 12.41s/it]08/02/2024 22:11:20 - INFO - __main__ -   Step: 62, LR: 4.351354315166137e-06, Loss: 725.4798583984375
2024-08-03T05:11:32.869456219Z 
  1%|          | 63/9500 [14:02<32:51:50, 12.54s/it]08/02/2024 22:11:32 - INFO - __main__ -   Step: 63, LR: 4.4215374492817195e-06, Loss: 774.119384765625
2024-08-03T05:11:45.210898416Z 
  1%|          | 64/9500 [14:15<32:42:25, 12.48s/it]08/02/2024 22:11:45 - INFO - __main__ -   Step: 64, LR: 4.491720583397303e-06, Loss: 690.6632080078125
2024-08-03T05:11:57.397755551Z 
  1%|          | 65/9500 [14:27<32:28:27, 12.39s/it]08/02/2024 22:11:57 - INFO - __main__ -   Step: 65, LR: 4.561903717512885e-06, Loss: 766.937744140625
2024-08-03T05:12:09.986614411Z 
  1%|          | 66/9500 [14:39<32:37:32, 12.45s/it]08/02/2024 22:12:09 - INFO - __main__ -   Step: 66, LR: 4.632086851628469e-06, Loss: 852.6328125
2024-08-03T05:12:22.462481068Z 
  1%|          | 67/9500 [14:52<32:38:36, 12.46s/it]08/02/2024 22:12:22 - INFO - __main__ -   Step: 67, LR: 4.702269985744051e-06, Loss: 650.8765869140625
2024-08-03T05:12:34.573401040Z 
  1%|          | 68/9500 [15:04<32:22:01, 12.35s/it]08/02/2024 22:12:34 - INFO - __main__ -   Step: 68, LR: 4.772453119859634e-06, Loss: 850.2139282226562
2024-08-03T05:12:47.314821800Z 
  1%|          | 69/9500 [15:17<32:40:05, 12.47s/it]08/02/2024 22:12:47 - INFO - __main__ -   Step: 69, LR: 4.842636253975217e-06, Loss: 838.8920288085938
2024-08-03T05:13:00.170681068Z 
  1%|          | 70/9500 [15:30<32:58:04, 12.59s/it]08/02/2024 22:13:00 - INFO - __main__ -   Step: 70, LR: 4.9128193880908e-06, Loss: 877.9049072265625
2024-08-03T05:13:12.396523835Z 
  1%|          | 71/9500 [15:42<32:40:54, 12.48s/it]08/02/2024 22:13:12 - INFO - __main__ -   Step: 71, LR: 4.983002522206383e-06, Loss: 804.1677856445312
2024-08-03T05:13:25.181209514Z 
  1%|          | 72/9500 [15:55<32:55:08, 12.57s/it]08/02/2024 22:13:25 - INFO - __main__ -   Step: 72, LR: 5.053185656321965e-06, Loss: 941.557861328125
2024-08-03T05:13:37.631202538Z 
  1%|          | 73/9500 [16:07<32:49:17, 12.53s/it]08/02/2024 22:13:37 - INFO - __main__ -   Step: 73, LR: 5.123368790437548e-06, Loss: 646.1536865234375
2024-08-03T05:13:50.010457609Z 
  1%|          | 74/9500 [16:19<32:41:47, 12.49s/it]08/02/2024 22:13:50 - INFO - __main__ -   Step: 74, LR: 5.1935519245531305e-06, Loss: 753.7862548828125
2024-08-03T05:14:02.094058418Z 
  1%|          | 75/9500 [16:32<32:22:33, 12.37s/it]08/02/2024 22:14:02 - INFO - __main__ -   Step: 75, LR: 5.263735058668714e-06, Loss: 583.718994140625
2024-08-03T05:14:14.214487874Z 
  1%|          | 76/9500 [16:44<32:10:45, 12.29s/it]08/02/2024 22:14:14 - INFO - __main__ -   Step: 76, LR: 5.333918192784296e-06, Loss: 493.2238464355469
2024-08-03T05:14:26.554942312Z 
  1%|          | 77/9500 [16:56<32:12:48, 12.31s/it]08/02/2024 22:14:26 - INFO - __main__ -   Step: 77, LR: 5.40410132689988e-06, Loss: 785.79541015625
2024-08-03T05:14:38.837162975Z 
  1%|          | 78/9500 [17:08<32:11:26, 12.30s/it]08/02/2024 22:14:38 - INFO - __main__ -   Step: 78, LR: 5.474284461015462e-06, Loss: 857.4005126953125
2024-08-03T05:14:51.215164442Z 
  1%|          | 79/9500 [17:21<32:14:55, 12.32s/it]08/02/2024 22:14:51 - INFO - __main__ -   Step: 79, LR: 5.544467595131046e-06, Loss: 798.7342529296875
2024-08-03T05:15:03.220041402Z 
  1%|          | 80/9500 [17:33<31:59:44, 12.23s/it]08/02/2024 22:15:03 - INFO - __main__ -   Step: 80, LR: 5.614650729246628e-06, Loss: 547.795166015625
2024-08-03T05:15:15.631331906Z 
  1%|          | 81/9500 [17:45<32:08:10, 12.28s/it]08/02/2024 22:15:15 - INFO - __main__ -   Step: 81, LR: 5.6848338633622115e-06, Loss: 1013.0118408203125
2024-08-03T05:15:28.091716655Z 
  1%|          | 82/9500 [17:58<32:16:20, 12.34s/it]08/02/2024 22:15:28 - INFO - __main__ -   Step: 82, LR: 5.755016997477794e-06, Loss: 601.5081787109375
2024-08-03T05:15:40.316174192Z 
  1%|          | 83/9500 [18:10<32:10:52, 12.30s/it]08/02/2024 22:15:40 - INFO - __main__ -   Step: 83, LR: 5.825200131593377e-06, Loss: 785.239990234375
2024-08-03T05:15:52.322731772Z 
  1%|          | 84/9500 [18:22<31:56:44, 12.21s/it]08/02/2024 22:15:52 - INFO - __main__ -   Step: 84, LR: 5.895383265708959e-06, Loss: 742.2146606445312
2024-08-03T05:16:05.144083421Z 
  1%|          | 85/9500 [18:35<32:25:09, 12.40s/it]08/02/2024 22:16:05 - INFO - __main__ -   Step: 85, LR: 5.965566399824542e-06, Loss: 756.3757934570312
2024-08-03T05:16:17.341644177Z 
  1%|          | 86/9500 [18:47<32:15:35, 12.34s/it]08/02/2024 22:16:17 - INFO - __main__ -   Step: 86, LR: 6.035749533940125e-06, Loss: 605.3018798828125
2024-08-03T05:16:29.449321226Z 
  1%|          | 87/9500 [18:59<32:04:38, 12.27s/it]08/02/2024 22:16:29 - INFO - __main__ -   Step: 87, LR: 6.105932668055708e-06, Loss: 644.5073852539062
2024-08-03T05:16:42.444095670Z 
  1%|          | 88/9500 [19:12<32:38:36, 12.49s/it]08/02/2024 22:16:42 - INFO - __main__ -   Step: 88, LR: 6.176115802171291e-06, Loss: 700.3208618164062
2024-08-03T05:16:54.753789064Z 
  1%|          | 89/9500 [19:24<32:30:08, 12.43s/it]08/02/2024 22:16:54 - INFO - __main__ -   Step: 89, LR: 6.246298936286874e-06, Loss: 581.6559448242188
2024-08-03T05:17:07.038940277Z 
  1%|          | 90/9500 [19:36<32:22:57, 12.39s/it]08/02/2024 22:17:07 - INFO - __main__ -   Step: 90, LR: 6.316482070402457e-06, Loss: 934.1231689453125
2024-08-03T05:17:20.065760425Z 
  1%|          | 91/9500 [19:50<32:52:45, 12.58s/it]08/02/2024 22:17:20 - INFO - __main__ -   Step: 91, LR: 6.38666520451804e-06, Loss: 873.905517578125
2024-08-03T05:17:32.145280653Z 
  1%|          | 92/9500 [20:02<32:29:01, 12.43s/it]08/02/2024 22:17:32 - INFO - __main__ -   Step: 92, LR: 6.4568483386336225e-06, Loss: 848.2255859375
2024-08-03T05:17:44.543588071Z 
  1%|          | 93/9500 [20:14<32:27:18, 12.42s/it]08/02/2024 22:17:44 - INFO - __main__ -   Step: 93, LR: 6.527031472749206e-06, Loss: 885.1182861328125
2024-08-03T05:17:57.647697194Z 
  1%|          | 94/9500 [20:27<32:59:16, 12.63s/it]08/02/2024 22:17:57 - INFO - __main__ -   Step: 94, LR: 6.5972146068647876e-06, Loss: 670.2018432617188
2024-08-03T05:18:10.026576287Z 
  1%|          | 95/9500 [20:39<32:47:27, 12.55s/it]08/02/2024 22:18:10 - INFO - __main__ -   Step: 95, LR: 6.667397740980372e-06, Loss: 665.0640869140625
2024-08-03T05:18:22.029743714Z 
  1%|          | 96/9500 [20:51<32:21:27, 12.39s/it]08/02/2024 22:18:22 - INFO - __main__ -   Step: 96, LR: 6.7375808750959534e-06, Loss: 634.0916137695312
2024-08-03T05:18:34.750149072Z 
  1%|          | 97/9500 [21:04<32:36:54, 12.49s/it]08/02/2024 22:18:34 - INFO - __main__ -   Step: 97, LR: 6.807764009211537e-06, Loss: 743.126220703125
2024-08-03T05:18:47.043609601Z 
  1%|          | 98/9500 [21:16<32:27:36, 12.43s/it]08/02/2024 22:18:47 - INFO - __main__ -   Step: 98, LR: 6.877947143327119e-06, Loss: 672.9864501953125
2024-08-03T05:18:59.358733799Z 
  1%|          | 99/9500 [21:29<32:22:03, 12.39s/it]08/02/2024 22:18:59 - INFO - __main__ -   Step: 99, LR: 6.948130277442703e-06, Loss: 782.7645263671875
2024-08-03T05:19:12.236039830Z 
  1%|          | 100/9500 [21:42<32:44:31, 12.54s/it]08/02/2024 22:19:12 - INFO - __main__ -   Step: 100, LR: 7.018313411558285e-06, Loss: 756.8104248046875
2024-08-03T05:19:24.632971235Z 
  1%|          | 101/9500 [21:54<32:37:37, 12.50s/it]08/02/2024 22:19:24 - INFO - __main__ -   Step: 101, LR: 7.0884965456738685e-06, Loss: 926.7112426757812
2024-08-03T05:19:36.629002521Z 
  1%|          | 102/9500 [22:06<32:13:53, 12.35s/it]08/02/2024 22:19:36 - INFO - __main__ -   Step: 102, LR: 7.158679679789451e-06, Loss: 704.9784545898438
2024-08-03T05:19:49.370607601Z 
  1%|          | 103/9500 [22:19<32:32:14, 12.47s/it]08/02/2024 22:19:49 - INFO - __main__ -   Step: 103, LR: 7.228862813905034e-06, Loss: 704.937255859375
2024-08-03T05:20:02.111549826Z 
  1%|          | 104/9500 [22:32<32:44:59, 12.55s/it]08/02/2024 22:20:02 - INFO - __main__ -   Step: 104, LR: 7.299045948020617e-06, Loss: 777.0748291015625
2024-08-03T05:20:14.277765416Z 
  1%|          | 105/9500 [22:44<32:26:50, 12.43s/it]08/02/2024 22:20:14 - INFO - __main__ -   Step: 105, LR: 7.3692290821362e-06, Loss: 793.6656494140625
2024-08-03T05:20:26.760269885Z 
  1%|          | 106/9500 [22:56<32:28:56, 12.45s/it]08/02/2024 22:20:26 - INFO - __main__ -   Step: 106, LR: 7.439412216251782e-06, Loss: 828.3385620117188
2024-08-03T05:20:38.947958248Z 
  1%|          | 107/9500 [23:08<32:16:31, 12.37s/it]08/02/2024 22:20:38 - INFO - __main__ -   Step: 107, LR: 7.509595350367366e-06, Loss: 712.111328125
2024-08-03T05:20:51.139513202Z 
  1%|          | 108/9500 [23:21<32:07:55, 12.32s/it]08/02/2024 22:20:51 - INFO - __main__ -   Step: 108, LR: 7.579778484482948e-06, Loss: 679.3271484375
2024-08-03T05:21:03.825485925Z 
  1%|          | 109/9500 [23:33<32:25:05, 12.43s/it]08/02/2024 22:21:03 - INFO - __main__ -   Step: 109, LR: 7.649961618598532e-06, Loss: 824.7820434570312
2024-08-03T05:21:16.075460217Z 
  1%|          | 110/9500 [23:46<32:16:33, 12.37s/it]08/02/2024 22:21:16 - INFO - __main__ -   Step: 110, LR: 7.720144752714114e-06, Loss: 754.0005493164062
2024-08-03T05:21:28.570927301Z 
  1%|          | 111/9500 [23:58<32:22:01, 12.41s/it]08/02/2024 22:21:28 - INFO - __main__ -   Step: 111, LR: 7.790327886829697e-06, Loss: 952.9481201171875
2024-08-03T05:21:40.714939218Z 
  1%|          | 112/9500 [24:10<32:09:19, 12.33s/it]08/02/2024 22:21:40 - INFO - __main__ -   Step: 112, LR: 7.860511020945279e-06, Loss: 598.6690063476562
2024-08-03T05:21:53.164591554Z 
  1%|          | 113/9500 [24:23<32:14:42, 12.37s/it]08/02/2024 22:21:53 - INFO - __main__ -   Step: 113, LR: 7.930694155060862e-06, Loss: 752.345947265625
2024-08-03T05:22:05.383086091Z 
  1%|          | 114/9500 [24:35<32:07:34, 12.32s/it]08/02/2024 22:22:05 - INFO - __main__ -   Step: 114, LR: 8.000877289176445e-06, Loss: 782.0490112304688
2024-08-03T05:22:17.862791932Z 
  1%|          | 115/9500 [24:47<32:14:40, 12.37s/it]08/02/2024 22:22:17 - INFO - __main__ -   Step: 115, LR: 8.071060423292029e-06, Loss: 634.1140747070312
2024-08-03T05:22:29.936089090Z 
  1%|          | 116/9500 [24:59<32:00:41, 12.28s/it]08/02/2024 22:22:29 - INFO - __main__ -   Step: 116, LR: 8.14124355740761e-06, Loss: 777.3321533203125
2024-08-03T05:22:42.017030941Z 
  1%|          | 117/9500 [25:11<31:51:06, 12.22s/it]08/02/2024 22:22:42 - INFO - __main__ -   Step: 117, LR: 8.211426691523194e-06, Loss: 692.0517578125
2024-08-03T05:22:54.405130649Z 
  1%|          | 118/9500 [25:24<31:58:47, 12.27s/it]08/02/2024 22:22:54 - INFO - __main__ -   Step: 118, LR: 8.281609825638777e-06, Loss: 602.9234619140625
2024-08-03T05:23:06.704588755Z 
  1%|▏         | 119/9500 [25:36<31:59:53, 12.28s/it]08/02/2024 22:23:06 - INFO - __main__ -   Step: 119, LR: 8.35179295975436e-06, Loss: 703.8271484375
2024-08-03T05:23:19.032614383Z 
  1%|▏         | 120/9500 [25:48<32:01:58, 12.29s/it]08/02/2024 22:23:19 - INFO - __main__ -   Step: 120, LR: 8.421976093869942e-06, Loss: 778.3741455078125
2024-08-03T05:23:31.652324125Z 
  1%|▏         | 121/9500 [26:01<32:17:02, 12.39s/it]08/02/2024 22:23:31 - INFO - __main__ -   Step: 121, LR: 8.492159227985526e-06, Loss: 833.2286376953125
2024-08-03T05:23:44.266928333Z 
  1%|▏         | 122/9500 [26:14<32:27:16, 12.46s/it]08/02/2024 22:23:44 - INFO - __main__ -   Step: 122, LR: 8.562342362101109e-06, Loss: 925.1005859375
2024-08-03T05:23:56.712068890Z 
  1%|▏         | 123/9500 [26:26<32:26:25, 12.45s/it]08/02/2024 22:23:56 - INFO - __main__ -   Step: 123, LR: 8.632525496216692e-06, Loss: 750.6466064453125
2024-08-03T05:24:08.804946928Z 
  1%|▏         | 124/9500 [26:38<32:09:16, 12.35s/it]08/02/2024 22:24:08 - INFO - __main__ -   Step: 124, LR: 8.702708630332274e-06, Loss: 682.558837890625
2024-08-03T05:24:21.902203488Z 
  1%|▏         | 125/9500 [26:51<32:44:17, 12.57s/it]08/02/2024 22:24:21 - INFO - __main__ -   Step: 125, LR: 8.772891764447857e-06, Loss: 793.0782470703125
2024-08-03T05:24:34.234716081Z 
  1%|▏         | 126/9500 [27:04<32:32:52, 12.50s/it]08/02/2024 22:24:34 - INFO - __main__ -   Step: 126, LR: 8.843074898563439e-06, Loss: 802.8662109375
2024-08-03T05:24:46.644812399Z 
  1%|▏         | 127/9500 [27:16<32:28:27, 12.47s/it]08/02/2024 22:24:46 - INFO - __main__ -   Step: 127, LR: 8.913258032679022e-06, Loss: 933.0335083007812
2024-08-03T05:24:59.420195819Z 
  1%|▏         | 128/9500 [27:29<32:42:25, 12.56s/it]08/02/2024 22:24:59 - INFO - __main__ -   Step: 128, LR: 8.983441166794606e-06, Loss: 687.5196533203125
2024-08-03T05:25:11.534586909Z 
  1%|▏         | 129/9500 [27:41<32:21:10, 12.43s/it]08/02/2024 22:25:11 - INFO - __main__ -   Step: 129, LR: 9.053624300910189e-06, Loss: 665.2922973632812
2024-08-03T05:25:23.860551183Z 
  1%|▏         | 130/9500 [27:53<32:16:09, 12.40s/it]08/02/2024 22:25:23 - INFO - __main__ -   Step: 130, LR: 9.12380743502577e-06, Loss: 734.6893310546875
2024-08-03T05:25:36.358990828Z 
  1%|▏         | 131/9500 [28:06<32:20:39, 12.43s/it]08/02/2024 22:25:36 - INFO - __main__ -   Step: 131, LR: 9.193990569141354e-06, Loss: 660.3497314453125
2024-08-03T05:25:48.806403670Z 
  1%|▏         | 132/9500 [28:18<32:21:21, 12.43s/it]08/02/2024 22:25:48 - INFO - __main__ -   Step: 132, LR: 9.264173703256937e-06, Loss: 804.8412475585938
2024-08-03T05:26:01.062977004Z 
  1%|▏         | 133/9500 [28:30<32:12:49, 12.38s/it]08/02/2024 22:26:01 - INFO - __main__ -   Step: 133, LR: 9.33435683737252e-06, Loss: 832.4326171875
2024-08-03T05:26:13.868179035Z 
  1%|▏         | 134/9500 [28:43<32:32:30, 12.51s/it]08/02/2024 22:26:13 - INFO - __main__ -   Step: 134, LR: 9.404539971488102e-06, Loss: 963.359130859375
2024-08-03T05:26:26.230412099Z 
  1%|▏         | 135/9500 [28:56<32:25:28, 12.46s/it]08/02/2024 22:26:26 - INFO - __main__ -   Step: 135, LR: 9.474723105603686e-06, Loss: 813.9011840820312
2024-08-03T05:26:38.305265704Z 
  1%|▏         | 136/9500 [29:08<32:07:01, 12.35s/it]08/02/2024 22:26:38 - INFO - __main__ -   Step: 136, LR: 9.544906239719268e-06, Loss: 837.3336791992188
2024-08-03T05:26:51.076888938Z 
  1%|▏         | 137/9500 [29:21<32:26:40, 12.47s/it]08/02/2024 22:26:51 - INFO - __main__ -   Step: 137, LR: 9.615089373834851e-06, Loss: 755.1436767578125
2024-08-03T05:27:03.143406671Z 
  1%|▏         | 138/9500 [29:33<32:07:21, 12.35s/it]08/02/2024 22:27:03 - INFO - __main__ -   Step: 138, LR: 9.685272507950434e-06, Loss: 732.1945190429688
2024-08-03T05:27:15.107339386Z 
  1%|▏         | 139/9500 [29:45<31:48:59, 12.24s/it]08/02/2024 22:27:15 - INFO - __main__ -   Step: 139, LR: 9.755455642066018e-06, Loss: 673.3106689453125
2024-08-03T05:27:27.496821228Z 
  1%|▏         | 140/9500 [29:57<31:55:58, 12.28s/it]08/02/2024 22:27:27 - INFO - __main__ -   Step: 140, LR: 9.8256387761816e-06, Loss: 677.6710205078125
2024-08-03T05:27:39.601794987Z 
  1%|▏         | 141/9500 [30:09<31:47:29, 12.23s/it]08/02/2024 22:27:39 - INFO - __main__ -   Step: 141, LR: 9.895821910297183e-06, Loss: 717.5072021484375
2024-08-03T05:27:51.700852733Z 
  1%|▏         | 142/9500 [30:21<31:41:12, 12.19s/it]08/02/2024 22:27:51 - INFO - __main__ -   Step: 142, LR: 9.966005044412766e-06, Loss: 726.0784912109375
2024-08-03T05:28:04.593971094Z 
  2%|▏         | 143/9500 [30:34<32:13:55, 12.40s/it]08/02/2024 22:28:04 - INFO - __main__ -   Step: 143, LR: 1.0036188178528348e-05, Loss: 816.5350341796875
2024-08-03T05:28:17.130337101Z 
  2%|▏         | 144/9500 [30:47<32:20:00, 12.44s/it]08/02/2024 22:28:17 - INFO - __main__ -   Step: 144, LR: 1.010637131264393e-05, Loss: 636.94287109375
2024-08-03T05:28:29.444664006Z 
  2%|▏         | 145/9500 [30:59<32:13:53, 12.40s/it]08/02/2024 22:28:29 - INFO - __main__ -   Step: 145, LR: 1.0176554446759514e-05, Loss: 651.7576293945312
2024-08-03T05:28:41.864809051Z 
  2%|▏         | 146/9500 [31:11<32:14:28, 12.41s/it]08/02/2024 22:28:41 - INFO - __main__ -   Step: 146, LR: 1.0246737580875096e-05, Loss: 696.9599609375
2024-08-03T05:28:53.979997054Z 
  2%|▏         | 147/9500 [31:23<32:00:33, 12.32s/it]08/02/2024 22:28:53 - INFO - __main__ -   Step: 147, LR: 1.031692071499068e-05, Loss: 657.1322631835938
2024-08-03T05:29:06.042717792Z 
  2%|▏         | 148/9500 [31:35<31:48:18, 12.24s/it]08/02/2024 22:29:06 - INFO - __main__ -   Step: 148, LR: 1.0387103849106261e-05, Loss: 597.7425537109375
2024-08-03T05:29:18.659223195Z 
  2%|▏         | 149/9500 [31:48<32:05:32, 12.36s/it]08/02/2024 22:29:18 - INFO - __main__ -   Step: 149, LR: 1.0457286983221846e-05, Loss: 796.6029052734375
2024-08-03T05:29:30.702444148Z 
  2%|▏         | 150/9500 [32:00<31:50:45, 12.26s/it]08/02/2024 22:29:30 - INFO - __main__ -   Step: 150, LR: 1.0527470117337428e-05, Loss: 642.94873046875
2024-08-03T05:29:42.856510097Z 
  2%|▏         | 151/9500 [32:12<31:45:32, 12.23s/it]08/02/2024 22:29:42 - INFO - __main__ -   Step: 151, LR: 1.0597653251453011e-05, Loss: 717.316650390625
2024-08-03T05:29:55.539060293Z 
  2%|▏         | 152/9500 [32:25<32:06:29, 12.37s/it]08/02/2024 22:29:55 - INFO - __main__ -   Step: 152, LR: 1.0667836385568593e-05, Loss: 851.762939453125
2024-08-03T05:30:07.638821576Z 
  2%|▏         | 153/9500 [32:37<31:53:54, 12.29s/it]08/02/2024 22:30:07 - INFO - __main__ -   Step: 153, LR: 1.0738019519684178e-05, Loss: 935.242919921875
2024-08-03T05:30:20.049640697Z 
  2%|▏         | 154/9500 [32:49<31:59:32, 12.32s/it]08/02/2024 22:30:20 - INFO - __main__ -   Step: 154, LR: 1.080820265379976e-05, Loss: 748.546875
2024-08-03T05:30:32.666092637Z 
  2%|▏         | 155/9500 [33:02<32:13:03, 12.41s/it]08/02/2024 22:30:32 - INFO - __main__ -   Step: 155, LR: 1.0878385787915343e-05, Loss: 844.513671875
2024-08-03T05:30:44.860910154Z 
  2%|▏         | 156/9500 [33:14<32:02:43, 12.35s/it]08/02/2024 22:30:44 - INFO - __main__ -   Step: 156, LR: 1.0948568922030925e-05, Loss: 749.2213134765625
2024-08-03T05:30:57.100170683Z 
  2%|▏         | 157/9500 [33:27<31:57:31, 12.31s/it]08/02/2024 22:30:57 - INFO - __main__ -   Step: 157, LR: 1.101875205614651e-05, Loss: 666.4827880859375
2024-08-03T05:31:09.298966387Z 
  2%|▏         | 158/9500 [33:39<31:51:55, 12.28s/it]08/02/2024 22:31:09 - INFO - __main__ -   Step: 158, LR: 1.1088935190262091e-05, Loss: 622.7744750976562
2024-08-03T05:31:21.614689146Z 
  2%|▏         | 159/9500 [33:51<31:53:24, 12.29s/it]08/02/2024 22:31:21 - INFO - __main__ -   Step: 159, LR: 1.1159118324377673e-05, Loss: 787.6405639648438
2024-08-03T05:31:34.390017273Z 
  2%|▏         | 160/9500 [34:04<32:15:51, 12.44s/it]08/02/2024 22:31:34 - INFO - __main__ -   Step: 160, LR: 1.1229301458493256e-05, Loss: 940.3912353515625
2024-08-03T05:31:47.058237734Z 
  2%|▏         | 161/9500 [34:16<32:26:30, 12.51s/it]08/02/2024 22:31:47 - INFO - __main__ -   Step: 161, LR: 1.129948459260884e-05, Loss: 984.712646484375
2024-08-03T05:31:59.340519622Z 
  2%|▏         | 162/9500 [34:29<32:15:51, 12.44s/it]08/02/2024 22:31:59 - INFO - __main__ -   Step: 162, LR: 1.1369667726724423e-05, Loss: 848.109375
2024-08-03T05:32:11.437438530Z 
  2%|▏         | 163/9500 [34:41<31:59:41, 12.34s/it]08/02/2024 22:32:11 - INFO - __main__ -   Step: 163, LR: 1.1439850860840005e-05, Loss: 651.5557861328125
2024-08-03T05:32:24.368200013Z 
  2%|▏         | 164/9500 [34:54<32:27:15, 12.51s/it]08/02/2024 22:32:24 - INFO - __main__ -   Step: 164, LR: 1.1510033994955588e-05, Loss: 880.024169921875
2024-08-03T05:32:36.367036634Z 
  2%|▏         | 165/9500 [35:06<32:02:58, 12.36s/it]08/02/2024 22:32:36 - INFO - __main__ -   Step: 165, LR: 1.1580217129071171e-05, Loss: 711.703125
2024-08-03T05:32:48.909630327Z 
  2%|▏         | 166/9500 [35:18<32:11:17, 12.41s/it]08/02/2024 22:32:48 - INFO - __main__ -   Step: 166, LR: 1.1650400263186755e-05, Loss: 875.571533203125
2024-08-03T05:33:01.172313998Z 
  2%|▏         | 167/9500 [35:31<32:03:59, 12.37s/it]08/02/2024 22:33:01 - INFO - __main__ -   Step: 167, LR: 1.1720583397302336e-05, Loss: 798.6446533203125
2024-08-03T05:33:13.531283526Z 
  2%|▏         | 168/9500 [35:43<32:03:19, 12.37s/it]08/02/2024 22:33:13 - INFO - __main__ -   Step: 168, LR: 1.1790766531417918e-05, Loss: 731.406494140625
2024-08-03T05:33:25.962756158Z 
  2%|▏         | 169/9500 [35:55<32:06:10, 12.39s/it]08/02/2024 22:33:25 - INFO - __main__ -   Step: 169, LR: 1.1860949665533503e-05, Loss: 697.09375
2024-08-03T05:33:38.012401007Z 
  2%|▏         | 170/9500 [36:07<31:50:18, 12.28s/it]08/02/2024 22:33:38 - INFO - __main__ -   Step: 170, LR: 1.1931132799649085e-05, Loss: 874.01318359375
2024-08-03T05:33:50.762327979Z 
  2%|▏         | 171/9500 [36:20<32:11:46, 12.42s/it]08/02/2024 22:33:50 - INFO - __main__ -   Step: 171, LR: 1.2001315933764668e-05, Loss: 879.4803466796875
2024-08-03T05:34:03.220063740Z 
  2%|▏         | 172/9500 [36:33<32:13:08, 12.43s/it]08/02/2024 22:34:03 - INFO - __main__ -   Step: 172, LR: 1.207149906788025e-05, Loss: 877.0220947265625
2024-08-03T05:34:15.480859043Z 
  2%|▏         | 173/9500 [36:45<32:04:49, 12.38s/it]08/02/2024 22:34:15 - INFO - __main__ -   Step: 173, LR: 1.2141682201995835e-05, Loss: 794.9292602539062
2024-08-03T05:34:27.859183386Z 
  2%|▏         | 174/9500 [36:57<32:04:26, 12.38s/it]08/02/2024 22:34:27 - INFO - __main__ -   Step: 174, LR: 1.2211865336111417e-05, Loss: 650.895751953125
2024-08-03T05:34:40.277536627Z 
  2%|▏         | 175/9500 [37:10<32:05:58, 12.39s/it]08/02/2024 22:34:40 - INFO - __main__ -   Step: 175, LR: 1.2282048470227e-05, Loss: 782.0654296875
2024-08-03T05:34:52.334643004Z 
  2%|▏         | 176/9500 [37:22<31:50:07, 12.29s/it]08/02/2024 22:34:52 - INFO - __main__ -   Step: 176, LR: 1.2352231604342582e-05, Loss: 831.3199462890625
2024-08-03T05:35:04.964589911Z 
  2%|▏         | 177/9500 [37:34<32:05:41, 12.39s/it]08/02/2024 22:35:04 - INFO - __main__ -   Step: 177, LR: 1.2422414738458167e-05, Loss: 820.0050048828125
2024-08-03T05:35:17.380228936Z 
  2%|▏         | 178/9500 [37:47<32:06:28, 12.40s/it]08/02/2024 22:35:17 - INFO - __main__ -   Step: 178, LR: 1.2492597872573748e-05, Loss: 597.3533325195312
2024-08-03T05:35:29.227691511Z 
  2%|▏         | 179/9500 [37:59<31:40:35, 12.23s/it]08/02/2024 22:35:29 - INFO - __main__ -   Step: 179, LR: 1.256278100668933e-05, Loss: 556.50537109375
2024-08-03T05:35:41.992679126Z 
  2%|▏         | 180/9500 [38:11<32:05:08, 12.39s/it]08/02/2024 22:35:41 - INFO - __main__ -   Step: 180, LR: 1.2632964140804913e-05, Loss: 974.1165161132812
2024-08-03T05:35:54.272365778Z 
  2%|▏         | 181/9500 [38:24<31:59:35, 12.36s/it]08/02/2024 22:35:54 - INFO - __main__ -   Step: 181, LR: 1.2703147274920498e-05, Loss: 603.52197265625
2024-08-03T05:36:06.664918334Z 
  2%|▏         | 182/9500 [38:36<32:00:57, 12.37s/it]08/02/2024 22:36:06 - INFO - __main__ -   Step: 182, LR: 1.277333040903608e-05, Loss: 746.4427490234375
2024-08-03T05:36:19.439094516Z 
  2%|▏         | 183/9500 [38:49<32:19:36, 12.49s/it]08/02/2024 22:36:19 - INFO - __main__ -   Step: 183, LR: 1.2843513543151662e-05, Loss: 838.945556640625
2024-08-03T05:36:32.148315723Z 
  2%|▏         | 184/9500 [39:02<32:29:34, 12.56s/it]08/02/2024 22:36:32 - INFO - __main__ -   Step: 184, LR: 1.2913696677267245e-05, Loss: 693.936767578125
2024-08-03T05:36:44.410772038Z 
  2%|▏         | 185/9500 [39:14<32:15:40, 12.47s/it]08/02/2024 22:36:44 - INFO - __main__ -   Step: 185, LR: 1.2983879811382828e-05, Loss: 790.921875
2024-08-03T05:36:56.804616276Z 
  2%|▏         | 186/9500 [39:26<32:12:00, 12.45s/it]08/02/2024 22:36:56 - INFO - __main__ -   Step: 186, LR: 1.3054062945498412e-05, Loss: 795.994873046875
2024-08-03T05:37:08.959798694Z 
  2%|▏         | 187/9500 [39:38<31:58:15, 12.36s/it]08/02/2024 22:37:08 - INFO - __main__ -   Step: 187, LR: 1.3124246079613993e-05, Loss: 728.35205078125
2024-08-03T05:37:21.362903593Z 
  2%|▏         | 188/9500 [39:51<32:00:07, 12.37s/it]08/02/2024 22:37:21 - INFO - __main__ -   Step: 188, LR: 1.3194429213729575e-05, Loss: 806.0184326171875
2024-08-03T05:37:33.975452701Z 
  2%|▏         | 189/9500 [40:03<32:11:08, 12.44s/it]08/02/2024 22:37:33 - INFO - __main__ -   Step: 189, LR: 1.326461234784516e-05, Loss: 730.7520751953125
2024-08-03T05:37:46.341668021Z 
  2%|▏         | 190/9500 [40:16<32:07:17, 12.42s/it]08/02/2024 22:37:46 - INFO - __main__ -   Step: 190, LR: 1.3334795481960744e-05, Loss: 824.1508178710938
2024-08-03T05:37:58.300796681Z 
  2%|▏         | 191/9500 [40:28<31:45:35, 12.28s/it]08/02/2024 22:37:58 - INFO - __main__ -   Step: 191, LR: 1.3404978616076325e-05, Loss: 655.0346069335938
2024-08-03T05:38:10.677999070Z 
  2%|▏         | 192/9500 [40:40<31:49:49, 12.31s/it]08/02/2024 22:38:10 - INFO - __main__ -   Step: 192, LR: 1.3475161750191907e-05, Loss: 798.9993896484375
2024-08-03T05:38:22.853182917Z 
  2%|▏         | 193/9500 [40:52<31:43:16, 12.27s/it]08/02/2024 22:38:22 - INFO - __main__ -   Step: 193, LR: 1.3545344884307492e-05, Loss: 725.7881469726562
2024-08-03T05:38:34.883834939Z 
  2%|▏         | 194/9500 [41:04<31:31:57, 12.20s/it]08/02/2024 22:38:34 - INFO - __main__ -   Step: 194, LR: 1.3615528018423074e-05, Loss: 546.0762939453125
2024-08-03T05:38:47.169972162Z 
  2%|▏         | 195/9500 [41:17<31:35:50, 12.22s/it]08/02/2024 22:38:47 - INFO - __main__ -   Step: 195, LR: 1.3685711152538657e-05, Loss: 691.5408325195312
2024-08-03T05:38:59.428421812Z 
  2%|▏         | 196/9500 [41:29<31:37:12, 12.23s/it]08/02/2024 22:38:59 - INFO - __main__ -   Step: 196, LR: 1.3755894286654239e-05, Loss: 627.0926513671875
2024-08-03T05:39:11.451933666Z 
  2%|▏         | 197/9500 [41:41<31:27:10, 12.17s/it]08/02/2024 22:39:11 - INFO - __main__ -   Step: 197, LR: 1.3826077420769824e-05, Loss: 577.7701416015625
2024-08-03T05:39:23.986859190Z 
  2%|▏         | 198/9500 [41:53<31:43:53, 12.28s/it]08/02/2024 22:39:23 - INFO - __main__ -   Step: 198, LR: 1.3896260554885405e-05, Loss: 691.1166381835938
2024-08-03T05:39:35.969942318Z 
  2%|▏         | 199/9500 [42:05<31:29:51, 12.19s/it]08/02/2024 22:39:35 - INFO - __main__ -   Step: 199, LR: 1.3966443689000989e-05, Loss: 583.2474975585938
2024-08-03T05:39:48.316695176Z 
  2%|▏         | 200/9500 [42:18<31:36:52, 12.24s/it]08/02/2024 22:39:48 - INFO - __main__ -   Step: 200, LR: 1.403662682311657e-05, Loss: 714.06103515625
2024-08-03T05:40:00.953721440Z 
  2%|▏         | 201/9500 [42:30<31:55:13, 12.36s/it]08/02/2024 22:40:00 - INFO - __main__ -   Step: 201, LR: 1.4106809957232155e-05, Loss: 785.7332763671875
2024-08-03T05:40:12.840562370Z 
  2%|▏         | 202/9500 [42:42<31:33:08, 12.22s/it]08/02/2024 22:40:12 - INFO - __main__ -   Step: 202, LR: 1.4176993091347737e-05, Loss: 630.9425048828125
2024-08-03T05:40:24.975794747Z 
  2%|▏         | 203/9500 [42:54<31:29:08, 12.19s/it]08/02/2024 22:40:24 - INFO - __main__ -   Step: 203, LR: 1.4247176225463319e-05, Loss: 664.0426025390625
2024-08-03T05:40:37.217415782Z 
  2%|▏         | 204/9500 [43:07<31:31:15, 12.21s/it]08/02/2024 22:40:37 - INFO - __main__ -   Step: 204, LR: 1.4317359359578902e-05, Loss: 751.9893798828125
2024-08-03T05:40:49.200926419Z 
  2%|▏         | 205/9500 [43:19<31:20:40, 12.14s/it]08/02/2024 22:40:49 - INFO - __main__ -   Step: 205, LR: 1.4387542493694485e-05, Loss: 859.8456420898438
2024-08-03T05:41:01.205209602Z 
  2%|▏         | 206/9500 [43:31<31:14:10, 12.10s/it]08/02/2024 22:41:01 - INFO - __main__ -   Step: 206, LR: 1.4457725627810069e-05, Loss: 536.3072509765625
2024-08-03T05:41:13.605776492Z 
  2%|▏         | 207/9500 [43:43<31:27:58, 12.19s/it]08/02/2024 22:41:13 - INFO - __main__ -   Step: 207, LR: 1.452790876192565e-05, Loss: 668.3511962890625
2024-08-03T05:41:25.990709666Z 
  2%|▏         | 208/9500 [43:55<31:36:50, 12.25s/it]08/02/2024 22:41:25 - INFO - __main__ -   Step: 208, LR: 1.4598091896041234e-05, Loss: 621.4524536132812
2024-08-03T05:41:38.159676723Z 
  2%|▏         | 209/9500 [44:08<31:32:57, 12.22s/it]08/02/2024 22:41:38 - INFO - __main__ -   Step: 209, LR: 1.4668275030156816e-05, Loss: 766.4388427734375
2024-08-03T05:41:50.223294250Z 
  2%|▏         | 210/9500 [44:20<31:25:15, 12.18s/it]08/02/2024 22:41:50 - INFO - __main__ -   Step: 210, LR: 1.47384581642724e-05, Loss: 433.8630065917969
2024-08-03T05:42:02.708825822Z 
  2%|▏         | 211/9500 [44:32<31:39:26, 12.27s/it]08/02/2024 22:42:02 - INFO - __main__ -   Step: 211, LR: 1.4808641298387982e-05, Loss: 838.9127807617188
2024-08-03T05:42:15.005795606Z 
  2%|▏         | 212/9500 [44:44<31:40:32, 12.28s/it]08/02/2024 22:42:15 - INFO - __main__ -   Step: 212, LR: 1.4878824432503564e-05, Loss: 654.8119506835938
2024-08-03T05:42:27.212379452Z 
  2%|▏         | 213/9500 [44:57<31:37:02, 12.26s/it]08/02/2024 22:42:27 - INFO - __main__ -   Step: 213, LR: 1.4949007566619147e-05, Loss: 890.7342529296875
2024-08-03T05:42:40.026300206Z 
  2%|▏         | 214/9500 [45:09<32:02:45, 12.42s/it]08/02/2024 22:42:40 - INFO - __main__ -   Step: 214, LR: 1.5019190700734732e-05, Loss: 639.0191040039062
2024-08-03T05:42:52.229202174Z 
  2%|▏         | 215/9500 [45:22<31:52:17, 12.36s/it]08/02/2024 22:42:52 - INFO - __main__ -   Step: 215, LR: 1.5089373834850314e-05, Loss: 698.5289306640625
2024-08-03T05:43:04.196667266Z 
  2%|▏         | 216/9500 [45:34<31:33:59, 12.24s/it]08/02/2024 22:43:04 - INFO - __main__ -   Step: 216, LR: 1.5159556968965896e-05, Loss: 806.38037109375
2024-08-03T05:43:16.690145057Z 
  2%|▏         | 217/9500 [45:46<31:45:32, 12.32s/it]08/02/2024 22:43:16 - INFO - __main__ -   Step: 217, LR: 1.5229740103081479e-05, Loss: 946.4337158203125
2024-08-03T05:43:28.770751066Z 
  2%|▏         | 218/9500 [45:58<31:34:23, 12.25s/it]08/02/2024 22:43:28 - INFO - __main__ -   Step: 218, LR: 1.5299923237197064e-05, Loss: 793.99609375
2024-08-03T05:43:40.861790581Z 
  2%|▏         | 219/9500 [46:10<31:27:00, 12.20s/it]08/02/2024 22:43:40 - INFO - __main__ -   Step: 219, LR: 1.5370106371312644e-05, Loss: 594.45556640625
2024-08-03T05:43:53.421783435Z 
  2%|▏         | 220/9500 [46:23<31:43:32, 12.31s/it]08/02/2024 22:43:53 - INFO - __main__ -   Step: 220, LR: 1.5440289505428227e-05, Loss: 740.625732421875
2024-08-03T05:44:05.753583841Z 
  2%|▏         | 221/9500 [46:35<31:44:28, 12.31s/it]08/02/2024 22:44:05 - INFO - __main__ -   Step: 221, LR: 1.551047263954381e-05, Loss: 705.68359375
2024-08-03T05:44:17.691483809Z 
  2%|▏         | 222/9500 [46:47<31:26:47, 12.20s/it]08/02/2024 22:44:17 - INFO - __main__ -   Step: 222, LR: 1.5580655773659394e-05, Loss: 738.27783203125
2024-08-03T05:44:30.093616813Z 
  2%|▏         | 223/9500 [47:00<31:35:52, 12.26s/it]08/02/2024 22:44:30 - INFO - __main__ -   Step: 223, LR: 1.5650838907774977e-05, Loss: 704.003662109375
2024-08-03T05:44:42.687118492Z 
  2%|▏         | 224/9500 [47:12<31:51:03, 12.36s/it]08/02/2024 22:44:42 - INFO - __main__ -   Step: 224, LR: 1.5721022041890557e-05, Loss: 705.8162841796875
2024-08-03T05:44:54.729388398Z 
  2%|▏         | 225/9500 [47:24<31:36:03, 12.27s/it]08/02/2024 22:44:54 - INFO - __main__ -   Step: 225, LR: 1.579120517600614e-05, Loss: 819.1229248046875
2024-08-03T05:45:06.975860788Z 
  2%|▏         | 226/9500 [47:36<31:34:58, 12.26s/it]08/02/2024 22:45:06 - INFO - __main__ -   Step: 226, LR: 1.5861388310121724e-05, Loss: 483.5325622558594
2024-08-03T05:45:19.411618081Z 
  2%|▏         | 227/9500 [47:49<31:42:55, 12.31s/it]08/02/2024 22:45:19 - INFO - __main__ -   Step: 227, LR: 1.5931571444237308e-05, Loss: 753.5121459960938
2024-08-03T05:45:31.694979273Z 
  2%|▏         | 228/9500 [48:01<31:41:21, 12.30s/it]08/02/2024 22:45:31 - INFO - __main__ -   Step: 228, LR: 1.600175457835289e-05, Loss: 657.1305541992188
2024-08-03T05:45:44.297925698Z 
  2%|▏         | 229/9500 [48:14<31:55:00, 12.39s/it]08/02/2024 22:45:44 - INFO - __main__ -   Step: 229, LR: 1.6071937712468474e-05, Loss: 623.4622192382812
2024-08-03T05:45:56.259807453Z 
  2%|▏         | 230/9500 [48:26<31:34:47, 12.26s/it]08/02/2024 22:45:56 - INFO - __main__ -   Step: 230, LR: 1.6142120846584058e-05, Loss: 608.7257080078125
2024-08-03T05:46:08.834191914Z 
  2%|▏         | 231/9500 [48:38<31:48:52, 12.36s/it]08/02/2024 22:46:08 - INFO - __main__ -   Step: 231, LR: 1.621230398069964e-05, Loss: 766.97412109375
2024-08-03T05:46:21.279952087Z 
  2%|▏         | 232/9500 [48:51<31:52:54, 12.38s/it]08/02/2024 22:46:21 - INFO - __main__ -   Step: 232, LR: 1.628248711481522e-05, Loss: 788.23828125
2024-08-03T05:46:33.882507426Z 
  2%|▏         | 233/9500 [49:03<32:02:49, 12.45s/it]08/02/2024 22:46:33 - INFO - __main__ -   Step: 233, LR: 1.6352670248930804e-05, Loss: 874.7152099609375
2024-08-03T05:46:46.082540114Z 
  2%|▏         | 234/9500 [49:16<31:51:04, 12.37s/it]08/02/2024 22:46:46 - INFO - __main__ -   Step: 234, LR: 1.6422853383046388e-05, Loss: 620.3897705078125
2024-08-03T05:46:58.476715158Z 
  2%|▏         | 235/9500 [49:28<31:51:45, 12.38s/it]08/02/2024 22:46:58 - INFO - __main__ -   Step: 235, LR: 1.649303651716197e-05, Loss: 710.447265625
2024-08-03T05:47:10.664786062Z 
  2%|▏         | 236/9500 [49:40<31:42:38, 12.32s/it]08/02/2024 22:47:10 - INFO - __main__ -   Step: 236, LR: 1.6563219651277554e-05, Loss: 855.3374633789062
2024-08-03T05:47:22.673133138Z 
  2%|▏         | 237/9500 [49:52<31:27:52, 12.23s/it]08/02/2024 22:47:22 - INFO - __main__ -   Step: 237, LR: 1.6633402785393134e-05, Loss: 749.738037109375
2024-08-03T05:47:35.102772900Z 
  3%|▎         | 238/9500 [50:05<31:36:58, 12.29s/it]08/02/2024 22:47:35 - INFO - __main__ -   Step: 238, LR: 1.670358591950872e-05, Loss: 695.332763671875
2024-08-03T05:47:46.999931452Z 
  3%|▎         | 239/9500 [50:16<31:18:39, 12.17s/it]08/02/2024 22:47:46 - INFO - __main__ -   Step: 239, LR: 1.67737690536243e-05, Loss: 640.4534912109375
2024-08-03T05:47:59.275583190Z 
  3%|▎         | 240/9500 [50:29<31:23:16, 12.20s/it]08/02/2024 22:47:59 - INFO - __main__ -   Step: 240, LR: 1.6843952187739884e-05, Loss: 781.849609375
2024-08-03T05:48:11.698620803Z 
  3%|▎         | 241/9500 [50:41<31:33:16, 12.27s/it]08/02/2024 22:48:11 - INFO - __main__ -   Step: 241, LR: 1.6914135321855468e-05, Loss: 591.4241943359375
2024-08-03T05:48:24.097290011Z 
  3%|▎         | 242/9500 [50:54<31:39:04, 12.31s/it]08/02/2024 22:48:24 - INFO - __main__ -   Step: 242, LR: 1.698431845597105e-05, Loss: 768.766357421875
2024-08-03T05:48:36.432947489Z 
  3%|▎         | 243/9500 [51:06<31:40:09, 12.32s/it]08/02/2024 22:48:36 - INFO - __main__ -   Step: 243, LR: 1.7054501590086635e-05, Loss: 625.5841674804688
2024-08-03T05:48:48.954507399Z 
  3%|▎         | 244/9500 [51:18<31:49:28, 12.38s/it]08/02/2024 22:48:48 - INFO - __main__ -   Step: 244, LR: 1.7124684724202218e-05, Loss: 782.8697509765625
2024-08-03T05:49:01.139401928Z 
  3%|▎         | 245/9500 [51:31<31:40:20, 12.32s/it]08/02/2024 22:49:01 - INFO - __main__ -   Step: 245, LR: 1.7194867858317798e-05, Loss: 583.4791870117188
2024-08-03T05:49:13.714505664Z 
  3%|▎         | 246/9500 [51:43<31:51:56, 12.40s/it]08/02/2024 22:49:13 - INFO - __main__ -   Step: 246, LR: 1.7265050992433385e-05, Loss: 879.625244140625
2024-08-03T05:49:26.166023144Z 
  3%|▎         | 247/9500 [51:56<31:54:17, 12.41s/it]08/02/2024 22:49:26 - INFO - __main__ -   Step: 247, LR: 1.7335234126548965e-05, Loss: 824.7745361328125
2024-08-03T05:49:38.657926213Z 
  3%|▎         | 248/9500 [52:08<31:57:43, 12.44s/it]08/02/2024 22:49:38 - INFO - __main__ -   Step: 248, LR: 1.7405417260664548e-05, Loss: 697.550537109375
2024-08-03T05:49:50.675672162Z 
  3%|▎         | 249/9500 [52:20<31:38:08, 12.31s/it]08/02/2024 22:49:50 - INFO - __main__ -   Step: 249, LR: 1.747560039478013e-05, Loss: 640.2730102539062
2024-08-03T05:50:03.010090209Z 
  3%|▎         | 250/9500 [52:32<31:39:01, 12.32s/it]08/02/2024 22:50:03 - INFO - __main__ -   Step: 250, LR: 1.7545783528895715e-05, Loss: 600.7548828125
2024-08-03T05:50:15.160189788Z 
  3%|▎         | 251/9500 [52:45<31:31:03, 12.27s/it]08/02/2024 22:50:15 - INFO - __main__ -   Step: 251, LR: 1.7615966663011298e-05, Loss: 790.4762573242188
2024-08-03T05:50:27.590877941Z 
  3%|▎         | 252/9500 [52:57<31:38:23, 12.32s/it]08/02/2024 22:50:27 - INFO - __main__ -   Step: 252, LR: 1.7686149797126878e-05, Loss: 877.4134521484375
2024-08-03T05:50:39.801724332Z 
  3%|▎         | 253/9500 [53:09<31:33:17, 12.28s/it]08/02/2024 22:50:39 - INFO - __main__ -   Step: 253, LR: 1.775633293124246e-05, Loss: 724.4337158203125
2024-08-03T05:50:52.265508290Z 
  3%|▎         | 254/9500 [53:22<31:41:22, 12.34s/it]08/02/2024 22:50:52 - INFO - __main__ -   Step: 254, LR: 1.7826516065358045e-05, Loss: 737.2691650390625
2024-08-03T05:51:04.572025042Z 
  3%|▎         | 255/9500 [53:34<31:39:40, 12.33s/it]08/02/2024 22:51:04 - INFO - __main__ -   Step: 255, LR: 1.7896699199473628e-05, Loss: 827.083740234375
2024-08-03T05:51:16.716927835Z 
  3%|▎         | 256/9500 [53:46<31:30:57, 12.27s/it]08/02/2024 22:51:16 - INFO - __main__ -   Step: 256, LR: 1.796688233358921e-05, Loss: 665.2674560546875
2024-08-03T05:51:29.126816358Z 
  3%|▎         | 257/9500 [53:59<31:37:03, 12.31s/it]08/02/2024 22:51:29 - INFO - __main__ -   Step: 257, LR: 1.803706546770479e-05, Loss: 697.8802490234375
2024-08-03T05:51:41.662728158Z 
  3%|▎         | 258/9500 [54:11<31:47:04, 12.38s/it]08/02/2024 22:51:41 - INFO - __main__ -   Step: 258, LR: 1.8107248601820378e-05, Loss: 764.6265258789062
2024-08-03T05:51:53.928208190Z 
  3%|▎         | 259/9500 [54:23<31:41:32, 12.35s/it]08/02/2024 22:51:53 - INFO - __main__ -   Step: 259, LR: 1.8177431735935958e-05, Loss: 680.6611938476562
2024-08-03T05:52:06.714903094Z 
  3%|▎         | 260/9500 [54:36<32:01:40, 12.48s/it]08/02/2024 22:52:06 - INFO - __main__ -   Step: 260, LR: 1.824761487005154e-05, Loss: 645.4142456054688
2024-08-03T05:52:18.805585310Z 
  3%|▎         | 261/9500 [54:48<31:43:33, 12.36s/it]08/02/2024 22:52:18 - INFO - __main__ -   Step: 261, LR: 1.8317798004167125e-05, Loss: 717.3328857421875
2024-08-03T05:52:30.835472277Z 
  3%|▎         | 262/9500 [55:00<31:28:00, 12.26s/it]08/02/2024 22:52:30 - INFO - __main__ -   Step: 262, LR: 1.8387981138282708e-05, Loss: 806.57763671875
2024-08-03T05:52:43.649471176Z 
  3%|▎         | 263/9500 [55:13<31:53:17, 12.43s/it]08/02/2024 22:52:43 - INFO - __main__ -   Step: 263, LR: 1.845816427239829e-05, Loss: 892.8135375976562
2024-08-03T05:52:56.381231909Z 
  3%|▎         | 264/9500 [55:26<32:07:06, 12.52s/it]08/02/2024 22:52:56 - INFO - __main__ -   Step: 264, LR: 1.8528347406513875e-05, Loss: 910.898193359375
2024-08-03T05:53:08.498382875Z 
  3%|▎         | 265/9500 [55:38<31:48:20, 12.40s/it]08/02/2024 22:53:08 - INFO - __main__ -   Step: 265, LR: 1.8598530540629455e-05, Loss: 798.2625732421875
2024-08-03T05:53:20.671228957Z 
  3%|▎         | 266/9500 [55:50<31:37:42, 12.33s/it]08/02/2024 22:53:20 - INFO - __main__ -   Step: 266, LR: 1.866871367474504e-05, Loss: 640.5384521484375
2024-08-03T05:53:32.775697512Z 
  3%|▎         | 267/9500 [56:02<31:27:03, 12.26s/it]08/02/2024 22:53:32 - INFO - __main__ -   Step: 267, LR: 1.873889680886062e-05, Loss: 693.033935546875
2024-08-03T05:53:44.803651825Z 
  3%|▎         | 268/9500 [56:14<31:16:00, 12.19s/it]08/02/2024 22:53:44 - INFO - __main__ -   Step: 268, LR: 1.8809079942976205e-05, Loss: 723.269775390625
2024-08-03T05:53:57.191542449Z 
  3%|▎         | 269/9500 [56:27<31:24:49, 12.25s/it]08/02/2024 22:53:57 - INFO - __main__ -   Step: 269, LR: 1.887926307709179e-05, Loss: 690.099609375
2024-08-03T05:54:09.348996610Z 
  3%|▎         | 270/9500 [56:39<31:20:17, 12.22s/it]08/02/2024 22:54:09 - INFO - __main__ -   Step: 270, LR: 1.894944621120737e-05, Loss: 700.1948852539062
2024-08-03T05:54:22.056953817Z 
  3%|▎         | 271/9500 [56:51<31:42:28, 12.37s/it]08/02/2024 22:54:22 - INFO - __main__ -   Step: 271, LR: 1.9019629345322955e-05, Loss: 840.7520141601562
2024-08-03T05:54:34.422771913Z 
  3%|▎         | 272/9500 [57:04<31:42:08, 12.37s/it]08/02/2024 22:54:34 - INFO - __main__ -   Step: 272, LR: 1.9089812479438535e-05, Loss: 603.132568359375
2024-08-03T05:54:46.379677154Z 
  3%|▎         | 273/9500 [57:16<31:22:59, 12.24s/it]08/02/2024 22:54:46 - INFO - __main__ -   Step: 273, LR: 1.915999561355412e-05, Loss: 584.04833984375
2024-08-03T05:54:58.648888529Z 
  3%|▎         | 274/9500 [57:28<31:23:54, 12.25s/it]08/02/2024 22:54:58 - INFO - __main__ -   Step: 274, LR: 1.9230178747669702e-05, Loss: 657.1204223632812
2024-08-03T05:55:11.049873932Z 
  3%|▎         | 275/9500 [57:40<31:30:37, 12.30s/it]08/02/2024 22:55:11 - INFO - __main__ -   Step: 275, LR: 1.9300361881785285e-05, Loss: 775.8446044921875
2024-08-03T05:55:22.883066347Z 
  3%|▎         | 276/9500 [57:52<31:09:01, 12.16s/it]08/02/2024 22:55:22 - INFO - __main__ -   Step: 276, LR: 1.937054501590087e-05, Loss: 528.2828369140625
2024-08-03T05:55:35.528947948Z 
  3%|▎         | 277/9500 [58:05<31:31:20, 12.30s/it]08/02/2024 22:55:35 - INFO - __main__ -   Step: 277, LR: 1.9440728150016452e-05, Loss: 675.3955078125
2024-08-03T05:55:48.920080038Z [2024-08-02 22:55:48,919] [WARNING] [stage3.py:2069:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
2024-08-03T05:55:48.921800945Z 
  3%|▎         | 278/9500 [58:18<32:21:21, 12.63s/it]08/02/2024 22:55:48 - INFO - __main__ -   Step: 278, LR: 1.9510911284132035e-05, Loss: 833.6566162109375
2024-08-03T05:56:01.276156376Z 
  3%|▎         | 279/9500 [58:31<32:08:22, 12.55s/it]08/02/2024 22:56:01 - INFO - __main__ -   Step: 279, LR: 1.958109441824762e-05, Loss: 722.0355224609375
2024-08-03T05:56:13.646477655Z 
  3%|▎         | 280/9500 [58:43<31:59:59, 12.49s/it]08/02/2024 22:56:13 - INFO - __main__ -   Step: 280, LR: 1.96512775523632e-05, Loss: 726.9227294921875
2024-08-03T05:56:26.352593141Z 
  3%|▎         | 281/9500 [58:56<32:09:31, 12.56s/it]08/02/2024 22:56:26 - INFO - __main__ -   Step: 281, LR: 1.9721460686478782e-05, Loss: 700.638427734375
2024-08-03T05:56:38.433108325Z 
  3%|▎         | 282/9500 [59:08<31:47:18, 12.41s/it]08/02/2024 22:56:38 - INFO - __main__ -   Step: 282, LR: 1.9791643820594365e-05, Loss: 748.341552734375
2024-08-03T05:56:50.949567966Z 
  3%|▎         | 283/9500 [59:20<31:51:48, 12.45s/it]08/02/2024 22:56:50 - INFO - __main__ -   Step: 283, LR: 1.986182695470995e-05, Loss: 804.1047973632812
2024-08-03T05:57:03.632647600Z 
  3%|▎         | 284/9500 [59:33<32:02:33, 12.52s/it]08/02/2024 22:57:03 - INFO - __main__ -   Step: 284, LR: 1.9932010088825532e-05, Loss: 931.688720703125
2024-08-03T05:57:15.920819688Z 
  3%|▎         | 285/9500 [59:45<31:51:49, 12.45s/it]08/02/2024 22:57:15 - INFO - __main__ -   Step: 285, LR: 1.9999932170509774e-05, Loss: 915.9358520507812
2024-08-03T05:57:28.087013017Z 
  3%|▎         | 286/9500 [59:58<31:38:37, 12.36s/it]08/02/2024 22:57:28 - INFO - __main__ -   Step: 286, LR: 1.9997761626822497e-05, Loss: 610.9818725585938
2024-08-03T05:57:40.801842795Z 
  3%|▎         | 287/9500 [1:00:10<31:54:35, 12.47s/it]08/02/2024 22:57:40 - INFO - __main__ -   Step: 287, LR: 1.9995591083135217e-05, Loss: 788.135986328125
2024-08-03T05:57:52.764479455Z 
  3%|▎         | 288/9500 [1:00:22<31:31:04, 12.32s/it]08/02/2024 22:57:52 - INFO - __main__ -   Step: 288, LR: 1.9993420539447937e-05, Loss: 590.9511108398438
2024-08-03T05:58:05.151023439Z 
  3%|▎         | 289/9500 [1:00:35<31:34:04, 12.34s/it]08/02/2024 22:58:05 - INFO - __main__ -   Step: 289, LR: 1.9991249995760657e-05, Loss: 765.1190185546875
2024-08-03T05:58:17.638356980Z 
  3%|▎         | 290/9500 [1:00:47<31:40:44, 12.38s/it]08/02/2024 22:58:17 - INFO - __main__ -   Step: 290, LR: 1.998907945207338e-05, Loss: 641.1272583007812
2024-08-03T05:58:30.040915047Z 
  3%|▎         | 291/9500 [1:00:59<31:41:26, 12.39s/it]08/02/2024 22:58:30 - INFO - __main__ -   Step: 291, LR: 1.99869089083861e-05, Loss: 676.94775390625
2024-08-03T05:58:42.232344610Z 
  3%|▎         | 292/9500 [1:01:12<31:32:09, 12.33s/it]08/02/2024 22:58:42 - INFO - __main__ -   Step: 292, LR: 1.998473836469882e-05, Loss: 742.5133056640625
2024-08-03T05:58:54.539082031Z 
  3%|▎         | 293/9500 [1:01:24<31:30:54, 12.32s/it]08/02/2024 22:58:54 - INFO - __main__ -   Step: 293, LR: 1.9982567821011543e-05, Loss: 616.7811279296875
2024-08-03T05:59:06.735331111Z 
  3%|▎         | 294/9500 [1:01:36<31:24:53, 12.28s/it]08/02/2024 22:59:06 - INFO - __main__ -   Step: 294, LR: 1.9980397277324263e-05, Loss: 735.7603759765625
2024-08-03T05:59:19.083067958Z 
  3%|▎         | 295/9500 [1:01:49<31:27:35, 12.30s/it]08/02/2024 22:59:19 - INFO - __main__ -   Step: 295, LR: 1.9978226733636986e-05, Loss: 618.8433837890625
2024-08-03T05:59:30.921839460Z 
  3%|▎         | 296/9500 [1:02:00<31:05:59, 12.16s/it]08/02/2024 22:59:30 - INFO - __main__ -   Step: 296, LR: 1.9976056189949706e-05, Loss: 722.8145141601562
2024-08-03T05:59:43.415888194Z 
  3%|▎         | 297/9500 [1:02:13<31:20:57, 12.26s/it]08/02/2024 22:59:43 - INFO - __main__ -   Step: 297, LR: 1.9973885646262426e-05, Loss: 737.3257446289062
2024-08-03T05:59:55.447435128Z 
  3%|▎         | 298/9500 [1:02:25<31:10:05, 12.19s/it]08/02/2024 22:59:55 - INFO - __main__ -   Step: 298, LR: 1.997171510257515e-05, Loss: 595.2896728515625
2024-08-03T06:00:07.544626041Z 
  3%|▎         | 299/9500 [1:02:37<31:05:27, 12.16s/it]08/02/2024 23:00:07 - INFO - __main__ -   Step: 299, LR: 1.996954455888787e-05, Loss: 731.2589111328125
2024-08-03T06:00:20.072631317Z 
  3%|▎         | 300/9500 [1:02:50<31:21:58, 12.27s/it]08/02/2024 23:00:20 - INFO - __main__ -   Step: 300, LR: 1.9967374015200593e-05, Loss: 698.3955078125
2024-08-03T06:00:32.218365744Z 
  3%|▎         | 301/9500 [1:03:02<31:15:53, 12.24s/it]08/02/2024 23:00:32 - INFO - __main__ -   Step: 301, LR: 1.9965203471513312e-05, Loss: 823.810791015625
2024-08-03T06:00:44.201631569Z 
  3%|▎         | 302/9500 [1:03:14<31:04:05, 12.16s/it]08/02/2024 23:00:44 - INFO - __main__ -   Step: 302, LR: 1.9963032927826032e-05, Loss: 563.7565307617188
2024-08-03T06:00:56.992958998Z 
  3%|▎         | 303/9500 [1:03:26<31:32:55, 12.35s/it]08/02/2024 23:00:56 - INFO - __main__ -   Step: 303, LR: 1.9960862384138752e-05, Loss: 779.7379150390625
2024-08-03T06:01:09.676556284Z 
  3%|▎         | 304/9500 [1:03:39<31:48:05, 12.45s/it]08/02/2024 23:01:09 - INFO - __main__ -   Step: 304, LR: 1.9958691840451475e-05, Loss: 728.5025634765625
2024-08-03T06:01:21.671494470Z 
  3%|▎         | 305/9500 [1:03:51<31:26:59, 12.31s/it]08/02/2024 23:01:21 - INFO - __main__ -   Step: 305, LR: 1.9956521296764195e-05, Loss: 632.9737548828125
2024-08-03T06:01:34.468797757Z 
  3%|▎         | 306/9500 [1:04:04<31:49:02, 12.46s/it]08/02/2024 23:01:34 - INFO - __main__ -   Step: 306, LR: 1.9954350753076915e-05, Loss: 895.0777587890625
2024-08-03T06:01:46.491631852Z 
  3%|▎         | 307/9500 [1:04:16<31:28:48, 12.33s/it]08/02/2024 23:01:46 - INFO - __main__ -   Step: 307, LR: 1.995218020938964e-05, Loss: 654.080078125
2024-08-03T06:01:58.763089634Z 
  3%|▎         | 308/9500 [1:04:28<31:26:00, 12.31s/it]08/02/2024 23:01:58 - INFO - __main__ -   Step: 308, LR: 1.9950009665702358e-05, Loss: 642.5975341796875
2024-08-03T06:02:11.262078304Z 
  3%|▎         | 309/9500 [1:04:41<31:34:27, 12.37s/it]08/02/2024 23:02:11 - INFO - __main__ -   Step: 309, LR: 1.994783912201508e-05, Loss: 645.3507080078125
2024-08-03T06:02:23.223744817Z 
  3%|▎         | 310/9500 [1:04:53<31:15:37, 12.25s/it]08/02/2024 23:02:23 - INFO - __main__ -   Step: 310, LR: 1.99456685783278e-05, Loss: 581.2047729492188
2024-08-03T06:02:35.158011209Z 
  3%|▎         | 311/9500 [1:05:05<31:01:06, 12.15s/it]08/02/2024 23:02:35 - INFO - __main__ -   Step: 311, LR: 1.994349803464052e-05, Loss: 621.028076171875
2024-08-03T06:02:47.564585763Z 
  3%|▎         | 312/9500 [1:05:17<31:12:35, 12.23s/it]08/02/2024 23:02:47 - INFO - __main__ -   Step: 312, LR: 1.9941327490953244e-05, Loss: 905.3905029296875
2024-08-03T06:02:59.901286074Z 
  3%|▎         | 313/9500 [1:05:29<31:17:21, 12.26s/it]08/02/2024 23:02:59 - INFO - __main__ -   Step: 313, LR: 1.9939156947265964e-05, Loss: 690.0845336914062
2024-08-03T06:03:12.022775472Z 
  3%|▎         | 314/9500 [1:05:41<31:10:45, 12.22s/it]08/02/2024 23:03:12 - INFO - __main__ -   Step: 314, LR: 1.9936986403578688e-05, Loss: 665.050048828125
2024-08-03T06:03:24.361911831Z 
  3%|▎         | 315/9500 [1:05:54<31:16:03, 12.26s/it]08/02/2024 23:03:24 - INFO - __main__ -   Step: 315, LR: 1.9934815859891407e-05, Loss: 805.748046875
2024-08-03T06:03:36.179422494Z 
  3%|▎         | 316/9500 [1:06:06<30:55:45, 12.12s/it]08/02/2024 23:03:36 - INFO - __main__ -   Step: 316, LR: 1.9932645316204127e-05, Loss: 641.7860107421875
2024-08-03T06:03:48.617234206Z 
  3%|▎         | 317/9500 [1:06:18<31:09:58, 12.22s/it]08/02/2024 23:03:48 - INFO - __main__ -   Step: 317, LR: 1.9930474772516847e-05, Loss: 634.8056030273438
2024-08-03T06:04:01.175430008Z 
  3%|▎         | 318/9500 [1:06:31<31:25:22, 12.32s/it]08/02/2024 23:04:01 - INFO - __main__ -   Step: 318, LR: 1.992830422882957e-05, Loss: 652.0555419921875
2024-08-03T06:04:13.376568028Z 
  3%|▎         | 319/9500 [1:06:43<31:19:43, 12.28s/it]08/02/2024 23:04:13 - INFO - __main__ -   Step: 319, LR: 1.992613368514229e-05, Loss: 893.7948608398438
2024-08-03T06:04:25.384404419Z 
  3%|▎         | 320/9500 [1:06:55<31:06:48, 12.20s/it]08/02/2024 23:04:25 - INFO - __main__ -   Step: 320, LR: 1.992396314145501e-05, Loss: 589.9794921875
2024-08-03T06:04:37.772981260Z 
  3%|▎         | 321/9500 [1:07:07<31:15:12, 12.26s/it]08/02/2024 23:04:37 - INFO - __main__ -   Step: 321, LR: 1.9921792597767733e-05, Loss: 704.6620483398438
2024-08-03T06:04:49.711649421Z 
  3%|▎         | 322/9500 [1:07:19<31:00:21, 12.16s/it]08/02/2024 23:04:49 - INFO - __main__ -   Step: 322, LR: 1.9919622054080453e-05, Loss: 742.01123046875
2024-08-03T06:05:01.744581338Z 
  3%|▎         | 323/9500 [1:07:31<30:54:15, 12.12s/it]08/02/2024 23:05:01 - INFO - __main__ -   Step: 323, LR: 1.9917451510393177e-05, Loss: 555.886962890625
2024-08-03T06:05:14.101162989Z 
  3%|▎         | 324/9500 [1:07:44<31:04:45, 12.19s/it]08/02/2024 23:05:14 - INFO - __main__ -   Step: 324, LR: 1.9915280966705896e-05, Loss: 679.8665771484375
2024-08-03T06:05:25.990487106Z 
  3%|▎         | 325/9500 [1:07:55<30:50:36, 12.10s/it]08/02/2024 23:05:25 - INFO - __main__ -   Step: 325, LR: 1.9913110423018616e-05, Loss: 535.6173095703125
2024-08-03T06:05:38.073058736Z 
  3%|▎         | 326/9500 [1:08:08<30:49:30, 12.10s/it]08/02/2024 23:05:38 - INFO - __main__ -   Step: 326, LR: 1.991093987933134e-05, Loss: 558.5751342773438
2024-08-03T06:05:50.561848634Z 
  3%|▎         | 327/9500 [1:08:20<31:07:18, 12.21s/it]08/02/2024 23:05:50 - INFO - __main__ -   Step: 327, LR: 1.990876933564406e-05, Loss: 771.981689453125
2024-08-03T06:06:02.916301856Z 
  3%|▎         | 328/9500 [1:08:32<31:13:33, 12.26s/it]08/02/2024 23:06:02 - INFO - __main__ -   Step: 328, LR: 1.9906598791956783e-05, Loss: 685.6788940429688
2024-08-03T06:06:15.094874356Z 
  3%|▎         | 329/9500 [1:08:45<31:09:46, 12.23s/it]08/02/2024 23:06:15 - INFO - __main__ -   Step: 329, LR: 1.9904428248269503e-05, Loss: 816.892822265625
2024-08-03T06:06:27.585283475Z 
  3%|▎         | 330/9500 [1:08:57<31:21:23, 12.31s/it]08/02/2024 23:06:27 - INFO - __main__ -   Step: 330, LR: 1.9902257704582222e-05, Loss: 684.1998901367188
2024-08-03T06:06:39.787812028Z 
  3%|▎         | 331/9500 [1:09:09<31:16:13, 12.28s/it]08/02/2024 23:06:39 - INFO - __main__ -   Step: 331, LR: 1.9900087160894942e-05, Loss: 659.9783325195312
2024-08-03T06:06:52.094533724Z 
  3%|▎         | 332/9500 [1:09:22<31:17:23, 12.29s/it]08/02/2024 23:06:52 - INFO - __main__ -   Step: 332, LR: 1.9897916617207666e-05, Loss: 867.8949584960938
2024-08-03T06:07:04.432566310Z 
  4%|▎         | 333/9500 [1:09:34<31:19:32, 12.30s/it]08/02/2024 23:07:04 - INFO - __main__ -   Step: 333, LR: 1.9895746073520385e-05, Loss: 567.1591796875
2024-08-03T06:07:16.445307230Z 
  4%|▎         | 334/9500 [1:09:46<31:06:05, 12.22s/it]08/02/2024 23:07:16 - INFO - __main__ -   Step: 334, LR: 1.9893575529833105e-05, Loss: 749.1220092773438
2024-08-03T06:07:28.715962601Z 
  4%|▎         | 335/9500 [1:09:58<31:08:24, 12.23s/it]08/02/2024 23:07:28 - INFO - __main__ -   Step: 335, LR: 1.989140498614583e-05, Loss: 764.4747314453125
2024-08-03T06:07:41.121780395Z 
  4%|▎         | 336/9500 [1:10:11<31:16:11, 12.28s/it]08/02/2024 23:07:41 - INFO - __main__ -   Step: 336, LR: 1.988923444245855e-05, Loss: 606.6558837890625
2024-08-03T06:07:53.457961858Z 
  4%|▎         | 337/9500 [1:10:23<31:18:20, 12.30s/it]08/02/2024 23:07:53 - INFO - __main__ -   Step: 337, LR: 1.988706389877127e-05, Loss: 596.6922607421875
2024-08-03T06:08:05.733546245Z 
  4%|▎         | 338/9500 [1:10:35<31:17:03, 12.29s/it]08/02/2024 23:08:05 - INFO - __main__ -   Step: 338, LR: 1.988489335508399e-05, Loss: 607.2015380859375
2024-08-03T06:08:17.931222402Z 
  4%|▎         | 339/9500 [1:10:47<31:12:30, 12.26s/it]08/02/2024 23:08:17 - INFO - __main__ -   Step: 339, LR: 1.988272281139671e-05, Loss: 762.8096923828125
2024-08-03T06:08:30.526317083Z 
  4%|▎         | 340/9500 [1:11:00<31:27:28, 12.36s/it]08/02/2024 23:08:30 - INFO - __main__ -   Step: 340, LR: 1.9880552267709435e-05, Loss: 619.324462890625
2024-08-03T06:08:42.749822895Z 
  4%|▎         | 341/9500 [1:11:12<31:20:51, 12.32s/it]08/02/2024 23:08:42 - INFO - __main__ -   Step: 341, LR: 1.9878381724022154e-05, Loss: 845.0535888671875
2024-08-03T06:08:54.847629641Z 
  4%|▎         | 342/9500 [1:11:24<31:10:24, 12.25s/it]08/02/2024 23:08:54 - INFO - __main__ -   Step: 342, LR: 1.9876211180334878e-05, Loss: 729.1438598632812
2024-08-03T06:09:07.261406196Z 
  4%|▎         | 343/9500 [1:11:37<31:17:31, 12.30s/it]08/02/2024 23:09:07 - INFO - __main__ -   Step: 343, LR: 1.9874040636647598e-05, Loss: 638.932861328125
2024-08-03T06:09:19.411131173Z 
  4%|▎         | 344/9500 [1:11:49<31:10:18, 12.26s/it]08/02/2024 23:09:19 - INFO - __main__ -   Step: 344, LR: 1.9871870092960317e-05, Loss: 710.978759765625
2024-08-03T06:09:31.479500379Z 
  4%|▎         | 345/9500 [1:12:01<31:01:30, 12.20s/it]08/02/2024 23:09:31 - INFO - __main__ -   Step: 345, LR: 1.9869699549273037e-05, Loss: 638.6541748046875
2024-08-03T06:09:44.016003365Z 
  4%|▎         | 346/9500 [1:12:13<31:16:44, 12.30s/it]08/02/2024 23:09:44 - INFO - __main__ -   Step: 346, LR: 1.986752900558576e-05, Loss: 765.4022216796875
2024-08-03T06:09:56.180948161Z 
  4%|▎         | 347/9500 [1:12:26<31:10:17, 12.26s/it]08/02/2024 23:09:56 - INFO - __main__ -   Step: 347, LR: 1.986535846189848e-05, Loss: 614.8328857421875
2024-08-03T06:10:08.527261205Z 
  4%|▎         | 348/9500 [1:12:38<31:14:01, 12.29s/it]08/02/2024 23:10:08 - INFO - __main__ -   Step: 348, LR: 1.98631879182112e-05, Loss: 731.6517333984375
2024-08-03T06:10:21.060926141Z 
  4%|▎         | 349/9500 [1:12:50<31:25:09, 12.36s/it]08/02/2024 23:10:21 - INFO - __main__ -   Step: 349, LR: 1.9861017374523924e-05, Loss: 762.666259765625
2024-08-03T06:10:33.166812895Z 
  4%|▎         | 350/9500 [1:13:03<31:13:18, 12.28s/it]08/02/2024 23:10:33 - INFO - __main__ -   Step: 350, LR: 1.9858846830836643e-05, Loss: 797.7299194335938
2024-08-03T06:10:45.654898670Z 
  4%|▎         | 351/9500 [1:13:15<31:22:26, 12.35s/it]08/02/2024 23:10:45 - INFO - __main__ -   Step: 351, LR: 1.9856676287149367e-05, Loss: 819.3013305664062
2024-08-03T06:10:58.024881085Z 
  4%|▎         | 352/9500 [1:13:27<31:23:22, 12.35s/it]08/02/2024 23:10:58 - INFO - __main__ -   Step: 352, LR: 1.9854505743462087e-05, Loss: 723.2139282226562
2024-08-03T06:11:10.189601267Z 
  4%|▎         | 353/9500 [1:13:40<31:14:33, 12.30s/it]08/02/2024 23:11:10 - INFO - __main__ -   Step: 353, LR: 1.985233519977481e-05, Loss: 617.231201171875
2024-08-03T06:11:22.408765977Z 
  4%|▎         | 354/9500 [1:13:52<31:10:50, 12.27s/it]08/02/2024 23:11:22 - INFO - __main__ -   Step: 354, LR: 1.985016465608753e-05, Loss: 642.3848876953125
2024-08-03T06:11:34.785278805Z 
  4%|▎         | 355/9500 [1:14:04<31:15:21, 12.30s/it]08/02/2024 23:11:34 - INFO - __main__ -   Step: 355, LR: 1.984799411240025e-05, Loss: 859.0439453125
2024-08-03T06:11:47.131706469Z 
  4%|▎         | 356/9500 [1:14:17<31:17:05, 12.32s/it]08/02/2024 23:11:47 - INFO - __main__ -   Step: 356, LR: 1.9845823568712973e-05, Loss: 657.815673828125
2024-08-03T06:11:59.331735607Z 
  4%|▍         | 357/9500 [1:14:29<31:11:31, 12.28s/it]08/02/2024 23:11:59 - INFO - __main__ -   Step: 357, LR: 1.9843653025025693e-05, Loss: 633.5601196289062
2024-08-03T06:12:11.779840984Z 
  4%|▍         | 358/9500 [1:14:41<31:18:55, 12.33s/it]08/02/2024 23:12:11 - INFO - __main__ -   Step: 358, LR: 1.9841482481338413e-05, Loss: 727.4357299804688
2024-08-03T06:12:23.924150491Z 
  4%|▍         | 359/9500 [1:14:53<31:10:10, 12.28s/it]08/02/2024 23:12:23 - INFO - __main__ -   Step: 359, LR: 1.9839311937651132e-05, Loss: 659.6218872070312
2024-08-03T06:12:36.381124675Z 
  4%|▍         | 360/9500 [1:15:06<31:18:15, 12.33s/it]08/02/2024 23:12:36 - INFO - __main__ -   Step: 360, LR: 1.9837141393963856e-05, Loss: 691.08642578125
2024-08-03T06:12:48.763741089Z 
  4%|▍         | 361/9500 [1:15:18<31:20:27, 12.35s/it]08/02/2024 23:12:48 - INFO - __main__ -   Step: 361, LR: 1.9834970850276575e-05, Loss: 636.2112426757812
2024-08-03T06:13:01.049229259Z 
  4%|▍         | 362/9500 [1:15:30<31:17:30, 12.33s/it]08/02/2024 23:13:01 - INFO - __main__ -   Step: 362, LR: 1.98328003065893e-05, Loss: 673.708984375
2024-08-03T06:13:13.125436994Z 
  4%|▍         | 363/9500 [1:15:43<31:05:48, 12.25s/it]08/02/2024 23:13:13 - INFO - __main__ -   Step: 363, LR: 1.983062976290202e-05, Loss: 713.3922119140625
2024-08-03T06:13:25.591184713Z 
  4%|▍         | 364/9500 [1:15:55<31:15:21, 12.32s/it]08/02/2024 23:13:25 - INFO - __main__ -   Step: 364, LR: 1.982845921921474e-05, Loss: 757.3699951171875
2024-08-03T06:13:37.745444475Z 
  4%|▍         | 365/9500 [1:16:07<31:07:45, 12.27s/it]08/02/2024 23:13:37 - INFO - __main__ -   Step: 365, LR: 1.9826288675527462e-05, Loss: 826.5706787109375
2024-08-03T06:13:49.910631061Z 
  4%|▍         | 366/9500 [1:16:19<31:02:51, 12.24s/it]08/02/2024 23:13:49 - INFO - __main__ -   Step: 366, LR: 1.982411813184018e-05, Loss: 531.85302734375
2024-08-03T06:14:02.303505019Z 
  4%|▍         | 367/9500 [1:16:32<31:09:47, 12.28s/it]08/02/2024 23:14:02 - INFO - __main__ -   Step: 367, LR: 1.9821947588152905e-05, Loss: 791.1414794921875
2024-08-03T06:14:14.439071736Z 
  4%|▍         | 368/9500 [1:16:44<31:02:48, 12.24s/it]08/02/2024 23:14:14 - INFO - __main__ -   Step: 368, LR: 1.9819777044465625e-05, Loss: 732.2603759765625
2024-08-03T06:14:26.688852293Z 
  4%|▍         | 369/9500 [1:16:56<31:03:05, 12.24s/it]08/02/2024 23:14:26 - INFO - __main__ -   Step: 369, LR: 1.9817606500778345e-05, Loss: 755.0555419921875
2024-08-03T06:14:39.370941061Z 
  4%|▍         | 370/9500 [1:17:09<31:22:57, 12.37s/it]08/02/2024 23:14:39 - INFO - __main__ -   Step: 370, LR: 1.9815435957091068e-05, Loss: 643.94287109375
2024-08-03T06:14:51.610625824Z 
  4%|▍         | 371/9500 [1:17:21<31:16:36, 12.33s/it]08/02/2024 23:14:51 - INFO - __main__ -   Step: 371, LR: 1.9813265413403788e-05, Loss: 867.4365234375
2024-08-03T06:15:03.962573532Z 
  4%|▍         | 372/9500 [1:17:33<31:17:10, 12.34s/it]08/02/2024 23:15:03 - INFO - __main__ -   Step: 372, LR: 1.9811094869716508e-05, Loss: 613.5740966796875
2024-08-03T06:15:16.415351128Z 
  4%|▍         | 373/9500 [1:17:46<31:22:12, 12.37s/it]08/02/2024 23:15:16 - INFO - __main__ -   Step: 373, LR: 1.9808924326029227e-05, Loss: 647.14404296875
2024-08-03T06:15:28.650159191Z 
  4%|▍         | 374/9500 [1:17:58<31:15:37, 12.33s/it]08/02/2024 23:15:28 - INFO - __main__ -   Step: 374, LR: 1.980675378234195e-05, Loss: 717.0784912109375
2024-08-03T06:15:40.972961899Z 
  4%|▍         | 375/9500 [1:18:10<31:15:04, 12.33s/it]08/02/2024 23:15:40 - INFO - __main__ -   Step: 375, LR: 1.980458323865467e-05, Loss: 674.4645385742188
2024-08-03T06:15:53.629615167Z 
  4%|▍         | 376/9500 [1:18:23<31:29:48, 12.43s/it]08/02/2024 23:15:53 - INFO - __main__ -   Step: 376, LR: 1.9802412694967394e-05, Loss: 953.6165161132812
2024-08-03T06:16:05.664057823Z 
  4%|▍         | 377/9500 [1:18:35<31:11:40, 12.31s/it]08/02/2024 23:16:05 - INFO - __main__ -   Step: 377, LR: 1.9800242151280114e-05, Loss: 674.3040161132812
2024-08-03T06:16:17.843882848Z 
  4%|▍         | 378/9500 [1:18:47<31:05:32, 12.27s/it]08/02/2024 23:16:17 - INFO - __main__ -   Step: 378, LR: 1.9798071607592834e-05, Loss: 772.3612670898438
2024-08-03T06:16:30.176587151Z 
  4%|▍         | 379/9500 [1:19:00<31:08:10, 12.29s/it]08/02/2024 23:16:30 - INFO - __main__ -   Step: 379, LR: 1.9795901063905557e-05, Loss: 540.8782958984375
2024-08-03T06:16:42.636067324Z 
  4%|▍         | 380/9500 [1:19:12<31:15:43, 12.34s/it]08/02/2024 23:16:42 - INFO - __main__ -   Step: 380, LR: 1.9793730520218277e-05, Loss: 620.1791381835938
2024-08-03T06:16:54.849330488Z 
  4%|▍         | 381/9500 [1:19:24<31:09:43, 12.30s/it]08/02/2024 23:16:54 - INFO - __main__ -   Step: 381, LR: 1.9791559976531e-05, Loss: 730.1820068359375
2024-08-03T06:17:06.903891379Z 
  4%|▍         | 382/9500 [1:19:36<30:58:14, 12.23s/it]08/02/2024 23:17:06 - INFO - __main__ -   Step: 382, LR: 1.978938943284372e-05, Loss: 753.9471435546875
2024-08-03T06:17:19.434365226Z 
  4%|▍         | 383/9500 [1:19:49<31:11:49, 12.32s/it]08/02/2024 23:17:19 - INFO - __main__ -   Step: 383, LR: 1.978721888915644e-05, Loss: 648.8341674804688
2024-08-03T06:17:31.434486957Z 
  4%|▍         | 384/9500 [1:20:01<30:57:05, 12.22s/it]08/02/2024 23:17:31 - INFO - __main__ -   Step: 384, LR: 1.9785048345469163e-05, Loss: 863.946533203125
2024-08-03T06:17:43.749112994Z 
  4%|▍         | 385/9500 [1:20:13<31:01:03, 12.25s/it]08/02/2024 23:17:43 - INFO - __main__ -   Step: 385, LR: 1.9782877801781883e-05, Loss: 653.4896850585938
2024-08-03T06:17:56.110734416Z 
  4%|▍         | 386/9500 [1:20:26<31:05:55, 12.28s/it]08/02/2024 23:17:56 - INFO - __main__ -   Step: 386, LR: 1.9780707258094603e-05, Loss: 580.040771484375
2024-08-03T06:18:08.361086542Z 
  4%|▍         | 387/9500 [1:20:38<31:04:11, 12.27s/it]08/02/2024 23:18:08 - INFO - __main__ -   Step: 387, LR: 1.9778536714407322e-05, Loss: 673.6163330078125
2024-08-03T06:18:21.402139265Z 
  4%|▍         | 388/9500 [1:20:51<31:38:56, 12.50s/it]08/02/2024 23:18:21 - INFO - __main__ -   Step: 388, LR: 1.9776366170720046e-05, Loss: 751.693603515625
2024-08-03T06:18:33.685371516Z 
  4%|▍         | 389/9500 [1:21:03<31:28:40, 12.44s/it]08/02/2024 23:18:33 - INFO - __main__ -   Step: 389, LR: 1.9774195627032766e-05, Loss: 623.6876831054688
2024-08-03T06:18:45.783872525Z 
  4%|▍         | 390/9500 [1:21:15<31:13:00, 12.34s/it]08/02/2024 23:18:45 - INFO - __main__ -   Step: 390, LR: 1.977202508334549e-05, Loss: 642.286865234375
2024-08-03T06:18:57.903438169Z 
  4%|▍         | 391/9500 [1:21:27<31:02:56, 12.27s/it]08/02/2024 23:18:57 - INFO - __main__ -   Step: 391, LR: 1.976985453965821e-05, Loss: 741.844970703125
2024-08-03T06:19:10.238583975Z 
  4%|▍         | 392/9500 [1:21:40<31:05:40, 12.29s/it]08/02/2024 23:19:10 - INFO - __main__ -   Step: 392, LR: 1.976768399597093e-05, Loss: 636.5910034179688
2024-08-03T06:19:22.625079769Z 
  4%|▍         | 393/9500 [1:21:52<31:09:50, 12.32s/it]08/02/2024 23:19:22 - INFO - __main__ -   Step: 393, LR: 1.9765513452283652e-05, Loss: 677.24951171875
2024-08-03T06:19:35.099576368Z 
  4%|▍         | 394/9500 [1:22:05<31:16:42, 12.37s/it]08/02/2024 23:19:35 - INFO - __main__ -   Step: 394, LR: 1.9763342908596372e-05, Loss: 667.9368896484375
2024-08-03T06:19:47.519394069Z 
  4%|▍         | 395/9500 [1:22:17<31:18:58, 12.38s/it]08/02/2024 23:19:47 - INFO - __main__ -   Step: 395, LR: 1.9761172364909095e-05, Loss: 569.4988403320312
2024-08-03T06:19:59.818872483Z 
  4%|▍         | 396/9500 [1:22:29<31:14:59, 12.36s/it]08/02/2024 23:19:59 - INFO - __main__ -   Step: 396, LR: 1.9759001821221815e-05, Loss: 649.9729614257812
2024-08-03T06:20:12.253897387Z 
  4%|▍         | 397/9500 [1:22:42<31:18:20, 12.38s/it]08/02/2024 23:20:12 - INFO - __main__ -   Step: 397, LR: 1.9756831277534535e-05, Loss: 859.6892700195312
2024-08-03T06:20:24.780530388Z 
  4%|▍         | 398/9500 [1:22:54<31:24:46, 12.42s/it]08/02/2024 23:20:24 - INFO - __main__ -   Step: 398, LR: 1.9754660733847258e-05, Loss: 713.95751953125
2024-08-03T06:20:36.841594490Z 
  4%|▍         | 399/9500 [1:23:06<31:08:02, 12.32s/it]08/02/2024 23:20:36 - INFO - __main__ -   Step: 399, LR: 1.9752490190159978e-05, Loss: 629.6963500976562
2024-08-03T06:20:48.899736090Z 
  4%|▍         | 400/9500 [1:23:18<30:56:07, 12.24s/it]08/02/2024 23:20:48 - INFO - __main__ -   Step: 400, LR: 1.9750319646472698e-05, Loss: 605.3978271484375
2024-08-03T06:21:01.479314193Z 
  4%|▍         | 401/9500 [1:23:31<31:11:27, 12.34s/it]08/02/2024 23:21:01 - INFO - __main__ -   Step: 401, LR: 1.9748149102785418e-05, Loss: 753.27587890625
2024-08-03T06:21:13.739411158Z 
  4%|▍         | 402/9500 [1:23:43<31:07:35, 12.32s/it]08/02/2024 23:21:13 - INFO - __main__ -   Step: 402, LR: 1.974597855909814e-05, Loss: 868.56396484375
2024-08-03T06:21:26.234956334Z 
  4%|▍         | 403/9500 [1:23:56<31:15:32, 12.37s/it]08/02/2024 23:21:26 - INFO - __main__ -   Step: 403, LR: 1.974380801541086e-05, Loss: 801.1940307617188
2024-08-03T06:21:38.628933878Z 
  4%|▍         | 404/9500 [1:24:08<31:16:24, 12.38s/it]08/02/2024 23:21:38 - INFO - __main__ -   Step: 404, LR: 1.9741637471723584e-05, Loss: 683.8402709960938
2024-08-03T06:21:50.765673886Z 
  4%|▍         | 405/9500 [1:24:20<31:05:15, 12.31s/it]08/02/2024 23:21:50 - INFO - __main__ -   Step: 405, LR: 1.9739466928036304e-05, Loss: 619.495361328125
2024-08-03T06:22:02.955561344Z 
  4%|▍         | 406/9500 [1:24:32<30:59:48, 12.27s/it]08/02/2024 23:22:02 - INFO - __main__ -   Step: 406, LR: 1.9737296384349024e-05, Loss: 522.4138793945312
2024-08-03T06:22:15.573611003Z 
  4%|▍         | 407/9500 [1:24:45<31:15:23, 12.37s/it]08/02/2024 23:22:15 - INFO - __main__ -   Step: 407, LR: 1.9735125840661747e-05, Loss: 915.572509765625
2024-08-03T06:22:27.697448028Z 
  4%|▍         | 408/9500 [1:24:57<31:03:46, 12.30s/it]08/02/2024 23:22:27 - INFO - __main__ -   Step: 408, LR: 1.9732955296974467e-05, Loss: 705.90673828125
2024-08-03T06:22:40.219110537Z 
  4%|▍         | 409/9500 [1:25:10<31:13:41, 12.37s/it]08/02/2024 23:22:40 - INFO - __main__ -   Step: 409, LR: 1.973078475328719e-05, Loss: 674.070556640625
2024-08-03T06:22:52.817248107Z 
  4%|▍         | 410/9500 [1:25:22<31:24:01, 12.44s/it]08/02/2024 23:22:52 - INFO - __main__ -   Step: 410, LR: 1.972861420959991e-05, Loss: 636.7470703125
2024-08-03T06:23:04.929866772Z 
  4%|▍         | 411/9500 [1:25:34<31:09:07, 12.34s/it]08/02/2024 23:23:04 - INFO - __main__ -   Step: 411, LR: 1.972644366591263e-05, Loss: 873.6583251953125
2024-08-03T06:23:16.868443632Z 
  4%|▍         | 412/9500 [1:25:46<30:50:43, 12.22s/it]08/02/2024 23:23:16 - INFO - __main__ -   Step: 412, LR: 1.9724273122225353e-05, Loss: 666.410888671875
2024-08-03T06:23:29.444178628Z 
  4%|▍         | 413/9500 [1:25:59<31:06:44, 12.33s/it]08/02/2024 23:23:29 - INFO - __main__ -   Step: 413, LR: 1.9722102578538073e-05, Loss: 731.1241455078125
2024-08-03T06:23:41.715982038Z 
  4%|▍         | 414/9500 [1:26:11<31:04:05, 12.31s/it]08/02/2024 23:23:41 - INFO - __main__ -   Step: 414, LR: 1.9719932034850793e-05, Loss: 591.2382202148438
2024-08-03T06:23:53.758209926Z 
  4%|▍         | 415/9500 [1:26:23<30:51:44, 12.23s/it]08/02/2024 23:23:53 - INFO - __main__ -   Step: 415, LR: 1.9717761491163513e-05, Loss: 803.3895263671875
2024-08-03T06:24:06.480630176Z 
  4%|▍         | 416/9500 [1:26:36<31:13:55, 12.38s/it]08/02/2024 23:24:06 - INFO - __main__ -   Step: 416, LR: 1.9715590947476236e-05, Loss: 680.630859375
2024-08-03T06:24:18.440455591Z 
  4%|▍         | 417/9500 [1:26:48<30:54:45, 12.25s/it]08/02/2024 23:24:18 - INFO - __main__ -   Step: 417, LR: 1.9713420403788956e-05, Loss: 853.145263671875
2024-08-03T06:24:31.326782494Z 
  4%|▍         | 418/9500 [1:27:01<31:23:20, 12.44s/it]08/02/2024 23:24:31 - INFO - __main__ -   Step: 418, LR: 1.971124986010168e-05, Loss: 1061.139404296875
2024-08-03T06:24:43.693897322Z 
  4%|▍         | 419/9500 [1:27:13<31:19:44, 12.42s/it]08/02/2024 23:24:43 - INFO - __main__ -   Step: 419, LR: 1.97090793164144e-05, Loss: 533.7657470703125
2024-08-03T06:24:55.758191525Z 
  4%|▍         | 420/9500 [1:27:25<31:03:23, 12.31s/it]08/02/2024 23:24:55 - INFO - __main__ -   Step: 420, LR: 1.970690877272712e-05, Loss: 709.5706176757812
2024-08-03T06:25:08.088813059Z 
  4%|▍         | 421/9500 [1:27:38<31:03:59, 12.32s/it]08/02/2024 23:25:08 - INFO - __main__ -   Step: 421, LR: 1.9704738229039842e-05, Loss: 795.0972900390625
2024-08-03T06:25:21.473142751Z 
  4%|▍         | 422/9500 [1:27:51<31:52:09, 12.64s/it]08/02/2024 23:25:21 - INFO - __main__ -   Step: 422, LR: 1.9702567685352562e-05, Loss: 977.797607421875
2024-08-03T06:25:33.768656751Z 
  4%|▍         | 423/9500 [1:28:03<31:36:23, 12.54s/it]08/02/2024 23:25:33 - INFO - __main__ -   Step: 423, LR: 1.9700397141665285e-05, Loss: 865.7545776367188
2024-08-03T06:25:45.681537380Z 
  4%|▍         | 424/9500 [1:28:15<31:07:55, 12.35s/it]08/02/2024 23:25:45 - INFO - __main__ -   Step: 424, LR: 1.9698226597978005e-05, Loss: 676.05078125
2024-08-03T06:25:57.915461781Z 
  4%|▍         | 425/9500 [1:28:27<31:02:31, 12.31s/it]08/02/2024 23:25:57 - INFO - __main__ -   Step: 425, LR: 1.9696056054290725e-05, Loss: 659.6317138671875
2024-08-03T06:26:10.502309457Z 
  4%|▍         | 426/9500 [1:28:40<31:14:40, 12.40s/it]08/02/2024 23:26:10 - INFO - __main__ -   Step: 426, LR: 1.9693885510603448e-05, Loss: 673.6114501953125
2024-08-03T06:26:22.756146142Z 
  4%|▍         | 427/9500 [1:28:52<31:08:02, 12.35s/it]08/02/2024 23:26:22 - INFO - __main__ -   Step: 427, LR: 1.9691714966916168e-05, Loss: 688.9948120117188
2024-08-03T06:26:34.962909244Z 
  5%|▍         | 428/9500 [1:29:04<31:01:10, 12.31s/it]08/02/2024 23:26:34 - INFO - __main__ -   Step: 428, LR: 1.9689544423228888e-05, Loss: 829.604736328125
2024-08-03T06:26:47.332156145Z 
  5%|▍         | 429/9500 [1:29:17<31:03:41, 12.33s/it]08/02/2024 23:26:47 - INFO - __main__ -   Step: 429, LR: 1.9687373879541608e-05, Loss: 625.0380249023438
2024-08-03T06:26:59.805620662Z 
  5%|▍         | 430/9500 [1:29:29<31:10:06, 12.37s/it]08/02/2024 23:26:59 - INFO - __main__ -   Step: 430, LR: 1.968520333585433e-05, Loss: 753.1527099609375
2024-08-03T06:27:12.108597284Z 
  5%|▍         | 431/9500 [1:29:42<31:06:49, 12.35s/it]08/02/2024 23:27:12 - INFO - __main__ -   Step: 431, LR: 1.968303279216705e-05, Loss: 765.0003662109375
2024-08-03T06:27:24.562168477Z 
  5%|▍         | 432/9500 [1:29:54<31:11:15, 12.38s/it]08/02/2024 23:27:24 - INFO - __main__ -   Step: 432, LR: 1.9680862248479774e-05, Loss: 693.1602783203125
2024-08-03T06:27:36.692650954Z 
  5%|▍         | 433/9500 [1:30:06<30:59:41, 12.31s/it]08/02/2024 23:27:36 - INFO - __main__ -   Step: 433, LR: 1.9678691704792494e-05, Loss: 752.15673828125
2024-08-03T06:27:48.952102163Z 
  5%|▍         | 434/9500 [1:30:18<30:57:20, 12.29s/it]08/02/2024 23:27:48 - INFO - __main__ -   Step: 434, LR: 1.9676521161105214e-05, Loss: 736.2980346679688
2024-08-03T06:28:01.411244055Z 
  5%|▍         | 435/9500 [1:30:31<31:04:41, 12.34s/it]08/02/2024 23:28:01 - INFO - __main__ -   Step: 435, LR: 1.9674350617417937e-05, Loss: 779.8369140625
2024-08-03T06:28:13.459447119Z 
  5%|▍         | 436/9500 [1:30:43<30:51:11, 12.25s/it]08/02/2024 23:28:13 - INFO - __main__ -   Step: 436, LR: 1.9672180073730657e-05, Loss: 614.4959716796875
2024-08-03T06:28:25.630857866Z 
  5%|▍         | 437/9500 [1:30:55<30:47:14, 12.23s/it]08/02/2024 23:28:25 - INFO - __main__ -   Step: 437, LR: 1.967000953004338e-05, Loss: 809.6599731445312
2024-08-03T06:28:38.396373748Z 
  5%|▍         | 438/9500 [1:31:08<31:11:19, 12.39s/it]08/02/2024 23:28:38 - INFO - __main__ -   Step: 438, LR: 1.96678389863561e-05, Loss: 874.9729614257812
2024-08-03T06:28:50.688649606Z 
  5%|▍         | 439/9500 [1:31:20<31:06:41, 12.36s/it]08/02/2024 23:28:50 - INFO - __main__ -   Step: 439, LR: 1.9665668442668823e-05, Loss: 689.558349609375
2024-08-03T06:29:03.270906716Z 
  5%|▍         | 440/9500 [1:31:33<31:16:30, 12.43s/it]08/02/2024 23:29:03 - INFO - __main__ -   Step: 440, LR: 1.9663497898981543e-05, Loss: 634.2291870117188
2024-08-03T06:29:15.883757563Z 
  5%|▍         | 441/9500 [1:31:45<31:24:43, 12.48s/it]08/02/2024 23:29:15 - INFO - __main__ -   Step: 441, LR: 1.9661327355294263e-05, Loss: 884.2178955078125
2024-08-03T06:29:28.050792294Z 
  5%|▍         | 442/9500 [1:31:57<31:10:11, 12.39s/it]08/02/2024 23:29:28 - INFO - __main__ -   Step: 442, LR: 1.9659156811606983e-05, Loss: 887.3457641601562
2024-08-03T06:29:40.083085183Z 
  5%|▍         | 443/9500 [1:32:10<30:53:52, 12.28s/it]08/02/2024 23:29:40 - INFO - __main__ -   Step: 443, LR: 1.9656986267919703e-05, Loss: 689.2947998046875
2024-08-03T06:29:52.714888622Z 
  5%|▍         | 444/9500 [1:32:22<31:09:32, 12.39s/it]08/02/2024 23:29:52 - INFO - __main__ -   Step: 444, LR: 1.9654815724232426e-05, Loss: 873.3381958007812
2024-08-03T06:30:04.638879301Z 
  5%|▍         | 445/9500 [1:32:34<30:48:23, 12.25s/it]08/02/2024 23:30:04 - INFO - __main__ -   Step: 445, LR: 1.9652645180545146e-05, Loss: 660.6295166015625
2024-08-03T06:30:16.782441635Z 
  5%|▍         | 446/9500 [1:32:46<30:43:27, 12.22s/it]08/02/2024 23:30:16 - INFO - __main__ -   Step: 446, LR: 1.965047463685787e-05, Loss: 764.553955078125
2024-08-03T06:30:29.348911022Z 
  5%|▍         | 447/9500 [1:32:59<30:59:06, 12.32s/it]08/02/2024 23:30:29 - INFO - __main__ -   Step: 447, LR: 1.964830409317059e-05, Loss: 947.6945190429688
2024-08-03T06:30:41.779790556Z 
  5%|▍         | 448/9500 [1:33:11<31:03:51, 12.35s/it]08/02/2024 23:30:41 - INFO - __main__ -   Step: 448, LR: 1.9646133549483312e-05, Loss: 807.9619140625
2024-08-03T06:30:53.935665364Z 
  5%|▍         | 449/9500 [1:33:23<30:54:40, 12.29s/it]08/02/2024 23:30:53 - INFO - __main__ -   Step: 449, LR: 1.9643963005796032e-05, Loss: 688.7421264648438
2024-08-03T06:31:06.546934303Z 
  5%|▍         | 450/9500 [1:33:36<31:08:46, 12.39s/it]08/02/2024 23:31:06 - INFO - __main__ -   Step: 450, LR: 1.9641792462108752e-05, Loss: 626.8707275390625
2024-08-03T06:31:18.704600017Z 
  5%|▍         | 451/9500 [1:33:48<30:58:04, 12.32s/it]08/02/2024 23:31:18 - INFO - __main__ -   Step: 451, LR: 1.9639621918421475e-05, Loss: 587.2490234375
2024-08-03T06:31:30.993066427Z 
  5%|▍         | 452/9500 [1:34:00<30:56:26, 12.31s/it]08/02/2024 23:31:30 - INFO - __main__ -   Step: 452, LR: 1.9637451374734195e-05, Loss: 554.8001098632812
2024-08-03T06:31:43.509170286Z 
  5%|▍         | 453/9500 [1:34:13<31:05:31, 12.37s/it]08/02/2024 23:31:43 - INFO - __main__ -   Step: 453, LR: 1.963528083104692e-05, Loss: 625.6378173828125
2024-08-03T06:31:55.452078699Z 
  5%|▍         | 454/9500 [1:34:25<30:45:54, 12.24s/it]08/02/2024 23:31:55 - INFO - __main__ -   Step: 454, LR: 1.9633110287359638e-05, Loss: 618.630615234375
2024-08-03T06:32:08.003060366Z 
  5%|▍         | 455/9500 [1:34:37<30:59:35, 12.34s/it]08/02/2024 23:32:08 - INFO - __main__ -   Step: 455, LR: 1.9630939743672358e-05, Loss: 644.6248779296875
2024-08-03T06:32:20.414526042Z 
  5%|▍         | 456/9500 [1:34:50<31:02:50, 12.36s/it]08/02/2024 23:32:20 - INFO - __main__ -   Step: 456, LR: 1.9628769199985078e-05, Loss: 717.3422241210938
2024-08-03T06:32:32.477479668Z 
  5%|▍         | 457/9500 [1:35:02<30:49:15, 12.27s/it]08/02/2024 23:32:32 - INFO - __main__ -   Step: 457, LR: 1.96265986562978e-05, Loss: 729.721435546875
2024-08-03T06:32:44.511674401Z 
  5%|▍         | 458/9500 [1:35:14<30:38:24, 12.20s/it]08/02/2024 23:32:44 - INFO - __main__ -   Step: 458, LR: 1.962442811261052e-05, Loss: 704.1012573242188
2024-08-03T06:32:57.075577225Z 
  5%|▍         | 459/9500 [1:35:27<30:54:41, 12.31s/it]08/02/2024 23:32:57 - INFO - __main__ -   Step: 459, LR: 1.962225756892324e-05, Loss: 612.1209716796875
2024-08-03T06:33:09.162348153Z 
  5%|▍         | 460/9500 [1:35:39<30:44:27, 12.24s/it]08/02/2024 23:33:09 - INFO - __main__ -   Step: 460, LR: 1.9620087025235964e-05, Loss: 643.7096557617188
2024-08-03T06:33:21.578172797Z 
  5%|▍         | 461/9500 [1:35:51<30:52:06, 12.29s/it]08/02/2024 23:33:21 - INFO - __main__ -   Step: 461, LR: 1.9617916481548684e-05, Loss: 682.7646484375
2024-08-03T06:33:33.956482061Z 
  5%|▍         | 462/9500 [1:36:03<30:55:43, 12.32s/it]08/02/2024 23:33:33 - INFO - __main__ -   Step: 462, LR: 1.9615745937861407e-05, Loss: 476.56671142578125
2024-08-03T06:33:46.045587396Z 
  5%|▍         | 463/9500 [1:36:15<30:45:06, 12.25s/it]08/02/2024 23:33:46 - INFO - __main__ -   Step: 463, LR: 1.9613575394174127e-05, Loss: 713.1748657226562
2024-08-03T06:33:57.916272101Z 
  5%|▍         | 464/9500 [1:36:27<30:27:45, 12.14s/it]08/02/2024 23:33:57 - INFO - __main__ -   Step: 464, LR: 1.9611404850486847e-05, Loss: 603.9369506835938
2024-08-03T06:34:10.822551882Z 
  5%|▍         | 465/9500 [1:36:40<31:02:19, 12.37s/it]08/02/2024 23:34:10 - INFO - __main__ -   Step: 465, LR: 1.960923430679957e-05, Loss: 811.3717041015625
2024-08-03T06:34:22.751955035Z 
  5%|▍         | 466/9500 [1:36:52<30:42:19, 12.24s/it]08/02/2024 23:34:22 - INFO - __main__ -   Step: 466, LR: 1.960706376311229e-05, Loss: 638.1512451171875
2024-08-03T06:34:34.902608475Z 
  5%|▍         | 467/9500 [1:37:04<30:38:16, 12.21s/it]08/02/2024 23:34:34 - INFO - __main__ -   Step: 467, LR: 1.9604893219425013e-05, Loss: 726.1263427734375
2024-08-03T06:34:47.458540948Z 
  5%|▍         | 468/9500 [1:37:17<30:53:40, 12.31s/it]08/02/2024 23:34:47 - INFO - __main__ -   Step: 468, LR: 1.9602722675737733e-05, Loss: 656.3861083984375
2024-08-03T06:35:00.019275959Z 
  5%|▍         | 469/9500 [1:37:29<31:04:36, 12.39s/it]08/02/2024 23:35:00 - INFO - __main__ -   Step: 469, LR: 1.9600552132050453e-05, Loss: 759.0458374023438
2024-08-03T06:35:12.142348244Z 
  5%|▍         | 470/9500 [1:37:42<30:52:25, 12.31s/it]08/02/2024 23:35:12 - INFO - __main__ -   Step: 470, LR: 1.9598381588363173e-05, Loss: 658.7054443359375
2024-08-03T06:35:24.211931847Z 
  5%|▍         | 471/9500 [1:37:54<30:41:26, 12.24s/it]08/02/2024 23:35:24 - INFO - __main__ -   Step: 471, LR: 1.9596211044675896e-05, Loss: 612.12646484375
2024-08-03T06:35:36.397656857Z 
  5%|▍         | 472/9500 [1:38:06<30:38:55, 12.22s/it]08/02/2024 23:35:36 - INFO - __main__ -   Step: 472, LR: 1.9594040500988616e-05, Loss: 674.596435546875
2024-08-03T06:35:48.506899882Z 
  5%|▍         | 473/9500 [1:38:18<30:33:39, 12.19s/it]08/02/2024 23:35:48 - INFO - __main__ -   Step: 473, LR: 1.9591869957301336e-05, Loss: 667.654296875
2024-08-03T06:36:00.776670622Z 
  5%|▍         | 474/9500 [1:38:30<30:37:08, 12.21s/it]08/02/2024 23:36:00 - INFO - __main__ -   Step: 474, LR: 1.958969941361406e-05, Loss: 723.9376220703125
2024-08-03T06:36:13.254714813Z 
  5%|▌         | 475/9500 [1:38:43<30:48:56, 12.29s/it]08/02/2024 23:36:13 - INFO - __main__ -   Step: 475, LR: 1.958752886992678e-05, Loss: 678.2221069335938
2024-08-03T06:36:25.292362029Z 
  5%|▌         | 476/9500 [1:38:55<30:37:14, 12.22s/it]08/02/2024 23:36:25 - INFO - __main__ -   Step: 476, LR: 1.9585358326239502e-05, Loss: 535.2278442382812
2024-08-03T06:36:37.359452171Z 
  5%|▌         | 477/9500 [1:39:07<30:30:20, 12.17s/it]08/02/2024 23:36:37 - INFO - __main__ -   Step: 477, LR: 1.9583187782552222e-05, Loss: 617.447021484375
2024-08-03T06:36:49.862552729Z 
  5%|▌         | 478/9500 [1:39:19<30:45:06, 12.27s/it]08/02/2024 23:36:49 - INFO - __main__ -   Step: 478, LR: 1.9581017238864942e-05, Loss: 746.2391357421875
2024-08-03T06:37:02.008857527Z 
  5%|▌         | 479/9500 [1:39:31<30:39:18, 12.23s/it]08/02/2024 23:37:02 - INFO - __main__ -   Step: 479, LR: 1.9578846695177665e-05, Loss: 864.9736328125
2024-08-03T06:37:14.459186246Z 
  5%|▌         | 480/9500 [1:39:44<30:48:51, 12.30s/it]08/02/2024 23:37:14 - INFO - __main__ -   Step: 480, LR: 1.9576676151490385e-05, Loss: 546.63818359375
2024-08-03T06:37:26.939051040Z 
  5%|▌         | 481/9500 [1:39:56<30:56:51, 12.35s/it]08/02/2024 23:37:26 - INFO - __main__ -   Step: 481, LR: 1.957450560780311e-05, Loss: 754.4591064453125
2024-08-03T06:37:39.271697153Z 
  5%|▌         | 482/9500 [1:40:09<30:55:40, 12.35s/it]08/02/2024 23:37:39 - INFO - __main__ -   Step: 482, LR: 1.9572335064115828e-05, Loss: 787.4139404296875
2024-08-03T06:37:51.918483119Z 
  5%|▌         | 483/9500 [1:40:21<31:09:04, 12.44s/it]08/02/2024 23:37:51 - INFO - __main__ -   Step: 483, LR: 1.9570164520428548e-05, Loss: 938.053466796875
2024-08-03T06:38:04.406192636Z 
  5%|▌         | 484/9500 [1:40:34<31:11:08, 12.45s/it]08/02/2024 23:38:04 - INFO - __main__ -   Step: 484, LR: 1.9567993976741268e-05, Loss: 703.6124877929688
2024-08-03T06:38:16.631530524Z 
  5%|▌         | 485/9500 [1:40:46<31:00:42, 12.38s/it]08/02/2024 23:38:16 - INFO - __main__ -   Step: 485, LR: 1.956582343305399e-05, Loss: 787.53466796875
2024-08-03T06:38:29.028348643Z 
  5%|▌         | 486/9500 [1:40:58<31:01:02, 12.39s/it]08/02/2024 23:38:29 - INFO - __main__ -   Step: 486, LR: 1.956365288936671e-05, Loss: 547.3529052734375
2024-08-03T06:38:41.403021063Z 
  5%|▌         | 487/9500 [1:41:11<31:00:17, 12.38s/it]08/02/2024 23:38:41 - INFO - __main__ -   Step: 487, LR: 1.956148234567943e-05, Loss: 681.3701171875
2024-08-03T06:38:53.726731980Z 
  5%|▌         | 488/9500 [1:41:23<30:57:21, 12.37s/it]08/02/2024 23:38:53 - INFO - __main__ -   Step: 488, LR: 1.9559311801992154e-05, Loss: 577.3984375
2024-08-03T06:39:06.086975672Z 
  5%|▌         | 489/9500 [1:41:36<30:56:53, 12.36s/it]08/02/2024 23:39:06 - INFO - __main__ -   Step: 489, LR: 1.9557141258304874e-05, Loss: 807.8284912109375
2024-08-03T06:39:18.817156995Z 
  5%|▌         | 490/9500 [1:41:48<31:13:10, 12.47s/it]08/02/2024 23:39:18 - INFO - __main__ -   Step: 490, LR: 1.9554970714617597e-05, Loss: 909.19384765625
2024-08-03T06:39:31.209534742Z 
  5%|▌         | 491/9500 [1:42:01<31:09:17, 12.45s/it]08/02/2024 23:39:31 - INFO - __main__ -   Step: 491, LR: 1.9552800170930317e-05, Loss: 784.555419921875
2024-08-03T06:39:43.422169853Z 
  5%|▌         | 492/9500 [1:42:13<30:58:24, 12.38s/it]08/02/2024 23:39:43 - INFO - __main__ -   Step: 492, LR: 1.9550629627243037e-05, Loss: 703.514892578125
2024-08-03T06:39:56.194656912Z 
  5%|▌         | 493/9500 [1:42:26<31:15:55, 12.50s/it]08/02/2024 23:39:56 - INFO - __main__ -   Step: 493, LR: 1.954845908355576e-05, Loss: 687.0004272460938
2024-08-03T06:40:08.196592964Z 
  5%|▌         | 494/9500 [1:42:38<30:53:28, 12.35s/it]08/02/2024 23:40:08 - INFO - __main__ -   Step: 494, LR: 1.954628853986848e-05, Loss: 658.8687744140625
2024-08-03T06:40:20.420278031Z 
  5%|▌         | 495/9500 [1:42:50<30:47:40, 12.31s/it]08/02/2024 23:40:20 - INFO - __main__ -   Step: 495, LR: 1.9544117996181203e-05, Loss: 690.4560546875
2024-08-03T06:40:32.865674159Z 
  5%|▌         | 496/9500 [1:43:02<30:53:30, 12.35s/it]08/02/2024 23:40:32 - INFO - __main__ -   Step: 496, LR: 1.9541947452493923e-05, Loss: 713.1408081054688
2024-08-03T06:40:45.367796243Z 
  5%|▌         | 497/9500 [1:43:15<31:00:05, 12.40s/it]08/02/2024 23:40:45 - INFO - __main__ -   Step: 497, LR: 1.9539776908806643e-05, Loss: 754.2989501953125
2024-08-03T06:40:57.681933451Z 
  5%|▌         | 498/9500 [1:43:27<30:56:07, 12.37s/it]08/02/2024 23:40:57 - INFO - __main__ -   Step: 498, LR: 1.9537606365119363e-05, Loss: 712.0247802734375
2024-08-03T06:41:10.164156513Z 
  5%|▌         | 499/9500 [1:43:40<31:00:57, 12.41s/it]08/02/2024 23:41:10 - INFO - __main__ -   Step: 499, LR: 1.9535435821432086e-05, Loss: 745.3817138671875
2024-08-03T06:41:22.750939542Z 
  5%|▌         | 500/9500 [1:43:52<31:08:56, 12.46s/it]08/02/2024 23:41:22 - INFO - __main__ -   Step: 500, LR: 1.9533265277744806e-05, Loss: 854.43408203125
2024-08-03T06:41:34.575512617Z 
  5%|▌         | 501/9500 [1:44:04<30:40:09, 12.27s/it]08/02/2024 23:41:34 - INFO - __main__ -   Step: 501, LR: 1.9531094734057526e-05, Loss: 527.9759521484375
2024-08-03T06:41:46.968417879Z 
  5%|▌         | 502/9500 [1:44:16<30:45:30, 12.31s/it]08/02/2024 23:41:46 - INFO - __main__ -   Step: 502, LR: 1.952892419037025e-05, Loss: 650.1702880859375
2024-08-03T06:41:59.259841324Z 
  5%|▌         | 503/9500 [1:44:29<30:44:38, 12.30s/it]08/02/2024 23:41:59 - INFO - __main__ -   Step: 503, LR: 1.952675364668297e-05, Loss: 543.3785400390625
2024-08-03T06:42:11.221189816Z 
  5%|▌         | 504/9500 [1:44:41<30:29:08, 12.20s/it]08/02/2024 23:42:11 - INFO - __main__ -   Step: 504, LR: 1.9524583102995692e-05, Loss: 589.267333984375
2024-08-03T06:42:23.682024629Z 
  5%|▌         | 505/9500 [1:44:53<30:40:40, 12.28s/it]08/02/2024 23:42:23 - INFO - __main__ -   Step: 505, LR: 1.9522412559308412e-05, Loss: 723.4403686523438
2024-08-03T06:42:36.068188377Z 
  5%|▌         | 506/9500 [1:45:06<30:45:20, 12.31s/it]08/02/2024 23:42:36 - INFO - __main__ -   Step: 506, LR: 1.9520242015621132e-05, Loss: 616.66259765625
2024-08-03T06:42:48.528664999Z 
  5%|▌         | 507/9500 [1:45:18<30:51:52, 12.36s/it]08/02/2024 23:42:48 - INFO - __main__ -   Step: 507, LR: 1.9518071471933855e-05, Loss: 761.5570678710938
2024-08-03T06:43:01.789838465Z 
  5%|▌         | 508/9500 [1:45:31<31:32:22, 12.63s/it]08/02/2024 23:43:01 - INFO - __main__ -   Step: 508, LR: 1.9515900928246575e-05, Loss: 995.9322509765625
2024-08-03T06:43:14.523460675Z 
  5%|▌         | 509/9500 [1:45:44<31:36:58, 12.66s/it]08/02/2024 23:43:14 - INFO - __main__ -   Step: 509, LR: 1.95137303845593e-05, Loss: 693.3236083984375
2024-08-03T06:43:26.703244290Z 
  5%|▌         | 510/9500 [1:45:56<31:15:13, 12.52s/it]08/02/2024 23:43:26 - INFO - __main__ -   Step: 510, LR: 1.951155984087202e-05, Loss: 801.7774658203125
2024-08-03T06:43:39.484924254Z 
  5%|▌         | 511/9500 [1:46:09<31:26:58, 12.60s/it]08/02/2024 23:43:39 - INFO - __main__ -   Step: 511, LR: 1.9509389297184738e-05, Loss: 720.7113037109375
2024-08-03T06:43:52.114429943Z 
  5%|▌         | 512/9500 [1:46:22<31:28:17, 12.61s/it]08/02/2024 23:43:52 - INFO - __main__ -   Step: 512, LR: 1.9507218753497458e-05, Loss: 915.5374145507812
2024-08-03T06:44:04.142205733Z 
  5%|▌         | 513/9500 [1:46:34<31:02:08, 12.43s/it]08/02/2024 23:44:04 - INFO - __main__ -   Step: 513, LR: 1.950504820981018e-05, Loss: 860.249267578125
2024-08-03T06:44:16.585416951Z 
  5%|▌         | 514/9500 [1:46:46<31:02:25, 12.44s/it]08/02/2024 23:44:16 - INFO - __main__ -   Step: 514, LR: 1.95028776661229e-05, Loss: 637.2806396484375
2024-08-03T06:44:29.197362922Z 
  5%|▌         | 515/9500 [1:46:59<31:10:08, 12.49s/it]08/02/2024 23:44:29 - INFO - __main__ -   Step: 515, LR: 1.950070712243562e-05, Loss: 694.3663330078125
2024-08-03T06:44:41.322337284Z 
  5%|▌         | 516/9500 [1:47:11<30:53:36, 12.38s/it]08/02/2024 23:44:41 - INFO - __main__ -   Step: 516, LR: 1.9498536578748344e-05, Loss: 614.7853393554688
2024-08-03T06:44:53.651665042Z 
  5%|▌         | 517/9500 [1:47:23<30:51:08, 12.36s/it]08/02/2024 23:44:53 - INFO - __main__ -   Step: 517, LR: 1.9496366035061064e-05, Loss: 843.4547119140625
2024-08-03T06:45:06.256596523Z 
  5%|▌         | 518/9500 [1:47:36<31:01:45, 12.44s/it]08/02/2024 23:45:06 - INFO - __main__ -   Step: 518, LR: 1.9494195491373787e-05, Loss: 624.665283203125
2024-08-03T06:45:18.422913270Z 
  5%|▌         | 519/9500 [1:47:48<30:49:23, 12.36s/it]08/02/2024 23:45:18 - INFO - __main__ -   Step: 519, LR: 1.9492024947686507e-05, Loss: 623.3548583984375
2024-08-03T06:45:30.409064505Z 
  5%|▌         | 520/9500 [1:48:00<30:32:37, 12.24s/it]08/02/2024 23:45:30 - INFO - __main__ -   Step: 520, LR: 1.9489854403999227e-05, Loss: 697.0093383789062
2024-08-03T06:45:42.889392906Z 
  5%|▌         | 521/9500 [1:48:12<30:42:59, 12.32s/it]08/02/2024 23:45:42 - INFO - __main__ -   Step: 521, LR: 1.948768386031195e-05, Loss: 582.26123046875
2024-08-03T06:45:54.997667607Z 
  5%|▌         | 522/9500 [1:48:24<30:33:29, 12.25s/it]08/02/2024 23:45:54 - INFO - __main__ -   Step: 522, LR: 1.948551331662467e-05, Loss: 783.6581420898438
2024-08-03T06:46:07.250040525Z 
  6%|▌         | 523/9500 [1:48:37<30:33:14, 12.25s/it]08/02/2024 23:46:07 - INFO - __main__ -   Step: 523, LR: 1.9483342772937394e-05, Loss: 813.8816528320312
2024-08-03T06:46:19.883471079Z 
  6%|▌         | 524/9500 [1:48:49<30:50:07, 12.37s/it]08/02/2024 23:46:19 - INFO - __main__ -   Step: 524, LR: 1.9481172229250113e-05, Loss: 730.3236694335938
2024-08-03T06:46:31.834452379Z 
  6%|▌         | 525/9500 [1:49:01<30:31:14, 12.24s/it]08/02/2024 23:46:31 - INFO - __main__ -   Step: 525, LR: 1.9479001685562833e-05, Loss: 606.6871337890625
2024-08-03T06:46:44.583386849Z 
  6%|▌         | 526/9500 [1:49:14<30:53:46, 12.39s/it]08/02/2024 23:46:44 - INFO - __main__ -   Step: 526, LR: 1.9476831141875553e-05, Loss: 873.9811401367188
2024-08-03T06:46:56.666285455Z 
  6%|▌         | 527/9500 [1:49:26<30:39:35, 12.30s/it]08/02/2024 23:46:56 - INFO - __main__ -   Step: 527, LR: 1.9474660598188276e-05, Loss: 523.497314453125
2024-08-03T06:47:08.701851327Z 
  6%|▌         | 528/9500 [1:49:38<30:27:28, 12.22s/it]08/02/2024 23:47:08 - INFO - __main__ -   Step: 528, LR: 1.9472490054500996e-05, Loss: 609.2318115234375
2024-08-03T06:47:20.973731111Z 
  6%|▌         | 529/9500 [1:49:50<30:29:33, 12.24s/it]08/02/2024 23:47:20 - INFO - __main__ -   Step: 529, LR: 1.9470319510813716e-05, Loss: 796.4359741210938
2024-08-03T06:47:33.493904968Z 
  6%|▌         | 530/9500 [1:50:03<30:42:04, 12.32s/it]08/02/2024 23:47:33 - INFO - __main__ -   Step: 530, LR: 1.946814896712644e-05, Loss: 715.0264892578125
2024-08-03T06:47:45.542721599Z 
  6%|▌         | 531/9500 [1:50:15<30:29:38, 12.24s/it]08/02/2024 23:47:45 - INFO - __main__ -   Step: 531, LR: 1.946597842343916e-05, Loss: 709.0610961914062
2024-08-03T06:47:57.879521417Z 
  6%|▌         | 532/9500 [1:50:27<30:33:46, 12.27s/it]08/02/2024 23:47:57 - INFO - __main__ -   Step: 532, LR: 1.9463807879751883e-05, Loss: 728.3740234375
2024-08-03T06:48:10.227644600Z 
  6%|▌         | 533/9500 [1:50:40<30:37:08, 12.29s/it]08/02/2024 23:48:10 - INFO - __main__ -   Step: 533, LR: 1.9461637336064602e-05, Loss: 574.78759765625
2024-08-03T06:48:22.245811711Z 
  6%|▌         | 534/9500 [1:50:52<30:24:37, 12.21s/it]08/02/2024 23:48:22 - INFO - __main__ -   Step: 534, LR: 1.9459466792377322e-05, Loss: 564.4134521484375
2024-08-03T06:48:34.260298282Z 
  6%|▌         | 535/9500 [1:51:04<30:15:38, 12.15s/it]08/02/2024 23:48:34 - INFO - __main__ -   Step: 535, LR: 1.9457296248690046e-05, Loss: 659.838134765625
2024-08-03T06:48:46.889342582Z 
  6%|▌         | 536/9500 [1:51:16<30:36:50, 12.29s/it]08/02/2024 23:48:46 - INFO - __main__ -   Step: 536, LR: 1.9455125705002765e-05, Loss: 819.524658203125
2024-08-03T06:48:58.862873879Z 
  6%|▌         | 537/9500 [1:51:28<30:22:13, 12.20s/it]08/02/2024 23:48:58 - INFO - __main__ -   Step: 537, LR: 1.945295516131549e-05, Loss: 553.0435180664062
2024-08-03T06:49:10.866108132Z 
  6%|▌         | 538/9500 [1:51:40<30:13:17, 12.14s/it]08/02/2024 23:49:10 - INFO - __main__ -   Step: 538, LR: 1.945078461762821e-05, Loss: 668.1063232421875
2024-08-03T06:49:23.358295201Z 
  6%|▌         | 539/9500 [1:51:53<30:28:52, 12.25s/it]08/02/2024 23:49:23 - INFO - __main__ -   Step: 539, LR: 1.944861407394093e-05, Loss: 823.8182983398438
2024-08-03T06:49:35.539075052Z 
  6%|▌         | 540/9500 [1:52:05<30:25:46, 12.23s/it]08/02/2024 23:49:35 - INFO - __main__ -   Step: 540, LR: 1.9446443530253648e-05, Loss: 854.947998046875
2024-08-03T06:49:47.414904534Z 
  6%|▌         | 541/9500 [1:52:17<30:09:52, 12.12s/it]08/02/2024 23:49:47 - INFO - __main__ -   Step: 541, LR: 1.944427298656637e-05, Loss: 737.0498046875
2024-08-03T06:50:00.231585496Z 
  6%|▌         | 542/9500 [1:52:30<30:40:49, 12.33s/it]08/02/2024 23:50:00 - INFO - __main__ -   Step: 542, LR: 1.944210244287909e-05, Loss: 714.6844482421875
2024-08-03T06:50:12.120679263Z 
  6%|▌         | 543/9500 [1:52:42<30:20:53, 12.20s/it]08/02/2024 23:50:12 - INFO - __main__ -   Step: 543, LR: 1.943993189919181e-05, Loss: 688.73486328125
2024-08-03T06:50:24.399007662Z 
  6%|▌         | 544/9500 [1:52:54<30:24:17, 12.22s/it]08/02/2024 23:50:24 - INFO - __main__ -   Step: 544, LR: 1.9437761355504534e-05, Loss: 667.1062622070312
2024-08-03T06:50:36.974130202Z 
  6%|▌         | 545/9500 [1:53:06<30:39:55, 12.33s/it]08/02/2024 23:50:36 - INFO - __main__ -   Step: 545, LR: 1.9435590811817254e-05, Loss: 615.6881103515625
2024-08-03T06:50:49.236668820Z 
  6%|▌         | 546/9500 [1:53:19<30:36:48, 12.31s/it]08/02/2024 23:50:49 - INFO - __main__ -   Step: 546, LR: 1.9433420268129978e-05, Loss: 724.5928955078125
2024-08-03T06:51:01.543383961Z 
  6%|▌         | 547/9500 [1:53:31<30:36:30, 12.31s/it]08/02/2024 23:51:01 - INFO - __main__ -   Step: 547, LR: 1.9431249724442697e-05, Loss: 716.0111083984375
2024-08-03T06:51:14.180561505Z 
  6%|▌         | 548/9500 [1:53:44<30:51:04, 12.41s/it]08/02/2024 23:51:14 - INFO - __main__ -   Step: 548, LR: 1.942907918075542e-05, Loss: 740.0570068359375
2024-08-03T06:51:26.180446982Z 
  6%|▌         | 549/9500 [1:53:56<30:32:38, 12.28s/it]08/02/2024 23:51:26 - INFO - __main__ -   Step: 549, LR: 1.942690863706814e-05, Loss: 644.8089599609375
2024-08-03T06:51:38.244617160Z 
  6%|▌         | 550/9500 [1:54:08<30:22:35, 12.22s/it]08/02/2024 23:51:38 - INFO - __main__ -   Step: 550, LR: 1.942473809338086e-05, Loss: 696.4190673828125
2024-08-03T06:51:50.983084028Z 
  6%|▌         | 551/9500 [1:54:20<30:45:39, 12.37s/it]08/02/2024 23:51:50 - INFO - __main__ -   Step: 551, LR: 1.9422567549693584e-05, Loss: 547.2679443359375
2024-08-03T06:52:02.780252587Z 
  6%|▌         | 552/9500 [1:54:32<30:19:36, 12.20s/it]08/02/2024 23:52:02 - INFO - __main__ -   Step: 552, LR: 1.94203970060063e-05, Loss: 594.9498291015625
2024-08-03T06:52:14.869271598Z 
  6%|▌         | 553/9500 [1:54:44<30:14:23, 12.17s/it]08/02/2024 23:52:14 - INFO - __main__ -   Step: 553, LR: 1.9418226462319023e-05, Loss: 789.6697387695312
2024-08-03T06:52:27.273116997Z 
  6%|▌         | 554/9500 [1:54:57<30:24:45, 12.24s/it]08/02/2024 23:52:27 - INFO - __main__ -   Step: 554, LR: 1.9416055918631743e-05, Loss: 661.4986572265625
2024-08-03T06:52:39.636712899Z 
  6%|▌         | 555/9500 [1:55:09<30:30:08, 12.28s/it]08/02/2024 23:52:39 - INFO - __main__ -   Step: 555, LR: 1.9413885374944467e-05, Loss: 749.7777709960938
2024-08-03T06:52:51.626012981Z 
  6%|▌         | 556/9500 [1:55:21<30:17:07, 12.19s/it]08/02/2024 23:52:51 - INFO - __main__ -   Step: 556, LR: 1.9411714831257186e-05, Loss: 607.8793334960938
2024-08-03T06:53:03.910028147Z 
  6%|▌         | 557/9500 [1:55:33<30:21:07, 12.22s/it]08/02/2024 23:53:03 - INFO - __main__ -   Step: 557, LR: 1.940954428756991e-05, Loss: 825.1702270507812
2024-08-03T06:53:16.439107016Z 
  6%|▌         | 558/9500 [1:55:46<30:34:49, 12.31s/it]08/02/2024 23:53:16 - INFO - __main__ -   Step: 558, LR: 1.940737374388263e-05, Loss: 860.0911865234375
2024-08-03T06:53:28.394954596Z 
  6%|▌         | 559/9500 [1:55:58<30:18:43, 12.20s/it]08/02/2024 23:53:28 - INFO - __main__ -   Step: 559, LR: 1.940520320019535e-05, Loss: 701.4305419921875
2024-08-03T06:53:40.481074503Z 
  6%|▌         | 560/9500 [1:56:10<30:13:12, 12.17s/it]08/02/2024 23:53:40 - INFO - __main__ -   Step: 560, LR: 1.9403032656508073e-05, Loss: 673.2363891601562
2024-08-03T06:53:52.953210773Z 
  6%|▌         | 561/9500 [1:56:22<30:26:32, 12.26s/it]08/02/2024 23:53:52 - INFO - __main__ -   Step: 561, LR: 1.9400862112820793e-05, Loss: 726.3997802734375
2024-08-03T06:54:05.091869100Z 
  6%|▌         | 562/9500 [1:56:35<30:20:54, 12.22s/it]08/02/2024 23:54:05 - INFO - __main__ -   Step: 562, LR: 1.9398691569133516e-05, Loss: 714.299072265625
2024-08-03T06:54:17.310160365Z 
  6%|▌         | 563/9500 [1:56:47<30:20:28, 12.22s/it]08/02/2024 23:54:17 - INFO - __main__ -   Step: 563, LR: 1.9396521025446236e-05, Loss: 668.43505859375
2024-08-03T06:54:29.834422664Z 
  6%|▌         | 564/9500 [1:56:59<30:33:46, 12.31s/it]08/02/2024 23:54:29 - INFO - __main__ -   Step: 564, LR: 1.9394350481758956e-05, Loss: 802.97802734375
2024-08-03T06:54:41.953315695Z 
  6%|▌         | 565/9500 [1:57:11<30:24:54, 12.25s/it]08/02/2024 23:54:41 - INFO - __main__ -   Step: 565, LR: 1.939217993807168e-05, Loss: 666.980224609375
2024-08-03T06:54:54.609445680Z 
  6%|▌         | 566/9500 [1:57:24<30:42:38, 12.38s/it]08/02/2024 23:54:54 - INFO - __main__ -   Step: 566, LR: 1.93900093943844e-05, Loss: 823.4419555664062
2024-08-03T06:55:07.085005845Z 
  6%|▌         | 567/9500 [1:57:37<30:46:55, 12.41s/it]08/02/2024 23:55:07 - INFO - __main__ -   Step: 567, LR: 1.938783885069712e-05, Loss: 679.782470703125
2024-08-03T06:55:19.269014137Z 
  6%|▌         | 568/9500 [1:57:49<30:36:50, 12.34s/it]08/02/2024 23:55:19 - INFO - __main__ -   Step: 568, LR: 1.938566830700984e-05, Loss: 589.5823974609375
2024-08-03T06:55:31.648205791Z 
  6%|▌         | 569/9500 [1:58:01<30:38:25, 12.35s/it]08/02/2024 23:55:31 - INFO - __main__ -   Step: 569, LR: 1.938349776332256e-05, Loss: 787.6243286132812
2024-08-03T06:55:44.001773611Z 
  6%|▌         | 570/9500 [1:58:13<30:38:21, 12.35s/it]08/02/2024 23:55:44 - INFO - __main__ -   Step: 570, LR: 1.938132721963528e-05, Loss: 715.4884643554688
2024-08-03T06:55:56.123486899Z 
  6%|▌         | 571/9500 [1:58:26<30:27:52, 12.28s/it]08/02/2024 23:55:56 - INFO - __main__ -   Step: 571, LR: 1.9379156675948005e-05, Loss: 671.7222900390625
2024-08-03T06:56:08.236429261Z 
  6%|▌         | 572/9500 [1:58:38<30:20:03, 12.23s/it]08/02/2024 23:56:08 - INFO - __main__ -   Step: 572, LR: 1.9376986132260725e-05, Loss: 782.8305053710938
2024-08-03T06:56:20.670809457Z 
  6%|▌         | 573/9500 [1:58:50<30:28:55, 12.29s/it]08/02/2024 23:56:20 - INFO - __main__ -   Step: 573, LR: 1.9374815588573444e-05, Loss: 714.9619140625
2024-08-03T06:56:32.852200425Z 
  6%|▌         | 574/9500 [1:59:02<30:23:46, 12.26s/it]08/02/2024 23:56:32 - INFO - __main__ -   Step: 574, LR: 1.9372645044886168e-05, Loss: 824.0125732421875
2024-08-03T06:56:45.052730137Z 
  6%|▌         | 575/9500 [1:59:14<30:20:56, 12.24s/it]08/02/2024 23:56:45 - INFO - __main__ -   Step: 575, LR: 1.9370474501198888e-05, Loss: 767.4454345703125
2024-08-03T06:56:57.514389776Z 
  6%|▌         | 576/9500 [1:59:27<30:30:33, 12.31s/it]08/02/2024 23:56:57 - INFO - __main__ -   Step: 576, LR: 1.936830395751161e-05, Loss: 728.48486328125
2024-08-03T06:57:09.658391280Z 
  6%|▌         | 577/9500 [1:59:39<30:23:03, 12.26s/it]08/02/2024 23:57:09 - INFO - __main__ -   Step: 577, LR: 1.936613341382433e-05, Loss: 691.0138549804688
2024-08-03T06:57:22.042624478Z 
  6%|▌         | 578/9500 [1:59:51<30:28:26, 12.30s/it]08/02/2024 23:57:22 - INFO - __main__ -   Step: 578, LR: 1.936396287013705e-05, Loss: 841.4717407226562
2024-08-03T06:57:34.604186653Z 
  6%|▌         | 579/9500 [2:00:04<30:40:05, 12.38s/it]08/02/2024 23:57:34 - INFO - __main__ -   Step: 579, LR: 1.9361792326449774e-05, Loss: 746.8446044921875
2024-08-03T06:57:46.807081785Z 
  6%|▌         | 580/9500 [2:00:16<30:32:09, 12.32s/it]08/02/2024 23:57:46 - INFO - __main__ -   Step: 580, LR: 1.9359621782762494e-05, Loss: 830.4296875
2024-08-03T06:57:58.982349609Z 
  6%|▌         | 581/9500 [2:00:28<30:25:19, 12.28s/it]08/02/2024 23:57:58 - INFO - __main__ -   Step: 581, LR: 1.9357451239075214e-05, Loss: 614.2439575195312
2024-08-03T06:58:11.943501144Z 
  6%|▌         | 582/9500 [2:00:41<30:55:31, 12.48s/it]08/02/2024 23:58:11 - INFO - __main__ -   Step: 582, LR: 1.9355280695387933e-05, Loss: 647.65966796875
2024-08-03T06:58:23.912651424Z 
  6%|▌         | 583/9500 [2:00:53<30:32:22, 12.33s/it]08/02/2024 23:58:23 - INFO - __main__ -   Step: 583, LR: 1.9353110151700657e-05, Loss: 641.46044921875
2024-08-03T06:58:36.166532974Z 
  6%|▌         | 584/9500 [2:01:06<30:28:47, 12.31s/it]08/02/2024 23:58:36 - INFO - __main__ -   Step: 584, LR: 1.9350939608013377e-05, Loss: 736.7098388671875
2024-08-03T06:58:48.719806374Z 
  6%|▌         | 585/9500 [2:01:18<30:39:34, 12.38s/it]08/02/2024 23:58:48 - INFO - __main__ -   Step: 585, LR: 1.93487690643261e-05, Loss: 809.7500610351562
2024-08-03T06:59:00.830744741Z 
  6%|▌         | 586/9500 [2:01:30<30:27:20, 12.30s/it]08/02/2024 23:59:00 - INFO - __main__ -   Step: 586, LR: 1.934659852063882e-05, Loss: 623.8739624023438
2024-08-03T06:59:12.890958644Z 
  6%|▌         | 587/9500 [2:01:42<30:16:27, 12.23s/it]08/02/2024 23:59:12 - INFO - __main__ -   Step: 587, LR: 1.934442797695154e-05, Loss: 759.3527221679688
2024-08-03T06:59:25.439628919Z 
  6%|▌         | 588/9500 [2:01:55<30:30:32, 12.32s/it]08/02/2024 23:59:25 - INFO - __main__ -   Step: 588, LR: 1.9342257433264263e-05, Loss: 767.2371826171875
2024-08-03T06:59:37.532764628Z 
  6%|▌         | 589/9500 [2:02:07<30:20:03, 12.25s/it]08/02/2024 23:59:37 - INFO - __main__ -   Step: 589, LR: 1.9340086889576983e-05, Loss: 679.1614379882812
2024-08-03T06:59:49.511838791Z 
  6%|▌         | 590/9500 [2:02:19<30:07:33, 12.17s/it]08/02/2024 23:59:49 - INFO - __main__ -   Step: 590, LR: 1.9337916345889706e-05, Loss: 601.9056396484375
2024-08-03T07:00:02.047600373Z 
  6%|▌         | 591/9500 [2:02:31<30:23:33, 12.28s/it]08/03/2024 00:00:02 - INFO - __main__ -   Step: 591, LR: 1.9335745802202426e-05, Loss: 684.572509765625
2024-08-03T07:00:14.162875703Z 
  6%|▌         | 592/9500 [2:02:44<30:15:57, 12.23s/it]08/03/2024 00:00:14 - INFO - __main__ -   Step: 592, LR: 1.9333575258515146e-05, Loss: 710.452392578125
2024-08-03T07:00:26.211119795Z 
  6%|▌         | 593/9500 [2:02:56<30:07:35, 12.18s/it]08/03/2024 00:00:26 - INFO - __main__ -   Step: 593, LR: 1.933140471482787e-05, Loss: 477.35198974609375
2024-08-03T07:00:39.061857792Z 
  6%|▋         | 594/9500 [2:03:08<30:37:25, 12.38s/it]08/03/2024 00:00:39 - INFO - __main__ -   Step: 594, LR: 1.932923417114059e-05, Loss: 689.626220703125
2024-08-03T07:00:51.147859795Z 
  6%|▋         | 595/9500 [2:03:21<30:24:10, 12.29s/it]08/03/2024 00:00:51 - INFO - __main__ -   Step: 595, LR: 1.932706362745331e-05, Loss: 630.9347534179688
2024-08-03T07:01:03.266019508Z 
  6%|▋         | 596/9500 [2:03:33<30:16:17, 12.24s/it]08/03/2024 00:01:03 - INFO - __main__ -   Step: 596, LR: 1.932489308376603e-05, Loss: 730.00341796875
2024-08-03T07:01:15.683703685Z 
  6%|▋         | 597/9500 [2:03:45<30:24:01, 12.29s/it]08/03/2024 00:01:15 - INFO - __main__ -   Step: 597, LR: 1.9322722540078752e-05, Loss: 729.3018188476562
2024-08-03T07:01:28.171043820Z 
  6%|▋         | 598/9500 [2:03:58<30:32:30, 12.35s/it]08/03/2024 00:01:28 - INFO - __main__ -   Step: 598, LR: 1.932055199639147e-05, Loss: 925.2485961914062
2024-08-03T07:01:40.205540945Z 
  6%|▋         | 599/9500 [2:04:10<30:18:11, 12.26s/it]08/03/2024 00:01:40 - INFO - __main__ -   Step: 599, LR: 1.9318381452704195e-05, Loss: 632.72509765625
2024-08-03T07:01:52.314653519Z 
  6%|▋         | 600/9500 [2:04:22<30:11:26, 12.21s/it]08/03/2024 00:01:52 - INFO - __main__ -   Step: 600, LR: 1.9316210909016915e-05, Loss: 728.0486450195312
2024-08-03T07:02:04.761248908Z 
  6%|▋         | 601/9500 [2:04:34<30:21:40, 12.28s/it]08/03/2024 00:02:04 - INFO - __main__ -   Step: 601, LR: 1.9314040365329635e-05, Loss: 693.9671020507812
2024-08-03T07:02:16.978832437Z 
  6%|▋         | 602/9500 [2:04:46<30:18:35, 12.26s/it]08/03/2024 00:02:16 - INFO - __main__ -   Step: 602, LR: 1.9311869821642358e-05, Loss: 677.727294921875
2024-08-03T07:02:28.973391598Z 
  6%|▋         | 603/9500 [2:04:58<30:06:27, 12.18s/it]08/03/2024 00:02:28 - INFO - __main__ -   Step: 603, LR: 1.9309699277955078e-05, Loss: 630.2489624023438
2024-08-03T07:02:41.552317485Z 
  6%|▋         | 604/9500 [2:05:11<30:23:52, 12.30s/it]08/03/2024 00:02:41 - INFO - __main__ -   Step: 604, LR: 1.93075287342678e-05, Loss: 741.4151000976562
2024-08-03T07:02:53.980901458Z 
  6%|▋         | 605/9500 [2:05:23<30:29:20, 12.34s/it]08/03/2024 00:02:53 - INFO - __main__ -   Step: 605, LR: 1.930535819058052e-05, Loss: 830.76513671875
2024-08-03T07:03:06.149778723Z 
  6%|▋         | 606/9500 [2:05:36<30:21:31, 12.29s/it]08/03/2024 00:03:06 - INFO - __main__ -   Step: 606, LR: 1.930318764689324e-05, Loss: 818.5368041992188
2024-08-03T07:03:18.594127400Z 
  6%|▋         | 607/9500 [2:05:48<30:28:16, 12.34s/it]08/03/2024 00:03:18 - INFO - __main__ -   Step: 607, LR: 1.9301017103205964e-05, Loss: 615.8649291992188
2024-08-03T07:03:30.654747123Z 
  6%|▋         | 608/9500 [2:06:00<30:15:51, 12.25s/it]08/03/2024 00:03:30 - INFO - __main__ -   Step: 608, LR: 1.9298846559518684e-05, Loss: 806.60791015625
2024-08-03T07:03:42.706738876Z 
  6%|▋         | 609/9500 [2:06:12<30:06:44, 12.19s/it]08/03/2024 00:03:42 - INFO - __main__ -   Step: 609, LR: 1.9296676015831404e-05, Loss: 679.8645629882812
2024-08-03T07:03:55.136262851Z 
  6%|▋         | 610/9500 [2:06:25<30:17:03, 12.26s/it]08/03/2024 00:03:55 - INFO - __main__ -   Step: 610, LR: 1.9294505472144124e-05, Loss: 547.4286499023438
2024-08-03T07:04:07.228702102Z 
  6%|▋         | 611/9500 [2:06:37<30:09:14, 12.21s/it]08/03/2024 00:04:07 - INFO - __main__ -   Step: 611, LR: 1.9292334928456847e-05, Loss: 690.4898681640625
2024-08-03T07:04:19.093443075Z 
  6%|▋         | 612/9500 [2:06:49<29:53:36, 12.11s/it]08/03/2024 00:04:19 - INFO - __main__ -   Step: 612, LR: 1.9290164384769567e-05, Loss: 689.6290283203125
2024-08-03T07:04:31.579496511Z 
  6%|▋         | 613/9500 [2:07:01<30:10:11, 12.22s/it]08/03/2024 00:04:31 - INFO - __main__ -   Step: 613, LR: 1.928799384108229e-05, Loss: 742.8798217773438
2024-08-03T07:04:43.625224881Z 
  6%|▋         | 614/9500 [2:07:13<30:02:11, 12.17s/it]08/03/2024 00:04:43 - INFO - __main__ -   Step: 614, LR: 1.928582329739501e-05, Loss: 618.5643310546875
2024-08-03T07:04:55.560211637Z 
  6%|▋         | 615/9500 [2:07:25<29:51:36, 12.10s/it]08/03/2024 00:04:55 - INFO - __main__ -   Step: 615, LR: 1.928365275370773e-05, Loss: 638.9580078125
2024-08-03T07:05:07.923863236Z 
  6%|▋         | 616/9500 [2:07:37<30:03:10, 12.18s/it]08/03/2024 00:05:07 - INFO - __main__ -   Step: 616, LR: 1.9281482210020453e-05, Loss: 631.667724609375
2024-08-03T07:05:20.017425409Z 
  6%|▋         | 617/9500 [2:07:49<29:59:12, 12.15s/it]08/03/2024 00:05:20 - INFO - __main__ -   Step: 617, LR: 1.9279311666333173e-05, Loss: 651.3274536132812
2024-08-03T07:05:32.135630288Z 
  7%|▋         | 618/9500 [2:08:02<29:57:28, 12.14s/it]08/03/2024 00:05:32 - INFO - __main__ -   Step: 618, LR: 1.9277141122645896e-05, Loss: 745.1690673828125
2024-08-03T07:05:44.570102907Z 
  7%|▋         | 619/9500 [2:08:14<30:10:15, 12.23s/it]08/03/2024 00:05:44 - INFO - __main__ -   Step: 619, LR: 1.9274970578958616e-05, Loss: 633.4281005859375
2024-08-03T07:05:56.776399752Z 
  7%|▋         | 620/9500 [2:08:26<30:09:00, 12.22s/it]08/03/2024 00:05:56 - INFO - __main__ -   Step: 620, LR: 1.9272800035271336e-05, Loss: 684.8284912109375
2024-08-03T07:06:08.961091581Z 
  7%|▋         | 621/9500 [2:08:38<30:07:04, 12.21s/it]08/03/2024 00:06:08 - INFO - __main__ -   Step: 621, LR: 1.927062949158406e-05, Loss: 798.8668212890625
2024-08-03T07:06:21.664726572Z 
  7%|▋         | 622/9500 [2:08:51<30:28:44, 12.36s/it]08/03/2024 00:06:21 - INFO - __main__ -   Step: 622, LR: 1.926845894789678e-05, Loss: 705.059814453125
2024-08-03T07:06:33.447028776Z 
  7%|▋         | 623/9500 [2:09:03<30:02:54, 12.19s/it]08/03/2024 00:06:33 - INFO - __main__ -   Step: 623, LR: 1.92662884042095e-05, Loss: 584.34619140625
2024-08-03T07:06:45.662600615Z 
  7%|▋         | 624/9500 [2:09:15<30:04:02, 12.19s/it]08/03/2024 00:06:45 - INFO - __main__ -   Step: 624, LR: 1.926411786052222e-05, Loss: 805.7607421875
2024-08-03T07:06:58.450113256Z 
  7%|▋         | 625/9500 [2:09:28<30:30:06, 12.37s/it]08/03/2024 00:06:58 - INFO - __main__ -   Step: 625, LR: 1.9261947316834942e-05, Loss: 665.5355224609375
2024-08-03T07:07:10.630464596Z 
  7%|▋         | 626/9500 [2:09:40<30:21:23, 12.32s/it]08/03/2024 00:07:10 - INFO - __main__ -   Step: 626, LR: 1.9259776773147662e-05, Loss: 703.24267578125
2024-08-03T07:07:22.842541184Z 
  7%|▋         | 627/9500 [2:09:52<30:16:37, 12.28s/it]08/03/2024 00:07:22 - INFO - __main__ -   Step: 627, LR: 1.9257606229460385e-05, Loss: 888.526611328125
2024-08-03T07:07:35.580981315Z 
  7%|▋         | 628/9500 [2:10:05<30:36:33, 12.42s/it]08/03/2024 00:07:35 - INFO - __main__ -   Step: 628, LR: 1.9255435685773105e-05, Loss: 808.5631713867188
2024-08-03T07:07:47.957068651Z 
  7%|▋         | 629/9500 [2:10:17<30:34:23, 12.41s/it]08/03/2024 00:07:47 - INFO - __main__ -   Step: 629, LR: 1.9253265142085825e-05, Loss: 794.2822265625
2024-08-03T07:08:00.316639368Z 
  7%|▋         | 630/9500 [2:10:30<30:32:04, 12.39s/it]08/03/2024 00:08:00 - INFO - __main__ -   Step: 630, LR: 1.9251094598398548e-05, Loss: 702.9164428710938
2024-08-03T07:08:12.709294893Z 
  7%|▋         | 631/9500 [2:10:42<30:31:51, 12.39s/it]08/03/2024 00:08:12 - INFO - __main__ -   Step: 631, LR: 1.9248924054711268e-05, Loss: 717.5318603515625
2024-08-03T07:08:25.017807021Z 
  7%|▋         | 632/9500 [2:10:54<30:27:54, 12.37s/it]08/03/2024 00:08:25 - INFO - __main__ -   Step: 632, LR: 1.924675351102399e-05, Loss: 814.7881469726562
2024-08-03T07:08:37.307242813Z 
  7%|▋         | 633/9500 [2:11:07<30:24:15, 12.34s/it]08/03/2024 00:08:37 - INFO - __main__ -   Step: 633, LR: 1.924458296733671e-05, Loss: 771.5604858398438
2024-08-03T07:08:49.764392434Z 
  7%|▋         | 634/9500 [2:11:19<30:29:03, 12.38s/it]08/03/2024 00:08:49 - INFO - __main__ -   Step: 634, LR: 1.9242412423649434e-05, Loss: 750.1754150390625
2024-08-03T07:09:01.863874045Z 
  7%|▋         | 635/9500 [2:11:31<30:16:30, 12.29s/it]08/03/2024 00:09:01 - INFO - __main__ -   Step: 635, LR: 1.9240241879962154e-05, Loss: 862.8841552734375
2024-08-03T07:09:13.867414832Z 
  7%|▋         | 636/9500 [2:11:43<30:03:24, 12.21s/it]08/03/2024 00:09:13 - INFO - __main__ -   Step: 636, LR: 1.9238071336274874e-05, Loss: 712.0114135742188
2024-08-03T07:09:26.402899728Z 
  7%|▋         | 637/9500 [2:11:56<30:17:45, 12.31s/it]08/03/2024 00:09:26 - INFO - __main__ -   Step: 637, LR: 1.9235900792587594e-05, Loss: 814.4560546875
2024-08-03T07:09:38.532762026Z 
  7%|▋         | 638/9500 [2:12:08<30:09:44, 12.25s/it]08/03/2024 00:09:38 - INFO - __main__ -   Step: 638, LR: 1.9233730248900314e-05, Loss: 737.16796875
2024-08-03T07:09:50.713666932Z 
  7%|▋         | 639/9500 [2:12:20<30:06:22, 12.23s/it]08/03/2024 00:09:50 - INFO - __main__ -   Step: 639, LR: 1.9231559705213037e-05, Loss: 845.6951904296875
2024-08-03T07:10:03.270754779Z 
  7%|▋         | 640/9500 [2:12:33<30:20:34, 12.33s/it]08/03/2024 00:10:03 - INFO - __main__ -   Step: 640, LR: 1.9229389161525757e-05, Loss: 826.8399658203125
2024-08-03T07:10:16.119017334Z 
  7%|▋         | 641/9500 [2:12:46<30:43:22, 12.48s/it]08/03/2024 00:10:16 - INFO - __main__ -   Step: 641, LR: 1.922721861783848e-05, Loss: 694.2147216796875
2024-08-03T07:10:28.257267004Z 
  7%|▋         | 642/9500 [2:12:58<30:27:49, 12.38s/it]08/03/2024 00:10:28 - INFO - __main__ -   Step: 642, LR: 1.92250480741512e-05, Loss: 744.7777099609375
2024-08-03T07:10:40.327198564Z 
  7%|▋         | 643/9500 [2:13:10<30:13:51, 12.29s/it]08/03/2024 00:10:40 - INFO - __main__ -   Step: 643, LR: 1.9222877530463923e-05, Loss: 604.343505859375
2024-08-03T07:10:52.839345652Z 
  7%|▋         | 644/9500 [2:13:22<30:23:35, 12.35s/it]08/03/2024 00:10:52 - INFO - __main__ -   Step: 644, LR: 1.9220706986776643e-05, Loss: 616.1453857421875
2024-08-03T07:11:05.364774004Z 
  7%|▋         | 645/9500 [2:13:35<30:30:55, 12.41s/it]08/03/2024 00:11:05 - INFO - __main__ -   Step: 645, LR: 1.9218536443089363e-05, Loss: 671.92578125
2024-08-03T07:11:18.076879543Z 
  7%|▋         | 646/9500 [2:13:48<30:44:15, 12.50s/it]08/03/2024 00:11:18 - INFO - __main__ -   Step: 646, LR: 1.9216365899402086e-05, Loss: 774.9754638671875
2024-08-03T07:11:30.567700164Z 
  7%|▋         | 647/9500 [2:14:00<30:43:45, 12.50s/it]08/03/2024 00:11:30 - INFO - __main__ -   Step: 647, LR: 1.9214195355714806e-05, Loss: 691.1343994140625
2024-08-03T07:11:42.977475225Z 
  7%|▋         | 648/9500 [2:14:12<30:39:43, 12.47s/it]08/03/2024 00:11:42 - INFO - __main__ -   Step: 648, LR: 1.921202481202753e-05, Loss: 799.1025390625
2024-08-03T07:11:54.828017992Z 
  7%|▋         | 649/9500 [2:14:24<30:12:07, 12.28s/it]08/03/2024 00:11:54 - INFO - __main__ -   Step: 649, LR: 1.920985426834025e-05, Loss: 494.58905029296875
2024-08-03T07:12:07.382074863Z 
  7%|▋         | 650/9500 [2:14:37<30:23:51, 12.37s/it]08/03/2024 00:12:07 - INFO - __main__ -   Step: 650, LR: 1.920768372465297e-05, Loss: 679.344482421875
2024-08-03T07:12:19.671743366Z 
  7%|▋         | 651/9500 [2:14:49<30:20:18, 12.34s/it]08/03/2024 00:12:19 - INFO - __main__ -   Step: 651, LR: 1.920551318096569e-05, Loss: 718.5826416015625
2024-08-03T07:12:32.381381244Z 
  7%|▋         | 652/9500 [2:15:02<30:36:20, 12.45s/it]08/03/2024 00:12:32 - INFO - __main__ -   Step: 652, LR: 1.9203342637278412e-05, Loss: 898.375
2024-08-03T07:12:45.089309247Z 
  7%|▋         | 653/9500 [2:15:15<30:47:26, 12.53s/it]08/03/2024 00:12:45 - INFO - __main__ -   Step: 653, LR: 1.9201172093591132e-05, Loss: 675.4249267578125
2024-08-03T07:12:57.228092670Z 
  7%|▋         | 654/9500 [2:15:27<30:29:57, 12.41s/it]08/03/2024 00:12:57 - INFO - __main__ -   Step: 654, LR: 1.9199001549903852e-05, Loss: 646.1123657226562
2024-08-03T07:13:09.359621832Z 
  7%|▋         | 655/9500 [2:15:39<30:17:19, 12.33s/it]08/03/2024 00:13:09 - INFO - __main__ -   Step: 655, LR: 1.9196831006216575e-05, Loss: 891.930419921875
2024-08-03T07:13:21.704118834Z 
  7%|▋         | 656/9500 [2:15:51<30:17:52, 12.33s/it]08/03/2024 00:13:21 - INFO - __main__ -   Step: 656, LR: 1.9194660462529295e-05, Loss: 661.22265625
2024-08-03T07:13:34.009347562Z 
  7%|▋         | 657/9500 [2:16:03<30:16:26, 12.32s/it]08/03/2024 00:13:34 - INFO - __main__ -   Step: 657, LR: 1.9192489918842018e-05, Loss: 734.4171142578125
2024-08-03T07:13:46.186602025Z 
  7%|▋         | 658/9500 [2:16:16<30:09:42, 12.28s/it]08/03/2024 00:13:46 - INFO - __main__ -   Step: 658, LR: 1.9190319375154738e-05, Loss: 657.494873046875
2024-08-03T07:13:58.573786429Z 
  7%|▋         | 659/9500 [2:16:28<30:14:14, 12.31s/it]08/03/2024 00:13:58 - INFO - __main__ -   Step: 659, LR: 1.9188148831467458e-05, Loss: 498.906982421875
2024-08-03T07:14:10.693923628Z 
  7%|▋         | 660/9500 [2:16:40<30:05:31, 12.25s/it]08/03/2024 00:14:10 - INFO - __main__ -   Step: 660, LR: 1.918597828778018e-05, Loss: 717.8242797851562
2024-08-03T07:14:22.659609262Z 
  7%|▋         | 661/9500 [2:16:52<29:52:33, 12.17s/it]08/03/2024 00:14:22 - INFO - __main__ -   Step: 661, LR: 1.91838077440929e-05, Loss: 745.7434692382812
2024-08-03T07:14:35.679910620Z 
  7%|▋         | 662/9500 [2:17:05<30:30:00, 12.42s/it]08/03/2024 00:14:35 - INFO - __main__ -   Step: 662, LR: 1.9181637200405624e-05, Loss: 707.423828125
2024-08-03T07:14:47.791481166Z 
  7%|▋         | 663/9500 [2:17:17<30:16:00, 12.33s/it]08/03/2024 00:14:47 - INFO - __main__ -   Step: 663, LR: 1.9179466656718344e-05, Loss: 662.419189453125
2024-08-03T07:15:00.244964890Z 
  7%|▋         | 664/9500 [2:17:30<30:21:15, 12.37s/it]08/03/2024 00:15:00 - INFO - __main__ -   Step: 664, LR: 1.9177296113031064e-05, Loss: 729.676025390625
2024-08-03T07:15:12.693693718Z 
  7%|▋         | 665/9500 [2:17:42<30:24:40, 12.39s/it]08/03/2024 00:15:12 - INFO - __main__ -   Step: 665, LR: 1.9175125569343784e-05, Loss: 578.44140625
2024-08-03T07:15:24.545322637Z 
  7%|▋         | 666/9500 [2:17:54<30:00:36, 12.23s/it]08/03/2024 00:15:24 - INFO - __main__ -   Step: 666, LR: 1.9172955025656507e-05, Loss: 600.5528564453125
2024-08-03T07:15:36.770267453Z 
  7%|▋         | 667/9500 [2:18:06<30:00:11, 12.23s/it]08/03/2024 00:15:36 - INFO - __main__ -   Step: 667, LR: 1.9170784481969227e-05, Loss: 799.031494140625
2024-08-03T07:15:49.349197733Z 
  7%|▋         | 668/9500 [2:18:19<30:15:28, 12.33s/it]08/03/2024 00:15:49 - INFO - __main__ -   Step: 668, LR: 1.9168613938281947e-05, Loss: 876.8937377929688
2024-08-03T07:16:01.572468801Z 
  7%|▋         | 669/9500 [2:18:31<30:10:25, 12.30s/it]08/03/2024 00:16:01 - INFO - __main__ -   Step: 669, LR: 1.916644339459467e-05, Loss: 711.8125610351562
2024-08-03T07:16:13.720855620Z 
  7%|▋         | 670/9500 [2:18:43<30:03:30, 12.25s/it]08/03/2024 00:16:13 - INFO - __main__ -   Step: 670, LR: 1.916427285090739e-05, Loss: 764.9593505859375
2024-08-03T07:16:26.158718605Z 
  7%|▋         | 671/9500 [2:18:56<30:11:22, 12.31s/it]08/03/2024 00:16:26 - INFO - __main__ -   Step: 671, LR: 1.9162102307220113e-05, Loss: 651.8121948242188
2024-08-03T07:16:38.225507763Z 
  7%|▋         | 672/9500 [2:19:08<30:00:26, 12.24s/it]08/03/2024 00:16:38 - INFO - __main__ -   Step: 672, LR: 1.9159931763532833e-05, Loss: 608.8763427734375
2024-08-03T07:16:50.403749442Z 
  7%|▋         | 673/9500 [2:19:20<29:57:38, 12.22s/it]08/03/2024 00:16:50 - INFO - __main__ -   Step: 673, LR: 1.9157761219845553e-05, Loss: 809.6666259765625
2024-08-03T07:17:03.174127944Z 
  7%|▋         | 674/9500 [2:19:33<30:21:46, 12.38s/it]08/03/2024 00:17:03 - INFO - __main__ -   Step: 674, LR: 1.9155590676158276e-05, Loss: 599.4547729492188
2024-08-03T07:17:15.261554826Z 
  7%|▋         | 675/9500 [2:19:45<30:08:26, 12.30s/it]08/03/2024 00:17:15 - INFO - __main__ -   Step: 675, LR: 1.9153420132470996e-05, Loss: 748.841796875
2024-08-03T07:17:27.421719363Z 
  7%|▋         | 676/9500 [2:19:57<30:02:17, 12.25s/it]08/03/2024 00:17:27 - INFO - __main__ -   Step: 676, LR: 1.915124958878372e-05, Loss: 718.5767211914062
2024-08-03T07:17:40.501898884Z 
  7%|▋         | 677/9500 [2:20:10<30:38:29, 12.50s/it]08/03/2024 00:17:40 - INFO - __main__ -   Step: 677, LR: 1.914907904509644e-05, Loss: 733.1981201171875
2024-08-03T07:17:52.614644814Z 
  7%|▋         | 678/9500 [2:20:22<30:21:05, 12.39s/it]08/03/2024 00:17:52 - INFO - __main__ -   Step: 678, LR: 1.914690850140916e-05, Loss: 857.1978759765625
2024-08-03T07:18:04.777335568Z 
  7%|▋         | 679/9500 [2:20:34<30:11:02, 12.32s/it]08/03/2024 00:18:04 - INFO - __main__ -   Step: 679, LR: 1.914473795772188e-05, Loss: 676.5284423828125
2024-08-03T07:18:17.255403339Z 
  7%|▋         | 680/9500 [2:20:47<30:17:52, 12.37s/it]08/03/2024 00:18:17 - INFO - __main__ -   Step: 680, LR: 1.9142567414034602e-05, Loss: 749.5280151367188
2024-08-03T07:18:29.767562489Z 
  7%|▋         | 681/9500 [2:20:59<30:24:05, 12.41s/it]08/03/2024 00:18:29 - INFO - __main__ -   Step: 681, LR: 1.9140396870347322e-05, Loss: 616.357666015625
2024-08-03T07:18:41.949365662Z 
  7%|▋         | 682/9500 [2:21:11<30:13:48, 12.34s/it]08/03/2024 00:18:41 - INFO - __main__ -   Step: 682, LR: 1.9138226326660042e-05, Loss: 596.894775390625
2024-08-03T07:18:54.313715722Z 
  7%|▋         | 683/9500 [2:21:24<30:14:36, 12.35s/it]08/03/2024 00:18:54 - INFO - __main__ -   Step: 683, LR: 1.9136055782972765e-05, Loss: 704.1961669921875
2024-08-03T07:19:07.113951215Z 
  7%|▋         | 684/9500 [2:21:37<30:34:19, 12.48s/it]08/03/2024 00:19:07 - INFO - __main__ -   Step: 684, LR: 1.9133885239285485e-05, Loss: 625.2471923828125
2024-08-03T07:19:19.618547794Z 
  7%|▋         | 685/9500 [2:21:49<30:35:00, 12.49s/it]08/03/2024 00:19:19 - INFO - __main__ -   Step: 685, LR: 1.913171469559821e-05, Loss: 949.9993896484375
2024-08-03T07:19:32.067380350Z 
  7%|▋         | 686/9500 [2:22:02<30:32:59, 12.48s/it]08/03/2024 00:19:32 - INFO - __main__ -   Step: 686, LR: 1.9129544151910928e-05, Loss: 757.095703125
2024-08-03T07:19:44.827319074Z 
  7%|▋         | 687/9500 [2:22:14<30:45:13, 12.56s/it]08/03/2024 00:19:44 - INFO - __main__ -   Step: 687, LR: 1.9127373608223648e-05, Loss: 832.380859375
2024-08-03T07:19:56.911224587Z 
  7%|▋         | 688/9500 [2:22:26<30:23:54, 12.42s/it]08/03/2024 00:19:56 - INFO - __main__ -   Step: 688, LR: 1.912520306453637e-05, Loss: 656.5248413085938
2024-08-03T07:20:09.086096041Z 
  7%|▋         | 689/9500 [2:22:39<30:12:58, 12.35s/it]08/03/2024 00:20:09 - INFO - __main__ -   Step: 689, LR: 1.912303252084909e-05, Loss: 808.6757202148438
2024-08-03T07:20:21.578300597Z 
  7%|▋         | 690/9500 [2:22:51<30:19:11, 12.39s/it]08/03/2024 00:20:21 - INFO - __main__ -   Step: 690, LR: 1.9120861977161814e-05, Loss: 584.613525390625
2024-08-03T07:20:33.567248655Z 
  7%|▋         | 691/9500 [2:23:03<30:01:21, 12.27s/it]08/03/2024 00:20:33 - INFO - __main__ -   Step: 691, LR: 1.9118691433474534e-05, Loss: 773.5736083984375
2024-08-03T07:20:45.822122702Z 
  7%|▋         | 692/9500 [2:23:15<30:00:30, 12.27s/it]08/03/2024 00:20:45 - INFO - __main__ -   Step: 692, LR: 1.9116520889787254e-05, Loss: 823.0386962890625
2024-08-03T07:20:58.280181581Z 
  7%|▋         | 693/9500 [2:23:28<30:08:48, 12.32s/it]08/03/2024 00:20:58 - INFO - __main__ -   Step: 693, LR: 1.9114350346099974e-05, Loss: 656.1011962890625
2024-08-03T07:21:10.464244254Z 
  7%|▋         | 694/9500 [2:23:40<30:02:28, 12.28s/it]08/03/2024 00:21:10 - INFO - __main__ -   Step: 694, LR: 1.9112179802412697e-05, Loss: 801.3583374023438
2024-08-03T07:21:22.617179608Z 
  7%|▋         | 695/9500 [2:23:52<29:56:37, 12.24s/it]08/03/2024 00:21:22 - INFO - __main__ -   Step: 695, LR: 1.9110009258725417e-05, Loss: 674.302001953125
2024-08-03T07:21:35.130778841Z 
  7%|▋         | 696/9500 [2:24:05<30:08:20, 12.32s/it]08/03/2024 00:21:35 - INFO - __main__ -   Step: 696, LR: 1.9107838715038137e-05, Loss: 620.8285522460938
2024-08-03T07:21:47.101002427Z 
  7%|▋         | 697/9500 [2:24:17<29:52:34, 12.22s/it]08/03/2024 00:21:47 - INFO - __main__ -   Step: 697, LR: 1.910566817135086e-05, Loss: 673.4520874023438
2024-08-03T07:21:59.304913474Z 
  7%|▋         | 698/9500 [2:24:29<29:51:44, 12.21s/it]08/03/2024 00:21:59 - INFO - __main__ -   Step: 698, LR: 1.910349762766358e-05, Loss: 772.3582763671875
2024-08-03T07:22:11.880643252Z 
  7%|▋         | 699/9500 [2:24:41<30:07:28, 12.32s/it]08/03/2024 00:22:11 - INFO - __main__ -   Step: 699, LR: 1.9101327083976303e-05, Loss: 613.956787109375
2024-08-03T07:22:24.043964337Z 
  7%|▋         | 700/9500 [2:24:53<30:00:16, 12.27s/it]08/03/2024 00:22:24 - INFO - __main__ -   Step: 700, LR: 1.9099156540289023e-05, Loss: 846.216064453125
2024-08-03T07:22:36.223354202Z 
  7%|▋         | 701/9500 [2:25:06<29:55:52, 12.25s/it]08/03/2024 00:22:36 - INFO - __main__ -   Step: 701, LR: 1.9096985996601743e-05, Loss: 648.121337890625
2024-08-03T07:22:49.251538364Z 
  7%|▋         | 702/9500 [2:25:19<30:30:05, 12.48s/it]08/03/2024 00:22:49 - INFO - __main__ -   Step: 702, LR: 1.9094815452914466e-05, Loss: 777.3641967773438
2024-08-03T07:23:01.663477718Z 
  7%|▋         | 703/9500 [2:25:31<30:26:51, 12.46s/it]08/03/2024 00:23:01 - INFO - __main__ -   Step: 703, LR: 1.9092644909227186e-05, Loss: 672.5228271484375
2024-08-03T07:23:13.853260584Z 
  7%|▋         | 704/9500 [2:25:43<30:14:45, 12.38s/it]08/03/2024 00:23:13 - INFO - __main__ -   Step: 704, LR: 1.909047436553991e-05, Loss: 601.846435546875
2024-08-03T07:23:26.289798069Z 
  7%|▋         | 705/9500 [2:25:56<30:17:05, 12.40s/it]08/03/2024 00:23:26 - INFO - __main__ -   Step: 705, LR: 1.908830382185263e-05, Loss: 751.12158203125
2024-08-03T07:23:38.213555372Z 
  7%|▋         | 706/9500 [2:26:08<29:56:06, 12.25s/it]08/03/2024 00:23:38 - INFO - __main__ -   Step: 706, LR: 1.908613327816535e-05, Loss: 616.357421875
2024-08-03T07:23:50.560607385Z 
  7%|▋         | 707/9500 [2:26:20<29:59:57, 12.28s/it]08/03/2024 00:23:50 - INFO - __main__ -   Step: 707, LR: 1.908396273447807e-05, Loss: 695.886474609375
2024-08-03T07:24:02.981182242Z 
  7%|▋         | 708/9500 [2:26:32<30:05:50, 12.32s/it]08/03/2024 00:24:02 - INFO - __main__ -   Step: 708, LR: 1.9081792190790792e-05, Loss: 584.384765625
2024-08-03T07:24:15.235480587Z 
  7%|▋         | 709/9500 [2:26:45<30:02:35, 12.30s/it]08/03/2024 00:24:15 - INFO - __main__ -   Step: 709, LR: 1.9079621647103512e-05, Loss: 778.667236328125
2024-08-03T07:24:27.428529638Z 
  7%|▋         | 710/9500 [2:26:57<29:57:32, 12.27s/it]08/03/2024 00:24:27 - INFO - __main__ -   Step: 710, LR: 1.9077451103416232e-05, Loss: 704.4082641601562
2024-08-03T07:24:39.759868728Z 
  7%|▋         | 711/9500 [2:27:09<30:00:02, 12.29s/it]08/03/2024 00:24:39 - INFO - __main__ -   Step: 711, LR: 1.9075280559728955e-05, Loss: 643.7241821289062
2024-08-03T07:24:52.014097355Z 
  7%|▋         | 712/9500 [2:27:21<29:58:20, 12.28s/it]08/03/2024 00:24:52 - INFO - __main__ -   Step: 712, LR: 1.9073110016041675e-05, Loss: 652.2778930664062
2024-08-03T07:25:03.983591193Z 
  8%|▊         | 713/9500 [2:27:33<29:44:34, 12.19s/it]08/03/2024 00:25:03 - INFO - __main__ -   Step: 713, LR: 1.90709394723544e-05, Loss: 604.731201171875
2024-08-03T07:25:16.710779875Z 
  8%|▊         | 714/9500 [2:27:46<30:08:07, 12.35s/it]08/03/2024 00:25:16 - INFO - __main__ -   Step: 714, LR: 1.9068768928667118e-05, Loss: 680.9030151367188
2024-08-03T07:25:28.768243110Z 
  8%|▊         | 715/9500 [2:27:58<29:55:12, 12.26s/it]08/03/2024 00:25:28 - INFO - __main__ -   Step: 715, LR: 1.9066598384979838e-05, Loss: 862.072998046875
2024-08-03T07:25:40.906074249Z 
  8%|▊         | 716/9500 [2:28:10<29:49:35, 12.22s/it]08/03/2024 00:25:40 - INFO - __main__ -   Step: 716, LR: 1.906442784129256e-05, Loss: 938.9302978515625
2024-08-03T07:25:53.122240100Z 
  8%|▊         | 717/9500 [2:28:23<29:49:02, 12.22s/it]08/03/2024 00:25:53 - INFO - __main__ -   Step: 717, LR: 1.906225729760528e-05, Loss: 560.6527099609375
2024-08-03T07:26:05.352977047Z 
  8%|▊         | 718/9500 [2:28:35<29:49:14, 12.22s/it]08/03/2024 00:26:05 - INFO - __main__ -   Step: 718, LR: 1.9060086753918005e-05, Loss: 670.2271728515625
2024-08-03T07:26:17.276932185Z 
  8%|▊         | 719/9500 [2:28:47<29:35:50, 12.13s/it]08/03/2024 00:26:17 - INFO - __main__ -   Step: 719, LR: 1.905791621023072e-05, Loss: 578.6900634765625
2024-08-03T07:26:29.483004098Z 
  8%|▊         | 720/9500 [2:28:59<29:38:48, 12.16s/it]08/03/2024 00:26:29 - INFO - __main__ -   Step: 720, LR: 1.9055745666543444e-05, Loss: 668.1783447265625
2024-08-03T07:26:41.879774595Z 
  8%|▊         | 721/9500 [2:29:11<29:49:10, 12.23s/it]08/03/2024 00:26:41 - INFO - __main__ -   Step: 721, LR: 1.9053575122856164e-05, Loss: 679.6222534179688
2024-08-03T07:26:54.146752343Z 
  8%|▊         | 722/9500 [2:29:24<29:50:40, 12.24s/it]08/03/2024 00:26:54 - INFO - __main__ -   Step: 722, LR: 1.9051404579168887e-05, Loss: 794.202880859375
2024-08-03T07:27:06.550258813Z 
  8%|▊         | 723/9500 [2:29:36<29:57:39, 12.29s/it]08/03/2024 00:27:06 - INFO - __main__ -   Step: 723, LR: 1.9049234035481607e-05, Loss: 663.4125366210938
2024-08-03T07:27:19.160801521Z 
  8%|▊         | 724/9500 [2:29:49<30:11:33, 12.39s/it]08/03/2024 00:27:19 - INFO - __main__ -   Step: 724, LR: 1.9047063491794327e-05, Loss: 824.9921875
2024-08-03T07:27:31.110236976Z 
  8%|▊         | 725/9500 [2:30:01<29:52:14, 12.25s/it]08/03/2024 00:27:31 - INFO - __main__ -   Step: 725, LR: 1.904489294810705e-05, Loss: 689.34228515625
2024-08-03T07:27:43.151795542Z 
  8%|▊         | 726/9500 [2:30:13<29:42:40, 12.19s/it]08/03/2024 00:27:43 - INFO - __main__ -   Step: 726, LR: 1.904272240441977e-05, Loss: 736.41357421875
2024-08-03T07:27:55.791179931Z 
  8%|▊         | 727/9500 [2:30:25<30:02:10, 12.33s/it]08/03/2024 00:27:55 - INFO - __main__ -   Step: 727, LR: 1.9040551860732493e-05, Loss: 863.035888671875
2024-08-03T07:28:07.670390709Z 
  8%|▊         | 728/9500 [2:30:37<29:42:23, 12.19s/it]08/03/2024 00:28:07 - INFO - __main__ -   Step: 728, LR: 1.9038381317045213e-05, Loss: 628.9674072265625
2024-08-03T07:28:19.754577828Z 
  8%|▊         | 729/9500 [2:30:49<29:37:29, 12.16s/it]08/03/2024 00:28:19 - INFO - __main__ -   Step: 729, LR: 1.9036210773357933e-05, Loss: 775.3599853515625
2024-08-03T07:28:32.573565183Z 
  8%|▊         | 730/9500 [2:31:02<30:06:12, 12.36s/it]08/03/2024 00:28:32 - INFO - __main__ -   Step: 730, LR: 1.9034040229670656e-05, Loss: 746.8606567382812
2024-08-03T07:28:44.664998725Z 
  8%|▊         | 731/9500 [2:31:14<29:54:21, 12.28s/it]08/03/2024 00:28:44 - INFO - __main__ -   Step: 731, LR: 1.9031869685983376e-05, Loss: 619.8712768554688
2024-08-03T07:28:56.846427161Z 
  8%|▊         | 732/9500 [2:31:26<29:49:56, 12.25s/it]08/03/2024 00:28:56 - INFO - __main__ -   Step: 732, LR: 1.90296991422961e-05, Loss: 736.353271484375
2024-08-03T07:29:09.299162000Z 
  8%|▊         | 733/9500 [2:31:39<29:58:40, 12.31s/it]08/03/2024 00:29:09 - INFO - __main__ -   Step: 733, LR: 1.9027528598608816e-05, Loss: 572.628173828125
2024-08-03T07:29:21.211591695Z 
  8%|▊         | 734/9500 [2:31:51<29:41:03, 12.19s/it]08/03/2024 00:29:21 - INFO - __main__ -   Step: 734, LR: 1.902535805492154e-05, Loss: 707.872802734375
2024-08-03T07:29:33.327387768Z 
  8%|▊         | 735/9500 [2:32:03<29:37:33, 12.17s/it]08/03/2024 00:29:33 - INFO - __main__ -   Step: 735, LR: 1.902318751123426e-05, Loss: 736.523681640625
2024-08-03T07:29:46.455043804Z 
  8%|▊         | 736/9500 [2:32:16<30:19:24, 12.46s/it]08/03/2024 00:29:46 - INFO - __main__ -   Step: 736, LR: 1.9021016967546982e-05, Loss: 778.6344604492188
2024-08-03T07:29:58.426899549Z 
  8%|▊         | 737/9500 [2:32:28<29:57:59, 12.31s/it]08/03/2024 00:29:58 - INFO - __main__ -   Step: 737, LR: 1.9018846423859702e-05, Loss: 692.8726806640625
2024-08-03T07:30:10.438290242Z 
  8%|▊         | 738/9500 [2:32:40<29:44:39, 12.22s/it]08/03/2024 00:30:10 - INFO - __main__ -   Step: 738, LR: 1.9016675880172422e-05, Loss: 654.3253173828125
2024-08-03T07:30:22.967539859Z 
  8%|▊         | 739/9500 [2:32:52<29:57:58, 12.31s/it]08/03/2024 00:30:22 - INFO - __main__ -   Step: 739, LR: 1.9014505336485145e-05, Loss: 763.8274536132812
2024-08-03T07:30:35.112506102Z 
  8%|▊         | 740/9500 [2:33:05<29:50:23, 12.26s/it]08/03/2024 00:30:35 - INFO - __main__ -   Step: 740, LR: 1.9012334792797865e-05, Loss: 689.7530517578125
2024-08-03T07:30:47.436658976Z 
  8%|▊         | 741/9500 [2:33:17<29:52:51, 12.28s/it]08/03/2024 00:30:47 - INFO - __main__ -   Step: 741, LR: 1.901016424911059e-05, Loss: 871.3836669921875
2024-08-03T07:31:00.161963874Z 
  8%|▊         | 742/9500 [2:33:30<30:12:05, 12.41s/it]08/03/2024 00:31:00 - INFO - __main__ -   Step: 742, LR: 1.900799370542331e-05, Loss: 805.6663208007812
2024-08-03T07:31:12.352383260Z 
  8%|▊         | 743/9500 [2:33:42<30:02:05, 12.35s/it]08/03/2024 00:31:12 - INFO - __main__ -   Step: 743, LR: 1.900582316173603e-05, Loss: 757.8726806640625
2024-08-03T07:31:24.514856091Z 
  8%|▊         | 744/9500 [2:33:54<29:53:46, 12.29s/it]08/03/2024 00:31:24 - INFO - __main__ -   Step: 744, LR: 1.900365261804875e-05, Loss: 805.5051879882812
2024-08-03T07:31:37.278883166Z 
  8%|▊         | 745/9500 [2:34:07<30:14:15, 12.43s/it]08/03/2024 00:31:37 - INFO - __main__ -   Step: 745, LR: 1.900148207436147e-05, Loss: 590.8416748046875
2024-08-03T07:31:49.181409908Z 
  8%|▊         | 746/9500 [2:34:19<29:50:47, 12.27s/it]08/03/2024 00:31:49 - INFO - __main__ -   Step: 746, LR: 1.8999311530674195e-05, Loss: 565.4534912109375
2024-08-03T07:32:01.270805019Z 
  8%|▊         | 747/9500 [2:34:31<29:42:30, 12.22s/it]08/03/2024 00:32:01 - INFO - __main__ -   Step: 747, LR: 1.899714098698691e-05, Loss: 518.4998779296875
2024-08-03T07:32:13.506242999Z 
  8%|▊         | 748/9500 [2:34:43<29:43:02, 12.22s/it]08/03/2024 00:32:13 - INFO - __main__ -   Step: 748, LR: 1.8994970443299634e-05, Loss: 566.6135864257812
2024-08-03T07:32:25.449699348Z 
  8%|▊         | 749/9500 [2:34:55<29:30:33, 12.14s/it]08/03/2024 00:32:25 - INFO - __main__ -   Step: 749, LR: 1.8992799899612354e-05, Loss: 585.763671875
2024-08-03T07:32:37.395196873Z 
  8%|▊         | 750/9500 [2:35:07<29:21:53, 12.08s/it]08/03/2024 00:32:37 - INFO - __main__ -   Step: 750, LR: 1.8990629355925077e-05, Loss: 644.4461669921875
2024-08-03T07:32:49.652031543Z 
  8%|▊         | 751/9500 [2:35:19<29:29:20, 12.13s/it]08/03/2024 00:32:49 - INFO - __main__ -   Step: 751, LR: 1.8988458812237797e-05, Loss: 567.564697265625
2024-08-03T07:33:01.646728194Z 
  8%|▊         | 752/9500 [2:35:31<29:23:02, 12.09s/it]08/03/2024 00:33:01 - INFO - __main__ -   Step: 752, LR: 1.898628826855052e-05, Loss: 572.8922119140625
2024-08-03T07:33:13.526269254Z 
  8%|▊         | 753/9500 [2:35:43<29:13:32, 12.03s/it]08/03/2024 00:33:13 - INFO - __main__ -   Step: 753, LR: 1.898411772486324e-05, Loss: 551.0777587890625
2024-08-03T07:33:25.712350605Z 
  8%|▊         | 754/9500 [2:35:55<29:20:15, 12.08s/it]08/03/2024 00:33:25 - INFO - __main__ -   Step: 754, LR: 1.898194718117596e-05, Loss: 733.533935546875
2024-08-03T07:33:37.865974755Z 
  8%|▊         | 755/9500 [2:36:07<29:23:26, 12.10s/it]08/03/2024 00:33:37 - INFO - __main__ -   Step: 755, LR: 1.8979776637488684e-05, Loss: 580.6219482421875
2024-08-03T07:33:50.145121280Z 
  8%|▊         | 756/9500 [2:36:20<29:31:06, 12.15s/it]08/03/2024 00:33:50 - INFO - __main__ -   Step: 756, LR: 1.8977606093801403e-05, Loss: 727.7445068359375
2024-08-03T07:34:02.688308877Z 
  8%|▊         | 757/9500 [2:36:32<29:47:58, 12.27s/it]08/03/2024 00:34:02 - INFO - __main__ -   Step: 757, LR: 1.8975435550114127e-05, Loss: 549.4823608398438
2024-08-03T07:34:14.704011445Z 
  8%|▊         | 758/9500 [2:36:44<29:36:38, 12.19s/it]08/03/2024 00:34:14 - INFO - __main__ -   Step: 758, LR: 1.8973265006426847e-05, Loss: 581.2212524414062
2024-08-03T07:34:26.874096320Z 
  8%|▊         | 759/9500 [2:36:56<29:35:23, 12.19s/it]08/03/2024 00:34:26 - INFO - __main__ -   Step: 759, LR: 1.8971094462739566e-05, Loss: 791.5185546875
2024-08-03T07:34:39.960299775Z 
  8%|▊         | 760/9500 [2:37:09<30:14:29, 12.46s/it]08/03/2024 00:34:39 - INFO - __main__ -   Step: 760, LR: 1.896892391905229e-05, Loss: 795.9078369140625
2024-08-03T07:34:52.517404136Z 
  8%|▊         | 761/9500 [2:37:22<30:18:41, 12.49s/it]08/03/2024 00:34:52 - INFO - __main__ -   Step: 761, LR: 1.896675337536501e-05, Loss: 824.27587890625
2024-08-03T07:35:04.948514133Z 
  8%|▊         | 762/9500 [2:37:34<30:16:02, 12.47s/it]08/03/2024 00:35:04 - INFO - __main__ -   Step: 762, LR: 1.896458283167773e-05, Loss: 612.1376953125
2024-08-03T07:35:17.578848460Z 
  8%|▊         | 763/9500 [2:37:47<30:22:51, 12.52s/it]08/03/2024 00:35:17 - INFO - __main__ -   Step: 763, LR: 1.896241228799045e-05, Loss: 771.6485595703125
2024-08-03T07:35:29.836323017Z 
  8%|▊         | 764/9500 [2:37:59<30:11:14, 12.44s/it]08/03/2024 00:35:29 - INFO - __main__ -   Step: 764, LR: 1.8960241744303173e-05, Loss: 748.6390380859375
2024-08-03T07:35:42.527219704Z 
  8%|▊         | 765/9500 [2:38:12<30:22:00, 12.52s/it]08/03/2024 00:35:42 - INFO - __main__ -   Step: 765, LR: 1.8958071200615892e-05, Loss: 862.45703125
2024-08-03T07:35:55.018424252Z 
  8%|▊         | 766/9500 [2:38:24<30:20:44, 12.51s/it]08/03/2024 00:35:55 - INFO - __main__ -   Step: 766, LR: 1.8955900656928616e-05, Loss: 599.40234375
2024-08-03T07:36:07.460114409Z 
  8%|▊         | 767/9500 [2:38:37<30:17:39, 12.49s/it]08/03/2024 00:36:07 - INFO - __main__ -   Step: 767, LR: 1.8953730113241336e-05, Loss: 889.8143920898438
2024-08-03T07:36:19.795347705Z 
  8%|▊         | 768/9500 [2:38:49<30:10:44, 12.44s/it]08/03/2024 00:36:19 - INFO - __main__ -   Step: 768, LR: 1.8951559569554055e-05, Loss: 892.919189453125
2024-08-03T07:36:31.714143653Z 
  8%|▊         | 769/9500 [2:39:01<29:47:42, 12.29s/it]08/03/2024 00:36:31 - INFO - __main__ -   Step: 769, LR: 1.894938902586678e-05, Loss: 682.1595458984375
2024-08-03T07:36:44.081858543Z 
  8%|▊         | 770/9500 [2:39:14<29:51:05, 12.31s/it]08/03/2024 00:36:44 - INFO - __main__ -   Step: 770, LR: 1.89472184821795e-05, Loss: 517.572998046875
2024-08-03T07:36:56.286176670Z 
  8%|▊         | 771/9500 [2:39:26<29:46:17, 12.28s/it]08/03/2024 00:36:56 - INFO - __main__ -   Step: 771, LR: 1.8945047938492222e-05, Loss: 790.7586059570312
2024-08-03T07:37:08.318940391Z 
  8%|▊         | 772/9500 [2:39:38<29:35:22, 12.20s/it]08/03/2024 00:37:08 - INFO - __main__ -   Step: 772, LR: 1.894287739480494e-05, Loss: 649.159912109375
2024-08-03T07:37:20.935604636Z 
  8%|▊         | 773/9500 [2:39:50<29:53:08, 12.33s/it]08/03/2024 00:37:20 - INFO - __main__ -   Step: 773, LR: 1.894070685111766e-05, Loss: 556.0614013671875
2024-08-03T07:37:33.073149333Z 
  8%|▊         | 774/9500 [2:40:03<29:44:36, 12.27s/it]08/03/2024 00:37:33 - INFO - __main__ -   Step: 774, LR: 1.8938536307430385e-05, Loss: 927.927001953125
2024-08-03T07:37:45.301074905Z 
  8%|▊         | 775/9500 [2:40:15<29:42:31, 12.26s/it]08/03/2024 00:37:45 - INFO - __main__ -   Step: 775, LR: 1.8936365763743105e-05, Loss: 650.3160400390625
2024-08-03T07:37:57.736854359Z 
  8%|▊         | 776/9500 [2:40:27<29:50:04, 12.31s/it]08/03/2024 00:37:57 - INFO - __main__ -   Step: 776, LR: 1.8934195220055824e-05, Loss: 651.1259765625
2024-08-03T07:38:09.796820824Z 
  8%|▊         | 777/9500 [2:40:39<29:38:54, 12.24s/it]08/03/2024 00:38:09 - INFO - __main__ -   Step: 777, LR: 1.8932024676368544e-05, Loss: 715.2479248046875
2024-08-03T07:38:21.983517193Z 
  8%|▊         | 778/9500 [2:40:51<29:36:33, 12.22s/it]08/03/2024 00:38:21 - INFO - __main__ -   Step: 778, LR: 1.8929854132681268e-05, Loss: 707.274169921875
2024-08-03T07:38:34.537666186Z 
  8%|▊         | 779/9500 [2:41:04<29:50:52, 12.32s/it]08/03/2024 00:38:34 - INFO - __main__ -   Step: 779, LR: 1.8927683588993987e-05, Loss: 660.6692504882812
2024-08-03T07:38:46.833028324Z 
  8%|▊         | 780/9500 [2:41:16<29:49:32, 12.31s/it]08/03/2024 00:38:46 - INFO - __main__ -   Step: 780, LR: 1.892551304530671e-05, Loss: 818.5944213867188
2024-08-03T07:38:59.095549755Z 
  8%|▊         | 781/9500 [2:41:29<29:47:07, 12.30s/it]08/03/2024 00:38:59 - INFO - __main__ -   Step: 781, LR: 1.892334250161943e-05, Loss: 691.4928588867188
2024-08-03T07:39:11.857589331Z 
  8%|▊         | 782/9500 [2:41:41<30:07:08, 12.44s/it]08/03/2024 00:39:11 - INFO - __main__ -   Step: 782, LR: 1.892117195793215e-05, Loss: 514.54638671875
2024-08-03T07:39:24.190838698Z 
  8%|▊         | 783/9500 [2:41:54<30:02:22, 12.41s/it]08/03/2024 00:39:24 - INFO - __main__ -   Step: 783, LR: 1.8919001414244874e-05, Loss: 641.4157104492188
2024-08-03T07:39:36.217538471Z 
  8%|▊         | 784/9500 [2:42:06<29:45:39, 12.29s/it]08/03/2024 00:39:36 - INFO - __main__ -   Step: 784, LR: 1.8916830870557594e-05, Loss: 540.86865234375
2024-08-03T07:39:48.656063136Z 
  8%|▊         | 785/9500 [2:42:18<29:51:50, 12.34s/it]08/03/2024 00:39:48 - INFO - __main__ -   Step: 785, LR: 1.8914660326870317e-05, Loss: 729.2393798828125
2024-08-03T07:40:00.923778132Z 
  8%|▊         | 786/9500 [2:42:30<29:48:37, 12.32s/it]08/03/2024 00:40:00 - INFO - __main__ -   Step: 786, LR: 1.8912489783183037e-05, Loss: 645.1357421875
2024-08-03T07:40:13.290906397Z 
  8%|▊         | 787/9500 [2:42:43<29:50:40, 12.33s/it]08/03/2024 00:40:13 - INFO - __main__ -   Step: 787, LR: 1.8910319239495757e-05, Loss: 764.28271484375
2024-08-03T07:40:25.914270669Z 
  8%|▊         | 788/9500 [2:42:55<30:03:11, 12.42s/it]08/03/2024 00:40:25 - INFO - __main__ -   Step: 788, LR: 1.890814869580848e-05, Loss: 694.978271484375
2024-08-03T07:40:38.229601914Z 
  8%|▊         | 789/9500 [2:43:08<29:58:29, 12.39s/it]08/03/2024 00:40:38 - INFO - __main__ -   Step: 789, LR: 1.89059781521212e-05, Loss: 757.2450561523438
2024-08-03T07:40:50.435572078Z 
  8%|▊         | 790/9500 [2:43:20<29:50:21, 12.33s/it]08/03/2024 00:40:50 - INFO - __main__ -   Step: 790, LR: 1.890380760843392e-05, Loss: 696.1607666015625
2024-08-03T07:41:03.292988383Z 
  8%|▊         | 791/9500 [2:43:33<30:12:59, 12.49s/it]08/03/2024 00:41:03 - INFO - __main__ -   Step: 791, LR: 1.890163706474664e-05, Loss: 716.880126953125
2024-08-03T07:41:15.405560292Z 
  8%|▊         | 792/9500 [2:43:45<29:56:20, 12.38s/it]08/03/2024 00:41:15 - INFO - __main__ -   Step: 792, LR: 1.8899466521059363e-05, Loss: 693.852783203125
2024-08-03T07:41:27.739867330Z 
  8%|▊         | 793/9500 [2:43:57<29:54:15, 12.36s/it]08/03/2024 00:41:27 - INFO - __main__ -   Step: 793, LR: 1.8897295977372083e-05, Loss: 748.8556518554688
2024-08-03T07:41:40.414253493Z 
  8%|▊         | 794/9500 [2:44:10<30:07:33, 12.46s/it]08/03/2024 00:41:40 - INFO - __main__ -   Step: 794, LR: 1.8895125433684806e-05, Loss: 816.0880737304688
2024-08-03T07:41:52.362615966Z 
  8%|▊         | 795/9500 [2:44:22<29:45:11, 12.30s/it]08/03/2024 00:41:52 - INFO - __main__ -   Step: 795, LR: 1.8892954889997526e-05, Loss: 604.0528564453125
2024-08-03T07:42:04.427566333Z 
  8%|▊         | 796/9500 [2:44:34<29:34:33, 12.23s/it]08/03/2024 00:42:04 - INFO - __main__ -   Step: 796, LR: 1.8890784346310246e-05, Loss: 556.0093383789062
2024-08-03T07:42:16.922415646Z 
  8%|▊         | 797/9500 [2:44:46<29:45:44, 12.31s/it]08/03/2024 00:42:16 - INFO - __main__ -   Step: 797, LR: 1.888861380262297e-05, Loss: 572.9593505859375
2024-08-03T07:42:28.948621909Z 
  8%|▊         | 798/9500 [2:44:58<29:33:09, 12.23s/it]08/03/2024 00:42:28 - INFO - __main__ -   Step: 798, LR: 1.888644325893569e-05, Loss: 735.5978393554688
2024-08-03T07:42:41.171174718Z 
  8%|▊         | 799/9500 [2:45:11<29:32:48, 12.22s/it]08/03/2024 00:42:41 - INFO - __main__ -   Step: 799, LR: 1.8884272715248412e-05, Loss: 605.939453125
2024-08-03T07:42:53.834656081Z 
  8%|▊         | 800/9500 [2:45:23<29:51:40, 12.36s/it]08/03/2024 00:42:53 - INFO - __main__ -   Step: 800, LR: 1.8882102171561132e-05, Loss: 730.248779296875
2024-08-03T07:43:06.417015711Z 
  8%|▊         | 801/9500 [2:45:36<30:01:18, 12.42s/it]08/03/2024 00:43:06 - INFO - __main__ -   Step: 801, LR: 1.887993162787385e-05, Loss: 666.5247802734375
2024-08-03T07:43:18.331244556Z 
  8%|▊         | 802/9500 [2:45:48<29:38:54, 12.27s/it]08/03/2024 00:43:18 - INFO - __main__ -   Step: 802, LR: 1.8877761084186575e-05, Loss: 560.4647216796875
2024-08-03T07:43:30.775733283Z 
  8%|▊         | 803/9500 [2:46:00<29:46:15, 12.32s/it]08/03/2024 00:43:30 - INFO - __main__ -   Step: 803, LR: 1.8875590540499295e-05, Loss: 568.1551513671875
2024-08-03T07:43:43.081231365Z 
  8%|▊         | 804/9500 [2:46:13<29:45:15, 12.32s/it]08/03/2024 00:43:43 - INFO - __main__ -   Step: 804, LR: 1.8873419996812015e-05, Loss: 708.261962890625
2024-08-03T07:43:55.075129282Z 
  8%|▊         | 805/9500 [2:46:25<29:30:59, 12.22s/it]08/03/2024 00:43:55 - INFO - __main__ -   Step: 805, LR: 1.8871249453124734e-05, Loss: 764.1436157226562
2024-08-03T07:44:07.604361944Z 
  8%|▊         | 806/9500 [2:46:37<29:44:11, 12.31s/it]08/03/2024 00:44:07 - INFO - __main__ -   Step: 806, LR: 1.8869078909437458e-05, Loss: 660.1531982421875
2024-08-03T07:44:19.872555744Z 
  8%|▊         | 807/9500 [2:46:49<29:42:01, 12.30s/it]08/03/2024 00:44:19 - INFO - __main__ -   Step: 807, LR: 1.8866908365750178e-05, Loss: 810.4544067382812
2024-08-03T07:44:32.090845438Z 
  9%|▊         | 808/9500 [2:47:02<29:38:16, 12.28s/it]08/03/2024 00:44:32 - INFO - __main__ -   Step: 808, LR: 1.88647378220629e-05, Loss: 786.2489624023438
2024-08-03T07:44:44.658811167Z 
  9%|▊         | 809/9500 [2:47:14<29:50:47, 12.36s/it]08/03/2024 00:44:44 - INFO - __main__ -   Step: 809, LR: 1.886256727837562e-05, Loss: 666.12548828125
2024-08-03T07:44:57.069738980Z 
  9%|▊         | 810/9500 [2:47:27<29:52:39, 12.38s/it]08/03/2024 00:44:57 - INFO - __main__ -   Step: 810, LR: 1.886039673468834e-05, Loss: 784.9684448242188
2024-08-03T07:45:09.158440863Z 
  9%|▊         | 811/9500 [2:47:39<29:39:55, 12.29s/it]08/03/2024 00:45:09 - INFO - __main__ -   Step: 811, LR: 1.8858226191001064e-05, Loss: 616.0133056640625
2024-08-03T07:45:21.387776495Z 
  9%|▊         | 812/9500 [2:47:51<29:37:02, 12.27s/it]08/03/2024 00:45:21 - INFO - __main__ -   Step: 812, LR: 1.8856055647313784e-05, Loss: 656.2799072265625
2024-08-03T07:45:33.759404098Z 
  9%|▊         | 813/9500 [2:48:03<29:41:08, 12.30s/it]08/03/2024 00:45:33 - INFO - __main__ -   Step: 813, LR: 1.8853885103626507e-05, Loss: 711.4862060546875
2024-08-03T07:45:46.052470593Z 
  9%|▊         | 814/9500 [2:48:15<29:40:32, 12.30s/it]08/03/2024 00:45:46 - INFO - __main__ -   Step: 814, LR: 1.8851714559939227e-05, Loss: 683.38037109375
2024-08-03T07:45:58.331017240Z 
  9%|▊         | 815/9500 [2:48:28<29:39:26, 12.29s/it]08/03/2024 00:45:58 - INFO - __main__ -   Step: 815, LR: 1.8849544016251947e-05, Loss: 832.70458984375
2024-08-03T07:46:11.015087504Z 
  9%|▊         | 816/9500 [2:48:40<29:56:11, 12.41s/it]08/03/2024 00:46:11 - INFO - __main__ -   Step: 816, LR: 1.884737347256467e-05, Loss: 743.156494140625
2024-08-03T07:46:23.101808310Z 
  9%|▊         | 817/9500 [2:48:53<29:41:56, 12.31s/it]08/03/2024 00:46:23 - INFO - __main__ -   Step: 817, LR: 1.884520292887739e-05, Loss: 587.3857421875
2024-08-03T07:46:35.174717449Z 
  9%|▊         | 818/9500 [2:49:05<29:31:18, 12.24s/it]08/03/2024 00:46:35 - INFO - __main__ -   Step: 818, LR: 1.884303238519011e-05, Loss: 671.3328857421875
2024-08-03T07:46:47.563164254Z 
  9%|▊         | 819/9500 [2:49:17<29:37:29, 12.29s/it]08/03/2024 00:46:47 - INFO - __main__ -   Step: 819, LR: 1.884086184150283e-05, Loss: 536.718994140625
2024-08-03T07:46:59.678150016Z 
  9%|▊         | 820/9500 [2:49:29<29:29:53, 12.23s/it]08/03/2024 00:46:59 - INFO - __main__ -   Step: 820, LR: 1.8838691297815553e-05, Loss: 625.45703125
2024-08-03T07:47:11.956667067Z 
  9%|▊         | 821/9500 [2:49:41<29:31:37, 12.25s/it]08/03/2024 00:47:11 - INFO - __main__ -   Step: 821, LR: 1.8836520754128273e-05, Loss: 707.4920043945312
2024-08-03T07:47:24.755515324Z 
  9%|▊         | 822/9500 [2:49:54<29:55:19, 12.41s/it]08/03/2024 00:47:24 - INFO - __main__ -   Step: 822, LR: 1.8834350210440996e-05, Loss: 817.9279174804688
2024-08-03T07:47:37.176117887Z 
  9%|▊         | 823/9500 [2:50:07<29:55:27, 12.42s/it]08/03/2024 00:47:37 - INFO - __main__ -   Step: 823, LR: 1.8832179666753716e-05, Loss: 718.0240478515625
2024-08-03T07:47:49.027036662Z 
  9%|▊         | 824/9500 [2:50:18<29:30:45, 12.25s/it]08/03/2024 00:47:49 - INFO - __main__ -   Step: 824, LR: 1.8830009123066436e-05, Loss: 552.6282958984375
2024-08-03T07:48:01.533188833Z 
  9%|▊         | 825/9500 [2:50:31<29:41:51, 12.32s/it]08/03/2024 00:48:01 - INFO - __main__ -   Step: 825, LR: 1.882783857937916e-05, Loss: 753.8643188476562
2024-08-03T07:48:13.808078213Z 
  9%|▊         | 826/9500 [2:50:43<29:39:30, 12.31s/it]08/03/2024 00:48:13 - INFO - __main__ -   Step: 826, LR: 1.882566803569188e-05, Loss: 703.5256958007812
2024-08-03T07:48:25.770360868Z 
  9%|▊         | 827/9500 [2:50:55<29:24:14, 12.21s/it]08/03/2024 00:48:25 - INFO - __main__ -   Step: 827, LR: 1.8823497492004602e-05, Loss: 550.1734619140625
2024-08-03T07:48:39.072401845Z 
  9%|▊         | 828/9500 [2:51:09<30:11:35, 12.53s/it]08/03/2024 00:48:39 - INFO - __main__ -   Step: 828, LR: 1.8821326948317322e-05, Loss: 730.161376953125
2024-08-03T07:48:51.239224115Z 
  9%|▊         | 829/9500 [2:51:21<29:55:28, 12.42s/it]08/03/2024 00:48:51 - INFO - __main__ -   Step: 829, LR: 1.8819156404630045e-05, Loss: 703.44091796875
2024-08-03T07:49:03.330281729Z 
  9%|▊         | 830/9500 [2:51:33<29:40:50, 12.32s/it]08/03/2024 00:49:03 - INFO - __main__ -   Step: 830, LR: 1.8816985860942765e-05, Loss: 720.28759765625
2024-08-03T07:49:15.592196094Z 
  9%|▊         | 831/9500 [2:51:45<29:37:56, 12.31s/it]08/03/2024 00:49:15 - INFO - __main__ -   Step: 831, LR: 1.8814815317255485e-05, Loss: 596.393310546875
2024-08-03T07:49:27.728058378Z 
  9%|▉         | 832/9500 [2:51:57<29:30:23, 12.25s/it]08/03/2024 00:49:27 - INFO - __main__ -   Step: 832, LR: 1.8812644773568205e-05, Loss: 745.0554809570312
2024-08-03T07:49:39.578926043Z 
  9%|▉         | 833/9500 [2:52:09<29:12:41, 12.13s/it]08/03/2024 00:49:39 - INFO - __main__ -   Step: 833, LR: 1.8810474229880925e-05, Loss: 598.5521240234375
2024-08-03T07:49:52.360155947Z 
  9%|▉         | 834/9500 [2:52:22<29:40:32, 12.33s/it]08/03/2024 00:49:52 - INFO - __main__ -   Step: 834, LR: 1.8808303686193648e-05, Loss: 796.2711181640625
2024-08-03T07:50:04.879584376Z 
  9%|▉         | 835/9500 [2:52:34<29:48:38, 12.39s/it]08/03/2024 00:50:04 - INFO - __main__ -   Step: 835, LR: 1.8806133142506368e-05, Loss: 661.0572509765625
2024-08-03T07:50:16.799687066Z 
  9%|▉         | 836/9500 [2:52:46<29:28:17, 12.25s/it]08/03/2024 00:50:16 - INFO - __main__ -   Step: 836, LR: 1.880396259881909e-05, Loss: 582.12890625
2024-08-03T07:50:29.211212757Z 
  9%|▉         | 837/9500 [2:52:59<29:35:15, 12.30s/it]08/03/2024 00:50:29 - INFO - __main__ -   Step: 837, LR: 1.880179205513181e-05, Loss: 573.8965454101562
2024-08-03T07:50:41.323812112Z 
  9%|▉         | 838/9500 [2:53:11<29:27:08, 12.24s/it]08/03/2024 00:50:41 - INFO - __main__ -   Step: 838, LR: 1.8799621511444534e-05, Loss: 635.5451049804688
2024-08-03T07:50:53.557600127Z 
  9%|▉         | 839/9500 [2:53:23<29:26:38, 12.24s/it]08/03/2024 00:50:53 - INFO - __main__ -   Step: 839, LR: 1.8797450967757254e-05, Loss: 792.9215087890625
2024-08-03T07:51:05.850635026Z 
  9%|▉         | 840/9500 [2:53:35<29:28:47, 12.25s/it]08/03/2024 00:51:05 - INFO - __main__ -   Step: 840, LR: 1.8795280424069974e-05, Loss: 742.7691650390625
2024-08-03T07:51:17.914849680Z 
  9%|▉         | 841/9500 [2:53:47<29:20:20, 12.20s/it]08/03/2024 00:51:17 - INFO - __main__ -   Step: 841, LR: 1.8793109880382697e-05, Loss: 529.2025146484375
2024-08-03T07:51:29.775958232Z 
  9%|▉         | 842/9500 [2:53:59<29:05:33, 12.10s/it]08/03/2024 00:51:29 - INFO - __main__ -   Step: 842, LR: 1.8790939336695417e-05, Loss: 518.4030151367188
2024-08-03T07:51:42.496692109Z 
  9%|▉         | 843/9500 [2:54:12<29:32:22, 12.28s/it]08/03/2024 00:51:42 - INFO - __main__ -   Step: 843, LR: 1.878876879300814e-05, Loss: 812.4871826171875
2024-08-03T07:51:54.656418496Z 
  9%|▉         | 844/9500 [2:54:24<29:26:46, 12.25s/it]08/03/2024 00:51:54 - INFO - __main__ -   Step: 844, LR: 1.878659824932086e-05, Loss: 665.7520751953125
2024-08-03T07:52:06.788715923Z 
  9%|▉         | 845/9500 [2:54:36<29:21:37, 12.21s/it]08/03/2024 00:52:06 - INFO - __main__ -   Step: 845, LR: 1.878442770563358e-05, Loss: 867.4614868164062
2024-08-03T07:52:19.457885533Z 
  9%|▉         | 846/9500 [2:54:49<29:41:11, 12.35s/it]08/03/2024 00:52:19 - INFO - __main__ -   Step: 846, LR: 1.87822571619463e-05, Loss: 878.8251342773438
2024-08-03T07:52:31.442482988Z 
  9%|▉         | 847/9500 [2:55:01<29:25:12, 12.24s/it]08/03/2024 00:52:31 - INFO - __main__ -   Step: 847, LR: 1.8780086618259023e-05, Loss: 591.7316284179688
2024-08-03T07:52:43.333822333Z 
  9%|▉         | 848/9500 [2:55:13<29:09:55, 12.14s/it]08/03/2024 00:52:43 - INFO - __main__ -   Step: 848, LR: 1.8777916074571743e-05, Loss: 720.757568359375
2024-08-03T07:52:55.880686073Z 
  9%|▉         | 849/9500 [2:55:25<29:27:31, 12.26s/it]08/03/2024 00:52:55 - INFO - __main__ -   Step: 849, LR: 1.8775745530884463e-05, Loss: 721.188232421875
2024-08-03T07:53:08.718336756Z 
  9%|▉         | 850/9500 [2:55:38<29:52:20, 12.43s/it]08/03/2024 00:53:08 - INFO - __main__ -   Step: 850, LR: 1.8773574987197186e-05, Loss: 686.759765625
2024-08-03T07:53:21.150685901Z 
  9%|▉         | 851/9500 [2:55:51<29:52:07, 12.43s/it]08/03/2024 00:53:21 - INFO - __main__ -   Step: 851, LR: 1.8771404443509906e-05, Loss: 662.4246826171875
2024-08-03T07:53:33.437014917Z 
  9%|▉         | 852/9500 [2:56:03<29:45:36, 12.39s/it]08/03/2024 00:53:33 - INFO - __main__ -   Step: 852, LR: 1.876923389982263e-05, Loss: 533.6498413085938
2024-08-03T07:53:45.815042558Z 
  9%|▉         | 853/9500 [2:56:15<29:44:56, 12.39s/it]08/03/2024 00:53:45 - INFO - __main__ -   Step: 853, LR: 1.876706335613535e-05, Loss: 757.82080078125
2024-08-03T07:53:57.806716406Z 
  9%|▉         | 854/9500 [2:56:27<29:27:42, 12.27s/it]08/03/2024 00:53:57 - INFO - __main__ -   Step: 854, LR: 1.876489281244807e-05, Loss: 672.9358520507812
2024-08-03T07:54:09.847725773Z 
  9%|▉         | 855/9500 [2:56:39<29:17:43, 12.20s/it]08/03/2024 00:54:09 - INFO - __main__ -   Step: 855, LR: 1.8762722268760792e-05, Loss: 703.8179931640625
2024-08-03T07:54:22.402517578Z 
  9%|▉         | 856/9500 [2:56:52<29:32:53, 12.31s/it]08/03/2024 00:54:22 - INFO - __main__ -   Step: 856, LR: 1.8760551725073512e-05, Loss: 720.780029296875
2024-08-03T07:54:34.520094567Z 
  9%|▉         | 857/9500 [2:57:04<29:24:32, 12.25s/it]08/03/2024 00:54:34 - INFO - __main__ -   Step: 857, LR: 1.8758381181386235e-05, Loss: 585.906494140625
2024-08-03T07:54:46.774938031Z 
  9%|▉         | 858/9500 [2:57:16<29:24:33, 12.25s/it]08/03/2024 00:54:46 - INFO - __main__ -   Step: 858, LR: 1.8756210637698955e-05, Loss: 704.4488525390625
2024-08-03T07:54:59.668872726Z 
  9%|▉         | 859/9500 [2:57:29<29:52:08, 12.44s/it]08/03/2024 00:54:59 - INFO - __main__ -   Step: 859, LR: 1.8754040094011675e-05, Loss: 753.2078857421875
2024-08-03T07:55:11.781205034Z 
  9%|▉         | 860/9500 [2:57:41<29:37:36, 12.34s/it]08/03/2024 00:55:11 - INFO - __main__ -   Step: 860, LR: 1.8751869550324395e-05, Loss: 570.183837890625
2024-08-03T07:55:24.164860876Z 
  9%|▉         | 861/9500 [2:57:54<29:39:06, 12.36s/it]08/03/2024 00:55:24 - INFO - __main__ -   Step: 861, LR: 1.8749699006637118e-05, Loss: 743.3356323242188
2024-08-03T07:55:36.746271989Z 
  9%|▉         | 862/9500 [2:58:06<29:48:36, 12.42s/it]08/03/2024 00:55:36 - INFO - __main__ -   Step: 862, LR: 1.8747528462949838e-05, Loss: 544.31201171875
2024-08-03T07:55:49.232602509Z 
  9%|▉         | 863/9500 [2:58:19<29:51:06, 12.44s/it]08/03/2024 00:55:49 - INFO - __main__ -   Step: 863, LR: 1.8745357919262558e-05, Loss: 763.310302734375
2024-08-03T07:56:01.357256331Z 
  9%|▉         | 864/9500 [2:58:31<29:37:09, 12.35s/it]08/03/2024 00:56:01 - INFO - __main__ -   Step: 864, LR: 1.874318737557528e-05, Loss: 507.4539794921875
2024-08-03T07:56:13.986250655Z 
  9%|▉         | 865/9500 [2:58:43<29:49:08, 12.43s/it]08/03/2024 00:56:13 - INFO - __main__ -   Step: 865, LR: 1.8741016831888e-05, Loss: 745.2327880859375
2024-08-03T07:56:26.231890163Z 
  9%|▉         | 866/9500 [2:58:56<29:40:53, 12.38s/it]08/03/2024 00:56:26 - INFO - __main__ -   Step: 866, LR: 1.8738846288200724e-05, Loss: 709.1285400390625
2024-08-03T07:56:38.218018674Z 
  9%|▉         | 867/9500 [2:59:08<29:23:52, 12.26s/it]08/03/2024 00:56:38 - INFO - __main__ -   Step: 867, LR: 1.8736675744513444e-05, Loss: 667.1201171875
2024-08-03T07:56:50.919278529Z 
  9%|▉         | 868/9500 [2:59:20<29:42:44, 12.39s/it]08/03/2024 00:56:50 - INFO - __main__ -   Step: 868, LR: 1.8734505200826164e-05, Loss: 649.1683349609375
2024-08-03T07:57:03.026303512Z 
  9%|▉         | 869/9500 [2:59:32<29:30:15, 12.31s/it]08/03/2024 00:57:03 - INFO - __main__ -   Step: 869, LR: 1.8732334657138887e-05, Loss: 808.3944091796875
2024-08-03T07:57:15.142652977Z 
  9%|▉         | 870/9500 [2:59:45<29:21:51, 12.25s/it]08/03/2024 00:57:15 - INFO - __main__ -   Step: 870, LR: 1.8730164113451607e-05, Loss: 866.949462890625
2024-08-03T07:57:27.639038263Z 
  9%|▉         | 871/9500 [2:59:57<29:32:19, 12.32s/it]08/03/2024 00:57:27 - INFO - __main__ -   Step: 871, LR: 1.872799356976433e-05, Loss: 777.7607421875
2024-08-03T07:57:40.087425678Z 
  9%|▉         | 872/9500 [3:00:10<29:37:30, 12.36s/it]08/03/2024 00:57:40 - INFO - __main__ -   Step: 872, LR: 1.872582302607705e-05, Loss: 591.2545166015625
2024-08-03T07:57:51.968612736Z 
  9%|▉         | 873/9500 [3:00:21<29:16:35, 12.22s/it]08/03/2024 00:57:51 - INFO - __main__ -   Step: 873, LR: 1.872365248238977e-05, Loss: 536.88134765625
2024-08-03T07:58:04.697393770Z 
  9%|▉         | 874/9500 [3:00:34<29:38:27, 12.37s/it]08/03/2024 00:58:04 - INFO - __main__ -   Step: 874, LR: 1.872148193870249e-05, Loss: 700.982177734375
2024-08-03T07:58:17.144375607Z 
  9%|▉         | 875/9500 [3:00:47<29:41:33, 12.39s/it]08/03/2024 00:58:17 - INFO - __main__ -   Step: 875, LR: 1.8719311395015213e-05, Loss: 831.9200439453125
2024-08-03T07:58:29.135569355Z 
  9%|▉         | 876/9500 [3:00:59<29:24:00, 12.27s/it]08/03/2024 00:58:29 - INFO - __main__ -   Step: 876, LR: 1.8717140851327933e-05, Loss: 658.1060791015625
2024-08-03T07:58:41.977899117Z 
  9%|▉         | 877/9500 [3:01:11<29:48:20, 12.44s/it]08/03/2024 00:58:41 - INFO - __main__ -   Step: 877, LR: 1.8714970307640653e-05, Loss: 697.577392578125
2024-08-03T07:58:54.065063378Z 
  9%|▉         | 878/9500 [3:01:24<29:32:47, 12.34s/it]08/03/2024 00:58:54 - INFO - __main__ -   Step: 878, LR: 1.8712799763953376e-05, Loss: 660.5262451171875
2024-08-03T07:59:06.210584676Z 
  9%|▉         | 879/9500 [3:01:36<29:24:20, 12.28s/it]08/03/2024 00:59:06 - INFO - __main__ -   Step: 879, LR: 1.8710629220266096e-05, Loss: 587.840087890625
2024-08-03T07:59:18.695723862Z 
  9%|▉         | 880/9500 [3:01:48<29:33:00, 12.34s/it]08/03/2024 00:59:18 - INFO - __main__ -   Step: 880, LR: 1.870845867657882e-05, Loss: 662.6251220703125
2024-08-03T07:59:30.761553022Z 
  9%|▉         | 881/9500 [3:02:00<29:20:55, 12.26s/it]08/03/2024 00:59:30 - INFO - __main__ -   Step: 881, LR: 1.870628813289154e-05, Loss: 593.7738647460938
2024-08-03T07:59:43.103945399Z 
  9%|▉         | 882/9500 [3:02:13<29:24:21, 12.28s/it]08/03/2024 00:59:43 - INFO - __main__ -   Step: 882, LR: 1.870411758920426e-05, Loss: 695.5160522460938
2024-08-03T07:59:55.833959962Z 
  9%|▉         | 883/9500 [3:02:25<29:43:22, 12.42s/it]08/03/2024 00:59:55 - INFO - __main__ -   Step: 883, LR: 1.8701947045516982e-05, Loss: 754.1246337890625
2024-08-03T08:00:08.008842928Z 
  9%|▉         | 884/9500 [3:02:37<29:32:42, 12.34s/it]08/03/2024 01:00:08 - INFO - __main__ -   Step: 884, LR: 1.8699776501829702e-05, Loss: 643.3868408203125
2024-08-03T08:00:20.225819938Z 
  9%|▉         | 885/9500 [3:02:50<29:27:00, 12.31s/it]08/03/2024 01:00:20 - INFO - __main__ -   Step: 885, LR: 1.8697605958142425e-05, Loss: 673.156982421875
2024-08-03T08:00:32.792314523Z 
  9%|▉         | 886/9500 [3:03:02<29:37:59, 12.38s/it]08/03/2024 01:00:32 - INFO - __main__ -   Step: 886, LR: 1.8695435414455142e-05, Loss: 667.874755859375
2024-08-03T08:00:45.016423747Z 
  9%|▉         | 887/9500 [3:03:14<29:30:52, 12.34s/it]08/03/2024 01:00:45 - INFO - __main__ -   Step: 887, LR: 1.8693264870767865e-05, Loss: 678.5452270507812
2024-08-03T08:00:57.168222978Z 
  9%|▉         | 888/9500 [3:03:27<29:22:43, 12.28s/it]08/03/2024 01:00:57 - INFO - __main__ -   Step: 888, LR: 1.8691094327080585e-05, Loss: 555.529296875
2024-08-03T08:01:09.649186090Z 
  9%|▉         | 889/9500 [3:03:39<29:31:08, 12.34s/it]08/03/2024 01:01:09 - INFO - __main__ -   Step: 889, LR: 1.8688923783393308e-05, Loss: 690.6402587890625
2024-08-03T08:01:21.980346671Z 
  9%|▉         | 890/9500 [3:03:51<29:30:29, 12.34s/it]08/03/2024 01:01:21 - INFO - __main__ -   Step: 890, LR: 1.8686753239706028e-05, Loss: 739.958984375
2024-08-03T08:01:34.082892699Z 
  9%|▉         | 891/9500 [3:04:04<29:20:09, 12.27s/it]08/03/2024 01:01:34 - INFO - __main__ -   Step: 891, LR: 1.8684582696018748e-05, Loss: 597.204345703125
2024-08-03T08:01:46.416202085Z 
  9%|▉         | 892/9500 [3:04:16<29:22:47, 12.29s/it]08/03/2024 01:01:46 - INFO - __main__ -   Step: 892, LR: 1.868241215233147e-05, Loss: 672.890380859375
2024-08-03T08:01:58.680735842Z 
  9%|▉         | 893/9500 [3:04:28<29:21:37, 12.28s/it]08/03/2024 01:01:58 - INFO - __main__ -   Step: 893, LR: 1.868024160864419e-05, Loss: 765.6068115234375
2024-08-03T08:02:10.914394938Z 
  9%|▉         | 894/9500 [3:04:40<29:19:24, 12.27s/it]08/03/2024 01:02:10 - INFO - __main__ -   Step: 894, LR: 1.8678071064956914e-05, Loss: 713.149169921875
2024-08-03T08:02:23.492500546Z 
  9%|▉         | 895/9500 [3:04:53<29:32:36, 12.36s/it]08/03/2024 01:02:23 - INFO - __main__ -   Step: 895, LR: 1.8675900521269634e-05, Loss: 706.15380859375
2024-08-03T08:02:35.813015213Z 
  9%|▉         | 896/9500 [3:05:05<29:30:42, 12.35s/it]08/03/2024 01:02:35 - INFO - __main__ -   Step: 896, LR: 1.8673729977582354e-05, Loss: 597.8824462890625
2024-08-03T08:02:48.053449412Z 
  9%|▉         | 897/9500 [3:05:17<29:25:53, 12.32s/it]08/03/2024 01:02:48 - INFO - __main__ -   Step: 897, LR: 1.8671559433895077e-05, Loss: 752.4053955078125
2024-08-03T08:03:00.794187954Z 
  9%|▉         | 898/9500 [3:05:30<29:43:57, 12.44s/it]08/03/2024 01:03:00 - INFO - __main__ -   Step: 898, LR: 1.8669388890207797e-05, Loss: 747.467041015625
2024-08-03T08:03:14.192099312Z 
  9%|▉         | 899/9500 [3:05:44<30:24:47, 12.73s/it]08/03/2024 01:03:14 - INFO - __main__ -   Step: 899, LR: 1.866721834652052e-05, Loss: 652.113037109375
2024-08-03T08:03:27.275964261Z 
  9%|▉         | 900/9500 [3:05:57<30:39:49, 12.84s/it]08/03/2024 01:03:27 - INFO - __main__ -   Step: 900, LR: 1.8665047802833237e-05, Loss: 763.4013671875
2024-08-03T08:03:40.133018176Z 
  9%|▉         | 901/9500 [3:06:10<30:40:30, 12.84s/it]08/03/2024 01:03:40 - INFO - __main__ -   Step: 901, LR: 1.866287725914596e-05, Loss: 783.076171875
2024-08-03T08:03:53.051578325Z 
  9%|▉         | 902/9500 [3:06:22<30:43:34, 12.87s/it]08/03/2024 01:03:53 - INFO - __main__ -   Step: 902, LR: 1.866070671545868e-05, Loss: 679.0548095703125
2024-08-03T08:04:05.382546082Z 
 10%|▉         | 903/9500 [3:06:35<30:20:23, 12.70s/it]08/03/2024 01:04:05 - INFO - __main__ -   Step: 903, LR: 1.8658536171771403e-05, Loss: 645.6751708984375
2024-08-03T08:04:17.619876913Z 
 10%|▉         | 904/9500 [3:06:47<30:00:04, 12.56s/it]08/03/2024 01:04:17 - INFO - __main__ -   Step: 904, LR: 1.8656365628084123e-05, Loss: 683.2030029296875
2024-08-03T08:04:30.551876714Z 
 10%|▉         | 905/9500 [3:07:00<30:15:40, 12.67s/it]08/03/2024 01:04:30 - INFO - __main__ -   Step: 905, LR: 1.8654195084396843e-05, Loss: 684.8403930664062
2024-08-03T08:04:43.142785327Z 
 10%|▉         | 906/9500 [3:07:13<30:11:50, 12.65s/it]08/03/2024 01:04:43 - INFO - __main__ -   Step: 906, LR: 1.8652024540709566e-05, Loss: 655.0647583007812
2024-08-03T08:04:56.148488352Z 
 10%|▉         | 907/9500 [3:07:26<30:26:55, 12.76s/it]08/03/2024 01:04:56 - INFO - __main__ -   Step: 907, LR: 1.8649853997022286e-05, Loss: 692.199462890625
2024-08-03T08:05:09.728729914Z 
 10%|▉         | 908/9500 [3:07:39<31:02:07, 13.00s/it]08/03/2024 01:05:09 - INFO - __main__ -   Step: 908, LR: 1.864768345333501e-05, Loss: 716.8869018554688
2024-08-03T08:05:22.301022908Z 
 10%|▉         | 909/9500 [3:07:52<30:43:22, 12.87s/it]08/03/2024 01:05:22 - INFO - __main__ -   Step: 909, LR: 1.864551290964773e-05, Loss: 747.5015869140625
2024-08-03T08:05:34.678455015Z 
 10%|▉         | 910/9500 [3:08:04<30:21:49, 12.73s/it]08/03/2024 01:05:34 - INFO - __main__ -   Step: 910, LR: 1.864334236596045e-05, Loss: 726.0423583984375
2024-08-03T08:05:47.807584123Z 
 10%|▉         | 911/9500 [3:08:17<30:38:56, 12.85s/it]08/03/2024 01:05:47 - INFO - __main__ -   Step: 911, LR: 1.8641171822273172e-05, Loss: 586.6616821289062
2024-08-03T08:06:00.479741852Z 
 10%|▉         | 912/9500 [3:08:30<30:31:15, 12.79s/it]08/03/2024 01:06:00 - INFO - __main__ -   Step: 912, LR: 1.8639001278585892e-05, Loss: 620.7532958984375
2024-08-03T08:06:13.305609507Z 
 10%|▉         | 913/9500 [3:08:43<30:32:22, 12.80s/it]08/03/2024 01:06:13 - INFO - __main__ -   Step: 913, LR: 1.8636830734898615e-05, Loss: 621.1403198242188
2024-08-03T08:06:25.795342785Z 
 10%|▉         | 914/9500 [3:08:55<30:18:44, 12.71s/it]08/03/2024 01:06:25 - INFO - __main__ -   Step: 914, LR: 1.8634660191211332e-05, Loss: 485.67803955078125
2024-08-03T08:06:38.783869792Z 
 10%|▉         | 915/9500 [3:09:08<30:30:28, 12.79s/it]08/03/2024 01:06:38 - INFO - __main__ -   Step: 915, LR: 1.8632489647524055e-05, Loss: 743.3848876953125
2024-08-03T08:06:51.557703671Z 
 10%|▉         | 916/9500 [3:09:21<30:29:26, 12.79s/it]08/03/2024 01:06:51 - INFO - __main__ -   Step: 916, LR: 1.8630319103836775e-05, Loss: 576.258056640625
2024-08-03T08:07:05.071266989Z 
 10%|▉         | 917/9500 [3:09:35<31:00:22, 13.01s/it]08/03/2024 01:07:05 - INFO - __main__ -   Step: 917, LR: 1.86281485601495e-05, Loss: 594.7640991210938
2024-08-03T08:07:18.024562363Z 
 10%|▉         | 918/9500 [3:09:47<30:57:57, 12.99s/it]08/03/2024 01:07:18 - INFO - __main__ -   Step: 918, LR: 1.8625978016462218e-05, Loss: 745.1307983398438
2024-08-03T08:07:31.077267554Z 
 10%|▉         | 919/9500 [3:10:01<31:00:26, 13.01s/it]08/03/2024 01:07:31 - INFO - __main__ -   Step: 919, LR: 1.8623807472774938e-05, Loss: 690.5862426757812
2024-08-03T08:07:44.090412862Z 
 10%|▉         | 920/9500 [3:10:14<31:00:24, 13.01s/it]08/03/2024 01:07:44 - INFO - __main__ -   Step: 920, LR: 1.862163692908766e-05, Loss: 610.3355102539062
2024-08-03T08:07:56.521892528Z 
 10%|▉         | 921/9500 [3:10:26<30:35:24, 12.84s/it]08/03/2024 01:07:56 - INFO - __main__ -   Step: 921, LR: 1.861946638540038e-05, Loss: 695.5458984375
2024-08-03T08:08:09.376343468Z 
 10%|▉         | 922/9500 [3:10:39<30:35:57, 12.84s/it]08/03/2024 01:08:09 - INFO - __main__ -   Step: 922, LR: 1.8617295841713104e-05, Loss: 681.6514892578125
2024-08-03T08:08:22.669959258Z 
 10%|▉         | 923/9500 [3:10:52<30:55:06, 12.98s/it]08/03/2024 01:08:22 - INFO - __main__ -   Step: 923, LR: 1.8615125298025824e-05, Loss: 749.47119140625
2024-08-03T08:08:34.905179924Z 
 10%|▉         | 924/9500 [3:11:04<30:23:04, 12.75s/it]08/03/2024 01:08:34 - INFO - __main__ -   Step: 924, LR: 1.8612954754338544e-05, Loss: 477.51458740234375
2024-08-03T08:08:47.544873312Z 
 10%|▉         | 925/9500 [3:11:17<30:17:55, 12.72s/it]08/03/2024 01:08:47 - INFO - __main__ -   Step: 925, LR: 1.8610784210651267e-05, Loss: 623.1087036132812
2024-08-03T08:09:00.727555980Z 
 10%|▉         | 926/9500 [3:11:30<30:37:32, 12.86s/it]08/03/2024 01:09:00 - INFO - __main__ -   Step: 926, LR: 1.8608613666963987e-05, Loss: 748.770751953125
2024-08-03T08:09:13.461811851Z 
 10%|▉         | 927/9500 [3:11:43<30:31:59, 12.82s/it]08/03/2024 01:09:13 - INFO - __main__ -   Step: 927, LR: 1.860644312327671e-05, Loss: 812.0148315429688
2024-08-03T08:09:26.236679804Z 
 10%|▉         | 928/9500 [3:11:56<30:29:46, 12.81s/it]08/03/2024 01:09:26 - INFO - __main__ -   Step: 928, LR: 1.8604272579589427e-05, Loss: 623.7017822265625
2024-08-03T08:09:39.153209564Z 
 10%|▉         | 929/9500 [3:12:09<30:34:13, 12.84s/it]08/03/2024 01:09:39 - INFO - __main__ -   Step: 929, LR: 1.860210203590215e-05, Loss: 544.0756225585938
2024-08-03T08:09:52.047266521Z 
 10%|▉         | 930/9500 [3:12:21<30:36:19, 12.86s/it]08/03/2024 01:09:52 - INFO - __main__ -   Step: 930, LR: 1.859993149221487e-05, Loss: 727.24072265625
2024-08-03T08:10:04.638088714Z 
 10%|▉         | 931/9500 [3:12:34<30:24:43, 12.78s/it]08/03/2024 01:10:04 - INFO - __main__ -   Step: 931, LR: 1.8597760948527593e-05, Loss: 644.288818359375
2024-08-03T08:10:17.630221199Z 
 10%|▉         | 932/9500 [3:12:47<30:33:43, 12.84s/it]08/03/2024 01:10:17 - INFO - __main__ -   Step: 932, LR: 1.8595590404840313e-05, Loss: 661.5323486328125
2024-08-03T08:10:30.831677805Z 
 10%|▉         | 933/9500 [3:13:00<30:48:57, 12.95s/it]08/03/2024 01:10:30 - INFO - __main__ -   Step: 933, LR: 1.8593419861153033e-05, Loss: 696.5723876953125
2024-08-03T08:10:43.289365234Z 
 10%|▉         | 934/9500 [3:13:13<30:27:40, 12.80s/it]08/03/2024 01:10:43 - INFO - __main__ -   Step: 934, LR: 1.8591249317465756e-05, Loss: 623.7330322265625
2024-08-03T08:10:56.229020321Z 
 10%|▉         | 935/9500 [3:13:26<30:33:22, 12.84s/it]08/03/2024 01:10:56 - INFO - __main__ -   Step: 935, LR: 1.8589078773778476e-05, Loss: 701.783447265625
2024-08-03T08:11:08.765188457Z 
 10%|▉         | 936/9500 [3:13:38<30:20:00, 12.75s/it]08/03/2024 01:11:08 - INFO - __main__ -   Step: 936, LR: 1.85869082300912e-05, Loss: 468.93701171875
2024-08-03T08:11:21.278771201Z 
 10%|▉         | 937/9500 [3:13:51<30:09:38, 12.68s/it]08/03/2024 01:11:21 - INFO - __main__ -   Step: 937, LR: 1.858473768640392e-05, Loss: 637.0530395507812
2024-08-03T08:11:34.165211445Z 
 10%|▉         | 938/9500 [3:14:04<30:18:14, 12.74s/it]08/03/2024 01:11:34 - INFO - __main__ -   Step: 938, LR: 1.8582567142716643e-05, Loss: 755.31640625
2024-08-03T08:11:46.711337574Z 
 10%|▉         | 939/9500 [3:14:16<30:09:39, 12.68s/it]08/03/2024 01:11:46 - INFO - __main__ -   Step: 939, LR: 1.8580396599029362e-05, Loss: 620.038818359375
2024-08-03T08:11:59.387354462Z 
 10%|▉         | 940/9500 [3:14:29<30:09:09, 12.68s/it]08/03/2024 01:11:59 - INFO - __main__ -   Step: 940, LR: 1.8578226055342082e-05, Loss: 735.0814819335938
2024-08-03T08:12:12.270585561Z 
 10%|▉         | 941/9500 [3:14:42<30:17:36, 12.74s/it]08/03/2024 01:12:12 - INFO - __main__ -   Step: 941, LR: 1.8576055511654806e-05, Loss: 747.328857421875
2024-08-03T08:12:26.001917595Z 
 10%|▉         | 942/9500 [3:14:55<30:59:41, 13.04s/it]08/03/2024 01:12:26 - INFO - __main__ -   Step: 942, LR: 1.8573884967967522e-05, Loss: 517.6998291015625
2024-08-03T08:12:38.554233855Z 
 10%|▉         | 943/9500 [3:15:08<30:38:41, 12.89s/it]08/03/2024 01:12:38 - INFO - __main__ -   Step: 943, LR: 1.8571714424280245e-05, Loss: 532.4199829101562
2024-08-03T08:12:51.467552870Z 
 10%|▉         | 944/9500 [3:15:21<30:39:24, 12.90s/it]08/03/2024 01:12:51 - INFO - __main__ -   Step: 944, LR: 1.8569543880592965e-05, Loss: 772.300537109375
2024-08-03T08:13:04.696682468Z 
 10%|▉         | 945/9500 [3:15:34<30:53:18, 13.00s/it]08/03/2024 01:13:04 - INFO - __main__ -   Step: 945, LR: 1.856737333690569e-05, Loss: 873.5951538085938
2024-08-03T08:13:17.442898122Z 
 10%|▉         | 946/9500 [3:15:47<30:42:19, 12.92s/it]08/03/2024 01:13:17 - INFO - __main__ -   Step: 946, LR: 1.8565202793218408e-05, Loss: 819.5807495117188
2024-08-03T08:13:30.150868068Z 
 10%|▉         | 947/9500 [3:16:00<30:32:55, 12.86s/it]08/03/2024 01:13:30 - INFO - __main__ -   Step: 947, LR: 1.856303224953113e-05, Loss: 724.211669921875
2024-08-03T08:13:43.913885175Z 
 10%|▉         | 948/9500 [3:16:13<31:11:23, 13.13s/it]08/03/2024 01:13:43 - INFO - __main__ -   Step: 948, LR: 1.856086170584385e-05, Loss: 762.97607421875
2024-08-03T08:13:56.633404456Z 
 10%|▉         | 949/9500 [3:16:26<30:53:39, 13.01s/it]08/03/2024 01:13:56 - INFO - __main__ -   Step: 949, LR: 1.855869116215657e-05, Loss: 672.4511108398438
2024-08-03T08:14:09.466416638Z 
 10%|█         | 950/9500 [3:16:39<30:46:00, 12.95s/it]08/03/2024 01:14:09 - INFO - __main__ -   Step: 950, LR: 1.8556520618469295e-05, Loss: 726.5078735351562
2024-08-03T08:14:22.469537450Z 
 10%|█         | 951/9500 [3:16:52<30:47:52, 12.97s/it]08/03/2024 01:14:22 - INFO - __main__ -   Step: 951, LR: 1.8554350074782014e-05, Loss: 568.5682373046875
2024-08-03T08:14:35.575465808Z 
 10%|█         | 952/9500 [3:17:05<30:53:31, 13.01s/it]08/03/2024 01:14:35 - INFO - __main__ -   Step: 952, LR: 1.8552179531094738e-05, Loss: 564.4234008789062
2024-08-03T08:14:48.458608666Z 
 10%|█         | 953/9500 [3:17:18<30:47:51, 12.97s/it]08/03/2024 01:14:48 - INFO - __main__ -   Step: 953, LR: 1.8550008987407458e-05, Loss: 702.8212890625
2024-08-03T08:15:01.413302196Z 
 10%|█         | 954/9500 [3:17:31<30:46:54, 12.97s/it]08/03/2024 01:15:01 - INFO - __main__ -   Step: 954, LR: 1.8547838443720177e-05, Loss: 560.5126953125
2024-08-03T08:15:13.894845037Z 
 10%|█         | 955/9500 [3:17:43<30:25:57, 12.82s/it]08/03/2024 01:15:13 - INFO - __main__ -   Step: 955, LR: 1.85456679000329e-05, Loss: 683.4965209960938
2024-08-03T08:15:26.415202074Z 
 10%|█         | 956/9500 [3:17:56<30:12:53, 12.73s/it]08/03/2024 01:15:26 - INFO - __main__ -   Step: 956, LR: 1.854349735634562e-05, Loss: 757.4067993164062
2024-08-03T08:15:39.430202635Z 
 10%|█         | 957/9500 [3:18:09<30:24:48, 12.82s/it]08/03/2024 01:15:39 - INFO - __main__ -   Step: 957, LR: 1.854132681265834e-05, Loss: 610.849609375
2024-08-03T08:15:51.807475827Z 
 10%|█         | 958/9500 [3:18:21<30:05:51, 12.68s/it]08/03/2024 01:15:51 - INFO - __main__ -   Step: 958, LR: 1.853915626897106e-05, Loss: 653.815673828125
2024-08-03T08:16:04.508839246Z 
 10%|█         | 959/9500 [3:18:34<30:06:21, 12.69s/it]08/03/2024 01:16:04 - INFO - __main__ -   Step: 959, LR: 1.8536985725283783e-05, Loss: 757.7366943359375
2024-08-03T08:16:17.490647788Z 
 10%|█         | 960/9500 [3:18:47<30:18:37, 12.78s/it]08/03/2024 01:16:17 - INFO - __main__ -   Step: 960, LR: 1.8534815181596503e-05, Loss: 852.8426513671875
2024-08-03T08:16:30.154222791Z 
 10%|█         | 961/9500 [3:19:00<30:13:33, 12.74s/it]08/03/2024 01:16:30 - INFO - __main__ -   Step: 961, LR: 1.8532644637909227e-05, Loss: 730.486572265625
2024-08-03T08:16:43.000999458Z 
 10%|█         | 962/9500 [3:19:12<30:17:45, 12.77s/it]08/03/2024 01:16:43 - INFO - __main__ -   Step: 962, LR: 1.8530474094221946e-05, Loss: 799.63427734375
2024-08-03T08:16:56.101508669Z 
 10%|█         | 963/9500 [3:19:26<30:31:29, 12.87s/it]08/03/2024 01:16:56 - INFO - __main__ -   Step: 963, LR: 1.8528303550534666e-05, Loss: 617.427734375
2024-08-03T08:17:09.288216777Z 
 10%|█         | 964/9500 [3:19:39<30:44:41, 12.97s/it]08/03/2024 01:17:09 - INFO - __main__ -   Step: 964, LR: 1.852613300684739e-05, Loss: 734.1448364257812
2024-08-03T08:17:22.065533281Z 
 10%|█         | 965/9500 [3:19:52<30:36:24, 12.91s/it]08/03/2024 01:17:22 - INFO - __main__ -   Step: 965, LR: 1.852396246316011e-05, Loss: 807.3134765625
2024-08-03T08:17:35.010332061Z 
 10%|█         | 966/9500 [3:20:04<30:37:41, 12.92s/it]08/03/2024 01:17:35 - INFO - __main__ -   Step: 966, LR: 1.8521791919472833e-05, Loss: 541.131591796875
2024-08-03T08:17:47.868063127Z 
 10%|█         | 967/9500 [3:20:17<30:34:48, 12.90s/it]08/03/2024 01:17:47 - INFO - __main__ -   Step: 967, LR: 1.8519621375785553e-05, Loss: 602.2786865234375
2024-08-03T08:18:00.153496871Z 
 10%|█         | 968/9500 [3:20:30<30:08:18, 12.72s/it]08/03/2024 01:18:00 - INFO - __main__ -   Step: 968, LR: 1.8517450832098272e-05, Loss: 637.4425048828125
2024-08-03T08:18:13.100635546Z 
 10%|█         | 969/9500 [3:20:43<30:17:56, 12.79s/it]08/03/2024 01:18:13 - INFO - __main__ -   Step: 969, LR: 1.8515280288410996e-05, Loss: 658.4751586914062
2024-08-03T08:18:25.776934055Z 
 10%|█         | 970/9500 [3:20:55<30:13:01, 12.75s/it]08/03/2024 01:18:25 - INFO - __main__ -   Step: 970, LR: 1.8513109744723716e-05, Loss: 843.9013671875
2024-08-03T08:18:38.340762419Z 
 10%|█         | 971/9500 [3:21:08<30:04:47, 12.70s/it]08/03/2024 01:18:38 - INFO - __main__ -   Step: 971, LR: 1.8510939201036435e-05, Loss: 629.0921020507812
2024-08-03T08:18:51.007877147Z 
 10%|█         | 972/9500 [3:21:20<30:03:18, 12.69s/it]08/03/2024 01:18:51 - INFO - __main__ -   Step: 972, LR: 1.8508768657349155e-05, Loss: 501.14349365234375
2024-08-03T08:19:04.298615961Z 
 10%|█         | 973/9500 [3:21:34<30:28:44, 12.87s/it]08/03/2024 01:19:04 - INFO - __main__ -   Step: 973, LR: 1.850659811366188e-05, Loss: 900.2806396484375
2024-08-03T08:19:16.737573501Z 
 10%|█         | 974/9500 [3:21:46<30:10:19, 12.74s/it]08/03/2024 01:19:16 - INFO - __main__ -   Step: 974, LR: 1.85044275699746e-05, Loss: 666.255859375
2024-08-03T08:19:29.297121348Z 
 10%|█         | 975/9500 [3:21:59<30:02:25, 12.69s/it]08/03/2024 01:19:29 - INFO - __main__ -   Step: 975, LR: 1.850225702628732e-05, Loss: 486.0537414550781
2024-08-03T08:19:41.871312123Z 
 10%|█         | 976/9500 [3:22:11<29:57:28, 12.65s/it]08/03/2024 01:19:41 - INFO - __main__ -   Step: 976, LR: 1.850008648260004e-05, Loss: 736.9159545898438
2024-08-03T08:19:54.702206834Z 
 10%|█         | 977/9500 [3:22:24<30:04:52, 12.71s/it]08/03/2024 01:19:54 - INFO - __main__ -   Step: 977, LR: 1.849791593891276e-05, Loss: 637.1673583984375
2024-08-03T08:20:07.470683285Z 
 10%|█         | 978/9500 [3:22:37<30:07:19, 12.72s/it]08/03/2024 01:20:07 - INFO - __main__ -   Step: 978, LR: 1.8495745395225485e-05, Loss: 597.3640747070312
2024-08-03T08:20:20.401415152Z 
 10%|█         | 979/9500 [3:22:50<30:15:53, 12.79s/it]08/03/2024 01:20:20 - INFO - __main__ -   Step: 979, LR: 1.8493574851538205e-05, Loss: 741.8491821289062
2024-08-03T08:20:33.234719745Z 
 10%|█         | 980/9500 [3:23:03<30:17:40, 12.80s/it]08/03/2024 01:20:33 - INFO - __main__ -   Step: 980, LR: 1.8491404307850928e-05, Loss: 687.1873168945312
2024-08-03T08:20:46.135011484Z 
 10%|█         | 981/9500 [3:23:16<30:21:42, 12.83s/it]08/03/2024 01:20:46 - INFO - __main__ -   Step: 981, LR: 1.8489233764163648e-05, Loss: 525.149169921875
2024-08-03T08:20:58.850917521Z 
 10%|█         | 982/9500 [3:23:28<30:16:37, 12.80s/it]08/03/2024 01:20:58 - INFO - __main__ -   Step: 982, LR: 1.8487063220476367e-05, Loss: 694.5408935546875
2024-08-03T08:21:11.596126109Z 
 10%|█         | 983/9500 [3:23:41<30:14:14, 12.78s/it]08/03/2024 01:21:11 - INFO - __main__ -   Step: 983, LR: 1.848489267678909e-05, Loss: 617.434814453125
2024-08-03T08:21:24.097107360Z 
 10%|█         | 984/9500 [3:23:54<30:02:05, 12.70s/it]08/03/2024 01:21:24 - INFO - __main__ -   Step: 984, LR: 1.848272213310181e-05, Loss: 712.1435546875
2024-08-03T08:21:36.783077102Z 
 10%|█         | 985/9500 [3:24:06<30:01:26, 12.69s/it]08/03/2024 01:21:36 - INFO - __main__ -   Step: 985, LR: 1.848055158941453e-05, Loss: 555.425537109375
2024-08-03T08:21:49.194747932Z 
 10%|█         | 986/9500 [3:24:19<29:49:12, 12.61s/it]08/03/2024 01:21:49 - INFO - __main__ -   Step: 986, LR: 1.847838104572725e-05, Loss: 663.632080078125
2024-08-03T08:22:01.361693648Z 
 10%|█         | 987/9500 [3:24:31<29:30:11, 12.48s/it]08/03/2024 01:22:01 - INFO - __main__ -   Step: 987, LR: 1.8476210502039974e-05, Loss: 825.613525390625
2024-08-03T08:22:14.287064415Z 
 10%|█         | 988/9500 [3:24:44<29:49:04, 12.61s/it]08/03/2024 01:22:14 - INFO - __main__ -   Step: 988, LR: 1.8474039958352693e-05, Loss: 767.2216796875
2024-08-03T08:22:26.420566977Z 
 10%|█         | 989/9500 [3:24:56<29:28:33, 12.47s/it]08/03/2024 01:22:26 - INFO - __main__ -   Step: 989, LR: 1.8471869414665417e-05, Loss: 668.7801513671875
2024-08-03T08:22:38.486564279Z 
 10%|█         | 990/9500 [3:25:08<29:11:16, 12.35s/it]08/03/2024 01:22:38 - INFO - __main__ -   Step: 990, LR: 1.8469698870978137e-05, Loss: 594.0677490234375
2024-08-03T08:22:51.052135575Z 
 10%|█         | 991/9500 [3:25:20<29:20:19, 12.41s/it]08/03/2024 01:22:51 - INFO - __main__ -   Step: 991, LR: 1.8467528327290856e-05, Loss: 585.555908203125
2024-08-03T08:23:03.413845684Z 
 10%|█         | 992/9500 [3:25:33<29:17:57, 12.40s/it]08/03/2024 01:23:03 - INFO - __main__ -   Step: 992, LR: 1.846535778360358e-05, Loss: 872.9381103515625
2024-08-03T08:23:15.937168192Z 
 10%|█         | 993/9500 [3:25:45<29:23:06, 12.44s/it]08/03/2024 01:23:15 - INFO - __main__ -   Step: 993, LR: 1.84631872399163e-05, Loss: 565.80810546875
2024-08-03T08:23:28.573457103Z 
 10%|█         | 994/9500 [3:25:58<29:31:26, 12.50s/it]08/03/2024 01:23:28 - INFO - __main__ -   Step: 994, LR: 1.8461016696229023e-05, Loss: 640.7679443359375
2024-08-03T08:23:40.992392702Z 
 10%|█         | 995/9500 [3:26:10<29:27:59, 12.47s/it]08/03/2024 01:23:40 - INFO - __main__ -   Step: 995, LR: 1.8458846152541743e-05, Loss: 572.03173828125
2024-08-03T08:23:53.283142859Z 
 10%|█         | 996/9500 [3:26:23<29:20:02, 12.42s/it]08/03/2024 01:23:53 - INFO - __main__ -   Step: 996, LR: 1.8456675608854463e-05, Loss: 812.9053955078125
2024-08-03T08:24:06.123804230Z 
 10%|█         | 997/9500 [3:26:36<29:37:48, 12.54s/it]08/03/2024 01:24:06 - INFO - __main__ -   Step: 997, LR: 1.8454505065167186e-05, Loss: 717.95654296875
2024-08-03T08:24:18.275254462Z 
 11%|█         | 998/9500 [3:26:48<29:20:53, 12.43s/it]08/03/2024 01:24:18 - INFO - __main__ -   Step: 998, LR: 1.8452334521479906e-05, Loss: 733.5176391601562
2024-08-03T08:24:30.245103546Z 
 11%|█         | 999/9500 [3:27:00<29:01:15, 12.29s/it]08/03/2024 01:24:30 - INFO - __main__ -   Step: 999, LR: 1.8450163977792626e-05, Loss: 555.6267700195312
2024-08-03T08:24:42.569491607Z 
 11%|█         | 1000/9500 [3:27:12<29:02:30, 12.30s/it]08/03/2024 01:24:42 - INFO - __main__ -   Step: 1000, LR: 1.8447993434105345e-05, Loss: 634.9744262695312
2024-08-03T08:24:54.692085519Z 
 11%|█         | 1001/9500 [3:27:24<28:54:45, 12.25s/it]08/03/2024 01:24:54 - INFO - __main__ -   Step: 1001, LR: 1.844582289041807e-05, Loss: 696.3159790039062
2024-08-03T08:25:06.535483561Z 
 11%|█         | 1002/9500 [3:27:36<28:37:24, 12.13s/it]08/03/2024 01:25:06 - INFO - __main__ -   Step: 1002, LR: 1.844365234673079e-05, Loss: 698.4991455078125
2024-08-03T08:25:19.107133670Z 
 11%|█         | 1003/9500 [3:27:49<28:56:09, 12.26s/it]08/03/2024 01:25:19 - INFO - __main__ -   Step: 1003, LR: 1.8441481803043512e-05, Loss: 671.6441650390625
2024-08-03T08:25:31.291593257Z 
 11%|█         | 1004/9500 [3:28:01<28:52:45, 12.24s/it]08/03/2024 01:25:31 - INFO - __main__ -   Step: 1004, LR: 1.843931125935623e-05, Loss: 602.8018798828125
2024-08-03T08:25:43.819875538Z 
 11%|█         | 1005/9500 [3:28:13<29:04:55, 12.32s/it]08/03/2024 01:25:43 - INFO - __main__ -   Step: 1005, LR: 1.843714071566895e-05, Loss: 798.53173828125
2024-08-03T08:25:56.033795718Z 
 11%|█         | 1006/9500 [3:28:25<29:00:01, 12.29s/it]08/03/2024 01:25:56 - INFO - __main__ -   Step: 1006, LR: 1.8434970171981675e-05, Loss: 561.6405029296875
2024-08-03T08:26:08.183883800Z 
 11%|█         | 1007/9500 [3:28:38<28:53:50, 12.25s/it]08/03/2024 01:26:08 - INFO - __main__ -   Step: 1007, LR: 1.8432799628294395e-05, Loss: 715.302001953125
2024-08-03T08:26:20.273143713Z 
 11%|█         | 1008/9500 [3:28:50<28:46:51, 12.20s/it]08/03/2024 01:26:20 - INFO - __main__ -   Step: 1008, LR: 1.8430629084607118e-05, Loss: 623.4044189453125
2024-08-03T08:26:32.635344780Z 
 11%|█         | 1009/9500 [3:29:02<28:53:29, 12.25s/it]08/03/2024 01:26:32 - INFO - __main__ -   Step: 1009, LR: 1.8428458540919838e-05, Loss: 612.5126342773438
2024-08-03T08:26:44.965974103Z 
 11%|█         | 1010/9500 [3:29:14<28:56:43, 12.27s/it]08/03/2024 01:26:44 - INFO - __main__ -   Step: 1010, LR: 1.8426287997232558e-05, Loss: 660.6895751953125
2024-08-03T08:26:57.346875770Z 
 11%|█         | 1011/9500 [3:29:27<29:01:04, 12.31s/it]08/03/2024 01:26:57 - INFO - __main__ -   Step: 1011, LR: 1.842411745354528e-05, Loss: 593.6776123046875
2024-08-03T08:27:09.929656605Z 
 11%|█         | 1012/9500 [3:29:39<29:12:37, 12.39s/it]08/03/2024 01:27:09 - INFO - __main__ -   Step: 1012, LR: 1.8421946909858e-05, Loss: 640.59326171875
2024-08-03T08:27:22.147008571Z 
 11%|█         | 1013/9500 [3:29:52<29:05:08, 12.34s/it]08/03/2024 01:27:22 - INFO - __main__ -   Step: 1013, LR: 1.841977636617072e-05, Loss: 700.2235107421875
2024-08-03T08:27:34.257195678Z 
 11%|█         | 1014/9500 [3:30:04<28:55:16, 12.27s/it]08/03/2024 01:27:34 - INFO - __main__ -   Step: 1014, LR: 1.841760582248344e-05, Loss: 589.1827392578125
2024-08-03T08:27:46.787252768Z 
 11%|█         | 1015/9500 [3:30:16<29:06:08, 12.35s/it]08/03/2024 01:27:46 - INFO - __main__ -   Step: 1015, LR: 1.8415435278796164e-05, Loss: 745.6657104492188
2024-08-03T08:27:58.861881961Z 
 11%|█         | 1016/9500 [3:30:28<28:54:21, 12.27s/it]08/03/2024 01:27:58 - INFO - __main__ -   Step: 1016, LR: 1.8413264735108884e-05, Loss: 655.1329345703125
2024-08-03T08:28:10.966956767Z 
 11%|█         | 1017/9500 [3:30:40<28:47:20, 12.22s/it]08/03/2024 01:28:10 - INFO - __main__ -   Step: 1017, LR: 1.8411094191421607e-05, Loss: 599.262451171875
2024-08-03T08:28:23.542445768Z 
 11%|█         | 1018/9500 [3:30:53<29:02:19, 12.32s/it]08/03/2024 01:28:23 - INFO - __main__ -   Step: 1018, LR: 1.8408923647734327e-05, Loss: 538.0007934570312
2024-08-03T08:28:35.888133003Z 
 11%|█         | 1019/9500 [3:31:05<29:03:00, 12.33s/it]08/03/2024 01:28:35 - INFO - __main__ -   Step: 1019, LR: 1.8406753104047047e-05, Loss: 789.061767578125
2024-08-03T08:28:48.242260607Z 
 11%|█         | 1020/9500 [3:31:18<29:03:46, 12.34s/it]08/03/2024 01:28:48 - INFO - __main__ -   Step: 1020, LR: 1.840458256035977e-05, Loss: 661.6148681640625
2024-08-03T08:29:00.708809816Z 
 11%|█         | 1021/9500 [3:31:30<29:09:00, 12.38s/it]08/03/2024 01:29:00 - INFO - __main__ -   Step: 1021, LR: 1.840241201667249e-05, Loss: 494.01416015625
2024-08-03T08:29:12.988794832Z 
 11%|█         | 1022/9500 [3:31:42<29:04:42, 12.35s/it]08/03/2024 01:29:12 - INFO - __main__ -   Step: 1022, LR: 1.8400241472985213e-05, Loss: 853.9723510742188
2024-08-03T08:29:25.061434660Z 
 11%|█         | 1023/9500 [3:31:54<28:52:51, 12.27s/it]08/03/2024 01:29:25 - INFO - __main__ -   Step: 1023, LR: 1.8398070929297933e-05, Loss: 722.3643798828125
2024-08-03T08:29:37.655764295Z 
 11%|█         | 1024/9500 [3:32:07<29:06:36, 12.36s/it]08/03/2024 01:29:37 - INFO - __main__ -   Step: 1024, LR: 1.8395900385610656e-05, Loss: 882.2139892578125
2024-08-03T08:29:50.184700224Z 
 11%|█         | 1025/9500 [3:32:20<29:13:23, 12.41s/it]08/03/2024 01:29:50 - INFO - __main__ -   Step: 1025, LR: 1.8393729841923376e-05, Loss: 757.8972778320312
2024-08-03T08:30:02.513667712Z 
 11%|█         | 1026/9500 [3:32:32<29:09:36, 12.39s/it]08/03/2024 01:30:02 - INFO - __main__ -   Step: 1026, LR: 1.8391559298236096e-05, Loss: 902.6826782226562
2024-08-03T08:30:14.709474084Z 
 11%|█         | 1027/9500 [3:32:44<29:01:15, 12.33s/it]08/03/2024 01:30:14 - INFO - __main__ -   Step: 1027, LR: 1.8389388754548816e-05, Loss: 771.73388671875
2024-08-03T08:30:27.307600631Z 
 11%|█         | 1028/9500 [3:32:57<29:12:23, 12.41s/it]08/03/2024 01:30:27 - INFO - __main__ -   Step: 1028, LR: 1.8387218210861536e-05, Loss: 661.1575927734375
2024-08-03T08:30:39.602377836Z 
 11%|█         | 1029/9500 [3:33:09<29:07:16, 12.38s/it]08/03/2024 01:30:39 - INFO - __main__ -   Step: 1029, LR: 1.838504766717426e-05, Loss: 675.440185546875
2024-08-03T08:30:51.449020049Z 
 11%|█         | 1030/9500 [3:33:21<28:44:39, 12.22s/it]08/03/2024 01:30:51 - INFO - __main__ -   Step: 1030, LR: 1.838287712348698e-05, Loss: 648.3720092773438
2024-08-03T08:31:03.962054498Z 
 11%|█         | 1031/9500 [3:33:33<28:56:58, 12.31s/it]08/03/2024 01:31:03 - INFO - __main__ -   Step: 1031, LR: 1.8380706579799702e-05, Loss: 643.5792236328125
2024-08-03T08:31:16.222708528Z 
 11%|█         | 1032/9500 [3:33:46<28:54:51, 12.29s/it]08/03/2024 01:31:16 - INFO - __main__ -   Step: 1032, LR: 1.8378536036112422e-05, Loss: 697.707763671875
2024-08-03T08:31:28.278956574Z 
 11%|█         | 1033/9500 [3:33:58<28:44:39, 12.22s/it]08/03/2024 01:31:28 - INFO - __main__ -   Step: 1033, LR: 1.8376365492425145e-05, Loss: 887.6432495117188
2024-08-03T08:31:40.843324369Z 
 11%|█         | 1034/9500 [3:34:10<28:58:57, 12.32s/it]08/03/2024 01:31:40 - INFO - __main__ -   Step: 1034, LR: 1.8374194948737865e-05, Loss: 584.784912109375
2024-08-03T08:31:52.926349664Z 
 11%|█         | 1035/9500 [3:34:22<28:48:33, 12.25s/it]08/03/2024 01:31:52 - INFO - __main__ -   Step: 1035, LR: 1.8372024405050585e-05, Loss: 629.9099731445312
2024-08-03T08:32:05.288115082Z 
 11%|█         | 1036/9500 [3:34:35<28:52:58, 12.28s/it]08/03/2024 01:32:05 - INFO - __main__ -   Step: 1036, LR: 1.8369853861363308e-05, Loss: 670.2833251953125
2024-08-03T08:32:17.781501318Z 
 11%|█         | 1037/9500 [3:34:47<29:01:36, 12.35s/it]08/03/2024 01:32:17 - INFO - __main__ -   Step: 1037, LR: 1.8367683317676028e-05, Loss: 654.0634765625
2024-08-03T08:32:29.961832924Z 
 11%|█         | 1038/9500 [3:34:59<28:54:19, 12.30s/it]08/03/2024 01:32:29 - INFO - __main__ -   Step: 1038, LR: 1.836551277398875e-05, Loss: 667.9464111328125
2024-08-03T08:32:42.007086145Z 
 11%|█         | 1039/9500 [3:35:11<28:43:28, 12.22s/it]08/03/2024 01:32:42 - INFO - __main__ -   Step: 1039, LR: 1.8363342230301468e-05, Loss: 704.8613891601562
2024-08-03T08:32:54.434669697Z 
 11%|█         | 1040/9500 [3:35:24<28:51:56, 12.28s/it]08/03/2024 01:32:54 - INFO - __main__ -   Step: 1040, LR: 1.836117168661419e-05, Loss: 633.0266723632812
2024-08-03T08:33:06.888097519Z 
 11%|█         | 1041/9500 [3:35:36<28:58:57, 12.33s/it]08/03/2024 01:33:06 - INFO - __main__ -   Step: 1041, LR: 1.835900114292691e-05, Loss: 565.1168212890625
2024-08-03T08:33:18.855420762Z 
 11%|█         | 1042/9500 [3:35:48<28:43:12, 12.22s/it]08/03/2024 01:33:18 - INFO - __main__ -   Step: 1042, LR: 1.8356830599239634e-05, Loss: 573.7685546875
2024-08-03T08:33:31.437797162Z 
 11%|█         | 1043/9500 [3:36:01<28:58:09, 12.33s/it]08/03/2024 01:33:31 - INFO - __main__ -   Step: 1043, LR: 1.8354660055552354e-05, Loss: 635.8681640625
2024-08-03T08:33:43.963563196Z 
 11%|█         | 1044/9500 [3:36:13<29:06:08, 12.39s/it]08/03/2024 01:33:43 - INFO - __main__ -   Step: 1044, LR: 1.8352489511865074e-05, Loss: 748.7257690429688
2024-08-03T08:33:56.428267156Z 
 11%|█         | 1045/9500 [3:36:26<29:09:06, 12.41s/it]08/03/2024 01:33:56 - INFO - __main__ -   Step: 1045, LR: 1.8350318968177797e-05, Loss: 607.578369140625
2024-08-03T08:34:09.244207652Z 
 11%|█         | 1046/9500 [3:36:39<29:25:57, 12.53s/it]08/03/2024 01:34:09 - INFO - __main__ -   Step: 1046, LR: 1.8348148424490517e-05, Loss: 566.7734375
2024-08-03T08:34:21.454270604Z 
 11%|█         | 1047/9500 [3:36:51<29:12:04, 12.44s/it]08/03/2024 01:34:21 - INFO - __main__ -   Step: 1047, LR: 1.834597788080324e-05, Loss: 643.2651977539062
2024-08-03T08:34:33.596478418Z 
 11%|█         | 1048/9500 [3:37:03<28:59:27, 12.35s/it]08/03/2024 01:34:33 - INFO - __main__ -   Step: 1048, LR: 1.834380733711596e-05, Loss: 729.2467651367188
2024-08-03T08:34:46.268548247Z 
 11%|█         | 1049/9500 [3:37:16<29:12:56, 12.45s/it]08/03/2024 01:34:46 - INFO - __main__ -   Step: 1049, LR: 1.834163679342868e-05, Loss: 629.3782958984375
2024-08-03T08:34:58.235623125Z 
 11%|█         | 1050/9500 [3:37:28<28:52:29, 12.30s/it]08/03/2024 01:34:58 - INFO - __main__ -   Step: 1050, LR: 1.8339466249741403e-05, Loss: 736.2548828125
2024-08-03T08:35:10.875309503Z 
 11%|█         | 1051/9500 [3:37:40<29:06:34, 12.40s/it]08/03/2024 01:35:10 - INFO - __main__ -   Step: 1051, LR: 1.8337295706054123e-05, Loss: 656.3004150390625
2024-08-03T08:35:23.152498621Z 
 11%|█         | 1052/9500 [3:37:53<29:01:02, 12.37s/it]08/03/2024 01:35:23 - INFO - __main__ -   Step: 1052, LR: 1.8335125162366846e-05, Loss: 538.0950927734375
2024-08-03T08:35:35.341839461Z 
 11%|█         | 1053/9500 [3:38:05<28:53:24, 12.31s/it]08/03/2024 01:35:35 - INFO - __main__ -   Step: 1053, LR: 1.8332954618679563e-05, Loss: 737.967041015625
2024-08-03T08:35:47.435326623Z 
 11%|█         | 1054/9500 [3:38:17<28:43:57, 12.25s/it]08/03/2024 01:35:47 - INFO - __main__ -   Step: 1054, LR: 1.8330784074992286e-05, Loss: 643.802978515625
2024-08-03T08:36:00.004575470Z 
 11%|█         | 1055/9500 [3:38:29<28:57:21, 12.34s/it]08/03/2024 01:36:00 - INFO - __main__ -   Step: 1055, LR: 1.8328613531305006e-05, Loss: 842.3916015625
2024-08-03T08:36:12.450793739Z 
 11%|█         | 1056/9500 [3:38:42<29:01:25, 12.37s/it]08/03/2024 01:36:12 - INFO - __main__ -   Step: 1056, LR: 1.832644298761773e-05, Loss: 698.0850830078125
2024-08-03T08:36:24.671679108Z 
 11%|█         | 1057/9500 [3:38:54<28:54:48, 12.33s/it]08/03/2024 01:36:24 - INFO - __main__ -   Step: 1057, LR: 1.832427244393045e-05, Loss: 618.17724609375
2024-08-03T08:36:37.401902842Z 
 11%|█         | 1058/9500 [3:39:07<29:11:34, 12.45s/it]08/03/2024 01:36:37 - INFO - __main__ -   Step: 1058, LR: 1.832210190024317e-05, Loss: 674.83349609375
2024-08-03T08:36:49.617964362Z 
 11%|█         | 1059/9500 [3:39:19<29:01:32, 12.38s/it]08/03/2024 01:36:49 - INFO - __main__ -   Step: 1059, LR: 1.8319931356555892e-05, Loss: 675.596435546875
2024-08-03T08:37:01.807639151Z 
 11%|█         | 1060/9500 [3:39:31<28:53:20, 12.32s/it]08/03/2024 01:37:01 - INFO - __main__ -   Step: 1060, LR: 1.8317760812868612e-05, Loss: 667.0802001953125
2024-08-03T08:37:14.126558017Z 
 11%|█         | 1061/9500 [3:39:44<28:52:58, 12.32s/it]08/03/2024 01:37:14 - INFO - __main__ -   Step: 1061, LR: 1.8315590269181335e-05, Loss: 757.5985107421875
2024-08-03T08:37:26.154108165Z 
 11%|█         | 1062/9500 [3:39:56<28:40:23, 12.23s/it]08/03/2024 01:37:26 - INFO - __main__ -   Step: 1062, LR: 1.8313419725494055e-05, Loss: 652.6693725585938
2024-08-03T08:37:38.427977397Z 
 11%|█         | 1063/9500 [3:40:08<28:41:54, 12.25s/it]08/03/2024 01:37:38 - INFO - __main__ -   Step: 1063, LR: 1.8311249181806775e-05, Loss: 538.0618286132812
2024-08-03T08:37:50.816057125Z 
 11%|█         | 1064/9500 [3:40:20<28:47:43, 12.29s/it]08/03/2024 01:37:50 - INFO - __main__ -   Step: 1064, LR: 1.8309078638119498e-05, Loss: 599.171142578125
2024-08-03T08:38:02.942477245Z 
 11%|█         | 1065/9500 [3:40:32<28:40:41, 12.24s/it]08/03/2024 01:38:02 - INFO - __main__ -   Step: 1065, LR: 1.8306908094432218e-05, Loss: 641.8128662109375
2024-08-03T08:38:14.914374677Z 
 11%|█         | 1066/9500 [3:40:44<28:29:12, 12.16s/it]08/03/2024 01:38:14 - INFO - __main__ -   Step: 1066, LR: 1.830473755074494e-05, Loss: 520.2544555664062
2024-08-03T08:38:27.397336202Z 
 11%|█         | 1067/9500 [3:40:57<28:42:38, 12.26s/it]08/03/2024 01:38:27 - INFO - __main__ -   Step: 1067, LR: 1.8302567007057658e-05, Loss: 682.8382568359375
2024-08-03T08:38:39.332420359Z 
 11%|█         | 1068/9500 [3:41:09<28:28:53, 12.16s/it]08/03/2024 01:38:39 - INFO - __main__ -   Step: 1068, LR: 1.830039646337038e-05, Loss: 684.7496337890625
2024-08-03T08:38:51.741723441Z 
 11%|█▏        | 1069/9500 [3:41:21<28:39:11, 12.23s/it]08/03/2024 01:38:51 - INFO - __main__ -   Step: 1069, LR: 1.82982259196831e-05, Loss: 650.1361083984375
2024-08-03T08:39:03.599327789Z 
 11%|█▏        | 1070/9500 [3:41:33<28:23:05, 12.12s/it]08/03/2024 01:39:03 - INFO - __main__ -   Step: 1070, LR: 1.8296055375995824e-05, Loss: 615.4716796875
2024-08-03T08:39:16.171410367Z 
 11%|█▏        | 1071/9500 [3:41:46<28:41:52, 12.26s/it]08/03/2024 01:39:16 - INFO - __main__ -   Step: 1071, LR: 1.8293884832308544e-05, Loss: 774.4517822265625
2024-08-03T08:39:28.942857349Z 
 11%|█▏        | 1072/9500 [3:41:58<29:03:21, 12.41s/it]08/03/2024 01:39:28 - INFO - __main__ -   Step: 1072, LR: 1.8291714288621264e-05, Loss: 828.6537475585938
2024-08-03T08:39:40.998162638Z 
 11%|█▏        | 1073/9500 [3:42:10<28:48:09, 12.30s/it]08/03/2024 01:39:40 - INFO - __main__ -   Step: 1073, LR: 1.8289543744933987e-05, Loss: 683.86962890625
2024-08-03T08:39:53.474273523Z 
 11%|█▏        | 1074/9500 [3:42:23<28:55:10, 12.36s/it]08/03/2024 01:39:53 - INFO - __main__ -   Step: 1074, LR: 1.8287373201246707e-05, Loss: 833.608642578125
2024-08-03T08:40:05.667531296Z 
 11%|█▏        | 1075/9500 [3:42:35<28:48:08, 12.31s/it]08/03/2024 01:40:05 - INFO - __main__ -   Step: 1075, LR: 1.828520265755943e-05, Loss: 685.587646484375
2024-08-03T08:40:18.102095959Z 
 11%|█▏        | 1076/9500 [3:42:48<28:53:17, 12.35s/it]08/03/2024 01:40:18 - INFO - __main__ -   Step: 1076, LR: 1.828303211387215e-05, Loss: 737.043212890625
2024-08-03T08:40:30.802454200Z 
 11%|█▏        | 1077/9500 [3:43:00<29:08:01, 12.45s/it]08/03/2024 01:40:30 - INFO - __main__ -   Step: 1077, LR: 1.828086157018487e-05, Loss: 930.10302734375
2024-08-03T08:40:43.320136870Z 
 11%|█▏        | 1078/9500 [3:43:13<29:10:35, 12.47s/it]08/03/2024 01:40:43 - INFO - __main__ -   Step: 1078, LR: 1.8278691026497593e-05, Loss: 594.1514892578125
2024-08-03T08:40:55.347810905Z 
 11%|█▏        | 1079/9500 [3:43:25<28:51:42, 12.34s/it]08/03/2024 01:40:55 - INFO - __main__ -   Step: 1079, LR: 1.8276520482810313e-05, Loss: 872.0355224609375
2024-08-03T08:41:07.743819562Z 
 11%|█▏        | 1080/9500 [3:43:37<28:53:54, 12.36s/it]08/03/2024 01:41:07 - INFO - __main__ -   Step: 1080, LR: 1.8274349939123036e-05, Loss: 693.4793090820312
2024-08-03T08:41:20.302089118Z 
 11%|█▏        | 1081/9500 [3:43:50<29:02:14, 12.42s/it]08/03/2024 01:41:20 - INFO - __main__ -   Step: 1081, LR: 1.8272179395435753e-05, Loss: 839.7640380859375
2024-08-03T08:41:32.523936069Z 
 11%|█▏        | 1082/9500 [3:44:02<28:53:49, 12.36s/it]08/03/2024 01:41:32 - INFO - __main__ -   Step: 1082, LR: 1.8270008851748476e-05, Loss: 564.2745971679688
2024-08-03T08:41:45.000244031Z 
 11%|█▏        | 1083/9500 [3:44:14<28:58:36, 12.39s/it]08/03/2024 01:41:45 - INFO - __main__ -   Step: 1083, LR: 1.8267838308061196e-05, Loss: 618.355712890625
2024-08-03T08:41:57.289514201Z 
 11%|█▏        | 1084/9500 [3:44:27<28:54:00, 12.36s/it]08/03/2024 01:41:57 - INFO - __main__ -   Step: 1084, LR: 1.826566776437392e-05, Loss: 567.7299194335938
2024-08-03T08:42:09.332244236Z 
 11%|█▏        | 1085/9500 [3:44:39<28:40:22, 12.27s/it]08/03/2024 01:42:09 - INFO - __main__ -   Step: 1085, LR: 1.826349722068664e-05, Loss: 633.5647583007812
2024-08-03T08:42:22.025353608Z 
 11%|█▏        | 1086/9500 [3:44:51<28:58:06, 12.39s/it]08/03/2024 01:42:22 - INFO - __main__ -   Step: 1086, LR: 1.826132667699936e-05, Loss: 704.7791748046875
2024-08-03T08:42:34.598598532Z 
 11%|█▏        | 1087/9500 [3:45:04<29:05:25, 12.45s/it]08/03/2024 01:42:34 - INFO - __main__ -   Step: 1087, LR: 1.8259156133312082e-05, Loss: 604.0397338867188
2024-08-03T08:42:46.742292870Z 
 11%|█▏        | 1088/9500 [3:45:16<28:52:24, 12.36s/it]08/03/2024 01:42:46 - INFO - __main__ -   Step: 1088, LR: 1.8256985589624802e-05, Loss: 670.8446044921875
2024-08-03T08:42:59.230503411Z 
 11%|█▏        | 1089/9500 [3:45:29<28:57:44, 12.40s/it]08/03/2024 01:42:59 - INFO - __main__ -   Step: 1089, LR: 1.8254815045937525e-05, Loss: 555.010986328125
2024-08-03T08:43:11.333432773Z 
 11%|█▏        | 1090/9500 [3:45:41<28:45:12, 12.31s/it]08/03/2024 01:43:11 - INFO - __main__ -   Step: 1090, LR: 1.8252644502250245e-05, Loss: 760.8265380859375
2024-08-03T08:43:23.322831457Z 
 11%|█▏        | 1091/9500 [3:45:53<28:31:35, 12.21s/it]08/03/2024 01:43:23 - INFO - __main__ -   Step: 1091, LR: 1.8250473958562965e-05, Loss: 640.2736206054688
2024-08-03T08:43:35.761534358Z 
 11%|█▏        | 1092/9500 [3:46:05<28:40:54, 12.28s/it]08/03/2024 01:43:35 - INFO - __main__ -   Step: 1092, LR: 1.8248303414875688e-05, Loss: 532.1466064453125
2024-08-03T08:43:47.949650855Z 
 12%|█▏        | 1093/9500 [3:46:17<28:36:48, 12.25s/it]08/03/2024 01:43:47 - INFO - __main__ -   Step: 1093, LR: 1.8246132871188408e-05, Loss: 674.6358032226562
2024-08-03T08:44:00.033542799Z 
 12%|█▏        | 1094/9500 [3:46:29<28:29:30, 12.20s/it]08/03/2024 01:44:00 - INFO - __main__ -   Step: 1094, LR: 1.824396232750113e-05, Loss: 704.562255859375
2024-08-03T08:44:12.473195073Z 
 12%|█▏        | 1095/9500 [3:46:42<28:39:17, 12.27s/it]08/03/2024 01:44:12 - INFO - __main__ -   Step: 1095, LR: 1.8241791783813848e-05, Loss: 803.1926879882812
2024-08-03T08:44:24.995480910Z 
 12%|█▏        | 1096/9500 [3:46:54<28:49:32, 12.35s/it]08/03/2024 01:44:24 - INFO - __main__ -   Step: 1096, LR: 1.823962124012657e-05, Loss: 778.6553955078125
2024-08-03T08:44:37.076272361Z 
 12%|█▏        | 1097/9500 [3:47:07<28:38:07, 12.27s/it]08/03/2024 01:44:37 - INFO - __main__ -   Step: 1097, LR: 1.823745069643929e-05, Loss: 880.5399169921875
2024-08-03T08:44:49.625472567Z 
 12%|█▏        | 1098/9500 [3:47:19<28:49:43, 12.35s/it]08/03/2024 01:44:49 - INFO - __main__ -   Step: 1098, LR: 1.8235280152752014e-05, Loss: 659.1915893554688
2024-08-03T08:45:01.661210783Z 
 12%|█▏        | 1099/9500 [3:47:31<28:36:13, 12.26s/it]08/03/2024 01:45:01 - INFO - __main__ -   Step: 1099, LR: 1.8233109609064734e-05, Loss: 669.1710205078125
2024-08-03T08:45:14.030342283Z 
 12%|█▏        | 1100/9500 [3:47:43<28:40:42, 12.29s/it]08/03/2024 01:45:14 - INFO - __main__ -   Step: 1100, LR: 1.8230939065377454e-05, Loss: 513.6604614257812
2024-08-03T08:45:26.626600562Z 
 12%|█▏        | 1101/9500 [3:47:56<28:53:20, 12.38s/it]08/03/2024 01:45:26 - INFO - __main__ -   Step: 1101, LR: 1.8228768521690177e-05, Loss: 734.436767578125
2024-08-03T08:45:39.194690917Z 
 12%|█▏        | 1102/9500 [3:48:09<29:00:56, 12.44s/it]08/03/2024 01:45:39 - INFO - __main__ -   Step: 1102, LR: 1.8226597978002897e-05, Loss: 653.2352294921875
2024-08-03T08:45:51.356482913Z 
 12%|█▏        | 1103/9500 [3:48:21<28:49:06, 12.36s/it]08/03/2024 01:45:51 - INFO - __main__ -   Step: 1103, LR: 1.822442743431562e-05, Loss: 654.1016845703125
2024-08-03T08:46:03.929783976Z 
 12%|█▏        | 1104/9500 [3:48:33<28:58:03, 12.42s/it]08/03/2024 01:46:03 - INFO - __main__ -   Step: 1104, LR: 1.822225689062834e-05, Loss: 682.81591796875
2024-08-03T08:46:16.147218928Z 
 12%|█▏        | 1105/9500 [3:48:46<28:49:20, 12.36s/it]08/03/2024 01:46:16 - INFO - __main__ -   Step: 1105, LR: 1.822008634694106e-05, Loss: 672.953125
2024-08-03T08:46:28.153774820Z 
 12%|█▏        | 1106/9500 [3:48:58<28:34:17, 12.25s/it]08/03/2024 01:46:28 - INFO - __main__ -   Step: 1106, LR: 1.8217915803253783e-05, Loss: 768.4597778320312
2024-08-03T08:46:40.821848975Z 
 12%|█▏        | 1107/9500 [3:49:10<28:51:28, 12.38s/it]08/03/2024 01:46:40 - INFO - __main__ -   Step: 1107, LR: 1.8215745259566503e-05, Loss: 910.9175415039062
2024-08-03T08:46:52.863753442Z 
 12%|█▏        | 1108/9500 [3:49:22<28:37:10, 12.28s/it]08/03/2024 01:46:52 - INFO - __main__ -   Step: 1108, LR: 1.8213574715879226e-05, Loss: 634.2501831054688
2024-08-03T08:47:05.287921862Z 
 12%|█▏        | 1109/9500 [3:49:35<28:43:07, 12.32s/it]08/03/2024 01:47:05 - INFO - __main__ -   Step: 1109, LR: 1.8211404172191943e-05, Loss: 574.4744873046875
2024-08-03T08:47:17.967415484Z 
 12%|█▏        | 1110/9500 [3:49:47<28:57:56, 12.43s/it]08/03/2024 01:47:17 - INFO - __main__ -   Step: 1110, LR: 1.8209233628504666e-05, Loss: 693.6182861328125
2024-08-03T08:47:30.168680770Z 
 12%|█▏        | 1111/9500 [3:50:00<28:48:12, 12.36s/it]08/03/2024 01:47:30 - INFO - __main__ -   Step: 1111, LR: 1.8207063084817386e-05, Loss: 650.777099609375
2024-08-03T08:47:42.160665280Z 
 12%|█▏        | 1112/9500 [3:50:12<28:32:32, 12.25s/it]08/03/2024 01:47:42 - INFO - __main__ -   Step: 1112, LR: 1.820489254113011e-05, Loss: 670.5292358398438
2024-08-03T08:47:54.159725237Z 
 12%|█▏        | 1113/9500 [3:50:24<28:21:48, 12.17s/it]08/03/2024 01:47:54 - INFO - __main__ -   Step: 1113, LR: 1.820272199744283e-05, Loss: 671.904052734375
2024-08-03T08:48:06.682056858Z 
 12%|█▏        | 1114/9500 [3:50:36<28:36:10, 12.28s/it]08/03/2024 01:48:06 - INFO - __main__ -   Step: 1114, LR: 1.820055145375555e-05, Loss: 731.6448364257812
2024-08-03T08:48:19.121775486Z 
 12%|█▏        | 1115/9500 [3:50:49<28:42:43, 12.33s/it]08/03/2024 01:48:19 - INFO - __main__ -   Step: 1115, LR: 1.8198380910068272e-05, Loss: 725.617431640625
2024-08-03T08:48:31.210921169Z 
 12%|█▏        | 1116/9500 [3:51:01<28:32:32, 12.26s/it]08/03/2024 01:48:31 - INFO - __main__ -   Step: 1116, LR: 1.8196210366380992e-05, Loss: 628.15673828125
2024-08-03T08:48:43.909643131Z 
 12%|█▏        | 1117/9500 [3:51:13<28:50:54, 12.39s/it]08/03/2024 01:48:43 - INFO - __main__ -   Step: 1117, LR: 1.8194039822693715e-05, Loss: 699.150390625
2024-08-03T08:48:55.920716960Z 
 12%|█▏        | 1118/9500 [3:51:25<28:34:52, 12.28s/it]08/03/2024 01:48:55 - INFO - __main__ -   Step: 1118, LR: 1.8191869279006435e-05, Loss: 599.7708740234375
2024-08-03T08:49:07.956834152Z 
 12%|█▏        | 1119/9500 [3:51:37<28:24:38, 12.20s/it]08/03/2024 01:49:07 - INFO - __main__ -   Step: 1119, LR: 1.8189698735319155e-05, Loss: 707.0633544921875
2024-08-03T08:49:20.527118808Z 
 12%|█▏        | 1120/9500 [3:51:50<28:39:47, 12.31s/it]08/03/2024 01:49:20 - INFO - __main__ -   Step: 1120, LR: 1.818752819163188e-05, Loss: 568.5616455078125
2024-08-03T08:49:32.673707923Z 
 12%|█▏        | 1121/9500 [3:52:02<28:32:36, 12.26s/it]08/03/2024 01:49:32 - INFO - __main__ -   Step: 1121, LR: 1.8185357647944598e-05, Loss: 576.7554931640625
2024-08-03T08:49:44.871788757Z 
 12%|█▏        | 1122/9500 [3:52:14<28:29:39, 12.24s/it]08/03/2024 01:49:44 - INFO - __main__ -   Step: 1122, LR: 1.818318710425732e-05, Loss: 664.831787109375
2024-08-03T08:49:57.344172625Z 
 12%|█▏        | 1123/9500 [3:52:27<28:39:01, 12.31s/it]08/03/2024 01:49:57 - INFO - __main__ -   Step: 1123, LR: 1.8181016560570038e-05, Loss: 680.56787109375
2024-08-03T08:50:09.737653563Z 
 12%|█▏        | 1124/9500 [3:52:39<28:42:12, 12.34s/it]08/03/2024 01:50:09 - INFO - __main__ -   Step: 1124, LR: 1.817884601688276e-05, Loss: 811.5374755859375
2024-08-03T08:50:21.924131062Z 
 12%|█▏        | 1125/9500 [3:52:51<28:35:43, 12.29s/it]08/03/2024 01:50:21 - INFO - __main__ -   Step: 1125, LR: 1.817667547319548e-05, Loss: 955.4331665039062
2024-08-03T08:50:34.274131812Z 
 12%|█▏        | 1126/9500 [3:53:04<28:37:56, 12.31s/it]08/03/2024 01:50:34 - INFO - __main__ -   Step: 1126, LR: 1.8174504929508204e-05, Loss: 556.6685180664062
2024-08-03T08:50:46.417987586Z 
 12%|█▏        | 1127/9500 [3:53:16<28:30:48, 12.26s/it]08/03/2024 01:50:46 - INFO - __main__ -   Step: 1127, LR: 1.8172334385820924e-05, Loss: 683.6221923828125
2024-08-03T08:50:58.754353021Z 
 12%|█▏        | 1128/9500 [3:53:28<28:33:50, 12.28s/it]08/03/2024 01:50:58 - INFO - __main__ -   Step: 1128, LR: 1.8170163842133644e-05, Loss: 749.311767578125
2024-08-03T08:51:11.498014533Z 
 12%|█▏        | 1129/9500 [3:53:41<28:52:55, 12.42s/it]08/03/2024 01:51:11 - INFO - __main__ -   Step: 1129, LR: 1.8167993298446367e-05, Loss: 622.916748046875
2024-08-03T08:51:23.693423900Z 
 12%|█▏        | 1130/9500 [3:53:53<28:43:16, 12.35s/it]08/03/2024 01:51:23 - INFO - __main__ -   Step: 1130, LR: 1.8165822754759087e-05, Loss: 792.9892578125
2024-08-03T08:51:35.679852139Z 
 12%|█▏        | 1131/9500 [3:54:05<28:27:43, 12.24s/it]08/03/2024 01:51:35 - INFO - __main__ -   Step: 1131, LR: 1.816365221107181e-05, Loss: 609.2242431640625
2024-08-03T08:51:48.149390451Z 
 12%|█▏        | 1132/9500 [3:54:18<28:36:55, 12.31s/it]08/03/2024 01:51:48 - INFO - __main__ -   Step: 1132, LR: 1.816148166738453e-05, Loss: 493.3519287109375
2024-08-03T08:52:00.388249503Z 
 12%|█▏        | 1133/9500 [3:54:30<28:33:46, 12.29s/it]08/03/2024 01:52:00 - INFO - __main__ -   Step: 1133, LR: 1.8159311123697254e-05, Loss: 724.2859497070312
2024-08-03T08:52:12.671277742Z 
 12%|█▏        | 1134/9500 [3:54:42<28:33:18, 12.29s/it]08/03/2024 01:52:12 - INFO - __main__ -   Step: 1134, LR: 1.8157140580009973e-05, Loss: 583.5364990234375
2024-08-03T08:52:25.279616505Z 
 12%|█▏        | 1135/9500 [3:54:55<28:46:30, 12.38s/it]08/03/2024 01:52:25 - INFO - __main__ -   Step: 1135, LR: 1.8154970036322693e-05, Loss: 865.3777465820312
2024-08-03T08:52:37.636470751Z 
 12%|█▏        | 1136/9500 [3:55:07<28:45:11, 12.38s/it]08/03/2024 01:52:37 - INFO - __main__ -   Step: 1136, LR: 1.8152799492635417e-05, Loss: 665.4461669921875
2024-08-03T08:52:49.878634714Z 
 12%|█▏        | 1137/9500 [3:55:19<28:39:21, 12.34s/it]08/03/2024 01:52:49 - INFO - __main__ -   Step: 1137, LR: 1.8150628948948133e-05, Loss: 592.8129272460938
2024-08-03T08:53:02.130694193Z 
 12%|█▏        | 1138/9500 [3:55:32<28:35:41, 12.31s/it]08/03/2024 01:53:02 - INFO - __main__ -   Step: 1138, LR: 1.8148458405260856e-05, Loss: 548.82470703125
2024-08-03T08:53:14.085177685Z 
 12%|█▏        | 1139/9500 [3:55:44<28:20:36, 12.20s/it]08/03/2024 01:53:14 - INFO - __main__ -   Step: 1139, LR: 1.8146287861573576e-05, Loss: 469.0942077636719
2024-08-03T08:53:26.588704755Z 
 12%|█▏        | 1140/9500 [3:55:56<28:32:54, 12.29s/it]08/03/2024 01:53:26 - INFO - __main__ -   Step: 1140, LR: 1.81441173178863e-05, Loss: 946.9347534179688
2024-08-03T08:53:38.910859573Z 
 12%|█▏        | 1141/9500 [3:56:08<28:33:54, 12.30s/it]08/03/2024 01:53:38 - INFO - __main__ -   Step: 1141, LR: 1.814194677419902e-05, Loss: 506.72369384765625
2024-08-03T08:53:50.939941383Z 
 12%|█▏        | 1142/9500 [3:56:20<28:22:16, 12.22s/it]08/03/2024 01:53:50 - INFO - __main__ -   Step: 1142, LR: 1.8139776230511742e-05, Loss: 630.4844360351562
2024-08-03T08:54:03.226303244Z 
 12%|█▏        | 1143/9500 [3:56:33<28:24:51, 12.24s/it]08/03/2024 01:54:03 - INFO - __main__ -   Step: 1143, LR: 1.8137605686824462e-05, Loss: 737.3201904296875
2024-08-03T08:54:15.741004235Z 
 12%|█▏        | 1144/9500 [3:56:45<28:36:06, 12.32s/it]08/03/2024 01:54:15 - INFO - __main__ -   Step: 1144, LR: 1.8135435143137182e-05, Loss: 783.1453857421875
2024-08-03T08:54:27.830027018Z 
 12%|█▏        | 1145/9500 [3:56:57<28:26:09, 12.25s/it]08/03/2024 01:54:27 - INFO - __main__ -   Step: 1145, LR: 1.8133264599449905e-05, Loss: 571.4151611328125
2024-08-03T08:54:40.139519576Z 
 12%|█▏        | 1146/9500 [3:57:10<28:28:19, 12.27s/it]08/03/2024 01:54:40 - INFO - __main__ -   Step: 1146, LR: 1.8131094055762625e-05, Loss: 929.2479248046875
2024-08-03T08:54:52.817727620Z 
 12%|█▏        | 1147/9500 [3:57:22<28:45:12, 12.39s/it]08/03/2024 01:54:52 - INFO - __main__ -   Step: 1147, LR: 1.812892351207535e-05, Loss: 734.4258422851562
2024-08-03T08:55:05.150808905Z 
 12%|█▏        | 1148/9500 [3:57:35<28:42:31, 12.37s/it]08/03/2024 01:55:05 - INFO - __main__ -   Step: 1148, LR: 1.812675296838807e-05, Loss: 766.9132080078125
2024-08-03T08:55:17.582695952Z 
 12%|█▏        | 1149/9500 [3:57:47<28:44:42, 12.39s/it]08/03/2024 01:55:17 - INFO - __main__ -   Step: 1149, LR: 1.812458242470079e-05, Loss: 736.2989501953125
2024-08-03T08:55:30.433783593Z 
 12%|█▏        | 1150/9500 [3:58:00<29:03:41, 12.53s/it]08/03/2024 01:55:30 - INFO - __main__ -   Step: 1150, LR: 1.812241188101351e-05, Loss: 814.2305908203125
2024-08-03T08:55:42.677418785Z 
 12%|█▏        | 1151/9500 [3:58:12<28:51:31, 12.44s/it]08/03/2024 01:55:42 - INFO - __main__ -   Step: 1151, LR: 1.812024133732623e-05, Loss: 731.0023803710938
2024-08-03T08:55:54.763450330Z 
 12%|█▏        | 1152/9500 [3:58:24<28:36:25, 12.34s/it]08/03/2024 01:55:54 - INFO - __main__ -   Step: 1152, LR: 1.811807079363895e-05, Loss: 674.4581298828125
2024-08-03T08:56:07.340193648Z 
 12%|█▏        | 1153/9500 [3:58:37<28:46:14, 12.41s/it]08/03/2024 01:56:07 - INFO - __main__ -   Step: 1153, LR: 1.811590024995167e-05, Loss: 641.1744995117188
2024-08-03T08:56:19.014148354Z 
 12%|█▏        | 1154/9500 [3:58:48<28:15:21, 12.19s/it]08/03/2024 01:56:19 - INFO - __main__ -   Step: 1154, LR: 1.8113729706264394e-05, Loss: 463.07720947265625
2024-08-03T08:56:31.163165162Z 
 12%|█▏        | 1155/9500 [3:59:01<28:13:32, 12.18s/it]08/03/2024 01:56:31 - INFO - __main__ -   Step: 1155, LR: 1.8111559162577114e-05, Loss: 580.905029296875
2024-08-03T08:56:43.384769149Z 
 12%|█▏        | 1156/9500 [3:59:13<28:15:13, 12.19s/it]08/03/2024 01:56:43 - INFO - __main__ -   Step: 1156, LR: 1.8109388618889838e-05, Loss: 718.7743530273438
2024-08-03T08:56:56.095523104Z 
 12%|█▏        | 1157/9500 [3:59:26<28:36:44, 12.35s/it]08/03/2024 01:56:56 - INFO - __main__ -   Step: 1157, LR: 1.8107218075202557e-05, Loss: 625.7150268554688
2024-08-03T08:57:08.632019375Z 
 12%|█▏        | 1158/9500 [3:59:38<28:44:28, 12.40s/it]08/03/2024 01:57:08 - INFO - __main__ -   Step: 1158, LR: 1.8105047531515277e-05, Loss: 576.9934692382812
2024-08-03T08:57:20.610455028Z 
 12%|█▏        | 1159/9500 [3:59:50<28:26:32, 12.28s/it]08/03/2024 01:57:20 - INFO - __main__ -   Step: 1159, LR: 1.8102876987828e-05, Loss: 811.59814453125
2024-08-03T08:57:33.233381616Z 
 12%|█▏        | 1160/9500 [4:00:03<28:40:48, 12.38s/it]08/03/2024 01:57:33 - INFO - __main__ -   Step: 1160, LR: 1.810070644414072e-05, Loss: 686.89892578125
2024-08-03T08:57:45.610899191Z 
 12%|█▏        | 1161/9500 [4:00:15<28:40:30, 12.38s/it]08/03/2024 01:57:45 - INFO - __main__ -   Step: 1161, LR: 1.8098535900453444e-05, Loss: 744.2915649414062
2024-08-03T08:57:57.957598291Z 
 12%|█▏        | 1162/9500 [4:00:27<28:38:57, 12.37s/it]08/03/2024 01:57:57 - INFO - __main__ -   Step: 1162, LR: 1.8096365356766164e-05, Loss: 775.6167602539062
2024-08-03T08:58:10.365741652Z 
 12%|█▏        | 1163/9500 [4:00:40<28:40:21, 12.38s/it]08/03/2024 01:58:10 - INFO - __main__ -   Step: 1163, LR: 1.8094194813078883e-05, Loss: 565.37353515625
2024-08-03T08:58:22.530489586Z 
 12%|█▏        | 1164/9500 [4:00:52<28:31:07, 12.32s/it]08/03/2024 01:58:22 - INFO - __main__ -   Step: 1164, LR: 1.8092024269391607e-05, Loss: 684.7987060546875
2024-08-03T08:58:34.562648361Z 
 12%|█▏        | 1165/9500 [4:01:04<28:19:05, 12.23s/it]08/03/2024 01:58:34 - INFO - __main__ -   Step: 1165, LR: 1.8089853725704326e-05, Loss: 701.7530517578125
2024-08-03T08:58:46.981018483Z 
 12%|█▏        | 1166/9500 [4:01:16<28:26:41, 12.29s/it]08/03/2024 01:58:46 - INFO - __main__ -   Step: 1166, LR: 1.8087683182017046e-05, Loss: 659.5419921875
2024-08-03T08:58:59.481587231Z 
 12%|█▏        | 1167/9500 [4:01:29<28:35:23, 12.35s/it]08/03/2024 01:58:59 - INFO - __main__ -   Step: 1167, LR: 1.8085512638329766e-05, Loss: 766.298828125
2024-08-03T08:59:11.497073213Z 
 12%|█▏        | 1168/9500 [4:01:41<28:21:10, 12.25s/it]08/03/2024 01:59:11 - INFO - __main__ -   Step: 1168, LR: 1.808334209464249e-05, Loss: 613.843017578125
2024-08-03T08:59:24.035084807Z 
 12%|█▏        | 1169/9500 [4:01:53<28:32:57, 12.34s/it]08/03/2024 01:59:24 - INFO - __main__ -   Step: 1169, LR: 1.808117155095521e-05, Loss: 624.71923828125
2024-08-03T08:59:36.481352698Z 
 12%|█▏        | 1170/9500 [4:02:06<28:37:14, 12.37s/it]08/03/2024 01:59:36 - INFO - __main__ -   Step: 1170, LR: 1.8079001007267933e-05, Loss: 597.1588134765625
2024-08-03T08:59:49.009861735Z 
 12%|█▏        | 1171/9500 [4:02:18<28:43:44, 12.42s/it]08/03/2024 01:59:49 - INFO - __main__ -   Step: 1171, LR: 1.8076830463580652e-05, Loss: 691.09521484375
2024-08-03T09:00:01.292878278Z 
 12%|█▏        | 1172/9500 [4:02:31<28:37:56, 12.38s/it]08/03/2024 02:00:01 - INFO - __main__ -   Step: 1172, LR: 1.8074659919893372e-05, Loss: 628.482177734375
2024-08-03T09:00:13.298224314Z 
 12%|█▏        | 1173/9500 [4:02:43<28:22:14, 12.27s/it]08/03/2024 02:00:13 - INFO - __main__ -   Step: 1173, LR: 1.8072489376206096e-05, Loss: 726.5693359375
2024-08-03T09:00:25.430615693Z 
 12%|█▏        | 1174/9500 [4:02:55<28:16:29, 12.23s/it]08/03/2024 02:00:25 - INFO - __main__ -   Step: 1174, LR: 1.8070318832518815e-05, Loss: 617.7557373046875
2024-08-03T09:00:37.939898708Z 
 12%|█▏        | 1175/9500 [4:03:07<28:28:06, 12.31s/it]08/03/2024 02:00:37 - INFO - __main__ -   Step: 1175, LR: 1.806814828883154e-05, Loss: 676.548828125
2024-08-03T09:00:50.313405988Z 
 12%|█▏        | 1176/9500 [4:03:20<28:30:29, 12.33s/it]08/03/2024 02:00:50 - INFO - __main__ -   Step: 1176, LR: 1.806597774514426e-05, Loss: 745.771484375
2024-08-03T09:01:02.422122953Z 
 12%|█▏        | 1177/9500 [4:03:32<28:21:07, 12.26s/it]08/03/2024 02:01:02 - INFO - __main__ -   Step: 1177, LR: 1.806380720145698e-05, Loss: 560.2750244140625
2024-08-03T09:01:14.849004106Z 
 12%|█▏        | 1178/9500 [4:03:44<28:27:44, 12.31s/it]08/03/2024 02:01:14 - INFO - __main__ -   Step: 1178, LR: 1.80616366577697e-05, Loss: 578.6475219726562
2024-08-03T09:01:26.823638427Z 
 12%|█▏        | 1179/9500 [4:03:56<28:13:28, 12.21s/it]08/03/2024 02:01:26 - INFO - __main__ -   Step: 1179, LR: 1.805946611408242e-05, Loss: 437.626220703125
2024-08-03T09:01:39.036038078Z 
 12%|█▏        | 1180/9500 [4:04:08<28:13:19, 12.21s/it]08/03/2024 02:01:39 - INFO - __main__ -   Step: 1180, LR: 1.805729557039514e-05, Loss: 708.6161499023438
2024-08-03T09:01:51.393006522Z 
 12%|█▏        | 1181/9500 [4:04:21<28:19:10, 12.26s/it]08/03/2024 02:01:51 - INFO - __main__ -   Step: 1181, LR: 1.805512502670786e-05, Loss: 501.3206787109375
2024-08-03T09:02:03.404441155Z 
 12%|█▏        | 1182/9500 [4:04:33<28:08:50, 12.18s/it]08/03/2024 02:02:03 - INFO - __main__ -   Step: 1182, LR: 1.8052954483020585e-05, Loss: 665.1871948242188
2024-08-03T09:02:15.504260664Z 
 12%|█▏        | 1183/9500 [4:04:45<28:05:12, 12.16s/it]08/03/2024 02:02:15 - INFO - __main__ -   Step: 1183, LR: 1.8050783939333304e-05, Loss: 713.263916015625
2024-08-03T09:02:27.950739715Z 
 12%|█▏        | 1184/9500 [4:04:57<28:17:02, 12.24s/it]08/03/2024 02:02:27 - INFO - __main__ -   Step: 1184, LR: 1.8048613395646028e-05, Loss: 670.7843627929688
2024-08-03T09:02:40.036659015Z 
 12%|█▏        | 1185/9500 [4:05:09<28:10:15, 12.20s/it]08/03/2024 02:02:40 - INFO - __main__ -   Step: 1185, LR: 1.8046442851958748e-05, Loss: 559.0584106445312
2024-08-03T09:02:52.184211234Z 
 12%|█▏        | 1186/9500 [4:05:22<28:08:00, 12.18s/it]08/03/2024 02:02:52 - INFO - __main__ -   Step: 1186, LR: 1.8044272308271467e-05, Loss: 715.0695190429688
2024-08-03T09:03:04.642042073Z 
 12%|█▏        | 1187/9500 [4:05:34<28:19:15, 12.26s/it]08/03/2024 02:03:04 - INFO - __main__ -   Step: 1187, LR: 1.804210176458419e-05, Loss: 590.653564453125
2024-08-03T09:03:16.793162853Z 
 13%|█▎        | 1188/9500 [4:05:46<28:14:21, 12.23s/it]08/03/2024 02:03:16 - INFO - __main__ -   Step: 1188, LR: 1.803993122089691e-05, Loss: 780.9901123046875
2024-08-03T09:03:29.153289129Z 
 13%|█▎        | 1189/9500 [4:05:59<28:19:31, 12.27s/it]08/03/2024 02:03:29 - INFO - __main__ -   Step: 1189, LR: 1.8037760677209634e-05, Loss: 581.4283447265625
2024-08-03T09:03:41.609246402Z 
 13%|█▎        | 1190/9500 [4:06:11<28:27:03, 12.33s/it]08/03/2024 02:03:41 - INFO - __main__ -   Step: 1190, LR: 1.8035590133522354e-05, Loss: 767.128662109375
2024-08-03T09:03:53.663130381Z 
 13%|█▎        | 1191/9500 [4:06:23<28:15:35, 12.24s/it]08/03/2024 02:03:53 - INFO - __main__ -   Step: 1191, LR: 1.8033419589835073e-05, Loss: 710.387939453125
2024-08-03T09:04:06.321948239Z 
 13%|█▎        | 1192/9500 [4:06:36<28:32:36, 12.37s/it]08/03/2024 02:04:06 - INFO - __main__ -   Step: 1192, LR: 1.8031249046147797e-05, Loss: 675.2470092773438
2024-08-03T09:04:18.672147932Z 
 13%|█▎        | 1193/9500 [4:06:48<28:31:39, 12.36s/it]08/03/2024 02:04:18 - INFO - __main__ -   Step: 1193, LR: 1.8029078502460517e-05, Loss: 515.4725341796875
2024-08-03T09:04:31.584040523Z 
 13%|█▎        | 1194/9500 [4:07:01<28:54:10, 12.53s/it]08/03/2024 02:04:31 - INFO - __main__ -   Step: 1194, LR: 1.8026907958773236e-05, Loss: 850.1534423828125
2024-08-03T09:04:44.160624444Z 
 13%|█▎        | 1195/9500 [4:07:14<28:56:04, 12.54s/it]08/03/2024 02:04:44 - INFO - __main__ -   Step: 1195, LR: 1.8024737415085956e-05, Loss: 669.7698974609375
2024-08-03T09:04:56.774654915Z 
 13%|█▎        | 1196/9500 [4:07:26<28:58:50, 12.56s/it]08/03/2024 02:04:56 - INFO - __main__ -   Step: 1196, LR: 1.802256687139868e-05, Loss: 672.7032470703125
2024-08-03T09:05:09.077874651Z 
 13%|█▎        | 1197/9500 [4:07:39<28:47:49, 12.49s/it]08/03/2024 02:05:09 - INFO - __main__ -   Step: 1197, LR: 1.80203963277114e-05, Loss: 611.1749267578125
2024-08-03T09:05:21.377726475Z 
 13%|█▎        | 1198/9500 [4:07:51<28:39:54, 12.43s/it]08/03/2024 02:05:21 - INFO - __main__ -   Step: 1198, LR: 1.8018225784024123e-05, Loss: 644.51171875
2024-08-03T09:05:33.666526080Z 
 13%|█▎        | 1199/9500 [4:08:03<28:33:49, 12.39s/it]08/03/2024 02:05:33 - INFO - __main__ -   Step: 1199, LR: 1.8016055240336843e-05, Loss: 804.2523193359375
2024-08-03T09:05:46.338239990Z 
 13%|█▎        | 1200/9500 [4:08:16<28:45:24, 12.47s/it]08/03/2024 02:05:46 - INFO - __main__ -   Step: 1200, LR: 1.8013884696649562e-05, Loss: 642.5935668945312
2024-08-03T09:05:58.564679412Z 
 13%|█▎        | 1201/9500 [4:08:28<28:34:59, 12.40s/it]08/03/2024 02:05:58 - INFO - __main__ -   Step: 1201, LR: 1.8011714152962286e-05, Loss: 694.463134765625
2024-08-03T09:06:10.580010370Z 
 13%|█▎        | 1202/9500 [4:08:40<28:18:51, 12.28s/it]08/03/2024 02:06:10 - INFO - __main__ -   Step: 1202, LR: 1.8009543609275006e-05, Loss: 708.8389892578125
2024-08-03T09:06:23.052449347Z 
 13%|█▎        | 1203/9500 [4:08:52<28:26:28, 12.34s/it]08/03/2024 02:06:23 - INFO - __main__ -   Step: 1203, LR: 1.800737306558773e-05, Loss: 550.2705078125
2024-08-03T09:06:35.453075505Z 
 13%|█▎        | 1204/9500 [4:09:05<28:28:46, 12.36s/it]08/03/2024 02:06:35 - INFO - __main__ -   Step: 1204, LR: 1.800520252190045e-05, Loss: 761.5311279296875
2024-08-03T09:06:47.699387886Z 
 13%|█▎        | 1205/9500 [4:09:17<28:23:54, 12.32s/it]08/03/2024 02:06:47 - INFO - __main__ -   Step: 1205, LR: 1.800303197821317e-05, Loss: 667.94873046875
2024-08-03T09:07:00.276394690Z 
 13%|█▎        | 1206/9500 [4:09:30<28:34:09, 12.40s/it]08/03/2024 02:07:00 - INFO - __main__ -   Step: 1206, LR: 1.800086143452589e-05, Loss: 686.8848876953125
2024-08-03T09:07:12.517166026Z 
 13%|█▎        | 1207/9500 [4:09:42<28:27:19, 12.35s/it]08/03/2024 02:07:12 - INFO - __main__ -   Step: 1207, LR: 1.799869089083861e-05, Loss: 620.4718017578125
2024-08-03T09:07:24.677098003Z 
 13%|█▎        | 1208/9500 [4:09:54<28:19:08, 12.29s/it]08/03/2024 02:07:24 - INFO - __main__ -   Step: 1208, LR: 1.799652034715133e-05, Loss: 681.8909301757812
2024-08-03T09:07:37.219828153Z 
 13%|█▎        | 1209/9500 [4:10:07<28:29:12, 12.37s/it]08/03/2024 02:07:37 - INFO - __main__ -   Step: 1209, LR: 1.799434980346405e-05, Loss: 540.2418823242188
2024-08-03T09:07:49.403524133Z 
 13%|█▎        | 1210/9500 [4:10:19<28:21:19, 12.31s/it]08/03/2024 02:07:49 - INFO - __main__ -   Step: 1210, LR: 1.7992179259776775e-05, Loss: 542.1070556640625
2024-08-03T09:08:01.468905329Z 
 13%|█▎        | 1211/9500 [4:10:31<28:10:49, 12.24s/it]08/03/2024 02:08:01 - INFO - __main__ -   Step: 1211, LR: 1.7990008716089495e-05, Loss: 747.3389282226562
2024-08-03T09:08:13.853155797Z 
 13%|█▎        | 1212/9500 [4:10:43<28:16:38, 12.28s/it]08/03/2024 02:08:13 - INFO - __main__ -   Step: 1212, LR: 1.7987838172402218e-05, Loss: 644.4832763671875
2024-08-03T09:08:26.043785525Z 
 13%|█▎        | 1213/9500 [4:10:55<28:12:37, 12.25s/it]08/03/2024 02:08:26 - INFO - __main__ -   Step: 1213, LR: 1.7985667628714938e-05, Loss: 625.26806640625
2024-08-03T09:08:38.234166194Z 
 13%|█▎        | 1214/9500 [4:11:08<28:09:43, 12.24s/it]08/03/2024 02:08:38 - INFO - __main__ -   Step: 1214, LR: 1.7983497085027658e-05, Loss: 696.5477905273438
2024-08-03T09:08:50.983549373Z 
 13%|█▎        | 1215/9500 [4:11:20<28:30:49, 12.39s/it]08/03/2024 02:08:50 - INFO - __main__ -   Step: 1215, LR: 1.798132654134038e-05, Loss: 735.2853393554688
2024-08-03T09:09:03.393670364Z 
 13%|█▎        | 1216/9500 [4:11:33<28:31:26, 12.40s/it]08/03/2024 02:09:03 - INFO - __main__ -   Step: 1216, LR: 1.79791559976531e-05, Loss: 896.3232421875
2024-08-03T09:09:15.645207092Z 
 13%|█▎        | 1217/9500 [4:11:45<28:25:16, 12.35s/it]08/03/2024 02:09:15 - INFO - __main__ -   Step: 1217, LR: 1.7976985453965824e-05, Loss: 652.8173828125
2024-08-03T09:09:28.284922317Z 
 13%|█▎        | 1218/9500 [4:11:58<28:36:57, 12.44s/it]08/03/2024 02:09:28 - INFO - __main__ -   Step: 1218, LR: 1.7974814910278544e-05, Loss: 582.0521240234375
2024-08-03T09:09:40.621970850Z 
 13%|█▎        | 1219/9500 [4:12:10<28:32:32, 12.41s/it]08/03/2024 02:09:40 - INFO - __main__ -   Step: 1219, LR: 1.7972644366591267e-05, Loss: 884.948486328125
2024-08-03T09:09:53.204520639Z 
 13%|█▎        | 1220/9500 [4:12:23<28:39:32, 12.46s/it]08/03/2024 02:09:53 - INFO - __main__ -   Step: 1220, LR: 1.7970473822903983e-05, Loss: 747.6715698242188
2024-08-03T09:10:05.620872653Z 
 13%|█▎        | 1221/9500 [4:12:35<28:37:31, 12.45s/it]08/03/2024 02:10:05 - INFO - __main__ -   Step: 1221, LR: 1.7968303279216707e-05, Loss: 624.4330444335938
2024-08-03T09:10:17.646572307Z 
 13%|█▎        | 1222/9500 [4:12:47<28:19:51, 12.32s/it]08/03/2024 02:10:17 - INFO - __main__ -   Step: 1222, LR: 1.7966132735529427e-05, Loss: 666.846435546875
2024-08-03T09:10:30.090514266Z 
 13%|█▎        | 1223/9500 [4:13:00<28:24:44, 12.36s/it]08/03/2024 02:10:30 - INFO - __main__ -   Step: 1223, LR: 1.7963962191842146e-05, Loss: 868.157470703125
2024-08-03T09:10:42.596304875Z 
 13%|█▎        | 1224/9500 [4:13:12<28:30:40, 12.40s/it]08/03/2024 02:10:42 - INFO - __main__ -   Step: 1224, LR: 1.796179164815487e-05, Loss: 577.1376953125
2024-08-03T09:10:54.998150283Z 
 13%|█▎        | 1225/9500 [4:13:24<28:30:26, 12.40s/it]08/03/2024 02:10:54 - INFO - __main__ -   Step: 1225, LR: 1.795962110446759e-05, Loss: 685.6943969726562
2024-08-03T09:11:06.865884398Z 
 13%|█▎        | 1226/9500 [4:13:36<28:08:08, 12.24s/it]08/03/2024 02:11:06 - INFO - __main__ -   Step: 1226, LR: 1.7957450560780313e-05, Loss: 528.656982421875
2024-08-03T09:11:19.025312797Z 
 13%|█▎        | 1227/9500 [4:13:48<28:04:31, 12.22s/it]08/03/2024 02:11:19 - INFO - __main__ -   Step: 1227, LR: 1.7955280017093033e-05, Loss: 467.43829345703125
2024-08-03T09:11:30.973855596Z 
 13%|█▎        | 1228/9500 [4:14:00<27:53:13, 12.14s/it]08/03/2024 02:11:30 - INFO - __main__ -   Step: 1228, LR: 1.7953109473405756e-05, Loss: 458.8004150390625
2024-08-03T09:11:42.952299944Z 
 13%|█▎        | 1229/9500 [4:14:12<27:46:28, 12.09s/it]08/03/2024 02:11:42 - INFO - __main__ -   Step: 1229, LR: 1.7950938929718476e-05, Loss: 661.5615844726562
2024-08-03T09:11:55.729922090Z 
 13%|█▎        | 1230/9500 [4:14:25<28:14:45, 12.30s/it]08/03/2024 02:11:55 - INFO - __main__ -   Step: 1230, LR: 1.7948768386031196e-05, Loss: 726.0101318359375
2024-08-03T09:12:08.049486737Z 
 13%|█▎        | 1231/9500 [4:14:37<28:15:32, 12.30s/it]08/03/2024 02:12:08 - INFO - __main__ -   Step: 1231, LR: 1.794659784234392e-05, Loss: 771.6758422851562
2024-08-03T09:12:20.158721422Z 
 13%|█▎        | 1232/9500 [4:14:50<28:07:19, 12.24s/it]08/03/2024 02:12:20 - INFO - __main__ -   Step: 1232, LR: 1.794442729865664e-05, Loss: 710.72607421875
2024-08-03T09:12:32.748642098Z 
 13%|█▎        | 1233/9500 [4:15:02<28:21:23, 12.35s/it]08/03/2024 02:12:32 - INFO - __main__ -   Step: 1233, LR: 1.7942256754969362e-05, Loss: 911.9159545898438
2024-08-03T09:12:44.687691635Z 
 13%|█▎        | 1234/9500 [4:15:14<28:04:16, 12.23s/it]08/03/2024 02:12:44 - INFO - __main__ -   Step: 1234, LR: 1.794008621128208e-05, Loss: 492.932373046875
2024-08-03T09:12:57.162074307Z 
 13%|█▎        | 1235/9500 [4:15:27<28:14:20, 12.30s/it]08/03/2024 02:12:57 - INFO - __main__ -   Step: 1235, LR: 1.7937915667594802e-05, Loss: 735.9608154296875
2024-08-03T09:13:09.540137808Z 
 13%|█▎        | 1236/9500 [4:15:39<28:17:21, 12.32s/it]08/03/2024 02:13:09 - INFO - __main__ -   Step: 1236, LR: 1.793574512390752e-05, Loss: 586.163818359375
2024-08-03T09:13:21.590822542Z 
 13%|█▎        | 1237/9500 [4:15:51<28:05:53, 12.24s/it]08/03/2024 02:13:21 - INFO - __main__ -   Step: 1237, LR: 1.7933574580220245e-05, Loss: 709.7205810546875
2024-08-03T09:13:34.000987064Z 
 13%|█▎        | 1238/9500 [4:16:03<28:12:38, 12.29s/it]08/03/2024 02:13:34 - INFO - __main__ -   Step: 1238, LR: 1.7931404036532965e-05, Loss: 621.3500366210938
2024-08-03T09:13:46.710080699Z 
 13%|█▎        | 1239/9500 [4:16:16<28:29:39, 12.42s/it]08/03/2024 02:13:46 - INFO - __main__ -   Step: 1239, LR: 1.7929233492845685e-05, Loss: 745.1012573242188
2024-08-03T09:13:59.098053378Z 
 13%|█▎        | 1240/9500 [4:16:29<28:28:13, 12.41s/it]08/03/2024 02:13:59 - INFO - __main__ -   Step: 1240, LR: 1.7927062949158408e-05, Loss: 695.5755004882812
2024-08-03T09:14:11.219613164Z 
 13%|█▎        | 1241/9500 [4:16:41<28:16:11, 12.32s/it]08/03/2024 02:14:11 - INFO - __main__ -   Step: 1241, LR: 1.7924892405471128e-05, Loss: 607.6805419921875
2024-08-03T09:14:23.316656821Z 
 13%|█▎        | 1242/9500 [4:16:53<28:06:40, 12.25s/it]08/03/2024 02:14:23 - INFO - __main__ -   Step: 1242, LR: 1.792272186178385e-05, Loss: 731.5184326171875
2024-08-03T09:14:35.861864288Z 
 13%|█▎        | 1243/9500 [4:17:05<28:18:26, 12.34s/it]08/03/2024 02:14:35 - INFO - __main__ -   Step: 1243, LR: 1.792055131809657e-05, Loss: 643.158203125
2024-08-03T09:14:48.250415559Z 
 13%|█▎        | 1244/9500 [4:17:18<28:20:10, 12.36s/it]08/03/2024 02:14:48 - INFO - __main__ -   Step: 1244, LR: 1.791838077440929e-05, Loss: 527.8504638671875
2024-08-03T09:15:00.414628473Z 
 13%|█▎        | 1245/9500 [4:17:30<28:12:02, 12.30s/it]08/03/2024 02:15:00 - INFO - __main__ -   Step: 1245, LR: 1.7916210230722014e-05, Loss: 694.308837890625
2024-08-03T09:15:13.307991203Z 
 13%|█▎        | 1246/9500 [4:17:43<28:36:24, 12.48s/it]08/03/2024 02:15:13 - INFO - __main__ -   Step: 1246, LR: 1.7914039687034734e-05, Loss: 801.156982421875
2024-08-03T09:15:25.607909242Z 
 13%|█▎        | 1247/9500 [4:17:55<28:28:54, 12.42s/it]08/03/2024 02:15:25 - INFO - __main__ -   Step: 1247, LR: 1.7911869143347457e-05, Loss: 568.66162109375
2024-08-03T09:15:37.857920437Z 
 13%|█▎        | 1248/9500 [4:18:07<28:21:30, 12.37s/it]08/03/2024 02:15:37 - INFO - __main__ -   Step: 1248, LR: 1.7909698599660174e-05, Loss: 634.6025390625
2024-08-03T09:15:50.368753797Z 
 13%|█▎        | 1249/9500 [4:18:20<28:27:03, 12.41s/it]08/03/2024 02:15:50 - INFO - __main__ -   Step: 1249, LR: 1.7907528055972897e-05, Loss: 542.854736328125
2024-08-03T09:16:02.882061593Z 
 13%|█▎        | 1250/9500 [4:18:32<28:30:55, 12.44s/it]08/03/2024 02:16:02 - INFO - __main__ -   Step: 1250, LR: 1.7905357512285617e-05, Loss: 813.51318359375
2024-08-03T09:16:14.923114051Z 
 13%|█▎        | 1251/9500 [4:18:44<28:14:10, 12.32s/it]08/03/2024 02:16:14 - INFO - __main__ -   Step: 1251, LR: 1.790318696859834e-05, Loss: 574.4185791015625
2024-08-03T09:16:27.299736192Z 
 13%|█▎        | 1252/9500 [4:18:57<28:16:10, 12.34s/it]08/03/2024 02:16:27 - INFO - __main__ -   Step: 1252, LR: 1.790101642491106e-05, Loss: 422.14593505859375
2024-08-03T09:16:39.717824709Z 
 13%|█▎        | 1253/9500 [4:19:09<28:19:15, 12.36s/it]08/03/2024 02:16:39 - INFO - __main__ -   Step: 1253, LR: 1.789884588122378e-05, Loss: 771.2530517578125
2024-08-03T09:16:51.911022903Z 
 13%|█▎        | 1254/9500 [4:19:21<28:12:03, 12.31s/it]08/03/2024 02:16:51 - INFO - __main__ -   Step: 1254, LR: 1.7896675337536503e-05, Loss: 670.37841796875
2024-08-03T09:17:04.912368621Z 
 13%|█▎        | 1255/9500 [4:19:34<28:40:16, 12.52s/it]08/03/2024 02:17:04 - INFO - __main__ -   Step: 1255, LR: 1.7894504793849223e-05, Loss: 735.9549560546875
2024-08-03T09:17:16.968916811Z 
 13%|█▎        | 1256/9500 [4:19:46<28:21:01, 12.38s/it]08/03/2024 02:17:16 - INFO - __main__ -   Step: 1256, LR: 1.7892334250161946e-05, Loss: 767.528076171875
2024-08-03T09:17:29.303178315Z 
 13%|█▎        | 1257/9500 [4:19:59<28:18:54, 12.37s/it]08/03/2024 02:17:29 - INFO - __main__ -   Step: 1257, LR: 1.7890163706474666e-05, Loss: 828.6923828125
2024-08-03T09:17:41.993908989Z 
 13%|█▎        | 1258/9500 [4:20:11<28:32:05, 12.46s/it]08/03/2024 02:17:41 - INFO - __main__ -   Step: 1258, LR: 1.7887993162787386e-05, Loss: 702.2122802734375
2024-08-03T09:17:54.230848527Z 
 13%|█▎        | 1259/9500 [4:20:24<28:22:32, 12.40s/it]08/03/2024 02:17:54 - INFO - __main__ -   Step: 1259, LR: 1.788582261910011e-05, Loss: 630.5216064453125
2024-08-03T09:18:06.430689252Z 
 13%|█▎        | 1260/9500 [4:20:36<28:14:15, 12.34s/it]08/03/2024 02:18:06 - INFO - __main__ -   Step: 1260, LR: 1.788365207541283e-05, Loss: 639.6021728515625
2024-08-03T09:18:19.212295260Z 
 13%|█▎        | 1261/9500 [4:20:49<28:32:23, 12.47s/it]08/03/2024 02:18:19 - INFO - __main__ -   Step: 1261, LR: 1.7881481531725552e-05, Loss: 778.5093994140625
2024-08-03T09:18:31.201586307Z 
 13%|█▎        | 1262/9500 [4:21:01<28:12:21, 12.33s/it]08/03/2024 02:18:31 - INFO - __main__ -   Step: 1262, LR: 1.787931098803827e-05, Loss: 887.90673828125
2024-08-03T09:18:43.512203768Z 
 13%|█▎        | 1263/9500 [4:21:13<28:11:30, 12.32s/it]08/03/2024 02:18:43 - INFO - __main__ -   Step: 1263, LR: 1.7877140444350992e-05, Loss: 681.5069580078125
2024-08-03T09:18:55.953549679Z 
 13%|█▎        | 1264/9500 [4:21:25<28:16:15, 12.36s/it]08/03/2024 02:18:55 - INFO - __main__ -   Step: 1264, LR: 1.7874969900663712e-05, Loss: 530.2598876953125
2024-08-03T09:19:08.259110970Z 
 13%|█▎        | 1265/9500 [4:21:38<28:13:55, 12.34s/it]08/03/2024 02:19:08 - INFO - __main__ -   Step: 1265, LR: 1.7872799356976435e-05, Loss: 900.5153198242188
2024-08-03T09:19:20.260509223Z 
 13%|█▎        | 1266/9500 [4:21:50<27:59:41, 12.24s/it]08/03/2024 02:19:20 - INFO - __main__ -   Step: 1266, LR: 1.7870628813289155e-05, Loss: 655.58740234375
2024-08-03T09:19:32.775047037Z 
 13%|█▎        | 1267/9500 [4:22:02<28:10:48, 12.32s/it]08/03/2024 02:19:32 - INFO - __main__ -   Step: 1267, LR: 1.7868458269601875e-05, Loss: 677.3848876953125
2024-08-03T09:19:45.402384929Z 
 13%|█▎        | 1268/9500 [4:22:15<28:23:09, 12.41s/it]08/03/2024 02:19:45 - INFO - __main__ -   Step: 1268, LR: 1.7866287725914598e-05, Loss: 779.2660522460938
2024-08-03T09:19:57.566967561Z 
 13%|█▎        | 1269/9500 [4:22:27<28:12:42, 12.34s/it]08/03/2024 02:19:57 - INFO - __main__ -   Step: 1269, LR: 1.7864117182227318e-05, Loss: 500.09747314453125
2024-08-03T09:20:10.508037687Z 
 13%|█▎        | 1270/9500 [4:22:40<28:37:15, 12.52s/it]08/03/2024 02:20:10 - INFO - __main__ -   Step: 1270, LR: 1.786194663854004e-05, Loss: 625.34716796875
2024-08-03T09:20:22.689976313Z 
 13%|█▎        | 1271/9500 [4:22:52<28:23:10, 12.42s/it]08/03/2024 02:20:22 - INFO - __main__ -   Step: 1271, LR: 1.785977609485276e-05, Loss: 592.3336181640625
2024-08-03T09:20:34.933288413Z 
 13%|█▎        | 1272/9500 [4:23:04<28:15:45, 12.37s/it]08/03/2024 02:20:34 - INFO - __main__ -   Step: 1272, LR: 1.785760555116548e-05, Loss: 656.653076171875
2024-08-03T09:20:47.325494012Z 
 13%|█▎        | 1273/9500 [4:23:17<28:16:38, 12.37s/it]08/03/2024 02:20:47 - INFO - __main__ -   Step: 1273, LR: 1.7855435007478204e-05, Loss: 561.7630615234375
2024-08-03T09:20:59.347565319Z 
 13%|█▎        | 1274/9500 [4:23:29<28:01:57, 12.27s/it]08/03/2024 02:20:59 - INFO - __main__ -   Step: 1274, LR: 1.7853264463790924e-05, Loss: 674.6837158203125
2024-08-03T09:21:11.448697070Z 
 13%|█▎        | 1275/9500 [4:23:41<27:54:54, 12.22s/it]08/03/2024 02:21:11 - INFO - __main__ -   Step: 1275, LR: 1.7851093920103647e-05, Loss: 563.9191284179688
2024-08-03T09:21:24.406053520Z 
 13%|█▎        | 1276/9500 [4:23:54<28:25:05, 12.44s/it]08/03/2024 02:21:24 - INFO - __main__ -   Step: 1276, LR: 1.7848923376416364e-05, Loss: 717.752685546875
2024-08-03T09:21:36.465636736Z 
 13%|█▎        | 1277/9500 [4:24:06<28:09:15, 12.33s/it]08/03/2024 02:21:36 - INFO - __main__ -   Step: 1277, LR: 1.7846752832729087e-05, Loss: 611.6490478515625
2024-08-03T09:21:48.831845648Z 
 13%|█▎        | 1278/9500 [4:24:18<28:10:40, 12.34s/it]08/03/2024 02:21:48 - INFO - __main__ -   Step: 1278, LR: 1.7844582289041807e-05, Loss: 659.7637939453125
2024-08-03T09:22:01.221848719Z 
 13%|█▎        | 1279/9500 [4:24:31<28:12:33, 12.35s/it]08/03/2024 02:22:01 - INFO - __main__ -   Step: 1279, LR: 1.784241174535453e-05, Loss: 472.6201477050781
2024-08-03T09:22:13.167543842Z 
 13%|█▎        | 1280/9500 [4:24:43<27:55:42, 12.23s/it]08/03/2024 02:22:13 - INFO - __main__ -   Step: 1280, LR: 1.784024120166725e-05, Loss: 529.9898071289062
2024-08-03T09:22:25.138963487Z 
 13%|█▎        | 1281/9500 [4:24:55<27:44:48, 12.15s/it]08/03/2024 02:22:25 - INFO - __main__ -   Step: 1281, LR: 1.783807065797997e-05, Loss: 578.53271484375
2024-08-03T09:22:37.584550072Z 
 13%|█▎        | 1282/9500 [4:25:07<27:56:37, 12.24s/it]08/03/2024 02:22:37 - INFO - __main__ -   Step: 1282, LR: 1.7835900114292693e-05, Loss: 637.0665283203125
2024-08-03T09:22:49.676188757Z 
 14%|█▎        | 1283/9500 [4:25:19<27:50:16, 12.20s/it]08/03/2024 02:22:49 - INFO - __main__ -   Step: 1283, LR: 1.7833729570605413e-05, Loss: 664.3138427734375
2024-08-03T09:23:02.173726572Z 
 14%|█▎        | 1284/9500 [4:25:32<28:02:24, 12.29s/it]08/03/2024 02:23:02 - INFO - __main__ -   Step: 1284, LR: 1.7831559026918136e-05, Loss: 827.3796997070312
2024-08-03T09:23:14.522591230Z 
 14%|█▎        | 1285/9500 [4:25:44<28:04:49, 12.31s/it]08/03/2024 02:23:14 - INFO - __main__ -   Step: 1285, LR: 1.7829388483230856e-05, Loss: 771.1630249023438
2024-08-03T09:23:27.250720410Z 
 14%|█▎        | 1286/9500 [4:25:57<28:21:58, 12.43s/it]08/03/2024 02:23:27 - INFO - __main__ -   Step: 1286, LR: 1.7827217939543576e-05, Loss: 734.9276733398438
2024-08-03T09:23:39.427992050Z 
 14%|█▎        | 1287/9500 [4:26:09<28:11:17, 12.36s/it]08/03/2024 02:23:39 - INFO - __main__ -   Step: 1287, LR: 1.78250473958563e-05, Loss: 532.5676879882812
2024-08-03T09:23:52.043771943Z 
 14%|█▎        | 1288/9500 [4:26:21<28:21:45, 12.43s/it]08/03/2024 02:23:52 - INFO - __main__ -   Step: 1288, LR: 1.782287685216902e-05, Loss: 731.924072265625
2024-08-03T09:24:04.609582880Z 
 14%|█▎        | 1289/9500 [4:26:34<28:26:59, 12.47s/it]08/03/2024 02:24:04 - INFO - __main__ -   Step: 1289, LR: 1.7820706308481742e-05, Loss: 509.0172119140625
2024-08-03T09:24:17.003581938Z 
 14%|█▎        | 1290/9500 [4:26:46<28:23:30, 12.45s/it]08/03/2024 02:24:17 - INFO - __main__ -   Step: 1290, LR: 1.781853576479446e-05, Loss: 644.5335693359375
2024-08-03T09:24:29.337700021Z 
 14%|█▎        | 1291/9500 [4:26:59<28:18:34, 12.41s/it]08/03/2024 02:24:29 - INFO - __main__ -   Step: 1291, LR: 1.7816365221107182e-05, Loss: 731.187255859375
2024-08-03T09:24:42.412275571Z 
 14%|█▎        | 1292/9500 [4:27:12<28:45:25, 12.61s/it]08/03/2024 02:24:42 - INFO - __main__ -   Step: 1292, LR: 1.7814194677419902e-05, Loss: 699.9737548828125
2024-08-03T09:24:54.495041732Z 
 14%|█▎        | 1293/9500 [4:27:24<28:23:27, 12.45s/it]08/03/2024 02:24:54 - INFO - __main__ -   Step: 1293, LR: 1.7812024133732625e-05, Loss: 512.9498291015625
2024-08-03T09:25:06.767647588Z 
 14%|█▎        | 1294/9500 [4:27:36<28:15:49, 12.40s/it]08/03/2024 02:25:06 - INFO - __main__ -   Step: 1294, LR: 1.7809853590045345e-05, Loss: 650.6334838867188
2024-08-03T09:25:19.525495729Z 
 14%|█▎        | 1295/9500 [4:27:49<28:30:20, 12.51s/it]08/03/2024 02:25:19 - INFO - __main__ -   Step: 1295, LR: 1.7807683046358065e-05, Loss: 778.714111328125
2024-08-03T09:25:31.565419455Z 
 14%|█▎        | 1296/9500 [4:28:01<28:10:57, 12.37s/it]08/03/2024 02:25:31 - INFO - __main__ -   Step: 1296, LR: 1.7805512502670788e-05, Loss: 756.8015747070312
2024-08-03T09:25:44.448021472Z 
 14%|█▎        | 1297/9500 [4:28:14<28:31:54, 12.52s/it]08/03/2024 02:25:44 - INFO - __main__ -   Step: 1297, LR: 1.7803341958983508e-05, Loss: 920.7899780273438
2024-08-03T09:25:57.003257189Z 
 14%|█▎        | 1298/9500 [4:28:26<28:33:04, 12.53s/it]08/03/2024 02:25:57 - INFO - __main__ -   Step: 1298, LR: 1.780117141529623e-05, Loss: 597.439453125
2024-08-03T09:26:09.303857345Z 
 14%|█▎        | 1299/9500 [4:28:39<28:23:23, 12.46s/it]08/03/2024 02:26:09 - INFO - __main__ -   Step: 1299, LR: 1.779900087160895e-05, Loss: 751.015869140625
2024-08-03T09:26:21.614999336Z 
 14%|█▎        | 1300/9500 [4:28:51<28:16:59, 12.42s/it]08/03/2024 02:26:21 - INFO - __main__ -   Step: 1300, LR: 1.779683032792167e-05, Loss: 739.4271850585938
2024-08-03T09:26:34.168276783Z 
 14%|█▎        | 1301/9500 [4:29:04<28:22:22, 12.46s/it]08/03/2024 02:26:34 - INFO - __main__ -   Step: 1301, LR: 1.7794659784234394e-05, Loss: 654.4796142578125
2024-08-03T09:26:46.270661183Z 
 14%|█▎        | 1302/9500 [4:29:16<28:07:35, 12.35s/it]08/03/2024 02:26:46 - INFO - __main__ -   Step: 1302, LR: 1.7792489240547114e-05, Loss: 898.584228515625
2024-08-03T09:26:58.245479849Z 
 14%|█▎        | 1303/9500 [4:29:28<27:51:57, 12.24s/it]08/03/2024 02:26:58 - INFO - __main__ -   Step: 1303, LR: 1.7790318696859837e-05, Loss: 545.5894775390625
2024-08-03T09:27:10.617268571Z 
 14%|█▎        | 1304/9500 [4:29:40<27:57:13, 12.28s/it]08/03/2024 02:27:10 - INFO - __main__ -   Step: 1304, LR: 1.7788148153172554e-05, Loss: 687.9274291992188
2024-08-03T09:27:22.864522821Z 
 14%|█▎        | 1305/9500 [4:29:52<27:55:44, 12.27s/it]08/03/2024 02:27:22 - INFO - __main__ -   Step: 1305, LR: 1.7785977609485277e-05, Loss: 824.506591796875
2024-08-03T09:27:35.222982783Z 
 14%|█▎        | 1306/9500 [4:30:05<27:59:11, 12.30s/it]08/03/2024 02:27:35 - INFO - __main__ -   Step: 1306, LR: 1.7783807065797997e-05, Loss: 545.2678833007812
2024-08-03T09:27:47.775174333Z 
 14%|█▍        | 1307/9500 [4:30:17<28:09:29, 12.37s/it]08/03/2024 02:27:47 - INFO - __main__ -   Step: 1307, LR: 1.778163652211072e-05, Loss: 888.2467041015625
2024-08-03T09:28:00.084773483Z 
 14%|█▍        | 1308/9500 [4:30:30<28:06:42, 12.35s/it]08/03/2024 02:28:00 - INFO - __main__ -   Step: 1308, LR: 1.777946597842344e-05, Loss: 772.0326538085938
2024-08-03T09:28:12.266234428Z 
 14%|█▍        | 1309/9500 [4:30:42<27:59:26, 12.30s/it]08/03/2024 02:28:12 - INFO - __main__ -   Step: 1309, LR: 1.777729543473616e-05, Loss: 701.0958251953125
2024-08-03T09:28:25.001854954Z 
 14%|█▍        | 1310/9500 [4:30:54<28:16:59, 12.43s/it]08/03/2024 02:28:25 - INFO - __main__ -   Step: 1310, LR: 1.7775124891048883e-05, Loss: 708.2244873046875
2024-08-03T09:28:36.936343779Z 
 14%|█▍        | 1311/9500 [4:31:06<27:56:24, 12.28s/it]08/03/2024 02:28:36 - INFO - __main__ -   Step: 1311, LR: 1.7772954347361603e-05, Loss: 566.7659912109375
2024-08-03T09:28:48.895473393Z 
 14%|█▍        | 1312/9500 [4:31:18<27:42:56, 12.19s/it]08/03/2024 02:28:48 - INFO - __main__ -   Step: 1312, LR: 1.7770783803674326e-05, Loss: 547.6326293945312
2024-08-03T09:29:01.705046360Z 
 14%|█▍        | 1313/9500 [4:31:31<28:08:16, 12.37s/it]08/03/2024 02:29:01 - INFO - __main__ -   Step: 1313, LR: 1.7768613259987046e-05, Loss: 562.525390625
2024-08-03T09:29:13.929114866Z 
 14%|█▍        | 1314/9500 [4:31:43<28:01:58, 12.33s/it]08/03/2024 02:29:13 - INFO - __main__ -   Step: 1314, LR: 1.7766442716299766e-05, Loss: 669.8248901367188
2024-08-03T09:29:25.878220932Z 
 14%|█▍        | 1315/9500 [4:31:55<27:46:15, 12.21s/it]08/03/2024 02:29:25 - INFO - __main__ -   Step: 1315, LR: 1.776427217261249e-05, Loss: 581.0509033203125
2024-08-03T09:29:38.348093848Z 
 14%|█▍        | 1316/9500 [4:32:08<27:56:30, 12.29s/it]08/03/2024 02:29:38 - INFO - __main__ -   Step: 1316, LR: 1.776210162892521e-05, Loss: 630.8825073242188
2024-08-03T09:29:50.142584889Z 
 14%|█▍        | 1317/9500 [4:32:20<27:35:59, 12.14s/it]08/03/2024 02:29:50 - INFO - __main__ -   Step: 1317, LR: 1.7759931085237932e-05, Loss: 524.0285034179688
2024-08-03T09:30:02.488203689Z 
 14%|█▍        | 1318/9500 [4:32:32<27:44:06, 12.20s/it]08/03/2024 02:30:02 - INFO - __main__ -   Step: 1318, LR: 1.775776054155065e-05, Loss: 660.2147216796875
2024-08-03T09:30:14.976751803Z 
 14%|█▍        | 1319/9500 [4:32:44<27:55:34, 12.29s/it]08/03/2024 02:30:14 - INFO - __main__ -   Step: 1319, LR: 1.7755589997863372e-05, Loss: 663.4957275390625
2024-08-03T09:30:26.902467329Z 
 14%|█▍        | 1320/9500 [4:32:56<27:40:31, 12.18s/it]08/03/2024 02:30:26 - INFO - __main__ -   Step: 1320, LR: 1.7753419454176092e-05, Loss: 504.6220703125
2024-08-03T09:30:39.171530547Z 
 14%|█▍        | 1321/9500 [4:33:09<27:43:58, 12.21s/it]08/03/2024 02:30:39 - INFO - __main__ -   Step: 1321, LR: 1.7751248910488815e-05, Loss: 812.428955078125
2024-08-03T09:30:51.526466261Z 
 14%|█▍        | 1322/9500 [4:33:21<27:49:50, 12.25s/it]08/03/2024 02:30:51 - INFO - __main__ -   Step: 1322, LR: 1.7749078366801535e-05, Loss: 434.7499694824219
2024-08-03T09:31:03.585735282Z 
 14%|█▍        | 1323/9500 [4:33:33<27:41:46, 12.19s/it]08/03/2024 02:31:03 - INFO - __main__ -   Step: 1323, LR: 1.7746907823114255e-05, Loss: 693.3154296875
2024-08-03T09:31:16.004084017Z 
 14%|█▍        | 1324/9500 [4:33:45<27:50:45, 12.26s/it]08/03/2024 02:31:16 - INFO - __main__ -   Step: 1324, LR: 1.7744737279426978e-05, Loss: 684.4236450195312
2024-08-03T09:31:28.489339301Z 
 14%|█▍        | 1325/9500 [4:33:58<27:59:42, 12.33s/it]08/03/2024 02:31:28 - INFO - __main__ -   Step: 1325, LR: 1.7742566735739698e-05, Loss: 625.3148193359375
2024-08-03T09:31:40.717810634Z 
 14%|█▍        | 1326/9500 [4:34:10<27:55:27, 12.30s/it]08/03/2024 02:31:40 - INFO - __main__ -   Step: 1326, LR: 1.774039619205242e-05, Loss: 785.9427490234375
2024-08-03T09:31:52.993666013Z 
 14%|█▍        | 1327/9500 [4:34:22<27:54:18, 12.29s/it]08/03/2024 02:31:52 - INFO - __main__ -   Step: 1327, LR: 1.773822564836514e-05, Loss: 762.893798828125
2024-08-03T09:32:05.280014334Z 
 14%|█▍        | 1328/9500 [4:34:35<27:53:54, 12.29s/it]08/03/2024 02:32:05 - INFO - __main__ -   Step: 1328, LR: 1.7736055104677864e-05, Loss: 593.3328857421875
2024-08-03T09:32:17.820610736Z 
 14%|█▍        | 1329/9500 [4:34:47<28:03:56, 12.37s/it]08/03/2024 02:32:17 - INFO - __main__ -   Step: 1329, LR: 1.7733884560990584e-05, Loss: 621.1004028320312
2024-08-03T09:32:30.213797928Z 
 14%|█▍        | 1330/9500 [4:35:00<28:04:48, 12.37s/it]08/03/2024 02:32:30 - INFO - __main__ -   Step: 1330, LR: 1.7731714017303304e-05, Loss: 547.8553466796875
2024-08-03T09:32:42.099184246Z 
 14%|█▍        | 1331/9500 [4:35:12<27:44:44, 12.23s/it]08/03/2024 02:32:42 - INFO - __main__ -   Step: 1331, LR: 1.7729543473616027e-05, Loss: 589.6512451171875
2024-08-03T09:32:54.587083379Z 
 14%|█▍        | 1332/9500 [4:35:24<27:55:11, 12.31s/it]08/03/2024 02:32:54 - INFO - __main__ -   Step: 1332, LR: 1.7727372929928744e-05, Loss: 545.1826171875
2024-08-03T09:33:07.058721948Z 
 14%|█▍        | 1333/9500 [4:35:36<28:01:45, 12.36s/it]08/03/2024 02:33:07 - INFO - __main__ -   Step: 1333, LR: 1.7725202386241467e-05, Loss: 633.7911987304688
2024-08-03T09:33:19.263110725Z 
 14%|█▍        | 1334/9500 [4:35:49<27:55:23, 12.31s/it]08/03/2024 02:33:19 - INFO - __main__ -   Step: 1334, LR: 1.7723031842554187e-05, Loss: 655.9820556640625
2024-08-03T09:33:32.013175101Z 
 14%|█▍        | 1335/9500 [4:36:01<28:13:09, 12.44s/it]08/03/2024 02:33:32 - INFO - __main__ -   Step: 1335, LR: 1.772086129886691e-05, Loss: 613.8323364257812
2024-08-03T09:33:44.614697935Z 
 14%|█▍        | 1336/9500 [4:36:14<28:19:26, 12.49s/it]08/03/2024 02:33:44 - INFO - __main__ -   Step: 1336, LR: 1.771869075517963e-05, Loss: 874.400634765625
2024-08-03T09:33:56.929964241Z 
 14%|█▍        | 1337/9500 [4:36:26<28:12:07, 12.44s/it]08/03/2024 02:33:56 - INFO - __main__ -   Step: 1337, LR: 1.7716520211492353e-05, Loss: 699.5419311523438
2024-08-03T09:34:09.811410781Z 
 14%|█▍        | 1338/9500 [4:36:39<28:30:01, 12.57s/it]08/03/2024 02:34:09 - INFO - __main__ -   Step: 1338, LR: 1.7714349667805073e-05, Loss: 552.30029296875
2024-08-03T09:34:21.936519957Z 
 14%|█▍        | 1339/9500 [4:36:51<28:11:38, 12.44s/it]08/03/2024 02:34:21 - INFO - __main__ -   Step: 1339, LR: 1.7712179124117793e-05, Loss: 560.251708984375
2024-08-03T09:34:33.994593236Z 
 14%|█▍        | 1340/9500 [4:37:03<27:55:58, 12.32s/it]08/03/2024 02:34:33 - INFO - __main__ -   Step: 1340, LR: 1.7710008580430516e-05, Loss: 528.5718994140625
2024-08-03T09:34:46.339909232Z 
 14%|█▍        | 1341/9500 [4:37:16<27:56:39, 12.33s/it]08/03/2024 02:34:46 - INFO - __main__ -   Step: 1341, LR: 1.7707838036743236e-05, Loss: 770.4739379882812
2024-08-03T09:34:58.368307886Z 
 14%|█▍        | 1342/9500 [4:37:28<27:44:09, 12.24s/it]08/03/2024 02:34:58 - INFO - __main__ -   Step: 1342, LR: 1.770566749305596e-05, Loss: 513.0078125
2024-08-03T09:35:10.785850158Z 
 14%|█▍        | 1343/9500 [4:37:40<27:51:13, 12.29s/it]08/03/2024 02:35:10 - INFO - __main__ -   Step: 1343, LR: 1.770349694936868e-05, Loss: 629.3748168945312
2024-08-03T09:35:23.286899424Z 
 14%|█▍        | 1344/9500 [4:37:53<27:59:30, 12.36s/it]08/03/2024 02:35:23 - INFO - __main__ -   Step: 1344, LR: 1.77013264056814e-05, Loss: 565.9686279296875
2024-08-03T09:35:35.518286162Z 
 14%|█▍        | 1345/9500 [4:38:05<27:54:14, 12.32s/it]08/03/2024 02:35:35 - INFO - __main__ -   Step: 1345, LR: 1.7699155861994123e-05, Loss: 572.4915771484375
2024-08-03T09:35:48.070998201Z 
 14%|█▍        | 1346/9500 [4:38:18<28:03:36, 12.39s/it]08/03/2024 02:35:48 - INFO - __main__ -   Step: 1346, LR: 1.7696985318306842e-05, Loss: 779.2823486328125
2024-08-03T09:36:00.859606803Z 
 14%|█▍        | 1347/9500 [4:38:30<28:19:41, 12.51s/it]08/03/2024 02:36:00 - INFO - __main__ -   Step: 1347, LR: 1.7694814774619562e-05, Loss: 684.218505859375
2024-08-03T09:36:13.024990010Z 
 14%|█▍        | 1348/9500 [4:38:42<28:05:29, 12.41s/it]08/03/2024 02:36:13 - INFO - __main__ -   Step: 1348, LR: 1.7692644230932282e-05, Loss: 693.88916015625
2024-08-03T09:36:25.058997180Z 
 14%|█▍        | 1349/9500 [4:38:54<27:50:09, 12.29s/it]08/03/2024 02:36:25 - INFO - __main__ -   Step: 1349, LR: 1.7690473687245005e-05, Loss: 449.9081726074219
2024-08-03T09:36:37.490633055Z 
 14%|█▍        | 1350/9500 [4:39:07<27:55:33, 12.34s/it]08/03/2024 02:36:37 - INFO - __main__ -   Step: 1350, LR: 1.7688303143557725e-05, Loss: 656.6838989257812
2024-08-03T09:36:49.583720256Z 
 14%|█▍        | 1351/9500 [4:39:19<27:45:29, 12.26s/it]08/03/2024 02:36:49 - INFO - __main__ -   Step: 1351, LR: 1.768613259987045e-05, Loss: 693.1600341796875
2024-08-03T09:37:01.574341927Z 
 14%|█▍        | 1352/9500 [4:39:31<27:34:11, 12.18s/it]08/03/2024 02:37:01 - INFO - __main__ -   Step: 1352, LR: 1.768396205618317e-05, Loss: 687.748291015625
2024-08-03T09:37:13.854034816Z 
 14%|█▍        | 1353/9500 [4:39:43<27:37:59, 12.21s/it]08/03/2024 02:37:13 - INFO - __main__ -   Step: 1353, LR: 1.7681791512495888e-05, Loss: 575.2507934570312
2024-08-03T09:37:26.027836285Z 
 14%|█▍        | 1354/9500 [4:39:55<27:36:17, 12.20s/it]08/03/2024 02:37:26 - INFO - __main__ -   Step: 1354, LR: 1.767962096880861e-05, Loss: 755.37744140625
2024-08-03T09:37:38.118707667Z 
 14%|█▍        | 1355/9500 [4:40:08<27:31:39, 12.17s/it]08/03/2024 02:37:38 - INFO - __main__ -   Step: 1355, LR: 1.767745042512133e-05, Loss: 599.6845703125
2024-08-03T09:37:50.459042865Z 
 14%|█▍        | 1356/9500 [4:40:20<27:38:31, 12.22s/it]08/03/2024 02:37:50 - INFO - __main__ -   Step: 1356, LR: 1.7675279881434055e-05, Loss: 678.1607666015625
2024-08-03T09:38:02.618752180Z 
 14%|█▍        | 1357/9500 [4:40:32<27:35:54, 12.20s/it]08/03/2024 02:38:02 - INFO - __main__ -   Step: 1357, LR: 1.7673109337746774e-05, Loss: 536.1301879882812
2024-08-03T09:38:14.404364880Z 
 14%|█▍        | 1358/9500 [4:40:44<27:18:46, 12.08s/it]08/03/2024 02:38:14 - INFO - __main__ -   Step: 1358, LR: 1.7670938794059494e-05, Loss: 544.3023071289062
2024-08-03T09:38:27.020849299Z 
 14%|█▍        | 1359/9500 [4:40:56<27:40:34, 12.24s/it]08/03/2024 02:38:27 - INFO - __main__ -   Step: 1359, LR: 1.7668768250372218e-05, Loss: 691.2734375
2024-08-03T09:38:39.038767700Z 
 14%|█▍        | 1360/9500 [4:41:08<27:31:22, 12.17s/it]08/03/2024 02:38:39 - INFO - __main__ -   Step: 1360, LR: 1.7666597706684937e-05, Loss: 609.2832641601562
2024-08-03T09:38:51.160160009Z 
 14%|█▍        | 1361/9500 [4:41:21<27:29:06, 12.16s/it]08/03/2024 02:38:51 - INFO - __main__ -   Step: 1361, LR: 1.7664427162997657e-05, Loss: 772.265625
2024-08-03T09:39:03.600931375Z 
 14%|█▍        | 1362/9500 [4:41:33<27:40:26, 12.24s/it]08/03/2024 02:39:03 - INFO - __main__ -   Step: 1362, LR: 1.7662256619310377e-05, Loss: 665.4354248046875
2024-08-03T09:39:16.227489073Z 
 14%|█▍        | 1363/9500 [4:41:46<27:55:53, 12.36s/it]08/03/2024 02:39:16 - INFO - __main__ -   Step: 1363, LR: 1.76600860756231e-05, Loss: 726.9346923828125
2024-08-03T09:39:28.269019848Z 
 14%|█▍        | 1364/9500 [4:41:58<27:42:48, 12.26s/it]08/03/2024 02:39:28 - INFO - __main__ -   Step: 1364, LR: 1.765791553193582e-05, Loss: 712.6952514648438
2024-08-03T09:39:40.870343300Z 
 14%|█▍        | 1365/9500 [4:42:10<27:56:24, 12.36s/it]08/03/2024 02:39:40 - INFO - __main__ -   Step: 1365, LR: 1.7655744988248544e-05, Loss: 680.0086669921875
2024-08-03T09:39:53.087712699Z 
 14%|█▍        | 1366/9500 [4:42:23<27:50:12, 12.32s/it]08/03/2024 02:39:53 - INFO - __main__ -   Step: 1366, LR: 1.7653574444561263e-05, Loss: 588.6331176757812
2024-08-03T09:40:05.549933824Z 
 14%|█▍        | 1367/9500 [4:42:35<27:55:46, 12.36s/it]08/03/2024 02:40:05 - INFO - __main__ -   Step: 1367, LR: 1.7651403900873983e-05, Loss: 869.0719604492188
2024-08-03T09:40:18.050284795Z 
 14%|█▍        | 1368/9500 [4:42:47<28:01:10, 12.40s/it]08/03/2024 02:40:18 - INFO - __main__ -   Step: 1368, LR: 1.7649233357186707e-05, Loss: 717.623779296875
2024-08-03T09:40:30.186090907Z 
 14%|█▍        | 1369/9500 [4:43:00<27:50:03, 12.32s/it]08/03/2024 02:40:30 - INFO - __main__ -   Step: 1369, LR: 1.7647062813499426e-05, Loss: 552.9725341796875
2024-08-03T09:40:42.446311815Z 
 14%|█▍        | 1370/9500 [4:43:12<27:47:16, 12.30s/it]08/03/2024 02:40:42 - INFO - __main__ -   Step: 1370, LR: 1.764489226981215e-05, Loss: 797.89501953125
2024-08-03T09:40:54.752923504Z 
 14%|█▍        | 1371/9500 [4:43:24<27:47:07, 12.31s/it]08/03/2024 02:40:54 - INFO - __main__ -   Step: 1371, LR: 1.764272172612487e-05, Loss: 723.74560546875
2024-08-03T09:41:07.083486616Z 
 14%|█▍        | 1372/9500 [4:43:37<27:47:59, 12.31s/it]08/03/2024 02:41:07 - INFO - __main__ -   Step: 1372, LR: 1.764055118243759e-05, Loss: 474.67681884765625
2024-08-03T09:41:19.349364144Z 
 14%|█▍        | 1373/9500 [4:43:49<27:45:51, 12.30s/it]08/03/2024 02:41:19 - INFO - __main__ -   Step: 1373, LR: 1.763838063875031e-05, Loss: 702.6820068359375
2024-08-03T09:41:31.777064477Z 
 14%|█▍        | 1374/9500 [4:44:01<27:50:53, 12.34s/it]08/03/2024 02:41:31 - INFO - __main__ -   Step: 1374, LR: 1.7636210095063032e-05, Loss: 807.65869140625
2024-08-03T09:41:44.333930439Z 
 14%|█▍        | 1375/9500 [4:44:14<27:59:36, 12.40s/it]08/03/2024 02:41:44 - INFO - __main__ -   Step: 1375, LR: 1.7634039551375752e-05, Loss: 704.6354370117188
2024-08-03T09:41:56.370946043Z 
 14%|█▍        | 1376/9500 [4:44:26<27:44:31, 12.29s/it]08/03/2024 02:41:56 - INFO - __main__ -   Step: 1376, LR: 1.7631869007688472e-05, Loss: 740.2655029296875
2024-08-03T09:42:08.559216157Z 
 14%|█▍        | 1377/9500 [4:44:38<27:40:03, 12.26s/it]08/03/2024 02:42:08 - INFO - __main__ -   Step: 1377, LR: 1.7629698464001195e-05, Loss: 841.23291015625
2024-08-03T09:42:21.207960066Z 
 15%|█▍        | 1378/9500 [4:44:51<27:55:30, 12.38s/it]08/03/2024 02:42:21 - INFO - __main__ -   Step: 1378, LR: 1.7627527920313915e-05, Loss: 644.405517578125
2024-08-03T09:42:33.300259836Z 
 15%|█▍        | 1379/9500 [4:45:03<27:43:45, 12.29s/it]08/03/2024 02:42:33 - INFO - __main__ -   Step: 1379, LR: 1.762535737662664e-05, Loss: 734.0555419921875
2024-08-03T09:42:46.081990716Z 
 15%|█▍        | 1380/9500 [4:45:16<28:03:26, 12.44s/it]08/03/2024 02:42:46 - INFO - __main__ -   Step: 1380, LR: 1.762318683293936e-05, Loss: 601.6852416992188
2024-08-03T09:42:58.563713246Z 
 15%|█▍        | 1381/9500 [4:45:28<28:04:57, 12.45s/it]08/03/2024 02:42:58 - INFO - __main__ -   Step: 1381, LR: 1.762101628925208e-05, Loss: 599.2965698242188
2024-08-03T09:43:10.801828191Z 
 15%|█▍        | 1382/9500 [4:45:40<27:56:04, 12.39s/it]08/03/2024 02:43:10 - INFO - __main__ -   Step: 1382, LR: 1.76188457455648e-05, Loss: 467.91339111328125
2024-08-03T09:43:22.809690939Z 
 15%|█▍        | 1383/9500 [4:45:52<27:40:26, 12.27s/it]08/03/2024 02:43:22 - INFO - __main__ -   Step: 1383, LR: 1.761667520187752e-05, Loss: 408.56988525390625
2024-08-03T09:43:35.494375751Z 
 15%|█▍        | 1384/9500 [4:46:05<27:56:53, 12.40s/it]08/03/2024 02:43:35 - INFO - __main__ -   Step: 1384, LR: 1.7614504658190245e-05, Loss: 827.4844360351562
2024-08-03T09:43:47.716214414Z 
 15%|█▍        | 1385/9500 [4:46:17<27:49:35, 12.34s/it]08/03/2024 02:43:47 - INFO - __main__ -   Step: 1385, LR: 1.7612334114502965e-05, Loss: 672.504150390625
2024-08-03T09:44:00.162529974Z 
 15%|█▍        | 1386/9500 [4:46:30<27:53:30, 12.38s/it]08/03/2024 02:44:00 - INFO - __main__ -   Step: 1386, LR: 1.7610163570815684e-05, Loss: 679.7256469726562
2024-08-03T09:44:12.488771512Z 
 15%|█▍        | 1387/9500 [4:46:42<27:51:19, 12.36s/it]08/03/2024 02:44:12 - INFO - __main__ -   Step: 1387, LR: 1.7607993027128404e-05, Loss: 649.1301879882812
2024-08-03T09:44:24.737650401Z 
 15%|█▍        | 1388/9500 [4:46:54<27:46:35, 12.33s/it]08/03/2024 02:44:24 - INFO - __main__ -   Step: 1388, LR: 1.7605822483441128e-05, Loss: 625.5640258789062
2024-08-03T09:44:36.804489815Z 
 15%|█▍        | 1389/9500 [4:47:06<27:35:51, 12.25s/it]08/03/2024 02:44:36 - INFO - __main__ -   Step: 1389, LR: 1.7603651939753847e-05, Loss: 737.8585205078125
2024-08-03T09:44:49.160991335Z 
 15%|█▍        | 1390/9500 [4:47:19<27:40:00, 12.28s/it]08/03/2024 02:44:49 - INFO - __main__ -   Step: 1390, LR: 1.7601481396066567e-05, Loss: 654.2200927734375
2024-08-03T09:45:01.600212733Z 
 15%|█▍        | 1391/9500 [4:47:31<27:46:12, 12.33s/it]08/03/2024 02:45:01 - INFO - __main__ -   Step: 1391, LR: 1.759931085237929e-05, Loss: 768.71044921875
2024-08-03T09:45:14.027075503Z 
 15%|█▍        | 1392/9500 [4:47:43<27:49:58, 12.36s/it]08/03/2024 02:45:14 - INFO - __main__ -   Step: 1392, LR: 1.759714030869201e-05, Loss: 686.4407958984375
2024-08-03T09:45:26.698421168Z 
 15%|█▍        | 1393/9500 [4:47:56<28:02:29, 12.45s/it]08/03/2024 02:45:26 - INFO - __main__ -   Step: 1393, LR: 1.7594969765004734e-05, Loss: 718.3206176757812
2024-08-03T09:45:39.159086046Z 
 15%|█▍        | 1394/9500 [4:48:09<28:02:37, 12.45s/it]08/03/2024 02:45:39 - INFO - __main__ -   Step: 1394, LR: 1.7592799221317454e-05, Loss: 675.33251953125
2024-08-03T09:45:51.191704107Z 
 15%|█▍        | 1395/9500 [4:48:21<27:45:18, 12.33s/it]08/03/2024 02:45:51 - INFO - __main__ -   Step: 1395, LR: 1.7590628677630173e-05, Loss: 654.6858520507812
2024-08-03T09:46:03.844385857Z 
 15%|█▍        | 1396/9500 [4:48:33<27:58:16, 12.43s/it]08/03/2024 02:46:03 - INFO - __main__ -   Step: 1396, LR: 1.7588458133942897e-05, Loss: 718.154296875
2024-08-03T09:46:15.932842658Z 
 15%|█▍        | 1397/9500 [4:48:45<27:44:24, 12.32s/it]08/03/2024 02:46:15 - INFO - __main__ -   Step: 1397, LR: 1.7586287590255616e-05, Loss: 542.222412109375
2024-08-03T09:46:28.105430956Z 
 15%|█▍        | 1398/9500 [4:48:58<27:37:59, 12.28s/it]08/03/2024 02:46:28 - INFO - __main__ -   Step: 1398, LR: 1.758411704656834e-05, Loss: 582.0531005859375
2024-08-03T09:46:40.871994793Z 
 15%|█▍        | 1399/9500 [4:49:10<27:57:37, 12.43s/it]08/03/2024 02:46:40 - INFO - __main__ -   Step: 1399, LR: 1.758194650288106e-05, Loss: 560.4393310546875
2024-08-03T09:46:52.963459558Z 
 15%|█▍        | 1400/9500 [4:49:22<27:43:53, 12.33s/it]08/03/2024 02:46:52 - INFO - __main__ -   Step: 1400, LR: 1.757977595919378e-05, Loss: 738.194580078125
2024-08-03T09:47:04.966995325Z 
 15%|█▍        | 1401/9500 [4:49:34<27:30:39, 12.23s/it]08/03/2024 02:47:04 - INFO - __main__ -   Step: 1401, LR: 1.75776054155065e-05, Loss: 503.4954833984375
2024-08-03T09:47:17.417181291Z 
 15%|█▍        | 1402/9500 [4:49:47<27:39:26, 12.30s/it]08/03/2024 02:47:17 - INFO - __main__ -   Step: 1402, LR: 1.7575434871819223e-05, Loss: 626.1531982421875
2024-08-03T09:47:29.390529627Z 
 15%|█▍        | 1403/9500 [4:49:59<27:26:11, 12.20s/it]08/03/2024 02:47:29 - INFO - __main__ -   Step: 1403, LR: 1.7573264328131942e-05, Loss: 659.4180908203125
2024-08-03T09:47:41.458892799Z 
 15%|█▍        | 1404/9500 [4:50:11<27:20:43, 12.16s/it]08/03/2024 02:47:41 - INFO - __main__ -   Step: 1404, LR: 1.7571093784444662e-05, Loss: 586.3377075195312
2024-08-03T09:47:54.178891190Z 
 15%|█▍        | 1405/9500 [4:50:24<27:43:12, 12.33s/it]08/03/2024 02:47:54 - INFO - __main__ -   Step: 1405, LR: 1.7568923240757386e-05, Loss: 618.613037109375
2024-08-03T09:48:06.228628096Z 
 15%|█▍        | 1406/9500 [4:50:36<27:31:45, 12.24s/it]08/03/2024 02:48:06 - INFO - __main__ -   Step: 1406, LR: 1.7566752697070105e-05, Loss: 468.91314697265625
2024-08-03T09:48:18.286705726Z 
 15%|█▍        | 1407/9500 [4:50:48<27:24:00, 12.19s/it]08/03/2024 02:48:18 - INFO - __main__ -   Step: 1407, LR: 1.756458215338283e-05, Loss: 677.1198120117188
2024-08-03T09:48:31.367984772Z 
 15%|█▍        | 1408/9500 [4:51:01<27:59:55, 12.46s/it]08/03/2024 02:48:31 - INFO - __main__ -   Step: 1408, LR: 1.756241160969555e-05, Loss: 762.6673583984375
2024-08-03T09:48:43.488382709Z 
 15%|█▍        | 1409/9500 [4:51:13<27:46:07, 12.36s/it]08/03/2024 02:48:43 - INFO - __main__ -   Step: 1409, LR: 1.756024106600827e-05, Loss: 821.6690673828125
2024-08-03T09:48:55.625349096Z 
 15%|█▍        | 1410/9500 [4:51:25<27:37:05, 12.29s/it]08/03/2024 02:48:55 - INFO - __main__ -   Step: 1410, LR: 1.755807052232099e-05, Loss: 661.5343017578125
2024-08-03T09:49:08.134654630Z 
 15%|█▍        | 1411/9500 [4:51:38<27:45:46, 12.36s/it]08/03/2024 02:49:08 - INFO - __main__ -   Step: 1411, LR: 1.755589997863371e-05, Loss: 690.787109375
2024-08-03T09:49:20.056776914Z 
 15%|█▍        | 1412/9500 [4:51:49<27:28:02, 12.23s/it]08/03/2024 02:49:20 - INFO - __main__ -   Step: 1412, LR: 1.7553729434946435e-05, Loss: 558.3283081054688
2024-08-03T09:49:32.045726349Z 
 15%|█▍        | 1413/9500 [4:52:01<27:18:14, 12.15s/it]08/03/2024 02:49:32 - INFO - __main__ -   Step: 1413, LR: 1.7551558891259155e-05, Loss: 577.858642578125
2024-08-03T09:49:44.108332102Z 
 15%|█▍        | 1414/9500 [4:52:14<27:14:18, 12.13s/it]08/03/2024 02:49:44 - INFO - __main__ -   Step: 1414, LR: 1.7549388347571878e-05, Loss: 563.1986694335938
2024-08-03T09:49:56.411610838Z 
 15%|█▍        | 1415/9500 [4:52:26<27:21:13, 12.18s/it]08/03/2024 02:49:56 - INFO - __main__ -   Step: 1415, LR: 1.7547217803884594e-05, Loss: 527.7998046875
2024-08-03T09:50:08.338314313Z 
 15%|█▍        | 1416/9500 [4:52:38<27:10:48, 12.10s/it]08/03/2024 02:50:08 - INFO - __main__ -   Step: 1416, LR: 1.7545047260197318e-05, Loss: 597.5975341796875
2024-08-03T09:50:20.752308261Z 
 15%|█▍        | 1417/9500 [4:52:50<27:23:08, 12.20s/it]08/03/2024 02:50:20 - INFO - __main__ -   Step: 1417, LR: 1.7542876716510038e-05, Loss: 711.04638671875
2024-08-03T09:50:33.243148728Z 
 15%|█▍        | 1418/9500 [4:53:03<27:34:48, 12.29s/it]08/03/2024 02:50:33 - INFO - __main__ -   Step: 1418, LR: 1.7540706172822757e-05, Loss: 685.7274169921875
2024-08-03T09:50:45.775033427Z 
 15%|█▍        | 1419/9500 [4:53:15<27:44:33, 12.36s/it]08/03/2024 02:50:45 - INFO - __main__ -   Step: 1419, LR: 1.753853562913548e-05, Loss: 667.018310546875
2024-08-03T09:50:58.406504145Z 
 15%|█▍        | 1420/9500 [4:53:28<27:55:21, 12.44s/it]08/03/2024 02:50:58 - INFO - __main__ -   Step: 1420, LR: 1.75363650854482e-05, Loss: 608.19970703125
2024-08-03T09:51:11.031778317Z 
 15%|█▍        | 1421/9500 [4:53:40<28:02:37, 12.50s/it]08/03/2024 02:51:11 - INFO - __main__ -   Step: 1421, LR: 1.7534194541760924e-05, Loss: 731.4991455078125
2024-08-03T09:51:23.211669295Z 
 15%|█▍        | 1422/9500 [4:53:53<27:49:38, 12.40s/it]08/03/2024 02:51:23 - INFO - __main__ -   Step: 1422, LR: 1.7532023998073644e-05, Loss: 795.541259765625
2024-08-03T09:51:35.587840513Z 
 15%|█▍        | 1423/9500 [4:54:05<27:48:24, 12.39s/it]08/03/2024 02:51:35 - INFO - __main__ -   Step: 1423, LR: 1.7529853454386367e-05, Loss: 680.2821044921875
2024-08-03T09:51:48.217749400Z 
 15%|█▍        | 1424/9500 [4:54:18<27:57:44, 12.46s/it]08/03/2024 02:51:48 - INFO - __main__ -   Step: 1424, LR: 1.7527682910699087e-05, Loss: 610.2218627929688
2024-08-03T09:52:00.188985904Z 
 15%|█▌        | 1425/9500 [4:54:30<27:37:36, 12.32s/it]08/03/2024 02:52:00 - INFO - __main__ -   Step: 1425, LR: 1.7525512367011807e-05, Loss: 529.65673828125
2024-08-03T09:52:12.876715970Z 
 15%|█▌        | 1426/9500 [4:54:42<27:52:23, 12.43s/it]08/03/2024 02:52:12 - INFO - __main__ -   Step: 1426, LR: 1.752334182332453e-05, Loss: 661.5068359375
2024-08-03T09:52:25.263054309Z 
 15%|█▌        | 1427/9500 [4:54:55<27:50:28, 12.42s/it]08/03/2024 02:52:25 - INFO - __main__ -   Step: 1427, LR: 1.752117127963725e-05, Loss: 680.1589965820312
2024-08-03T09:52:37.667854997Z 
 15%|█▌        | 1428/9500 [4:55:07<27:49:52, 12.41s/it]08/03/2024 02:52:37 - INFO - __main__ -   Step: 1428, LR: 1.7519000735949973e-05, Loss: 722.8073120117188
2024-08-03T09:52:50.112673313Z 
 15%|█▌        | 1429/9500 [4:55:20<27:50:58, 12.42s/it]08/03/2024 02:52:50 - INFO - __main__ -   Step: 1429, LR: 1.751683019226269e-05, Loss: 725.8253173828125
2024-08-03T09:53:02.747310268Z 
 15%|█▌        | 1430/9500 [4:55:32<27:59:21, 12.49s/it]08/03/2024 02:53:02 - INFO - __main__ -   Step: 1430, LR: 1.7514659648575413e-05, Loss: 679.230224609375
2024-08-03T09:53:14.753250359Z 
 15%|█▌        | 1431/9500 [4:55:44<27:39:46, 12.34s/it]08/03/2024 02:53:14 - INFO - __main__ -   Step: 1431, LR: 1.7512489104888133e-05, Loss: 612.9638671875
2024-08-03T09:53:26.776552585Z 
 15%|█▌        | 1432/9500 [4:55:56<27:26:43, 12.25s/it]08/03/2024 02:53:26 - INFO - __main__ -   Step: 1432, LR: 1.7510318561200856e-05, Loss: 919.057373046875
2024-08-03T09:53:39.017372363Z 
 15%|█▌        | 1433/9500 [4:56:08<27:26:17, 12.24s/it]08/03/2024 02:53:39 - INFO - __main__ -   Step: 1433, LR: 1.7508148017513576e-05, Loss: 556.861328125
2024-08-03T09:53:50.873336661Z 
 15%|█▌        | 1434/9500 [4:56:20<27:10:24, 12.13s/it]08/03/2024 02:53:50 - INFO - __main__ -   Step: 1434, LR: 1.7505977473826296e-05, Loss: 527.5355224609375
2024-08-03T09:54:03.145990097Z 
 15%|█▌        | 1435/9500 [4:56:33<27:16:01, 12.17s/it]08/03/2024 02:54:03 - INFO - __main__ -   Step: 1435, LR: 1.750380693013902e-05, Loss: 814.472900390625
2024-08-03T09:54:15.773139830Z 
 15%|█▌        | 1436/9500 [4:56:45<27:34:13, 12.31s/it]08/03/2024 02:54:15 - INFO - __main__ -   Step: 1436, LR: 1.750163638645174e-05, Loss: 864.2535400390625
2024-08-03T09:54:27.978050511Z 
 15%|█▌        | 1437/9500 [4:56:57<27:29:50, 12.28s/it]08/03/2024 02:54:27 - INFO - __main__ -   Step: 1437, LR: 1.7499465842764462e-05, Loss: 546.5504760742188
2024-08-03T09:54:40.254277901Z 
 15%|█▌        | 1438/9500 [4:57:10<27:29:36, 12.28s/it]08/03/2024 02:54:40 - INFO - __main__ -   Step: 1438, LR: 1.7497295299077182e-05, Loss: 664.596435546875
2024-08-03T09:54:52.895757915Z 
 15%|█▌        | 1439/9500 [4:57:22<27:44:06, 12.39s/it]08/03/2024 02:54:52 - INFO - __main__ -   Step: 1439, LR: 1.74951247553899e-05, Loss: 629.8043212890625
2024-08-03T09:55:04.945903479Z 
 15%|█▌        | 1440/9500 [4:57:34<27:30:20, 12.29s/it]08/03/2024 02:55:04 - INFO - __main__ -   Step: 1440, LR: 1.7492954211702625e-05, Loss: 664.6312255859375
2024-08-03T09:55:17.060091506Z 
 15%|█▌        | 1441/9500 [4:57:46<27:23:14, 12.23s/it]08/03/2024 02:55:17 - INFO - __main__ -   Step: 1441, LR: 1.7490783668015345e-05, Loss: 683.935546875
2024-08-03T09:55:29.700795789Z 
 15%|█▌        | 1442/9500 [4:57:59<27:39:24, 12.36s/it]08/03/2024 02:55:29 - INFO - __main__ -   Step: 1442, LR: 1.7488613124328068e-05, Loss: 580.7482299804688
2024-08-03T09:55:41.994060099Z 
 15%|█▌        | 1443/9500 [4:58:11<27:36:40, 12.34s/it]08/03/2024 02:55:41 - INFO - __main__ -   Step: 1443, LR: 1.7486442580640785e-05, Loss: 685.493896484375
2024-08-03T09:55:54.460104548Z 
 15%|█▌        | 1444/9500 [4:58:24<27:41:40, 12.38s/it]08/03/2024 02:55:54 - INFO - __main__ -   Step: 1444, LR: 1.7484272036953508e-05, Loss: 731.761474609375
2024-08-03T09:56:07.235783743Z 
 15%|█▌        | 1445/9500 [4:58:37<27:57:33, 12.50s/it]08/03/2024 02:56:07 - INFO - __main__ -   Step: 1445, LR: 1.7482101493266228e-05, Loss: 784.1329345703125
2024-08-03T09:56:19.338456156Z 
 15%|█▌        | 1446/9500 [4:58:49<27:41:31, 12.38s/it]08/03/2024 02:56:19 - INFO - __main__ -   Step: 1446, LR: 1.747993094957895e-05, Loss: 570.9361572265625
2024-08-03T09:56:31.749854439Z 
 15%|█▌        | 1447/9500 [4:59:01<27:42:39, 12.39s/it]08/03/2024 02:56:31 - INFO - __main__ -   Step: 1447, LR: 1.747776040589167e-05, Loss: 707.6544189453125
2024-08-03T09:56:44.195178905Z 
 15%|█▌        | 1448/9500 [4:59:14<27:44:46, 12.41s/it]08/03/2024 02:56:44 - INFO - __main__ -   Step: 1448, LR: 1.747558986220439e-05, Loss: 671.2979736328125
2024-08-03T09:56:56.370716231Z 
 15%|█▌        | 1449/9500 [4:59:26<27:35:18, 12.34s/it]08/03/2024 02:56:56 - INFO - __main__ -   Step: 1449, LR: 1.7473419318517114e-05, Loss: 733.267822265625
2024-08-03T09:57:08.332591216Z 
 15%|█▌        | 1450/9500 [4:59:38<27:20:02, 12.22s/it]08/03/2024 02:57:08 - INFO - __main__ -   Step: 1450, LR: 1.7471248774829834e-05, Loss: 523.4720458984375
2024-08-03T09:57:21.189523978Z 
 15%|█▌        | 1451/9500 [4:59:51<27:45:18, 12.41s/it]08/03/2024 02:57:21 - INFO - __main__ -   Step: 1451, LR: 1.7469078231142557e-05, Loss: 687.2242431640625
2024-08-03T09:57:33.410237040Z 
 15%|█▌        | 1452/9500 [5:00:03<27:37:20, 12.36s/it]08/03/2024 02:57:33 - INFO - __main__ -   Step: 1452, LR: 1.7466907687455277e-05, Loss: 824.2346801757812
2024-08-03T09:57:45.536941663Z 
 15%|█▌        | 1453/9500 [5:00:15<27:27:55, 12.29s/it]08/03/2024 02:57:45 - INFO - __main__ -   Step: 1453, LR: 1.7464737143767997e-05, Loss: 677.3367309570312
2024-08-03T09:57:58.155845833Z 
 15%|█▌        | 1454/9500 [5:00:28<27:41:03, 12.39s/it]08/03/2024 02:57:58 - INFO - __main__ -   Step: 1454, LR: 1.746256660008072e-05, Loss: 569.0715942382812
2024-08-03T09:58:10.289215382Z 
 15%|█▌        | 1455/9500 [5:00:40<27:30:38, 12.31s/it]08/03/2024 02:58:10 - INFO - __main__ -   Step: 1455, LR: 1.746039605639344e-05, Loss: 712.1467895507812
2024-08-03T09:58:22.539302297Z 
 15%|█▌        | 1456/9500 [5:00:52<27:28:01, 12.29s/it]08/03/2024 02:58:22 - INFO - __main__ -   Step: 1456, LR: 1.7458225512706163e-05, Loss: 580.6412963867188
2024-08-03T09:58:35.134662782Z 
 15%|█▌        | 1457/9500 [5:01:05<27:39:59, 12.38s/it]08/03/2024 02:58:35 - INFO - __main__ -   Step: 1457, LR: 1.745605496901888e-05, Loss: 677.2811889648438
2024-08-03T09:58:47.904091878Z 
 15%|█▌        | 1458/9500 [5:01:17<27:55:18, 12.50s/it]08/03/2024 02:58:47 - INFO - __main__ -   Step: 1458, LR: 1.7453884425331603e-05, Loss: 684.301513671875
2024-08-03T09:59:00.244131465Z 
 15%|█▌        | 1459/9500 [5:01:30<27:48:41, 12.45s/it]08/03/2024 02:59:00 - INFO - __main__ -   Step: 1459, LR: 1.7451713881644323e-05, Loss: 945.4080810546875
2024-08-03T09:59:12.579971972Z 
 15%|█▌        | 1460/9500 [5:01:42<27:43:50, 12.42s/it]08/03/2024 02:59:12 - INFO - __main__ -   Step: 1460, LR: 1.7449543337957046e-05, Loss: 555.987548828125
2024-08-03T09:59:24.969313893Z 
 15%|█▌        | 1461/9500 [5:01:54<27:42:32, 12.41s/it]08/03/2024 02:59:24 - INFO - __main__ -   Step: 1461, LR: 1.7447372794269766e-05, Loss: 574.1699829101562
2024-08-03T09:59:37.242148698Z 
 15%|█▌        | 1462/9500 [5:02:07<27:36:51, 12.37s/it]08/03/2024 02:59:37 - INFO - __main__ -   Step: 1462, LR: 1.7445202250582486e-05, Loss: 754.3709716796875
2024-08-03T09:59:49.472429411Z 
 15%|█▌        | 1463/9500 [5:02:19<27:31:09, 12.33s/it]08/03/2024 02:59:49 - INFO - __main__ -   Step: 1463, LR: 1.744303170689521e-05, Loss: 742.0098876953125
2024-08-03T10:00:02.257300257Z 
 15%|█▌        | 1464/9500 [5:02:32<27:49:21, 12.46s/it]08/03/2024 03:00:02 - INFO - __main__ -   Step: 1464, LR: 1.744086116320793e-05, Loss: 675.1212158203125
2024-08-03T10:00:14.218984977Z 
 15%|█▌        | 1465/9500 [5:02:44<27:28:57, 12.31s/it]08/03/2024 03:00:14 - INFO - __main__ -   Step: 1465, LR: 1.7438690619520652e-05, Loss: 596.552978515625
2024-08-03T10:00:26.471771947Z 
 15%|█▌        | 1466/9500 [5:02:56<27:26:20, 12.30s/it]08/03/2024 03:00:26 - INFO - __main__ -   Step: 1466, LR: 1.7436520075833372e-05, Loss: 625.479736328125
2024-08-03T10:00:38.874784942Z 
 15%|█▌        | 1467/9500 [5:03:08<27:30:26, 12.33s/it]08/03/2024 03:00:38 - INFO - __main__ -   Step: 1467, LR: 1.7434349532146092e-05, Loss: 803.81884765625
2024-08-03T10:00:51.130008526Z 
 15%|█▌        | 1468/9500 [5:03:21<27:27:20, 12.31s/it]08/03/2024 03:00:51 - INFO - __main__ -   Step: 1468, LR: 1.7432178988458815e-05, Loss: 761.8168334960938
2024-08-03T10:01:03.254199278Z 
 15%|█▌        | 1469/9500 [5:03:33<27:19:50, 12.25s/it]08/03/2024 03:01:03 - INFO - __main__ -   Step: 1469, LR: 1.7430008444771535e-05, Loss: 645.7459716796875
2024-08-03T10:01:15.674903109Z 
 15%|█▌        | 1470/9500 [5:03:45<27:26:26, 12.30s/it]08/03/2024 03:01:15 - INFO - __main__ -   Step: 1470, LR: 1.7427837901084258e-05, Loss: 704.2200927734375
2024-08-03T10:01:27.896023905Z 
 15%|█▌        | 1471/9500 [5:03:57<27:22:58, 12.28s/it]08/03/2024 03:01:27 - INFO - __main__ -   Step: 1471, LR: 1.7425667357396975e-05, Loss: 580.17578125
2024-08-03T10:01:40.088858393Z 
 15%|█▌        | 1472/9500 [5:04:10<27:19:22, 12.25s/it]08/03/2024 03:01:40 - INFO - __main__ -   Step: 1472, LR: 1.7423496813709698e-05, Loss: 679.455810546875
2024-08-03T10:01:52.714567324Z 
 16%|█▌        | 1473/9500 [5:04:22<27:34:08, 12.36s/it]08/03/2024 03:01:52 - INFO - __main__ -   Step: 1473, LR: 1.7421326270022418e-05, Loss: 651.9754638671875
2024-08-03T10:02:05.042221748Z 
 16%|█▌        | 1474/9500 [5:04:34<27:32:27, 12.35s/it]08/03/2024 03:02:05 - INFO - __main__ -   Step: 1474, LR: 1.741915572633514e-05, Loss: 556.30224609375
2024-08-03T10:02:16.992577720Z 
 16%|█▌        | 1475/9500 [5:04:46<27:16:05, 12.23s/it]08/03/2024 03:02:16 - INFO - __main__ -   Step: 1475, LR: 1.741698518264786e-05, Loss: 720.2927856445312
2024-08-03T10:02:29.573933244Z 
 16%|█▌        | 1476/9500 [5:04:59<27:29:53, 12.34s/it]08/03/2024 03:02:29 - INFO - __main__ -   Step: 1476, LR: 1.741481463896058e-05, Loss: 561.9915771484375
2024-08-03T10:02:41.928525135Z 
 16%|█▌        | 1477/9500 [5:05:11<27:30:22, 12.34s/it]08/03/2024 03:02:41 - INFO - __main__ -   Step: 1477, LR: 1.7412644095273304e-05, Loss: 821.1104736328125
2024-08-03T10:02:54.214540446Z 
 16%|█▌        | 1478/9500 [5:05:24<27:27:54, 12.33s/it]08/03/2024 03:02:54 - INFO - __main__ -   Step: 1478, LR: 1.7410473551586024e-05, Loss: 711.3631591796875
2024-08-03T10:03:06.714715440Z 
 16%|█▌        | 1479/9500 [5:05:36<27:34:43, 12.38s/it]08/03/2024 03:03:06 - INFO - __main__ -   Step: 1479, LR: 1.7408303007898747e-05, Loss: 693.236328125
2024-08-03T10:03:18.859970726Z 
 16%|█▌        | 1480/9500 [5:05:48<27:25:10, 12.31s/it]08/03/2024 03:03:18 - INFO - __main__ -   Step: 1480, LR: 1.7406132464211467e-05, Loss: 762.174560546875
2024-08-03T10:03:30.856027927Z 
 16%|█▌        | 1481/9500 [5:06:00<27:12:28, 12.21s/it]08/03/2024 03:03:30 - INFO - __main__ -   Step: 1481, LR: 1.7403961920524187e-05, Loss: 620.8948974609375
2024-08-03T10:03:43.328238313Z 
 16%|█▌        | 1482/9500 [5:06:13<27:22:35, 12.29s/it]08/03/2024 03:03:43 - INFO - __main__ -   Step: 1482, LR: 1.740179137683691e-05, Loss: 678.1213989257812
2024-08-03T10:03:55.362802628Z 
 16%|█▌        | 1483/9500 [5:06:25<27:12:04, 12.21s/it]08/03/2024 03:03:55 - INFO - __main__ -   Step: 1483, LR: 1.739962083314963e-05, Loss: 688.8751831054688
2024-08-03T10:04:07.344944559Z 
 16%|█▌        | 1484/9500 [5:06:37<27:02:33, 12.14s/it]08/03/2024 03:04:07 - INFO - __main__ -   Step: 1484, LR: 1.7397450289462353e-05, Loss: 569.6700439453125
2024-08-03T10:04:19.773364472Z 
 16%|█▌        | 1485/9500 [5:06:49<27:13:43, 12.23s/it]08/03/2024 03:04:19 - INFO - __main__ -   Step: 1485, LR: 1.739527974577507e-05, Loss: 681.1307373046875
2024-08-03T10:04:31.778246686Z 
 16%|█▌        | 1486/9500 [5:07:01<27:04:29, 12.16s/it]08/03/2024 03:04:31 - INFO - __main__ -   Step: 1486, LR: 1.7393109202087793e-05, Loss: 696.3292236328125
2024-08-03T10:04:44.217265160Z 
 16%|█▌        | 1487/9500 [5:07:14<27:15:21, 12.25s/it]08/03/2024 03:04:44 - INFO - __main__ -   Step: 1487, LR: 1.7390938658400513e-05, Loss: 705.474609375
2024-08-03T10:04:56.651959975Z 
 16%|█▌        | 1488/9500 [5:07:26<27:22:45, 12.30s/it]08/03/2024 03:04:56 - INFO - __main__ -   Step: 1488, LR: 1.7388768114713236e-05, Loss: 543.4839477539062
2024-08-03T10:05:08.378297027Z 
 16%|█▌        | 1489/9500 [5:07:38<26:59:28, 12.13s/it]08/03/2024 03:05:08 - INFO - __main__ -   Step: 1489, LR: 1.7386597571025956e-05, Loss: 572.43115234375
2024-08-03T10:05:20.393444218Z 
 16%|█▌        | 1490/9500 [5:07:50<26:54:42, 12.10s/it]08/03/2024 03:05:20 - INFO - __main__ -   Step: 1490, LR: 1.7384427027338676e-05, Loss: 682.4736328125
2024-08-03T10:05:33.156881366Z 
 16%|█▌        | 1491/9500 [5:08:03<27:21:14, 12.30s/it]08/03/2024 03:05:33 - INFO - __main__ -   Step: 1491, LR: 1.73822564836514e-05, Loss: 691.0296020507812
2024-08-03T10:05:45.457663497Z 
 16%|█▌        | 1492/9500 [5:08:15<27:21:15, 12.30s/it]08/03/2024 03:05:45 - INFO - __main__ -   Step: 1492, LR: 1.738008593996412e-05, Loss: 805.8013916015625
2024-08-03T10:05:57.547073015Z 
 16%|█▌        | 1493/9500 [5:08:27<27:12:44, 12.23s/it]08/03/2024 03:05:57 - INFO - __main__ -   Step: 1493, LR: 1.7377915396276842e-05, Loss: 661.0321044921875
2024-08-03T10:06:10.261771336Z 
 16%|█▌        | 1494/9500 [5:08:40<27:31:44, 12.38s/it]08/03/2024 03:06:10 - INFO - __main__ -   Step: 1494, LR: 1.7375744852589562e-05, Loss: 660.128662109375
2024-08-03T10:06:22.751120136Z 
 16%|█▌        | 1495/9500 [5:08:52<27:35:58, 12.41s/it]08/03/2024 03:06:22 - INFO - __main__ -   Step: 1495, LR: 1.7373574308902282e-05, Loss: 728.6945190429688
2024-08-03T10:06:35.067068817Z 
 16%|█▌        | 1496/9500 [5:09:05<27:31:54, 12.38s/it]08/03/2024 03:06:35 - INFO - __main__ -   Step: 1496, LR: 1.7371403765215005e-05, Loss: 588.8226318359375
2024-08-03T10:06:47.721401138Z 
 16%|█▌        | 1497/9500 [5:09:17<27:42:34, 12.46s/it]08/03/2024 03:06:47 - INFO - __main__ -   Step: 1497, LR: 1.7369233221527725e-05, Loss: 794.2193603515625
2024-08-03T10:06:59.613095283Z 
 16%|█▌        | 1498/9500 [5:09:29<27:19:26, 12.29s/it]08/03/2024 03:06:59 - INFO - __main__ -   Step: 1498, LR: 1.7367062677840448e-05, Loss: 524.4412841796875
2024-08-03T10:07:11.624463249Z 
 16%|█▌        | 1499/9500 [5:09:41<27:07:58, 12.21s/it]08/03/2024 03:07:11 - INFO - __main__ -   Step: 1499, LR: 1.7364892134153165e-05, Loss: 635.0706787109375
2024-08-03T10:07:23.526248534Z 
 16%|█▌        | 1500/9500 [5:09:53<26:55:30, 12.12s/it]08/03/2024 03:07:23 - INFO - __main__ -   Step: 1500, LR: 1.7362721590465888e-05, Loss: 471.9384765625
2024-08-03T10:07:36.088710199Z 
 16%|█▌        | 1501/9500 [5:10:06<27:13:06, 12.25s/it]08/03/2024 03:07:36 - INFO - __main__ -   Step: 1501, LR: 1.7360551046778608e-05, Loss: 655.731689453125
2024-08-03T10:07:48.000324431Z 
 16%|█▌        | 1502/9500 [5:10:17<26:59:25, 12.15s/it]08/03/2024 03:07:48 - INFO - __main__ -   Step: 1502, LR: 1.735838050309133e-05, Loss: 546.575439453125
2024-08-03T10:07:59.926939646Z 
 16%|█▌        | 1503/9500 [5:10:29<26:50:19, 12.08s/it]08/03/2024 03:07:59 - INFO - __main__ -   Step: 1503, LR: 1.735620995940405e-05, Loss: 633.2445068359375
2024-08-03T10:08:12.680075825Z 
 16%|█▌        | 1504/9500 [5:10:42<27:16:58, 12.28s/it]08/03/2024 03:08:12 - INFO - __main__ -   Step: 1504, LR: 1.735403941571677e-05, Loss: 545.50341796875
2024-08-03T10:08:24.796690977Z 
 16%|█▌        | 1505/9500 [5:10:54<27:10:05, 12.23s/it]08/03/2024 03:08:24 - INFO - __main__ -   Step: 1505, LR: 1.7351868872029494e-05, Loss: 685.4370727539062
2024-08-03T10:08:37.145276600Z 
 16%|█▌        | 1506/9500 [5:11:07<27:14:29, 12.27s/it]08/03/2024 03:08:37 - INFO - __main__ -   Step: 1506, LR: 1.7349698328342214e-05, Loss: 684.009033203125
2024-08-03T10:08:49.682150991Z 
 16%|█▌        | 1507/9500 [5:11:19<27:25:01, 12.35s/it]08/03/2024 03:08:49 - INFO - __main__ -   Step: 1507, LR: 1.7347527784654937e-05, Loss: 675.4091796875
2024-08-03T10:09:01.855987941Z 
 16%|█▌        | 1508/9500 [5:11:31<27:17:51, 12.30s/it]08/03/2024 03:09:01 - INFO - __main__ -   Step: 1508, LR: 1.7345357240967657e-05, Loss: 548.80517578125
2024-08-03T10:09:14.272581290Z 
 16%|█▌        | 1509/9500 [5:11:44<27:22:26, 12.33s/it]08/03/2024 03:09:14 - INFO - __main__ -   Step: 1509, LR: 1.7343186697280377e-05, Loss: 766.9781494140625
2024-08-03T10:09:26.960722813Z 
 16%|█▌        | 1510/9500 [5:11:56<27:36:28, 12.44s/it]08/03/2024 03:09:26 - INFO - __main__ -   Step: 1510, LR: 1.73410161535931e-05, Loss: 773.6813354492188
2024-08-03T10:09:39.068123112Z 
 16%|█▌        | 1511/9500 [5:12:09<27:23:01, 12.34s/it]08/03/2024 03:09:39 - INFO - __main__ -   Step: 1511, LR: 1.733884560990582e-05, Loss: 741.9854736328125
2024-08-03T10:09:51.648210923Z 
 16%|█▌        | 1512/9500 [5:12:21<27:32:23, 12.41s/it]08/03/2024 03:09:51 - INFO - __main__ -   Step: 1512, LR: 1.7336675066218543e-05, Loss: 723.1345825195312
2024-08-03T10:10:04.160767775Z 
 16%|█▌        | 1513/9500 [5:12:34<27:36:14, 12.44s/it]08/03/2024 03:10:04 - INFO - __main__ -   Step: 1513, LR: 1.733450452253126e-05, Loss: 626.1683349609375
2024-08-03T10:10:16.421659021Z 
 16%|█▌        | 1514/9500 [5:12:46<27:28:47, 12.39s/it]08/03/2024 03:10:16 - INFO - __main__ -   Step: 1514, LR: 1.7332333978843983e-05, Loss: 713.7009887695312
2024-08-03T10:10:28.522763450Z 
 16%|█▌        | 1515/9500 [5:12:58<27:17:08, 12.30s/it]08/03/2024 03:10:28 - INFO - __main__ -   Step: 1515, LR: 1.7330163435156703e-05, Loss: 686.2371826171875
2024-08-03T10:10:41.339468050Z 
 16%|█▌        | 1516/9500 [5:13:11<27:37:30, 12.46s/it]08/03/2024 03:10:41 - INFO - __main__ -   Step: 1516, LR: 1.7327992891469426e-05, Loss: 708.1156005859375
2024-08-03T10:10:53.634749223Z 
 16%|█▌        | 1517/9500 [5:13:23<27:30:52, 12.41s/it]08/03/2024 03:10:53 - INFO - __main__ -   Step: 1517, LR: 1.7325822347782146e-05, Loss: 679.4169921875
2024-08-03T10:11:05.475598218Z 
 16%|█▌        | 1518/9500 [5:13:35<27:08:02, 12.24s/it]08/03/2024 03:11:05 - INFO - __main__ -   Step: 1518, LR: 1.7323651804094866e-05, Loss: 542.7488403320312
2024-08-03T10:11:18.085442076Z 
 16%|█▌        | 1519/9500 [5:13:48<27:22:41, 12.35s/it]08/03/2024 03:11:18 - INFO - __main__ -   Step: 1519, LR: 1.732148126040759e-05, Loss: 774.1172485351562
2024-08-03T10:11:30.116098782Z 
 16%|█▌        | 1520/9500 [5:14:00<27:09:44, 12.25s/it]08/03/2024 03:11:30 - INFO - __main__ -   Step: 1520, LR: 1.731931071672031e-05, Loss: 781.1763305664062
2024-08-03T10:11:42.524894311Z 
 16%|█▌        | 1521/9500 [5:14:12<27:15:44, 12.30s/it]08/03/2024 03:11:42 - INFO - __main__ -   Step: 1521, LR: 1.7317140173033032e-05, Loss: 692.749267578125
2024-08-03T10:11:55.135719023Z 
 16%|█▌        | 1522/9500 [5:14:25<27:27:54, 12.39s/it]08/03/2024 03:11:55 - INFO - __main__ -   Step: 1522, LR: 1.7314969629345752e-05, Loss: 630.1336059570312
2024-08-03T10:12:07.160491007Z 
 16%|█▌        | 1523/9500 [5:14:37<27:12:57, 12.28s/it]08/03/2024 03:12:07 - INFO - __main__ -   Step: 1523, LR: 1.7312799085658475e-05, Loss: 511.51751708984375
2024-08-03T10:12:19.219615975Z 
 16%|█▌        | 1524/9500 [5:14:49<27:03:53, 12.22s/it]08/03/2024 03:12:19 - INFO - __main__ -   Step: 1524, LR: 1.7310628541971195e-05, Loss: 622.9320068359375
2024-08-03T10:12:31.917573902Z 
 16%|█▌        | 1525/9500 [5:15:01<27:22:54, 12.36s/it]08/03/2024 03:12:31 - INFO - __main__ -   Step: 1525, LR: 1.7308457998283915e-05, Loss: 800.7330322265625
2024-08-03T10:12:44.198075749Z 
 16%|█▌        | 1526/9500 [5:15:14<27:19:30, 12.34s/it]08/03/2024 03:12:44 - INFO - __main__ -   Step: 1526, LR: 1.730628745459664e-05, Loss: 655.8123779296875
2024-08-03T10:12:56.272827809Z 
 16%|█▌        | 1527/9500 [5:15:26<27:08:52, 12.26s/it]08/03/2024 03:12:56 - INFO - __main__ -   Step: 1527, LR: 1.7304116910909355e-05, Loss: 630.2921752929688
2024-08-03T10:13:08.823446988Z 
 16%|█▌        | 1528/9500 [5:15:38<27:20:20, 12.35s/it]08/03/2024 03:13:08 - INFO - __main__ -   Step: 1528, LR: 1.7301946367222078e-05, Loss: 645.0950927734375
2024-08-03T10:13:20.893165070Z 
 16%|█▌        | 1529/9500 [5:15:50<27:09:08, 12.26s/it]08/03/2024 03:13:20 - INFO - __main__ -   Step: 1529, LR: 1.7299775823534798e-05, Loss: 758.5287475585938
2024-08-03T10:13:32.987828226Z 
 16%|█▌        | 1530/9500 [5:16:02<27:02:12, 12.21s/it]08/03/2024 03:13:32 - INFO - __main__ -   Step: 1530, LR: 1.729760527984752e-05, Loss: 726.76904296875
2024-08-03T10:13:45.829066913Z 
 16%|█▌        | 1531/9500 [5:16:15<27:27:04, 12.40s/it]08/03/2024 03:13:45 - INFO - __main__ -   Step: 1531, LR: 1.729543473616024e-05, Loss: 774.9310302734375
2024-08-03T10:13:58.037175698Z 
 16%|█▌        | 1532/9500 [5:16:27<27:19:10, 12.34s/it]08/03/2024 03:13:58 - INFO - __main__ -   Step: 1532, LR: 1.7293264192472964e-05, Loss: 701.860107421875
2024-08-03T10:14:10.160856865Z 
 16%|█▌        | 1533/9500 [5:16:40<27:10:13, 12.28s/it]08/03/2024 03:14:10 - INFO - __main__ -   Step: 1533, LR: 1.7291093648785684e-05, Loss: 499.4784240722656
2024-08-03T10:14:22.978649157Z 
 16%|█▌        | 1534/9500 [5:16:52<27:31:32, 12.44s/it]08/03/2024 03:14:22 - INFO - __main__ -   Step: 1534, LR: 1.7288923105098404e-05, Loss: 539.9815673828125
2024-08-03T10:14:35.014322799Z 
 16%|█▌        | 1535/9500 [5:17:04<27:15:15, 12.32s/it]08/03/2024 03:14:35 - INFO - __main__ -   Step: 1535, LR: 1.7286752561411127e-05, Loss: 584.604248046875
2024-08-03T10:14:47.057657360Z 
 16%|█▌        | 1536/9500 [5:17:16<27:04:06, 12.24s/it]08/03/2024 03:14:47 - INFO - __main__ -   Step: 1536, LR: 1.7284582017723847e-05, Loss: 631.816162109375
2024-08-03T10:14:59.692666075Z 
 16%|█▌        | 1537/9500 [5:17:29<27:19:48, 12.36s/it]08/03/2024 03:14:59 - INFO - __main__ -   Step: 1537, LR: 1.728241147403657e-05, Loss: 746.66064453125
2024-08-03T10:15:12.048088597Z 
 16%|█▌        | 1538/9500 [5:17:41<27:19:34, 12.36s/it]08/03/2024 03:15:12 - INFO - __main__ -   Step: 1538, LR: 1.728024093034929e-05, Loss: 739.4853515625
2024-08-03T10:15:24.420799241Z 
 16%|█▌        | 1539/9500 [5:17:54<27:20:03, 12.36s/it]08/03/2024 03:15:24 - INFO - __main__ -   Step: 1539, LR: 1.727807038666201e-05, Loss: 729.9967041015625
2024-08-03T10:15:36.921042914Z 
 16%|█▌        | 1540/9500 [5:18:06<27:25:24, 12.40s/it]08/03/2024 03:15:36 - INFO - __main__ -   Step: 1540, LR: 1.727589984297473e-05, Loss: 696.7826538085938
2024-08-03T10:15:49.341420526Z 
 16%|█▌        | 1541/9500 [5:18:19<27:25:55, 12.41s/it]08/03/2024 03:15:49 - INFO - __main__ -   Step: 1541, LR: 1.7273729299287453e-05, Loss: 783.609619140625
2024-08-03T10:16:01.489201561Z 
 16%|█▌        | 1542/9500 [5:18:31<27:15:20, 12.33s/it]08/03/2024 03:16:01 - INFO - __main__ -   Step: 1542, LR: 1.7271558755600173e-05, Loss: 678.1119384765625
2024-08-03T10:16:13.712803600Z 
 16%|█▌        | 1543/9500 [5:18:43<27:10:54, 12.30s/it]08/03/2024 03:16:13 - INFO - __main__ -   Step: 1543, LR: 1.7269388211912893e-05, Loss: 642.2684326171875
2024-08-03T10:16:26.339885856Z 
 16%|█▋        | 1544/9500 [5:18:56<27:23:47, 12.40s/it]08/03/2024 03:16:26 - INFO - __main__ -   Step: 1544, LR: 1.7267217668225616e-05, Loss: 639.3116455078125
2024-08-03T10:16:38.390297803Z 
 16%|█▋        | 1545/9500 [5:19:08<27:09:49, 12.29s/it]08/03/2024 03:16:38 - INFO - __main__ -   Step: 1545, LR: 1.7265047124538336e-05, Loss: 662.7499389648438
2024-08-03T10:16:50.641444203Z 
 16%|█▋        | 1546/9500 [5:19:20<27:07:57, 12.28s/it]08/03/2024 03:16:50 - INFO - __main__ -   Step: 1546, LR: 1.726287658085106e-05, Loss: 655.590576171875
2024-08-03T10:17:02.947086292Z 
 16%|█▋        | 1547/9500 [5:19:32<27:08:46, 12.29s/it]08/03/2024 03:17:02 - INFO - __main__ -   Step: 1547, LR: 1.726070603716378e-05, Loss: 505.6773376464844
2024-08-03T10:17:15.103725064Z 
 16%|█▋        | 1548/9500 [5:19:45<27:03:20, 12.25s/it]08/03/2024 03:17:15 - INFO - __main__ -   Step: 1548, LR: 1.72585354934765e-05, Loss: 819.5448608398438
2024-08-03T10:17:27.615727544Z 
 16%|█▋        | 1549/9500 [5:19:57<27:13:36, 12.33s/it]08/03/2024 03:17:27 - INFO - __main__ -   Step: 1549, LR: 1.7256364949789222e-05, Loss: 770.385009765625
2024-08-03T10:17:39.870363601Z 
 16%|█▋        | 1550/9500 [5:20:09<27:10:29, 12.31s/it]08/03/2024 03:17:39 - INFO - __main__ -   Step: 1550, LR: 1.7254194406101942e-05, Loss: 698.2547607421875
2024-08-03T10:17:51.978063042Z 
 16%|█▋        | 1551/9500 [5:20:21<27:02:25, 12.25s/it]08/03/2024 03:17:51 - INFO - __main__ -   Step: 1551, LR: 1.7252023862414666e-05, Loss: 652.878662109375
2024-08-03T10:18:04.152774740Z 
 16%|█▋        | 1552/9500 [5:20:34<26:59:22, 12.22s/it]08/03/2024 03:18:04 - INFO - __main__ -   Step: 1552, LR: 1.7249853318727385e-05, Loss: 764.1333618164062
2024-08-03T10:18:16.541903395Z 
 16%|█▋        | 1553/9500 [5:20:46<27:05:41, 12.27s/it]08/03/2024 03:18:16 - INFO - __main__ -   Step: 1553, LR: 1.7247682775040105e-05, Loss: 707.2199096679688
2024-08-03T10:18:28.952470463Z 
 16%|█▋        | 1554/9500 [5:20:58<27:10:56, 12.32s/it]08/03/2024 03:18:28 - INFO - __main__ -   Step: 1554, LR: 1.7245512231352825e-05, Loss: 688.8853759765625
2024-08-03T10:18:41.306427061Z 
 16%|█▋        | 1555/9500 [5:21:11<27:12:15, 12.33s/it]08/03/2024 03:18:41 - INFO - __main__ -   Step: 1555, LR: 1.724334168766555e-05, Loss: 727.2340087890625
2024-08-03T10:18:53.876961790Z 
 16%|█▋        | 1556/9500 [5:21:23<27:21:44, 12.40s/it]08/03/2024 03:18:53 - INFO - __main__ -   Step: 1556, LR: 1.7241171143978268e-05, Loss: 654.2667236328125
2024-08-03T10:19:05.913888992Z 
 16%|█▋        | 1557/9500 [5:21:35<27:07:07, 12.29s/it]08/03/2024 03:19:05 - INFO - __main__ -   Step: 1557, LR: 1.7239000600290988e-05, Loss: 600.5364990234375
2024-08-03T10:19:18.520873065Z 
 16%|█▋        | 1558/9500 [5:21:48<27:19:28, 12.39s/it]08/03/2024 03:19:18 - INFO - __main__ -   Step: 1558, LR: 1.723683005660371e-05, Loss: 770.1389770507812
2024-08-03T10:19:31.040726355Z 
 16%|█▋        | 1559/9500 [5:22:00<27:24:34, 12.43s/it]08/03/2024 03:19:31 - INFO - __main__ -   Step: 1559, LR: 1.723465951291643e-05, Loss: 603.6619873046875
2024-08-03T10:19:43.108583773Z 
 16%|█▋        | 1560/9500 [5:22:13<27:10:08, 12.32s/it]08/03/2024 03:19:43 - INFO - __main__ -   Step: 1560, LR: 1.7232488969229154e-05, Loss: 674.4265747070312
2024-08-03T10:19:55.210458675Z 
 16%|█▋        | 1561/9500 [5:22:25<27:01:20, 12.25s/it]08/03/2024 03:19:55 - INFO - __main__ -   Step: 1561, LR: 1.7230318425541874e-05, Loss: 563.5994873046875
2024-08-03T10:20:07.647380820Z 
 16%|█▋        | 1562/9500 [5:22:37<27:08:25, 12.31s/it]08/03/2024 03:20:07 - INFO - __main__ -   Step: 1562, LR: 1.7228147881854594e-05, Loss: 702.0326538085938
2024-08-03T10:20:19.556619030Z 
 16%|█▋        | 1563/9500 [5:22:49<26:52:22, 12.19s/it]08/03/2024 03:20:19 - INFO - __main__ -   Step: 1563, LR: 1.7225977338167317e-05, Loss: 558.280029296875
2024-08-03T10:20:31.779313379Z 
 16%|█▋        | 1564/9500 [5:23:01<26:53:30, 12.20s/it]08/03/2024 03:20:31 - INFO - __main__ -   Step: 1564, LR: 1.7223806794480037e-05, Loss: 770.1636962890625
2024-08-03T10:20:44.629720998Z 
 16%|█▋        | 1565/9500 [5:23:14<27:19:10, 12.39s/it]08/03/2024 03:20:44 - INFO - __main__ -   Step: 1565, LR: 1.722163625079276e-05, Loss: 788.5778198242188
2024-08-03T10:20:56.685368839Z 
 16%|█▋        | 1566/9500 [5:23:26<27:05:30, 12.29s/it]08/03/2024 03:20:56 - INFO - __main__ -   Step: 1566, LR: 1.721946570710548e-05, Loss: 745.260986328125
2024-08-03T10:21:08.878137171Z 
 16%|█▋        | 1567/9500 [5:23:38<27:01:20, 12.26s/it]08/03/2024 03:21:08 - INFO - __main__ -   Step: 1567, LR: 1.72172951634182e-05, Loss: 626.8641357421875
2024-08-03T10:21:21.918705230Z 
 17%|█▋        | 1568/9500 [5:23:51<27:31:59, 12.50s/it]08/03/2024 03:21:21 - INFO - __main__ -   Step: 1568, LR: 1.721512461973092e-05, Loss: 752.401611328125
2024-08-03T10:21:34.087028569Z 
 17%|█▋        | 1569/9500 [5:24:04<27:18:46, 12.40s/it]08/03/2024 03:21:34 - INFO - __main__ -   Step: 1569, LR: 1.7212954076043643e-05, Loss: 689.7506103515625
2024-08-03T10:21:46.504785435Z 
 17%|█▋        | 1570/9500 [5:24:16<27:19:21, 12.40s/it]08/03/2024 03:21:46 - INFO - __main__ -   Step: 1570, LR: 1.7210783532356363e-05, Loss: 768.5712890625
2024-08-03T10:21:58.891220537Z 
 17%|█▋        | 1571/9500 [5:24:28<27:18:27, 12.40s/it]08/03/2024 03:21:58 - INFO - __main__ -   Step: 1571, LR: 1.7208612988669083e-05, Loss: 658.229736328125
2024-08-03T10:22:10.750438198Z 
 17%|█▋        | 1572/9500 [5:24:40<26:56:52, 12.24s/it]08/03/2024 03:22:10 - INFO - __main__ -   Step: 1572, LR: 1.7206442444981806e-05, Loss: 469.20281982421875
2024-08-03T10:22:23.148736526Z 
 17%|█▋        | 1573/9500 [5:24:53<27:03:05, 12.29s/it]08/03/2024 03:22:23 - INFO - __main__ -   Step: 1573, LR: 1.7204271901294526e-05, Loss: 659.479736328125
2024-08-03T10:22:35.650260032Z 
 17%|█▋        | 1574/9500 [5:25:05<27:11:27, 12.35s/it]08/03/2024 03:22:35 - INFO - __main__ -   Step: 1574, LR: 1.720210135760725e-05, Loss: 675.1720581054688
2024-08-03T10:22:47.686900372Z 
 17%|█▋        | 1575/9500 [5:25:17<26:58:49, 12.26s/it]08/03/2024 03:22:47 - INFO - __main__ -   Step: 1575, LR: 1.719993081391997e-05, Loss: 492.3594970703125
2024-08-03T10:22:59.774604448Z 
 17%|█▋        | 1576/9500 [5:25:29<26:51:57, 12.21s/it]08/03/2024 03:22:59 - INFO - __main__ -   Step: 1576, LR: 1.719776027023269e-05, Loss: 690.1856079101562
2024-08-03T10:23:12.342827075Z 
 17%|█▋        | 1577/9500 [5:25:42<27:06:06, 12.31s/it]08/03/2024 03:23:12 - INFO - __main__ -   Step: 1577, LR: 1.7195589726545413e-05, Loss: 702.6761474609375
2024-08-03T10:23:24.504138406Z 
 17%|█▋        | 1578/9500 [5:25:54<26:59:50, 12.27s/it]08/03/2024 03:23:24 - INFO - __main__ -   Step: 1578, LR: 1.7193419182858132e-05, Loss: 682.2061767578125
2024-08-03T10:23:36.682397844Z 
 17%|█▋        | 1579/9500 [5:26:06<26:56:04, 12.24s/it]08/03/2024 03:23:36 - INFO - __main__ -   Step: 1579, LR: 1.7191248639170856e-05, Loss: 649.76171875
2024-08-03T10:23:49.371951235Z 
 17%|█▋        | 1580/9500 [5:26:19<27:13:35, 12.38s/it]08/03/2024 03:23:49 - INFO - __main__ -   Step: 1580, LR: 1.7189078095483575e-05, Loss: 806.866455078125
2024-08-03T10:24:01.431590310Z 
 17%|█▋        | 1581/9500 [5:26:31<27:00:52, 12.28s/it]08/03/2024 03:24:01 - INFO - __main__ -   Step: 1581, LR: 1.7186907551796295e-05, Loss: 469.0615539550781
2024-08-03T10:24:13.671496042Z 
 17%|█▋        | 1582/9500 [5:26:43<26:59:02, 12.27s/it]08/03/2024 03:24:13 - INFO - __main__ -   Step: 1582, LR: 1.7184737008109015e-05, Loss: 704.1376953125
2024-08-03T10:24:26.099085318Z 
 17%|█▋        | 1583/9500 [5:26:56<27:05:09, 12.32s/it]08/03/2024 03:24:26 - INFO - __main__ -   Step: 1583, LR: 1.718256646442174e-05, Loss: 732.950927734375
2024-08-03T10:24:38.265222938Z 
 17%|█▋        | 1584/9500 [5:27:08<26:58:59, 12.27s/it]08/03/2024 03:24:38 - INFO - __main__ -   Step: 1584, LR: 1.718039592073446e-05, Loss: 636.575927734375
2024-08-03T10:24:50.218059962Z 
 17%|█▋        | 1585/9500 [5:27:20<26:46:11, 12.18s/it]08/03/2024 03:24:50 - INFO - __main__ -   Step: 1585, LR: 1.7178225377047178e-05, Loss: 637.1651000976562
2024-08-03T10:25:02.825079601Z 
 17%|█▋        | 1586/9500 [5:27:32<27:03:02, 12.31s/it]08/03/2024 03:25:02 - INFO - __main__ -   Step: 1586, LR: 1.71760548333599e-05, Loss: 613.7230834960938
2024-08-03T10:25:15.161889764Z 
 17%|█▋        | 1587/9500 [5:27:45<27:04:05, 12.31s/it]08/03/2024 03:25:15 - INFO - __main__ -   Step: 1587, LR: 1.717388428967262e-05, Loss: 565.0137939453125
2024-08-03T10:25:27.401126712Z 
 17%|█▋        | 1588/9500 [5:27:57<27:00:54, 12.29s/it]08/03/2024 03:25:27 - INFO - __main__ -   Step: 1588, LR: 1.7171713745985345e-05, Loss: 809.514404296875
2024-08-03T10:25:39.936803784Z 
 17%|█▋        | 1589/9500 [5:28:09<27:10:20, 12.37s/it]08/03/2024 03:25:39 - INFO - __main__ -   Step: 1589, LR: 1.7169543202298064e-05, Loss: 756.559814453125
2024-08-03T10:25:52.323004602Z 
 17%|█▋        | 1590/9500 [5:28:22<27:10:58, 12.37s/it]08/03/2024 03:25:52 - INFO - __main__ -   Step: 1590, LR: 1.7167372658610784e-05, Loss: 549.759521484375
2024-08-03T10:26:04.702141096Z 
 17%|█▋        | 1591/9500 [5:28:34<27:11:04, 12.37s/it]08/03/2024 03:26:04 - INFO - __main__ -   Step: 1591, LR: 1.7165202114923508e-05, Loss: 646.5482177734375
2024-08-03T10:26:16.801136580Z 
 17%|█▋        | 1592/9500 [5:28:46<26:59:59, 12.29s/it]08/03/2024 03:26:16 - INFO - __main__ -   Step: 1592, LR: 1.7163031571236227e-05, Loss: 613.9622192382812
2024-08-03T10:26:29.467510069Z 
 17%|█▋        | 1593/9500 [5:28:59<27:14:37, 12.40s/it]08/03/2024 03:26:29 - INFO - __main__ -   Step: 1593, LR: 1.716086102754895e-05, Loss: 716.3577880859375
2024-08-03T10:26:41.604198185Z 
 17%|█▋        | 1594/9500 [5:29:11<27:03:50, 12.32s/it]08/03/2024 03:26:41 - INFO - __main__ -   Step: 1594, LR: 1.715869048386167e-05, Loss: 649.326904296875
2024-08-03T10:26:53.871942651Z 
 17%|█▋        | 1595/9500 [5:29:23<27:01:25, 12.31s/it]08/03/2024 03:26:53 - INFO - __main__ -   Step: 1595, LR: 1.715651994017439e-05, Loss: 761.0684204101562
2024-08-03T10:27:06.632042627Z 
 17%|█▋        | 1596/9500 [5:29:36<27:19:08, 12.44s/it]08/03/2024 03:27:06 - INFO - __main__ -   Step: 1596, LR: 1.715434939648711e-05, Loss: 659.800537109375
2024-08-03T10:27:19.072960815Z 
 17%|█▋        | 1597/9500 [5:29:49<27:18:50, 12.44s/it]08/03/2024 03:27:19 - INFO - __main__ -   Step: 1597, LR: 1.7152178852799834e-05, Loss: 601.838134765625
2024-08-03T10:27:31.124341420Z 
 17%|█▋        | 1598/9500 [5:30:01<27:03:12, 12.33s/it]08/03/2024 03:27:31 - INFO - __main__ -   Step: 1598, LR: 1.7150008309112553e-05, Loss: 492.4423522949219
2024-08-03T10:27:43.707430819Z 
 17%|█▋        | 1599/9500 [5:30:13<27:13:10, 12.40s/it]08/03/2024 03:27:43 - INFO - __main__ -   Step: 1599, LR: 1.7147837765425273e-05, Loss: 700.839111328125
2024-08-03T10:27:55.799710138Z 
 17%|█▋        | 1600/9500 [5:30:25<27:00:44, 12.31s/it]08/03/2024 03:27:55 - INFO - __main__ -   Step: 1600, LR: 1.7145667221737997e-05, Loss: 839.74755859375
2024-08-03T10:28:07.939415677Z 
 17%|█▋        | 1601/9500 [5:30:37<26:53:50, 12.26s/it]08/03/2024 03:28:07 - INFO - __main__ -   Step: 1601, LR: 1.7143496678050716e-05, Loss: 666.7081298828125
2024-08-03T10:28:20.180865071Z 
 17%|█▋        | 1602/9500 [5:30:50<26:52:56, 12.25s/it]08/03/2024 03:28:20 - INFO - __main__ -   Step: 1602, LR: 1.714132613436344e-05, Loss: 569.462158203125
2024-08-03T10:28:32.217069873Z 
 17%|█▋        | 1603/9500 [5:31:02<26:44:10, 12.19s/it]08/03/2024 03:28:32 - INFO - __main__ -   Step: 1603, LR: 1.713915559067616e-05, Loss: 503.2947082519531
2024-08-03T10:28:44.246700314Z 
 17%|█▋        | 1604/9500 [5:31:14<26:37:42, 12.14s/it]08/03/2024 03:28:44 - INFO - __main__ -   Step: 1604, LR: 1.713698504698888e-05, Loss: 540.118408203125
2024-08-03T10:28:57.019030971Z 
 17%|█▋        | 1605/9500 [5:31:26<27:02:26, 12.33s/it]08/03/2024 03:28:57 - INFO - __main__ -   Step: 1605, LR: 1.7134814503301603e-05, Loss: 661.2340698242188
2024-08-03T10:29:09.126979106Z 
 17%|█▋        | 1606/9500 [5:31:39<26:53:28, 12.26s/it]08/03/2024 03:29:09 - INFO - __main__ -   Step: 1606, LR: 1.7132643959614322e-05, Loss: 692.1529541015625
2024-08-03T10:29:21.309764632Z 
 17%|█▋        | 1607/9500 [5:31:51<26:50:04, 12.24s/it]08/03/2024 03:29:21 - INFO - __main__ -   Step: 1607, LR: 1.7130473415927046e-05, Loss: 532.6994018554688
2024-08-03T10:29:34.177298501Z 
 17%|█▋        | 1608/9500 [5:32:04<27:14:40, 12.43s/it]08/03/2024 03:29:34 - INFO - __main__ -   Step: 1608, LR: 1.7128302872239766e-05, Loss: 693.2659912109375
2024-08-03T10:29:46.256435762Z 
 17%|█▋        | 1609/9500 [5:32:16<27:00:42, 12.32s/it]08/03/2024 03:29:46 - INFO - __main__ -   Step: 1609, LR: 1.712613232855249e-05, Loss: 553.4736328125
2024-08-03T10:29:58.462188130Z 
 17%|█▋        | 1610/9500 [5:32:28<26:55:51, 12.29s/it]08/03/2024 03:29:58 - INFO - __main__ -   Step: 1610, LR: 1.7123961784865205e-05, Loss: 619.5764770507812
2024-08-03T10:30:11.111301016Z 
 17%|█▋        | 1611/9500 [5:32:41<27:09:54, 12.40s/it]08/03/2024 03:30:11 - INFO - __main__ -   Step: 1611, LR: 1.712179124117793e-05, Loss: 809.6649169921875
2024-08-03T10:30:23.182426779Z 
 17%|█▋        | 1612/9500 [5:32:53<26:56:52, 12.30s/it]08/03/2024 03:30:23 - INFO - __main__ -   Step: 1612, LR: 1.711962069749065e-05, Loss: 591.0020751953125
2024-08-03T10:30:35.450952007Z 
 17%|█▋        | 1613/9500 [5:33:05<26:55:28, 12.29s/it]08/03/2024 03:30:35 - INFO - __main__ -   Step: 1613, LR: 1.711745015380337e-05, Loss: 606.24169921875
2024-08-03T10:30:48.030553384Z 
 17%|█▋        | 1614/9500 [5:33:17<27:06:42, 12.38s/it]08/03/2024 03:30:48 - INFO - __main__ -   Step: 1614, LR: 1.711527961011609e-05, Loss: 894.807861328125
2024-08-03T10:31:00.127377381Z 
 17%|█▋        | 1615/9500 [5:33:30<26:55:28, 12.29s/it]08/03/2024 03:31:00 - INFO - __main__ -   Step: 1615, LR: 1.711310906642881e-05, Loss: 634.93017578125
2024-08-03T10:31:12.340395447Z 
 17%|█▋        | 1616/9500 [5:33:42<26:52:06, 12.27s/it]08/03/2024 03:31:12 - INFO - __main__ -   Step: 1616, LR: 1.7110938522741535e-05, Loss: 678.182373046875
2024-08-03T10:31:25.025675977Z 
 17%|█▋        | 1617/9500 [5:33:54<27:08:20, 12.39s/it]08/03/2024 03:31:25 - INFO - __main__ -   Step: 1617, LR: 1.7108767979054255e-05, Loss: 656.462890625
2024-08-03T10:31:37.353718918Z 
 17%|█▋        | 1618/9500 [5:34:07<27:05:31, 12.37s/it]08/03/2024 03:31:37 - INFO - __main__ -   Step: 1618, LR: 1.7106597435366978e-05, Loss: 729.5
2024-08-03T10:31:49.505032957Z 
 17%|█▋        | 1619/9500 [5:34:19<26:56:33, 12.31s/it]08/03/2024 03:31:49 - INFO - __main__ -   Step: 1619, LR: 1.7104426891679698e-05, Loss: 444.9744567871094
2024-08-03T10:32:02.087790203Z 
 17%|█▋        | 1620/9500 [5:34:32<27:07:13, 12.39s/it]08/03/2024 03:32:02 - INFO - __main__ -   Step: 1620, LR: 1.7102256347992418e-05, Loss: 540.651611328125
2024-08-03T10:32:14.514322160Z 
 17%|█▋        | 1621/9500 [5:34:44<27:08:26, 12.40s/it]08/03/2024 03:32:14 - INFO - __main__ -   Step: 1621, LR: 1.710008580430514e-05, Loss: 568.6256103515625
2024-08-03T10:32:26.566346671Z 
 17%|█▋        | 1622/9500 [5:34:56<26:54:29, 12.30s/it]08/03/2024 03:32:26 - INFO - __main__ -   Step: 1622, LR: 1.709791526061786e-05, Loss: 682.88427734375
2024-08-03T10:32:39.148771724Z 
 17%|█▋        | 1623/9500 [5:35:09<27:05:33, 12.38s/it]08/03/2024 03:32:39 - INFO - __main__ -   Step: 1623, LR: 1.7095744716930584e-05, Loss: 613.4357299804688
2024-08-03T10:32:51.515113247Z 
 17%|█▋        | 1624/9500 [5:35:21<27:04:44, 12.38s/it]08/03/2024 03:32:51 - INFO - __main__ -   Step: 1624, LR: 1.70935741732433e-05, Loss: 720.3392333984375
2024-08-03T10:33:03.692586885Z 
 17%|█▋        | 1625/9500 [5:35:33<26:56:33, 12.32s/it]08/03/2024 03:33:03 - INFO - __main__ -   Step: 1625, LR: 1.7091403629556024e-05, Loss: 770.2005615234375
2024-08-03T10:33:16.663084652Z 
 17%|█▋        | 1626/9500 [5:35:46<27:22:09, 12.51s/it]08/03/2024 03:33:16 - INFO - __main__ -   Step: 1626, LR: 1.7089233085868744e-05, Loss: 756.5623779296875
2024-08-03T10:33:28.943969812Z 
 17%|█▋        | 1627/9500 [5:35:58<27:12:50, 12.44s/it]08/03/2024 03:33:28 - INFO - __main__ -   Step: 1627, LR: 1.7087062542181467e-05, Loss: 661.8878173828125
2024-08-03T10:33:40.699741380Z 
 17%|█▋        | 1628/9500 [5:36:10<26:45:32, 12.24s/it]08/03/2024 03:33:40 - INFO - __main__ -   Step: 1628, LR: 1.7084891998494187e-05, Loss: 528.599853515625
2024-08-03T10:33:52.995922117Z 
 17%|█▋        | 1629/9500 [5:36:22<26:47:40, 12.26s/it]08/03/2024 03:33:52 - INFO - __main__ -   Step: 1629, LR: 1.7082721454806907e-05, Loss: 744.0762329101562
2024-08-03T10:34:05.907291149Z 
 17%|█▋        | 1630/9500 [5:36:35<27:13:17, 12.45s/it]08/03/2024 03:34:05 - INFO - __main__ -   Step: 1630, LR: 1.708055091111963e-05, Loss: 611.8494262695312
2024-08-03T10:34:18.201860526Z 
 17%|█▋        | 1631/9500 [5:36:48<27:06:52, 12.40s/it]08/03/2024 03:34:18 - INFO - __main__ -   Step: 1631, LR: 1.707838036743235e-05, Loss: 766.7467651367188
2024-08-03T10:34:30.315545177Z 
 17%|█▋        | 1632/9500 [5:37:00<26:55:13, 12.32s/it]08/03/2024 03:34:30 - INFO - __main__ -   Step: 1632, LR: 1.7076209823745073e-05, Loss: 658.9196166992188
2024-08-03T10:34:42.837823522Z 
 17%|█▋        | 1633/9500 [5:37:12<27:03:05, 12.38s/it]08/03/2024 03:34:42 - INFO - __main__ -   Step: 1633, LR: 1.7074039280057793e-05, Loss: 630.6007080078125
2024-08-03T10:34:54.946984571Z 
 17%|█▋        | 1634/9500 [5:37:24<26:52:15, 12.30s/it]08/03/2024 03:34:54 - INFO - __main__ -   Step: 1634, LR: 1.7071868736370513e-05, Loss: 616.559814453125
2024-08-03T10:35:07.178527453Z 
 17%|█▋        | 1635/9500 [5:37:37<26:49:26, 12.28s/it]08/03/2024 03:35:07 - INFO - __main__ -   Step: 1635, LR: 1.7069698192683236e-05, Loss: 729.2696533203125
2024-08-03T10:35:19.749239784Z 
 17%|█▋        | 1636/9500 [5:37:49<27:00:44, 12.37s/it]08/03/2024 03:35:19 - INFO - __main__ -   Step: 1636, LR: 1.7067527648995956e-05, Loss: 952.1729736328125
2024-08-03T10:35:31.945876464Z 
 17%|█▋        | 1637/9500 [5:38:01<26:53:52, 12.32s/it]08/03/2024 03:35:31 - INFO - __main__ -   Step: 1637, LR: 1.706535710530868e-05, Loss: 491.9582214355469
2024-08-03T10:35:44.147127782Z 
 17%|█▋        | 1638/9500 [5:38:14<26:49:13, 12.28s/it]08/03/2024 03:35:44 - INFO - __main__ -   Step: 1638, LR: 1.7063186561621395e-05, Loss: 514.2069702148438
2024-08-03T10:35:56.716738189Z 
 17%|█▋        | 1639/9500 [5:38:26<27:00:20, 12.37s/it]08/03/2024 03:35:56 - INFO - __main__ -   Step: 1639, LR: 1.706101601793412e-05, Loss: 747.3355712890625
2024-08-03T10:36:08.916773155Z 
 17%|█▋        | 1640/9500 [5:38:38<26:53:34, 12.32s/it]08/03/2024 03:36:08 - INFO - __main__ -   Step: 1640, LR: 1.705884547424684e-05, Loss: 724.9481201171875
2024-08-03T10:36:21.097880937Z 
 17%|█▋        | 1641/9500 [5:38:51<26:48:00, 12.28s/it]08/03/2024 03:36:21 - INFO - __main__ -   Step: 1641, LR: 1.7056674930559562e-05, Loss: 625.5240478515625
2024-08-03T10:36:33.627578197Z 
 17%|█▋        | 1642/9500 [5:39:03<26:57:44, 12.35s/it]08/03/2024 03:36:33 - INFO - __main__ -   Step: 1642, LR: 1.7054504386872282e-05, Loss: 647.47998046875
2024-08-03T10:36:46.044032548Z 
 17%|█▋        | 1643/9500 [5:39:15<27:00:04, 12.37s/it]08/03/2024 03:36:46 - INFO - __main__ -   Step: 1643, LR: 1.7052333843185e-05, Loss: 738.0215454101562
2024-08-03T10:36:58.058245963Z 
 17%|█▋        | 1644/9500 [5:39:27<26:45:48, 12.26s/it]08/03/2024 03:36:58 - INFO - __main__ -   Step: 1644, LR: 1.7050163299497725e-05, Loss: 612.304931640625
2024-08-03T10:37:10.613386903Z 
 17%|█▋        | 1645/9500 [5:39:40<26:57:01, 12.35s/it]08/03/2024 03:37:10 - INFO - __main__ -   Step: 1645, LR: 1.7047992755810445e-05, Loss: 629.4986572265625
2024-08-03T10:37:22.838521937Z 
 17%|█▋        | 1646/9500 [5:39:52<26:51:51, 12.31s/it]08/03/2024 03:37:22 - INFO - __main__ -   Step: 1646, LR: 1.7045822212123168e-05, Loss: 733.559326171875
2024-08-03T10:37:34.992922878Z 
 17%|█▋        | 1647/9500 [5:40:04<26:45:24, 12.27s/it]08/03/2024 03:37:34 - INFO - __main__ -   Step: 1647, LR: 1.7043651668435888e-05, Loss: 632.7841796875
2024-08-03T10:37:47.860310853Z 
 17%|█▋        | 1648/9500 [5:40:17<27:08:48, 12.45s/it]08/03/2024 03:37:47 - INFO - __main__ -   Step: 1648, LR: 1.7041481124748608e-05, Loss: 644.7545166015625
2024-08-03T10:38:00.208300366Z 
 17%|█▋        | 1649/9500 [5:40:30<27:04:44, 12.42s/it]08/03/2024 03:38:00 - INFO - __main__ -   Step: 1649, LR: 1.703931058106133e-05, Loss: 701.453125
2024-08-03T10:38:12.505239040Z 
 17%|█▋        | 1650/9500 [5:40:42<26:59:49, 12.38s/it]08/03/2024 03:38:12 - INFO - __main__ -   Step: 1650, LR: 1.703714003737405e-05, Loss: 664.822998046875
2024-08-03T10:38:25.245125840Z 
 17%|█▋        | 1651/9500 [5:40:55<27:13:41, 12.49s/it]08/03/2024 03:38:25 - INFO - __main__ -   Step: 1651, LR: 1.7034969493686774e-05, Loss: 721.4216918945312
2024-08-03T10:38:37.461537232Z 
 17%|█▋        | 1652/9500 [5:41:07<27:02:50, 12.41s/it]08/03/2024 03:38:37 - INFO - __main__ -   Step: 1652, LR: 1.703279894999949e-05, Loss: 758.4603881835938
2024-08-03T10:38:49.776717474Z 
 17%|█▋        | 1653/9500 [5:41:19<26:59:00, 12.38s/it]08/03/2024 03:38:49 - INFO - __main__ -   Step: 1653, LR: 1.7030628406312214e-05, Loss: 734.9664306640625
2024-08-03T10:39:02.458307585Z 
 17%|█▋        | 1654/9500 [5:41:32<27:10:40, 12.47s/it]08/03/2024 03:39:02 - INFO - __main__ -   Step: 1654, LR: 1.7028457862624934e-05, Loss: 639.7457275390625
2024-08-03T10:39:14.740131677Z 
 17%|█▋        | 1655/9500 [5:41:44<27:03:03, 12.41s/it]08/03/2024 03:39:14 - INFO - __main__ -   Step: 1655, LR: 1.7026287318937657e-05, Loss: 580.8663330078125
2024-08-03T10:39:26.834414058Z 
 17%|█▋        | 1656/9500 [5:41:56<26:50:20, 12.32s/it]08/03/2024 03:39:26 - INFO - __main__ -   Step: 1656, LR: 1.7024116775250377e-05, Loss: 701.8215942382812
2024-08-03T10:39:39.271765074Z 
 17%|█▋        | 1657/9500 [5:42:09<26:54:49, 12.35s/it]08/03/2024 03:39:39 - INFO - __main__ -   Step: 1657, LR: 1.7021946231563097e-05, Loss: 540.62109375
2024-08-03T10:39:51.205506973Z 
 17%|█▋        | 1658/9500 [5:42:21<26:38:10, 12.23s/it]08/03/2024 03:39:51 - INFO - __main__ -   Step: 1658, LR: 1.701977568787582e-05, Loss: 668.7083129882812
2024-08-03T10:40:03.239102934Z 
 17%|█▋        | 1659/9500 [5:42:33<26:30:21, 12.17s/it]08/03/2024 03:40:03 - INFO - __main__ -   Step: 1659, LR: 1.701760514418854e-05, Loss: 491.9122009277344
2024-08-03T10:40:15.925014009Z 
 17%|█▋        | 1660/9500 [5:42:45<26:50:23, 12.32s/it]08/03/2024 03:40:15 - INFO - __main__ -   Step: 1660, LR: 1.7015434600501263e-05, Loss: 666.6973876953125
2024-08-03T10:40:27.946131463Z 
 17%|█▋        | 1661/9500 [5:42:57<26:38:17, 12.23s/it]08/03/2024 03:40:27 - INFO - __main__ -   Step: 1661, LR: 1.7013264056813983e-05, Loss: 623.65087890625
2024-08-03T10:40:40.240207423Z 
 17%|█▋        | 1662/9500 [5:43:10<26:40:27, 12.25s/it]08/03/2024 03:40:40 - INFO - __main__ -   Step: 1662, LR: 1.7011093513126703e-05, Loss: 762.0001220703125
2024-08-03T10:40:52.749728413Z 
 18%|█▊        | 1663/9500 [5:43:22<26:50:22, 12.33s/it]08/03/2024 03:40:52 - INFO - __main__ -   Step: 1663, LR: 1.7008922969439426e-05, Loss: 698.8699951171875
2024-08-03T10:41:05.155679778Z 
 18%|█▊        | 1664/9500 [5:43:35<26:53:09, 12.35s/it]08/03/2024 03:41:05 - INFO - __main__ -   Step: 1664, LR: 1.7006752425752146e-05, Loss: 596.73291015625
2024-08-03T10:41:17.234490123Z 
 18%|█▊        | 1665/9500 [5:43:47<26:42:16, 12.27s/it]08/03/2024 03:41:17 - INFO - __main__ -   Step: 1665, LR: 1.700458188206487e-05, Loss: 554.6041870117188
2024-08-03T10:41:29.750055623Z 
 18%|█▊        | 1666/9500 [5:43:59<26:51:40, 12.34s/it]08/03/2024 03:41:29 - INFO - __main__ -   Step: 1666, LR: 1.7002411338377586e-05, Loss: 719.1754150390625
2024-08-03T10:41:42.093375951Z 
 18%|█▊        | 1667/9500 [5:44:12<26:51:26, 12.34s/it]08/03/2024 03:41:42 - INFO - __main__ -   Step: 1667, LR: 1.700024079469031e-05, Loss: 613.0177001953125
2024-08-03T10:41:54.254854572Z 
 18%|█▊        | 1668/9500 [5:44:24<26:44:07, 12.29s/it]08/03/2024 03:41:54 - INFO - __main__ -   Step: 1668, LR: 1.699807025100303e-05, Loss: 587.37255859375
2024-08-03T10:42:06.924480185Z 
 18%|█▊        | 1669/9500 [5:44:36<26:58:49, 12.40s/it]08/03/2024 03:42:06 - INFO - __main__ -   Step: 1669, LR: 1.6995899707315752e-05, Loss: 786.3838500976562
2024-08-03T10:42:19.282045535Z 
 18%|█▊        | 1670/9500 [5:44:49<26:56:49, 12.39s/it]08/03/2024 03:42:19 - INFO - __main__ -   Step: 1670, LR: 1.6993729163628472e-05, Loss: 755.1890869140625
2024-08-03T10:42:31.498526766Z 
 18%|█▊        | 1671/9500 [5:45:01<26:49:51, 12.34s/it]08/03/2024 03:42:31 - INFO - __main__ -   Step: 1671, LR: 1.699155861994119e-05, Loss: 556.3187255859375
2024-08-03T10:42:43.993183395Z 
 18%|█▊        | 1672/9500 [5:45:13<26:55:46, 12.38s/it]08/03/2024 03:42:43 - INFO - __main__ -   Step: 1672, LR: 1.6989388076253915e-05, Loss: 756.54052734375
2024-08-03T10:42:56.700225129Z 
 18%|█▊        | 1673/9500 [5:45:26<27:08:11, 12.48s/it]08/03/2024 03:42:56 - INFO - __main__ -   Step: 1673, LR: 1.6987217532566635e-05, Loss: 624.5565185546875
2024-08-03T10:43:08.874870253Z 
 18%|█▊        | 1674/9500 [5:45:38<26:55:58, 12.39s/it]08/03/2024 03:43:08 - INFO - __main__ -   Step: 1674, LR: 1.6985046988879358e-05, Loss: 604.1463012695312
2024-08-03T10:43:21.003436089Z 
 18%|█▊        | 1675/9500 [5:45:50<26:45:34, 12.31s/it]08/03/2024 03:43:21 - INFO - __main__ -   Step: 1675, LR: 1.6982876445192078e-05, Loss: 745.989990234375
2024-08-03T10:43:33.561374650Z 
 18%|█▊        | 1676/9500 [5:46:03<26:55:01, 12.39s/it]08/03/2024 03:43:33 - INFO - __main__ -   Step: 1676, LR: 1.6980705901504798e-05, Loss: 612.9553833007812
2024-08-03T10:43:45.818423860Z 
 18%|█▊        | 1677/9500 [5:46:15<26:49:48, 12.35s/it]08/03/2024 03:43:45 - INFO - __main__ -   Step: 1677, LR: 1.697853535781752e-05, Loss: 639.11328125
2024-08-03T10:43:57.707885378Z 
 18%|█▊        | 1678/9500 [5:46:27<26:31:43, 12.21s/it]08/03/2024 03:43:57 - INFO - __main__ -   Step: 1678, LR: 1.697636481413024e-05, Loss: 666.4923095703125
2024-08-03T10:44:10.218668321Z 
 18%|█▊        | 1679/9500 [5:46:40<26:43:17, 12.30s/it]08/03/2024 03:44:10 - INFO - __main__ -   Step: 1679, LR: 1.6974194270442964e-05, Loss: 627.479248046875
2024-08-03T10:44:22.243850323Z 
 18%|█▊        | 1680/9500 [5:46:52<26:32:21, 12.22s/it]08/03/2024 03:44:22 - INFO - __main__ -   Step: 1680, LR: 1.697202372675568e-05, Loss: 565.635009765625
2024-08-03T10:44:34.506850981Z 
 18%|█▊        | 1681/9500 [5:47:04<26:33:55, 12.23s/it]08/03/2024 03:44:34 - INFO - __main__ -   Step: 1681, LR: 1.6969853183068404e-05, Loss: 700.17578125
2024-08-03T10:44:47.085789119Z 
 18%|█▊        | 1682/9500 [5:47:17<26:47:19, 12.34s/it]08/03/2024 03:44:47 - INFO - __main__ -   Step: 1682, LR: 1.6967682639381124e-05, Loss: 647.9501342773438
2024-08-03T10:44:59.294951534Z 
 18%|█▊        | 1683/9500 [5:47:29<26:42:10, 12.30s/it]08/03/2024 03:44:59 - INFO - __main__ -   Step: 1683, LR: 1.6965512095693847e-05, Loss: 641.3749389648438
2024-08-03T10:45:11.699120622Z 
 18%|█▊        | 1684/9500 [5:47:41<26:46:08, 12.33s/it]08/03/2024 03:45:11 - INFO - __main__ -   Step: 1684, LR: 1.6963341552006567e-05, Loss: 631.005615234375
2024-08-03T10:45:24.552902682Z 
 18%|█▊        | 1685/9500 [5:47:54<27:06:24, 12.49s/it]08/03/2024 03:45:24 - INFO - __main__ -   Step: 1685, LR: 1.6961171008319287e-05, Loss: 623.7384643554688
2024-08-03T10:45:36.940180683Z 
 18%|█▊        | 1686/9500 [5:48:06<27:02:18, 12.46s/it]08/03/2024 03:45:36 - INFO - __main__ -   Step: 1686, LR: 1.695900046463201e-05, Loss: 800.130126953125
2024-08-03T10:45:49.054440240Z 
 18%|█▊        | 1687/9500 [5:48:18<26:48:42, 12.35s/it]08/03/2024 03:45:49 - INFO - __main__ -   Step: 1687, LR: 1.695682992094473e-05, Loss: 649.135009765625
2024-08-03T10:46:01.785173844Z 
 18%|█▊        | 1688/9500 [5:48:31<27:03:13, 12.47s/it]08/03/2024 03:46:01 - INFO - __main__ -   Step: 1688, LR: 1.6954659377257453e-05, Loss: 754.1017456054688
2024-08-03T10:46:13.889109686Z 
 18%|█▊        | 1689/9500 [5:48:43<26:48:49, 12.36s/it]08/03/2024 03:46:13 - INFO - __main__ -   Step: 1689, LR: 1.6952488833570173e-05, Loss: 681.274169921875
2024-08-03T10:46:26.329062475Z 
 18%|█▊        | 1690/9500 [5:48:56<26:51:48, 12.38s/it]08/03/2024 03:46:26 - INFO - __main__ -   Step: 1690, LR: 1.6950318289882893e-05, Loss: 843.4461669921875
2024-08-03T10:46:38.921859414Z 
 18%|█▊        | 1691/9500 [5:49:08<26:59:48, 12.45s/it]08/03/2024 03:46:38 - INFO - __main__ -   Step: 1691, LR: 1.6948147746195616e-05, Loss: 557.1890258789062
2024-08-03T10:46:51.017886719Z 
 18%|█▊        | 1692/9500 [5:49:20<26:45:56, 12.34s/it]08/03/2024 03:46:51 - INFO - __main__ -   Step: 1692, LR: 1.6945977202508336e-05, Loss: 645.583984375
2024-08-03T10:47:03.118053995Z 
 18%|█▊        | 1693/9500 [5:49:33<26:36:20, 12.27s/it]08/03/2024 03:47:03 - INFO - __main__ -   Step: 1693, LR: 1.6943806658821056e-05, Loss: 617.929931640625
2024-08-03T10:47:15.631800847Z 
 18%|█▊        | 1694/9500 [5:49:45<26:45:42, 12.34s/it]08/03/2024 03:47:15 - INFO - __main__ -   Step: 1694, LR: 1.6941636115133776e-05, Loss: 659.706298828125
2024-08-03T10:47:27.899018974Z 
 18%|█▊        | 1695/9500 [5:49:57<26:42:35, 12.32s/it]08/03/2024 03:47:27 - INFO - __main__ -   Step: 1695, LR: 1.69394655714465e-05, Loss: 805.6177978515625
2024-08-03T10:47:40.072588824Z 
 18%|█▊        | 1696/9500 [5:50:10<26:36:40, 12.28s/it]08/03/2024 03:47:40 - INFO - __main__ -   Step: 1696, LR: 1.693729502775922e-05, Loss: 608.1657104492188
2024-08-03T10:47:52.807655320Z 
 18%|█▊        | 1697/9500 [5:50:22<26:54:23, 12.41s/it]08/03/2024 03:47:52 - INFO - __main__ -   Step: 1697, LR: 1.6935124484071942e-05, Loss: 766.1079711914062
2024-08-03T10:48:04.675154133Z 
 18%|█▊        | 1698/9500 [5:50:34<26:32:52, 12.25s/it]08/03/2024 03:48:04 - INFO - __main__ -   Step: 1698, LR: 1.6932953940384662e-05, Loss: 508.054443359375
2024-08-03T10:48:16.454288832Z 
 18%|█▊        | 1699/9500 [5:50:46<26:14:18, 12.11s/it]08/03/2024 03:48:16 - INFO - __main__ -   Step: 1699, LR: 1.6930783396697382e-05, Loss: 456.18438720703125
2024-08-03T10:48:29.005824020Z 
 18%|█▊        | 1700/9500 [5:50:58<26:31:23, 12.24s/it]08/03/2024 03:48:29 - INFO - __main__ -   Step: 1700, LR: 1.6928612853010105e-05, Loss: 528.6864013671875
2024-08-03T10:48:41.261065790Z 
 18%|█▊        | 1701/9500 [5:51:11<26:31:43, 12.25s/it]08/03/2024 03:48:41 - INFO - __main__ -   Step: 1701, LR: 1.6926442309322825e-05, Loss: 657.572021484375
2024-08-03T10:48:53.418981094Z 
 18%|█▊        | 1702/9500 [5:51:23<26:28:06, 12.22s/it]08/03/2024 03:48:53 - INFO - __main__ -   Step: 1702, LR: 1.6924271765635548e-05, Loss: 601.27734375
2024-08-03T10:49:06.174726968Z 
 18%|█▊        | 1703/9500 [5:51:36<26:48:48, 12.38s/it]08/03/2024 03:49:06 - INFO - __main__ -   Step: 1703, LR: 1.6922101221948268e-05, Loss: 786.89013671875
2024-08-03T10:49:19.085171373Z 
 18%|█▊        | 1704/9500 [5:51:49<27:09:16, 12.54s/it]08/03/2024 03:49:19 - INFO - __main__ -   Step: 1704, LR: 1.6919930678260988e-05, Loss: 828.663818359375
2024-08-03T10:49:31.553704662Z 
 18%|█▊        | 1705/9500 [5:52:01<27:06:18, 12.52s/it]08/03/2024 03:49:31 - INFO - __main__ -   Step: 1705, LR: 1.691776013457371e-05, Loss: 617.92822265625
2024-08-03T10:49:44.169769158Z 
 18%|█▊        | 1706/9500 [5:52:14<27:09:54, 12.55s/it]08/03/2024 03:49:44 - INFO - __main__ -   Step: 1706, LR: 1.691558959088643e-05, Loss: 567.8348388671875
2024-08-03T10:49:56.713649770Z 
 18%|█▊        | 1707/9500 [5:52:26<27:09:33, 12.55s/it]08/03/2024 03:49:56 - INFO - __main__ -   Step: 1707, LR: 1.691341904719915e-05, Loss: 829.3050537109375
2024-08-03T10:50:08.714810843Z 
 18%|█▊        | 1708/9500 [5:52:38<26:48:06, 12.38s/it]08/03/2024 03:50:08 - INFO - __main__ -   Step: 1708, LR: 1.691124850351187e-05, Loss: 549.5408935546875
2024-08-03T10:50:21.000036980Z 
 18%|█▊        | 1709/9500 [5:52:50<26:44:06, 12.35s/it]08/03/2024 03:50:20 - INFO - __main__ -   Step: 1709, LR: 1.6909077959824594e-05, Loss: 589.24853515625
2024-08-03T10:50:32.905998202Z 
 18%|█▊        | 1710/9500 [5:53:02<26:26:28, 12.22s/it]08/03/2024 03:50:32 - INFO - __main__ -   Step: 1710, LR: 1.6906907416137314e-05, Loss: 558.9598388671875
2024-08-03T10:50:44.826215599Z 
 18%|█▊        | 1711/9500 [5:53:14<26:14:37, 12.13s/it]08/03/2024 03:50:44 - INFO - __main__ -   Step: 1711, LR: 1.6904736872450037e-05, Loss: 571.802001953125
2024-08-03T10:50:57.210853918Z 
 18%|█▊        | 1712/9500 [5:53:27<26:24:20, 12.21s/it]08/03/2024 03:50:57 - INFO - __main__ -   Step: 1712, LR: 1.6902566328762757e-05, Loss: 643.2566528320312
2024-08-03T10:51:09.350830860Z 
 18%|█▊        | 1713/9500 [5:53:39<26:21:34, 12.19s/it]08/03/2024 03:51:09 - INFO - __main__ -   Step: 1713, LR: 1.6900395785075477e-05, Loss: 556.31494140625
2024-08-03T10:51:21.632403430Z 
 18%|█▊        | 1714/9500 [5:53:51<26:25:04, 12.21s/it]08/03/2024 03:51:21 - INFO - __main__ -   Step: 1714, LR: 1.68982252413882e-05, Loss: 678.5369873046875
2024-08-03T10:51:34.401561546Z 
 18%|█▊        | 1715/9500 [5:54:04<26:46:27, 12.38s/it]08/03/2024 03:51:34 - INFO - __main__ -   Step: 1715, LR: 1.689605469770092e-05, Loss: 668.3214111328125
2024-08-03T10:51:47.343489405Z 
 18%|█▊        | 1716/9500 [5:54:17<27:08:04, 12.55s/it]08/03/2024 03:51:47 - INFO - __main__ -   Step: 1716, LR: 1.6893884154013643e-05, Loss: 629.147216796875
2024-08-03T10:51:59.618996944Z 
 18%|█▊        | 1717/9500 [5:54:29<26:57:12, 12.47s/it]08/03/2024 03:51:59 - INFO - __main__ -   Step: 1717, LR: 1.6891713610326363e-05, Loss: 708.68115234375
2024-08-03T10:52:11.692783281Z 
 18%|█▊        | 1718/9500 [5:54:41<26:41:41, 12.35s/it]08/03/2024 03:52:11 - INFO - __main__ -   Step: 1718, LR: 1.6889543066639086e-05, Loss: 549.8570556640625
2024-08-03T10:52:24.692224159Z 
 18%|█▊        | 1719/9500 [5:54:54<27:06:47, 12.54s/it]08/03/2024 03:52:24 - INFO - __main__ -   Step: 1719, LR: 1.6887372522951806e-05, Loss: 751.78173828125
2024-08-03T10:52:36.811820637Z 
 18%|█▊        | 1720/9500 [5:55:06<26:50:03, 12.42s/it]08/03/2024 03:52:36 - INFO - __main__ -   Step: 1720, LR: 1.6885201979264526e-05, Loss: 735.7119140625
2024-08-03T10:52:48.977025600Z 
 18%|█▊        | 1721/9500 [5:55:18<26:40:03, 12.34s/it]08/03/2024 03:52:48 - INFO - __main__ -   Step: 1721, LR: 1.6883031435577246e-05, Loss: 597.6664428710938
2024-08-03T10:53:01.520051160Z 
 18%|█▊        | 1722/9500 [5:55:31<26:47:42, 12.40s/it]08/03/2024 03:53:01 - INFO - __main__ -   Step: 1722, LR: 1.6880860891889966e-05, Loss: 620.8934936523438
2024-08-03T10:53:13.782150303Z 
 18%|█▊        | 1723/9500 [5:55:43<26:42:02, 12.36s/it]08/03/2024 03:53:13 - INFO - __main__ -   Step: 1723, LR: 1.687869034820269e-05, Loss: 685.5979614257812
2024-08-03T10:53:26.174592076Z 
 18%|█▊        | 1724/9500 [5:55:56<26:43:06, 12.37s/it]08/03/2024 03:53:26 - INFO - __main__ -   Step: 1724, LR: 1.687651980451541e-05, Loss: 699.179443359375
2024-08-03T10:53:38.699383940Z 
 18%|█▊        | 1725/9500 [5:56:08<26:48:56, 12.42s/it]08/03/2024 03:53:38 - INFO - __main__ -   Step: 1725, LR: 1.6874349260828132e-05, Loss: 659.1156005859375
2024-08-03T10:53:51.118573446Z 
 18%|█▊        | 1726/9500 [5:56:21<26:48:50, 12.42s/it]08/03/2024 03:53:51 - INFO - __main__ -   Step: 1726, LR: 1.6872178717140852e-05, Loss: 422.051025390625
2024-08-03T10:54:03.195984522Z 
 18%|█▊        | 1727/9500 [5:56:33<26:35:25, 12.32s/it]08/03/2024 03:54:03 - INFO - __main__ -   Step: 1727, LR: 1.6870008173453575e-05, Loss: 631.3172607421875
2024-08-03T10:54:15.712686434Z 
 18%|█▊        | 1728/9500 [5:56:45<26:43:03, 12.38s/it]08/03/2024 03:54:15 - INFO - __main__ -   Step: 1728, LR: 1.6867837629766295e-05, Loss: 727.0721435546875
2024-08-03T10:54:28.065127259Z 
 18%|█▊        | 1729/9500 [5:56:58<26:41:56, 12.37s/it]08/03/2024 03:54:28 - INFO - __main__ -   Step: 1729, LR: 1.6865667086079015e-05, Loss: 622.169677734375
2024-08-03T10:54:40.108738106Z 
 18%|█▊        | 1730/9500 [5:57:10<26:29:06, 12.27s/it]08/03/2024 03:54:40 - INFO - __main__ -   Step: 1730, LR: 1.6863496542391738e-05, Loss: 726.450927734375
2024-08-03T10:54:52.845128309Z 
 18%|█▊        | 1731/9500 [5:57:22<26:46:59, 12.41s/it]08/03/2024 03:54:52 - INFO - __main__ -   Step: 1731, LR: 1.6861325998704458e-05, Loss: 735.5357055664062
2024-08-03T10:55:05.166668631Z 
 18%|█▊        | 1732/9500 [5:57:35<26:43:18, 12.38s/it]08/03/2024 03:55:05 - INFO - __main__ -   Step: 1732, LR: 1.685915545501718e-05, Loss: 810.615966796875
2024-08-03T10:55:17.130110289Z 
 18%|█▊        | 1733/9500 [5:57:47<26:26:46, 12.26s/it]08/03/2024 03:55:17 - INFO - __main__ -   Step: 1733, LR: 1.68569849113299e-05, Loss: 601.769287109375
2024-08-03T10:55:30.185785735Z 
 18%|█▊        | 1734/9500 [5:58:00<26:57:33, 12.50s/it]08/03/2024 03:55:30 - INFO - __main__ -   Step: 1734, LR: 1.685481436764262e-05, Loss: 730.249267578125
2024-08-03T10:55:42.596851347Z 
 18%|█▊        | 1735/9500 [5:58:12<26:53:59, 12.47s/it]08/03/2024 03:55:42 - INFO - __main__ -   Step: 1735, LR: 1.685264382395534e-05, Loss: 805.05419921875
2024-08-03T10:55:54.570358284Z 
 18%|█▊        | 1736/9500 [5:58:24<26:34:27, 12.32s/it]08/03/2024 03:55:54 - INFO - __main__ -   Step: 1736, LR: 1.6850473280268064e-05, Loss: 619.1735229492188
2024-08-03T10:56:07.409651974Z 
 18%|█▊        | 1737/9500 [5:58:37<26:54:20, 12.48s/it]08/03/2024 03:56:07 - INFO - __main__ -   Step: 1737, LR: 1.6848302736580784e-05, Loss: 649.1846313476562
2024-08-03T10:56:19.418376706Z 
 18%|█▊        | 1738/9500 [5:58:49<26:35:57, 12.34s/it]08/03/2024 03:56:19 - INFO - __main__ -   Step: 1738, LR: 1.6846132192893504e-05, Loss: 567.939453125
2024-08-03T10:56:31.755950134Z 
 18%|█▊        | 1739/9500 [5:59:01<26:35:46, 12.34s/it]08/03/2024 03:56:31 - INFO - __main__ -   Step: 1739, LR: 1.6843961649206227e-05, Loss: 631.66796875
2024-08-03T10:56:44.708450425Z 
 18%|█▊        | 1740/9500 [5:59:14<26:59:25, 12.52s/it]08/03/2024 03:56:44 - INFO - __main__ -   Step: 1740, LR: 1.6841791105518947e-05, Loss: 592.8741455078125
2024-08-03T10:56:57.142479054Z 
 18%|█▊        | 1741/9500 [5:59:27<26:55:51, 12.50s/it]08/03/2024 03:56:57 - INFO - __main__ -   Step: 1741, LR: 1.683962056183167e-05, Loss: 648.034912109375
2024-08-03T10:57:09.123982458Z 
 18%|█▊        | 1742/9500 [5:59:39<26:35:43, 12.34s/it]08/03/2024 03:57:09 - INFO - __main__ -   Step: 1742, LR: 1.683745001814439e-05, Loss: 524.0341796875
2024-08-03T10:57:21.455948281Z 
 18%|█▊        | 1743/9500 [5:59:51<26:35:08, 12.34s/it]08/03/2024 03:57:21 - INFO - __main__ -   Step: 1743, LR: 1.683527947445711e-05, Loss: 582.0838012695312
2024-08-03T10:57:33.849458066Z 
 18%|█▊        | 1744/9500 [6:00:03<26:37:04, 12.35s/it]08/03/2024 03:57:33 - INFO - __main__ -   Step: 1744, LR: 1.6833108930769833e-05, Loss: 652.0579833984375
2024-08-03T10:57:46.233492221Z 
 18%|█▊        | 1745/9500 [6:00:16<26:38:00, 12.36s/it]08/03/2024 03:57:46 - INFO - __main__ -   Step: 1745, LR: 1.6830938387082553e-05, Loss: 508.738525390625
2024-08-03T10:57:58.885174559Z 
 18%|█▊        | 1746/9500 [6:00:28<26:48:58, 12.45s/it]08/03/2024 03:57:58 - INFO - __main__ -   Step: 1746, LR: 1.6828767843395276e-05, Loss: 640.3946533203125
2024-08-03T10:58:11.120753297Z 
 18%|█▊        | 1747/9500 [6:00:41<26:40:26, 12.39s/it]08/03/2024 03:58:11 - INFO - __main__ -   Step: 1747, LR: 1.6826597299707996e-05, Loss: 429.5877685546875
2024-08-03T10:58:23.450792378Z 
 18%|█▊        | 1748/9500 [6:00:53<26:38:04, 12.37s/it]08/03/2024 03:58:23 - INFO - __main__ -   Step: 1748, LR: 1.6824426756020716e-05, Loss: 674.968017578125
2024-08-03T10:58:36.018751967Z 
 18%|█▊        | 1749/9500 [6:01:05<26:45:34, 12.43s/it]08/03/2024 03:58:36 - INFO - __main__ -   Step: 1749, LR: 1.6822256212333436e-05, Loss: 697.809326171875
2024-08-03T10:58:48.251750022Z 
 18%|█▊        | 1750/9500 [6:01:18<26:37:47, 12.37s/it]08/03/2024 03:58:48 - INFO - __main__ -   Step: 1750, LR: 1.682008566864616e-05, Loss: 766.8316650390625
2024-08-03T10:59:00.441018875Z 
 18%|█▊        | 1751/9500 [6:01:30<26:30:35, 12.32s/it]08/03/2024 03:59:00 - INFO - __main__ -   Step: 1751, LR: 1.681791512495888e-05, Loss: 630.7243041992188
2024-08-03T10:59:12.998185642Z 
 18%|█▊        | 1752/9500 [6:01:42<26:39:44, 12.39s/it]08/03/2024 03:59:12 - INFO - __main__ -   Step: 1752, LR: 1.68157445812716e-05, Loss: 738.341796875
2024-08-03T10:59:25.331594427Z 
 18%|█▊        | 1753/9500 [6:01:55<26:37:23, 12.37s/it]08/03/2024 03:59:25 - INFO - __main__ -   Step: 1753, LR: 1.6813574037584322e-05, Loss: 673.8347778320312
2024-08-03T10:59:37.253321140Z 
 18%|█▊        | 1754/9500 [6:02:07<26:19:45, 12.24s/it]08/03/2024 03:59:37 - INFO - __main__ -   Step: 1754, LR: 1.6811403493897042e-05, Loss: 563.4111938476562
2024-08-03T10:59:49.790279136Z 
 18%|█▊        | 1755/9500 [6:02:19<26:31:11, 12.33s/it]08/03/2024 03:59:49 - INFO - __main__ -   Step: 1755, LR: 1.6809232950209765e-05, Loss: 614.3206787109375
2024-08-03T11:00:02.016483266Z 
 18%|█▊        | 1756/9500 [6:02:31<26:27:05, 12.30s/it]08/03/2024 04:00:02 - INFO - __main__ -   Step: 1756, LR: 1.6807062406522485e-05, Loss: 699.277099609375
2024-08-03T11:00:14.199091765Z 
 18%|█▊        | 1757/9500 [6:02:44<26:22:27, 12.26s/it]08/03/2024 04:00:14 - INFO - __main__ -   Step: 1757, LR: 1.6804891862835205e-05, Loss: 555.986328125
2024-08-03T11:00:26.536831230Z 
 19%|█▊        | 1758/9500 [6:02:56<26:25:10, 12.28s/it]08/03/2024 04:00:26 - INFO - __main__ -   Step: 1758, LR: 1.680272131914793e-05, Loss: 630.3194580078125
2024-08-03T11:00:39.465009163Z 
 19%|█▊        | 1759/9500 [6:03:09<26:49:51, 12.48s/it]08/03/2024 04:00:39 - INFO - __main__ -   Step: 1759, LR: 1.6800550775460648e-05, Loss: 733.1171875
2024-08-03T11:00:51.686514320Z 
 19%|█▊        | 1760/9500 [6:03:21<26:39:44, 12.40s/it]08/03/2024 04:00:51 - INFO - __main__ -   Step: 1760, LR: 1.679838023177337e-05, Loss: 765.8836669921875
2024-08-03T11:01:03.983424364Z 
 19%|█▊        | 1761/9500 [6:03:33<26:35:29, 12.37s/it]08/03/2024 04:01:03 - INFO - __main__ -   Step: 1761, LR: 1.679620968808609e-05, Loss: 527.8367919921875
2024-08-03T11:01:16.655115139Z 
 19%|█▊        | 1762/9500 [6:03:46<26:46:58, 12.46s/it]08/03/2024 04:01:16 - INFO - __main__ -   Step: 1762, LR: 1.679403914439881e-05, Loss: 543.4461669921875
2024-08-03T11:01:28.677244612Z 
 19%|█▊        | 1763/9500 [6:03:58<26:29:48, 12.33s/it]08/03/2024 04:01:28 - INFO - __main__ -   Step: 1763, LR: 1.679186860071153e-05, Loss: 535.2127685546875
2024-08-03T11:01:40.842039784Z 
 19%|█▊        | 1764/9500 [6:04:10<26:23:16, 12.28s/it]08/03/2024 04:01:40 - INFO - __main__ -   Step: 1764, LR: 1.6789698057024254e-05, Loss: 748.711669921875
2024-08-03T11:01:53.507806927Z 
 19%|█▊        | 1765/9500 [6:04:23<26:37:58, 12.40s/it]08/03/2024 04:01:53 - INFO - __main__ -   Step: 1765, LR: 1.6787527513336974e-05, Loss: 553.4446411132812
2024-08-03T11:02:05.674078720Z 
 19%|█▊        | 1766/9500 [6:04:35<26:28:54, 12.33s/it]08/03/2024 04:02:05 - INFO - __main__ -   Step: 1766, LR: 1.6785356969649694e-05, Loss: 649.3775024414062
2024-08-03T11:02:17.871923174Z 
 19%|█▊        | 1767/9500 [6:04:47<26:23:43, 12.29s/it]08/03/2024 04:02:17 - INFO - __main__ -   Step: 1767, LR: 1.6783186425962417e-05, Loss: 689.861328125
2024-08-03T11:02:30.605662818Z 
 19%|█▊        | 1768/9500 [6:05:00<26:40:45, 12.42s/it]08/03/2024 04:02:30 - INFO - __main__ -   Step: 1768, LR: 1.6781015882275137e-05, Loss: 721.876708984375
2024-08-03T11:02:42.802027288Z 
 19%|█▊        | 1769/9500 [6:05:12<26:31:49, 12.35s/it]08/03/2024 04:02:42 - INFO - __main__ -   Step: 1769, LR: 1.677884533858786e-05, Loss: 572.7432250976562
2024-08-03T11:02:54.840187405Z 
 19%|█▊        | 1770/9500 [6:05:24<26:19:25, 12.26s/it]08/03/2024 04:02:54 - INFO - __main__ -   Step: 1770, LR: 1.677667479490058e-05, Loss: 621.8338623046875
2024-08-03T11:03:07.617674450Z 
 19%|█▊        | 1771/9500 [6:05:37<26:39:13, 12.41s/it]08/03/2024 04:03:07 - INFO - __main__ -   Step: 1771, LR: 1.67745042512133e-05, Loss: 796.9847412109375
2024-08-03T11:03:19.871466091Z 
 19%|█▊        | 1772/9500 [6:05:49<26:32:48, 12.37s/it]08/03/2024 04:03:19 - INFO - __main__ -   Step: 1772, LR: 1.6772333707526023e-05, Loss: 725.683837890625
2024-08-03T11:03:31.983473122Z 
 19%|█▊        | 1773/9500 [6:06:01<26:22:46, 12.29s/it]08/03/2024 04:03:31 - INFO - __main__ -   Step: 1773, LR: 1.6770163163838743e-05, Loss: 712.50927734375
2024-08-03T11:03:45.171022472Z 
 19%|█▊        | 1774/9500 [6:06:15<26:57:13, 12.56s/it]08/03/2024 04:03:45 - INFO - __main__ -   Step: 1774, LR: 1.6767992620151467e-05, Loss: 712.2665405273438
2024-08-03T11:03:57.728786311Z 
 19%|█▊        | 1775/9500 [6:06:27<26:56:57, 12.56s/it]08/03/2024 04:03:57 - INFO - __main__ -   Step: 1775, LR: 1.6765822076464186e-05, Loss: 755.5386962890625
2024-08-03T11:04:09.909336883Z 
 19%|█▊        | 1776/9500 [6:06:39<26:42:08, 12.45s/it]08/03/2024 04:04:09 - INFO - __main__ -   Step: 1776, LR: 1.6763651532776906e-05, Loss: 739.6573486328125
2024-08-03T11:04:22.341484569Z 
 19%|█▊        | 1777/9500 [6:06:52<26:41:24, 12.44s/it]08/03/2024 04:04:22 - INFO - __main__ -   Step: 1777, LR: 1.6761480989089626e-05, Loss: 541.5279541015625
2024-08-03T11:04:34.581814105Z 
 19%|█▊        | 1778/9500 [6:07:04<26:33:26, 12.38s/it]08/03/2024 04:04:34 - INFO - __main__ -   Step: 1778, LR: 1.675931044540235e-05, Loss: 630.8338623046875
2024-08-03T11:04:47.014511290Z 
 19%|█▊        | 1779/9500 [6:07:16<26:35:14, 12.40s/it]08/03/2024 04:04:47 - INFO - __main__ -   Step: 1779, LR: 1.675713990171507e-05, Loss: 625.868896484375
2024-08-03T11:04:59.497317712Z 
 19%|█▊        | 1780/9500 [6:07:29<26:38:21, 12.42s/it]08/03/2024 04:04:59 - INFO - __main__ -   Step: 1780, LR: 1.675496935802779e-05, Loss: 640.255859375
2024-08-03T11:05:11.932484949Z 
 19%|█▊        | 1781/9500 [6:07:41<26:38:38, 12.43s/it]08/03/2024 04:05:11 - INFO - __main__ -   Step: 1781, LR: 1.6752798814340512e-05, Loss: 633.7619018554688
2024-08-03T11:05:24.049491718Z 
 19%|█▉        | 1782/9500 [6:07:53<26:26:29, 12.33s/it]08/03/2024 04:05:24 - INFO - __main__ -   Step: 1782, LR: 1.6750628270653232e-05, Loss: 749.606201171875
2024-08-03T11:05:36.611805730Z 
 19%|█▉        | 1783/9500 [6:08:06<26:35:07, 12.40s/it]08/03/2024 04:05:36 - INFO - __main__ -   Step: 1783, LR: 1.6748457726965956e-05, Loss: 616.8795776367188
2024-08-03T11:05:48.912347293Z 
 19%|█▉        | 1784/9500 [6:08:18<26:30:59, 12.37s/it]08/03/2024 04:05:48 - INFO - __main__ -   Step: 1784, LR: 1.6746287183278675e-05, Loss: 602.1952514648438
2024-08-03T11:06:00.968953298Z 
 19%|█▉        | 1785/9500 [6:08:30<26:18:37, 12.28s/it]08/03/2024 04:06:00 - INFO - __main__ -   Step: 1785, LR: 1.6744116639591395e-05, Loss: 489.99822998046875
2024-08-03T11:06:13.665230958Z 
 19%|█▉        | 1786/9500 [6:08:43<26:34:35, 12.40s/it]08/03/2024 04:06:13 - INFO - __main__ -   Step: 1786, LR: 1.674194609590412e-05, Loss: 663.0799560546875
2024-08-03T11:06:25.322724807Z 
 19%|█▉        | 1787/9500 [6:08:55<26:05:38, 12.18s/it]08/03/2024 04:06:25 - INFO - __main__ -   Step: 1787, LR: 1.673977555221684e-05, Loss: 396.960693359375
2024-08-03T11:06:37.110930143Z 
 19%|█▉        | 1788/9500 [6:09:07<25:50:21, 12.06s/it]08/03/2024 04:06:37 - INFO - __main__ -   Step: 1788, LR: 1.673760500852956e-05, Loss: 479.91558837890625
2024-08-03T11:06:49.905125140Z 
 19%|█▉        | 1789/9500 [6:09:19<26:18:24, 12.28s/it]08/03/2024 04:06:49 - INFO - __main__ -   Step: 1789, LR: 1.673543446484228e-05, Loss: 618.1992797851562
2024-08-03T11:07:02.037284659Z 
 19%|█▉        | 1790/9500 [6:09:31<26:12:23, 12.24s/it]08/03/2024 04:07:02 - INFO - __main__ -   Step: 1790, LR: 1.6733263921155e-05, Loss: 676.5592041015625
2024-08-03T11:07:14.559958215Z 
 19%|█▉        | 1791/9500 [6:09:44<26:23:15, 12.32s/it]08/03/2024 04:07:14 - INFO - __main__ -   Step: 1791, LR: 1.673109337746772e-05, Loss: 692.3189086914062
2024-08-03T11:07:26.918865106Z 
 19%|█▉        | 1792/9500 [6:09:56<26:24:26, 12.33s/it]08/03/2024 04:07:26 - INFO - __main__ -   Step: 1792, LR: 1.6728922833780444e-05, Loss: 559.936279296875
2024-08-03T11:07:38.869021761Z 
 19%|█▉        | 1793/9500 [6:10:08<26:09:27, 12.22s/it]08/03/2024 04:07:38 - INFO - __main__ -   Step: 1793, LR: 1.6726752290093164e-05, Loss: 725.8450317382812
2024-08-03T11:07:50.899810628Z 
 19%|█▉        | 1794/9500 [6:10:20<26:02:02, 12.16s/it]08/03/2024 04:07:50 - INFO - __main__ -   Step: 1794, LR: 1.6724581746405884e-05, Loss: 658.2994384765625
2024-08-03T11:08:03.720389604Z 
 19%|█▉        | 1795/9500 [6:10:33<26:27:10, 12.36s/it]08/03/2024 04:08:03 - INFO - __main__ -   Step: 1795, LR: 1.6722411202718607e-05, Loss: 752.5972290039062
2024-08-03T11:08:15.966233684Z 
 19%|█▉        | 1796/9500 [6:10:45<26:22:36, 12.33s/it]08/03/2024 04:08:15 - INFO - __main__ -   Step: 1796, LR: 1.6720240659031327e-05, Loss: 696.640869140625
2024-08-03T11:08:27.961465575Z 
 19%|█▉        | 1797/9500 [6:10:57<26:09:40, 12.23s/it]08/03/2024 04:08:27 - INFO - __main__ -   Step: 1797, LR: 1.671807011534405e-05, Loss: 575.6011962890625
2024-08-03T11:08:40.589521663Z 
 19%|█▉        | 1798/9500 [6:11:10<26:24:56, 12.35s/it]08/03/2024 04:08:40 - INFO - __main__ -   Step: 1798, LR: 1.671589957165677e-05, Loss: 682.857666015625
2024-08-03T11:08:52.851583249Z 
 19%|█▉        | 1799/9500 [6:11:22<26:21:27, 12.32s/it]08/03/2024 04:08:52 - INFO - __main__ -   Step: 1799, LR: 1.671372902796949e-05, Loss: 675.76171875
2024-08-03T11:09:05.027213989Z 
 19%|█▉        | 1800/9500 [6:11:34<26:15:38, 12.28s/it]08/03/2024 04:09:05 - INFO - __main__ -   Step: 1800, LR: 1.6711558484282214e-05, Loss: 586.0194091796875
2024-08-03T11:09:17.081857604Z 
 19%|█▉        | 1801/9500 [6:11:47<26:06:50, 12.21s/it]08/03/2024 04:09:17 - INFO - __main__ -   Step: 1801, LR: 1.6709387940594933e-05, Loss: 717.0482177734375
2024-08-03T11:09:29.834503264Z 
 19%|█▉        | 1802/9500 [6:11:59<26:27:30, 12.37s/it]08/03/2024 04:09:29 - INFO - __main__ -   Step: 1802, LR: 1.6707217396907657e-05, Loss: 560.7302856445312
2024-08-03T11:09:41.861736692Z 
 19%|█▉        | 1803/9500 [6:12:11<26:13:58, 12.27s/it]08/03/2024 04:09:41 - INFO - __main__ -   Step: 1803, LR: 1.6705046853220377e-05, Loss: 656.5637817382812
2024-08-03T11:09:54.128954443Z 
 19%|█▉        | 1804/9500 [6:12:24<26:13:40, 12.27s/it]08/03/2024 04:09:54 - INFO - __main__ -   Step: 1804, LR: 1.67028763095331e-05, Loss: 611.81298828125
2024-08-03T11:10:06.725821809Z 
 19%|█▉        | 1805/9500 [6:12:36<26:26:05, 12.37s/it]08/03/2024 04:10:06 - INFO - __main__ -   Step: 1805, LR: 1.6700705765845816e-05, Loss: 625.3792724609375
2024-08-03T11:10:19.000168542Z 
 19%|█▉        | 1806/9500 [6:12:48<26:22:19, 12.34s/it]08/03/2024 04:10:19 - INFO - __main__ -   Step: 1806, LR: 1.669853522215854e-05, Loss: 667.767333984375
2024-08-03T11:10:30.905967906Z 
 19%|█▉        | 1807/9500 [6:13:00<26:05:26, 12.21s/it]08/03/2024 04:10:30 - INFO - __main__ -   Step: 1807, LR: 1.669636467847126e-05, Loss: 538.6569213867188
2024-08-03T11:10:43.496482102Z 
 19%|█▉        | 1808/9500 [6:13:13<26:19:53, 12.32s/it]08/03/2024 04:10:43 - INFO - __main__ -   Step: 1808, LR: 1.669419413478398e-05, Loss: 563.235107421875
2024-08-03T11:10:55.618661098Z 
 19%|█▉        | 1809/9500 [6:13:25<26:11:56, 12.26s/it]08/03/2024 04:10:55 - INFO - __main__ -   Step: 1809, LR: 1.6692023591096703e-05, Loss: 781.7607421875
2024-08-03T11:11:07.596092695Z 
 19%|█▉        | 1810/9500 [6:13:37<26:00:45, 12.18s/it]08/03/2024 04:11:07 - INFO - __main__ -   Step: 1810, LR: 1.6689853047409422e-05, Loss: 520.2593994140625
2024-08-03T11:11:20.379668073Z 
 19%|█▉        | 1811/9500 [6:13:50<26:23:51, 12.36s/it]08/03/2024 04:11:20 - INFO - __main__ -   Step: 1811, LR: 1.6687682503722146e-05, Loss: 770.8490600585938
2024-08-03T11:11:32.525081938Z 
 19%|█▉        | 1812/9500 [6:14:02<26:15:24, 12.30s/it]08/03/2024 04:11:32 - INFO - __main__ -   Step: 1812, LR: 1.6685511960034865e-05, Loss: 615.1043701171875
2024-08-03T11:11:44.966943677Z 
 19%|█▉        | 1813/9500 [6:14:14<26:20:50, 12.34s/it]08/03/2024 04:11:44 - INFO - __main__ -   Step: 1813, LR: 1.668334141634759e-05, Loss: 654.5526123046875
2024-08-03T11:11:57.635907362Z 
 19%|█▉        | 1814/9500 [6:14:27<26:33:19, 12.44s/it]08/03/2024 04:11:57 - INFO - __main__ -   Step: 1814, LR: 1.668117087266031e-05, Loss: 652.7169189453125
2024-08-03T11:12:10.009840557Z 
 19%|█▉        | 1815/9500 [6:14:39<26:30:38, 12.42s/it]08/03/2024 04:12:10 - INFO - __main__ -   Step: 1815, LR: 1.667900032897303e-05, Loss: 548.4420776367188
2024-08-03T11:12:22.266090467Z 
 19%|█▉        | 1816/9500 [6:14:52<26:24:11, 12.37s/it]08/03/2024 04:12:22 - INFO - __main__ -   Step: 1816, LR: 1.6676829785285752e-05, Loss: 683.3788452148438
2024-08-03T11:12:34.916268037Z 
 19%|█▉        | 1817/9500 [6:15:04<26:34:40, 12.45s/it]08/03/2024 04:12:34 - INFO - __main__ -   Step: 1817, LR: 1.667465924159847e-05, Loss: 659.0052490234375
2024-08-03T11:12:47.174256425Z 
 19%|█▉        | 1818/9500 [6:15:17<26:27:01, 12.40s/it]08/03/2024 04:12:47 - INFO - __main__ -   Step: 1818, LR: 1.6672488697911195e-05, Loss: 691.4003295898438
2024-08-03T11:12:59.755080003Z 
 19%|█▉        | 1819/9500 [6:15:29<26:33:56, 12.45s/it]08/03/2024 04:12:59 - INFO - __main__ -   Step: 1819, LR: 1.667031815422391e-05, Loss: 619.7822265625
2024-08-03T11:13:12.593016927Z 
 19%|█▉        | 1820/9500 [6:15:42<26:48:35, 12.57s/it]08/03/2024 04:13:12 - INFO - __main__ -   Step: 1820, LR: 1.6668147610536635e-05, Loss: 641.5039672851562
2024-08-03T11:13:24.954359000Z 
 19%|█▉        | 1821/9500 [6:15:54<26:40:29, 12.51s/it]08/03/2024 04:13:24 - INFO - __main__ -   Step: 1821, LR: 1.6665977066849354e-05, Loss: 880.1793212890625
2024-08-03T11:13:37.236313232Z 
 19%|█▉        | 1822/9500 [6:16:07<26:31:41, 12.44s/it]08/03/2024 04:13:37 - INFO - __main__ -   Step: 1822, LR: 1.6663806523162078e-05, Loss: 671.226806640625
2024-08-03T11:13:49.851677792Z 
 19%|█▉        | 1823/9500 [6:16:19<26:38:17, 12.49s/it]08/03/2024 04:13:49 - INFO - __main__ -   Step: 1823, LR: 1.6661635979474798e-05, Loss: 785.5225219726562
2024-08-03T11:14:02.191647661Z 
 19%|█▉        | 1824/9500 [6:16:32<26:32:15, 12.45s/it]08/03/2024 04:14:02 - INFO - __main__ -   Step: 1824, LR: 1.6659465435787517e-05, Loss: 735.4022216796875
2024-08-03T11:14:14.349388376Z 
 19%|█▉        | 1825/9500 [6:16:44<26:20:58, 12.36s/it]08/03/2024 04:14:14 - INFO - __main__ -   Step: 1825, LR: 1.665729489210024e-05, Loss: 560.3953857421875
2024-08-03T11:14:26.856990472Z 
 19%|█▉        | 1826/9500 [6:16:56<26:26:28, 12.40s/it]08/03/2024 04:14:26 - INFO - __main__ -   Step: 1826, LR: 1.665512434841296e-05, Loss: 577.064453125
2024-08-03T11:14:39.100618977Z 
 19%|█▉        | 1827/9500 [6:17:09<26:20:05, 12.36s/it]08/03/2024 04:14:39 - INFO - __main__ -   Step: 1827, LR: 1.6652953804725684e-05, Loss: 823.6422119140625
2024-08-03T11:14:51.352673450Z 
 19%|█▉        | 1828/9500 [6:17:21<26:15:55, 12.32s/it]08/03/2024 04:14:51 - INFO - __main__ -   Step: 1828, LR: 1.6650783261038404e-05, Loss: 719.9774169921875
2024-08-03T11:15:03.841511788Z 
 19%|█▉        | 1829/9500 [6:17:33<26:22:00, 12.37s/it]08/03/2024 04:15:03 - INFO - __main__ -   Step: 1829, LR: 1.6648612717351124e-05, Loss: 572.6324462890625
2024-08-03T11:15:16.230089628Z 
 19%|█▉        | 1830/9500 [6:17:46<26:22:21, 12.38s/it]08/03/2024 04:15:16 - INFO - __main__ -   Step: 1830, LR: 1.6646442173663847e-05, Loss: 780.9613037109375
2024-08-03T11:15:28.155950948Z 
 19%|█▉        | 1831/9500 [6:17:58<26:04:48, 12.24s/it]08/03/2024 04:15:28 - INFO - __main__ -   Step: 1831, LR: 1.6644271629976567e-05, Loss: 530.1995849609375
2024-08-03T11:15:40.820977841Z 
 19%|█▉        | 1832/9500 [6:18:10<26:20:48, 12.37s/it]08/03/2024 04:15:40 - INFO - __main__ -   Step: 1832, LR: 1.664210108628929e-05, Loss: 751.8136596679688
2024-08-03T11:15:52.985634399Z 
 19%|█▉        | 1833/9500 [6:18:22<26:12:44, 12.31s/it]08/03/2024 04:15:52 - INFO - __main__ -   Step: 1833, LR: 1.6639930542602006e-05, Loss: 618.9881591796875
2024-08-03T11:16:05.063293280Z 
 19%|█▉        | 1834/9500 [6:18:35<26:03:43, 12.24s/it]08/03/2024 04:16:05 - INFO - __main__ -   Step: 1834, LR: 1.663775999891473e-05, Loss: 662.35693359375
2024-08-03T11:16:17.657589052Z 
 19%|█▉        | 1835/9500 [6:18:47<26:17:08, 12.35s/it]08/03/2024 04:16:17 - INFO - __main__ -   Step: 1835, LR: 1.663558945522745e-05, Loss: 852.0652465820312
2024-08-03T11:16:29.820069860Z 
 19%|█▉        | 1836/9500 [6:18:59<26:09:54, 12.29s/it]08/03/2024 04:16:29 - INFO - __main__ -   Step: 1836, LR: 1.6633418911540173e-05, Loss: 513.912353515625
2024-08-03T11:16:42.174507117Z 
 19%|█▉        | 1837/9500 [6:19:12<26:12:09, 12.31s/it]08/03/2024 04:16:42 - INFO - __main__ -   Step: 1837, LR: 1.6631248367852893e-05, Loss: 652.5986938476562
2024-08-03T11:16:54.664224619Z 
 19%|█▉        | 1838/9500 [6:19:24<26:18:50, 12.36s/it]08/03/2024 04:16:54 - INFO - __main__ -   Step: 1838, LR: 1.6629077824165612e-05, Loss: 560.1090087890625
2024-08-03T11:17:06.977601795Z 
 19%|█▉        | 1839/9500 [6:19:36<26:16:43, 12.35s/it]08/03/2024 04:17:06 - INFO - __main__ -   Step: 1839, LR: 1.6626907280478336e-05, Loss: 788.824462890625
2024-08-03T11:17:19.439534816Z 
 19%|█▉        | 1840/9500 [6:19:49<26:20:50, 12.38s/it]08/03/2024 04:17:19 - INFO - __main__ -   Step: 1840, LR: 1.6624736736791056e-05, Loss: 771.9689331054688
2024-08-03T11:17:32.197308420Z 
 19%|█▉        | 1841/9500 [6:20:02<26:35:00, 12.50s/it]08/03/2024 04:17:32 - INFO - __main__ -   Step: 1841, LR: 1.662256619310378e-05, Loss: 763.70654296875
2024-08-03T11:17:44.166824347Z 
 19%|█▉        | 1842/9500 [6:20:14<26:14:41, 12.34s/it]08/03/2024 04:17:44 - INFO - __main__ -   Step: 1842, LR: 1.66203956494165e-05, Loss: 520.8541259765625
2024-08-03T11:17:56.250841694Z 
 19%|█▉        | 1843/9500 [6:20:26<26:04:45, 12.26s/it]08/03/2024 04:17:56 - INFO - __main__ -   Step: 1843, LR: 1.661822510572922e-05, Loss: 663.427734375
2024-08-03T11:18:08.780441003Z 
 19%|█▉        | 1844/9500 [6:20:38<26:14:50, 12.34s/it]08/03/2024 04:18:08 - INFO - __main__ -   Step: 1844, LR: 1.6616054562041942e-05, Loss: 665.024658203125
2024-08-03T11:18:21.491921231Z 
 19%|█▉        | 1845/9500 [6:20:51<26:28:46, 12.45s/it]08/03/2024 04:18:21 - INFO - __main__ -   Step: 1845, LR: 1.6613884018354662e-05, Loss: 609.8199462890625
2024-08-03T11:18:33.715724671Z 
 19%|█▉        | 1846/9500 [6:21:03<26:19:48, 12.38s/it]08/03/2024 04:18:33 - INFO - __main__ -   Step: 1846, LR: 1.6611713474667385e-05, Loss: 825.8263549804688
2024-08-03T11:18:45.739072513Z 
 19%|█▉        | 1847/9500 [6:21:15<26:05:46, 12.28s/it]08/03/2024 04:18:45 - INFO - __main__ -   Step: 1847, LR: 1.66095429309801e-05, Loss: 683.555419921875
2024-08-03T11:18:58.642816840Z 
 19%|█▉        | 1848/9500 [6:21:28<26:29:37, 12.46s/it]08/03/2024 04:18:58 - INFO - __main__ -   Step: 1848, LR: 1.6607372387292825e-05, Loss: 901.1575927734375
2024-08-03T11:19:10.736524753Z 
 19%|█▉        | 1849/9500 [6:21:40<26:15:13, 12.35s/it]08/03/2024 04:19:10 - INFO - __main__ -   Step: 1849, LR: 1.6605201843605545e-05, Loss: 650.9624633789062
2024-08-03T11:19:22.866862929Z 
 19%|█▉        | 1850/9500 [6:21:52<26:06:29, 12.29s/it]08/03/2024 04:19:22 - INFO - __main__ -   Step: 1850, LR: 1.6603031299918268e-05, Loss: 598.1654052734375
2024-08-03T11:19:35.429240342Z 
 19%|█▉        | 1851/9500 [6:22:05<26:16:51, 12.37s/it]08/03/2024 04:19:35 - INFO - __main__ -   Step: 1851, LR: 1.6600860756230988e-05, Loss: 540.1458740234375
2024-08-03T11:19:47.581672650Z 
 19%|█▉        | 1852/9500 [6:22:17<26:08:21, 12.30s/it]08/03/2024 04:19:47 - INFO - __main__ -   Step: 1852, LR: 1.6598690212543708e-05, Loss: 754.1779174804688
2024-08-03T11:19:59.592580054Z 
 20%|█▉        | 1853/9500 [6:22:29<25:56:56, 12.22s/it]08/03/2024 04:19:59 - INFO - __main__ -   Step: 1853, LR: 1.659651966885643e-05, Loss: 606.0909423828125
2024-08-03T11:20:12.652445901Z 
 20%|█▉        | 1854/9500 [6:22:42<26:28:58, 12.47s/it]08/03/2024 04:20:12 - INFO - __main__ -   Step: 1854, LR: 1.659434912516915e-05, Loss: 717.5894775390625
2024-08-03T11:20:24.661580850Z 
 20%|█▉        | 1855/9500 [6:22:54<26:11:12, 12.33s/it]08/03/2024 04:20:24 - INFO - __main__ -   Step: 1855, LR: 1.6592178581481874e-05, Loss: 650.1459350585938
2024-08-03T11:20:36.718682977Z 
 20%|█▉        | 1856/9500 [6:23:06<26:00:31, 12.25s/it]08/03/2024 04:20:36 - INFO - __main__ -   Step: 1856, LR: 1.6590008037794594e-05, Loss: 575.7894287109375
2024-08-03T11:20:49.534603333Z 
 20%|█▉        | 1857/9500 [6:23:19<26:21:58, 12.42s/it]08/03/2024 04:20:49 - INFO - __main__ -   Step: 1857, LR: 1.6587837494107314e-05, Loss: 636.9642333984375
2024-08-03T11:21:01.620097221Z 
 20%|█▉        | 1858/9500 [6:23:31<26:09:02, 12.32s/it]08/03/2024 04:21:01 - INFO - __main__ -   Step: 1858, LR: 1.6585666950420037e-05, Loss: 553.166259765625
2024-08-03T11:21:13.479383352Z 
 20%|█▉        | 1859/9500 [6:23:43<25:51:14, 12.18s/it]08/03/2024 04:21:13 - INFO - __main__ -   Step: 1859, LR: 1.6583496406732757e-05, Loss: 519.5433349609375
2024-08-03T11:21:25.964344738Z 
 20%|█▉        | 1860/9500 [6:23:55<26:02:39, 12.27s/it]08/03/2024 04:21:25 - INFO - __main__ -   Step: 1860, LR: 1.6581325863045477e-05, Loss: 526.8828125
2024-08-03T11:21:38.382672702Z 
 20%|█▉        | 1861/9500 [6:24:08<26:08:01, 12.32s/it]08/03/2024 04:21:38 - INFO - __main__ -   Step: 1861, LR: 1.6579155319358197e-05, Loss: 874.8446044921875
2024-08-03T11:21:50.601931590Z 
 20%|█▉        | 1862/9500 [6:24:20<26:04:08, 12.29s/it]08/03/2024 04:21:50 - INFO - __main__ -   Step: 1862, LR: 1.657698477567092e-05, Loss: 776.2685546875
2024-08-03T11:22:03.082493012Z 
 20%|█▉        | 1863/9500 [6:24:33<26:11:19, 12.35s/it]08/03/2024 04:22:03 - INFO - __main__ -   Step: 1863, LR: 1.657481423198364e-05, Loss: 550.0364379882812
2024-08-03T11:22:15.202264264Z 
 20%|█▉        | 1864/9500 [6:24:45<26:02:30, 12.28s/it]08/03/2024 04:22:15 - INFO - __main__ -   Step: 1864, LR: 1.6572643688296363e-05, Loss: 678.3475341796875
2024-08-03T11:22:27.733991344Z 
 20%|█▉        | 1865/9500 [6:24:57<26:12:00, 12.35s/it]08/03/2024 04:22:27 - INFO - __main__ -   Step: 1865, LR: 1.6570473144609083e-05, Loss: 618.452392578125
2024-08-03T11:22:40.460573565Z 
 20%|█▉        | 1866/9500 [6:25:10<26:26:02, 12.47s/it]08/03/2024 04:22:40 - INFO - __main__ -   Step: 1866, LR: 1.6568302600921803e-05, Loss: 832.2891845703125
2024-08-03T11:22:52.503882038Z 
 20%|█▉        | 1867/9500 [6:25:22<26:09:42, 12.34s/it]08/03/2024 04:22:52 - INFO - __main__ -   Step: 1867, LR: 1.6566132057234526e-05, Loss: 596.6322021484375
2024-08-03T11:23:04.784240894Z 
 20%|█▉        | 1868/9500 [6:25:34<26:07:16, 12.32s/it]08/03/2024 04:23:04 - INFO - __main__ -   Step: 1868, LR: 1.6563961513547246e-05, Loss: 512.3157958984375
2024-08-03T11:23:17.327657316Z 
 20%|█▉        | 1869/9500 [6:25:47<26:15:32, 12.39s/it]08/03/2024 04:23:17 - INFO - __main__ -   Step: 1869, LR: 1.656179096985997e-05, Loss: 739.5477905273438
2024-08-03T11:23:29.474955354Z 
 20%|█▉        | 1870/9500 [6:25:59<26:06:09, 12.32s/it]08/03/2024 04:23:29 - INFO - __main__ -   Step: 1870, LR: 1.655962042617269e-05, Loss: 650.589111328125
2024-08-03T11:23:41.557304945Z 
 20%|█▉        | 1871/9500 [6:26:11<25:57:02, 12.25s/it]08/03/2024 04:23:41 - INFO - __main__ -   Step: 1871, LR: 1.655744988248541e-05, Loss: 649.300537109375
2024-08-03T11:23:54.327244524Z 
 20%|█▉        | 1872/9500 [6:26:24<26:16:50, 12.40s/it]08/03/2024 04:23:54 - INFO - __main__ -   Step: 1872, LR: 1.6555279338798132e-05, Loss: 614.0703125
2024-08-03T11:24:06.760140112Z 
 20%|█▉        | 1873/9500 [6:26:36<26:17:46, 12.41s/it]08/03/2024 04:24:06 - INFO - __main__ -   Step: 1873, LR: 1.6553108795110852e-05, Loss: 682.628662109375
2024-08-03T11:24:18.854820775Z 
 20%|█▉        | 1874/9500 [6:26:48<26:05:28, 12.32s/it]08/03/2024 04:24:18 - INFO - __main__ -   Step: 1874, LR: 1.6550938251423572e-05, Loss: 674.9581298828125
2024-08-03T11:24:31.628984106Z 
 20%|█▉        | 1875/9500 [6:27:01<26:22:41, 12.45s/it]08/03/2024 04:24:31 - INFO - __main__ -   Step: 1875, LR: 1.654876770773629e-05, Loss: 769.7406005859375
2024-08-03T11:24:44.261861091Z 
 20%|█▉        | 1876/9500 [6:27:14<26:29:18, 12.51s/it]08/03/2024 04:24:44 - INFO - __main__ -   Step: 1876, LR: 1.6546597164049015e-05, Loss: 690.548583984375
2024-08-03T11:24:56.362653153Z 
 20%|█▉        | 1877/9500 [6:27:26<26:13:35, 12.39s/it]08/03/2024 04:24:56 - INFO - __main__ -   Step: 1877, LR: 1.6544426620361735e-05, Loss: 515.7308959960938
2024-08-03T11:25:08.677097846Z 
 20%|█▉        | 1878/9500 [6:27:38<26:10:40, 12.36s/it]08/03/2024 04:25:08 - INFO - __main__ -   Step: 1878, LR: 1.6542256076674458e-05, Loss: 608.685791015625
2024-08-03T11:25:20.925096995Z 
 20%|█▉        | 1879/9500 [6:27:50<26:06:01, 12.33s/it]08/03/2024 04:25:20 - INFO - __main__ -   Step: 1879, LR: 1.6540085532987178e-05, Loss: 607.018310546875
2024-08-03T11:25:33.045592574Z 
 20%|█▉        | 1880/9500 [6:28:02<25:57:52, 12.27s/it]08/03/2024 04:25:33 - INFO - __main__ -   Step: 1880, LR: 1.6537914989299898e-05, Loss: 484.2084045410156
2024-08-03T11:25:45.739842540Z 
 20%|█▉        | 1881/9500 [6:28:15<26:13:56, 12.39s/it]08/03/2024 04:25:45 - INFO - __main__ -   Step: 1881, LR: 1.653574444561262e-05, Loss: 656.3365478515625
2024-08-03T11:25:58.313722421Z 
 20%|█▉        | 1882/9500 [6:28:28<26:20:34, 12.45s/it]08/03/2024 04:25:58 - INFO - __main__ -   Step: 1882, LR: 1.653357390192534e-05, Loss: 849.4747314453125
2024-08-03T11:26:10.345647082Z 
 20%|█▉        | 1883/9500 [6:28:40<26:04:28, 12.32s/it]08/03/2024 04:26:10 - INFO - __main__ -   Step: 1883, LR: 1.6531403358238064e-05, Loss: 575.40771484375
2024-08-03T11:26:23.159725149Z 
 20%|█▉        | 1884/9500 [6:28:53<26:22:57, 12.47s/it]08/03/2024 04:26:23 - INFO - __main__ -   Step: 1884, LR: 1.6529232814550784e-05, Loss: 520.5511474609375
2024-08-03T11:26:35.535008088Z 
 20%|█▉        | 1885/9500 [6:29:05<26:19:06, 12.44s/it]08/03/2024 04:26:35 - INFO - __main__ -   Step: 1885, LR: 1.6527062270863504e-05, Loss: 593.4451293945312
2024-08-03T11:26:47.897901980Z 
 20%|█▉        | 1886/9500 [6:29:17<26:15:53, 12.42s/it]08/03/2024 04:26:47 - INFO - __main__ -   Step: 1886, LR: 1.6524891727176227e-05, Loss: 742.352783203125
2024-08-03T11:27:00.031109301Z 
 20%|█▉        | 1887/9500 [6:29:29<26:04:50, 12.33s/it]08/03/2024 04:27:00 - INFO - __main__ -   Step: 1887, LR: 1.6522721183488947e-05, Loss: 503.38006591796875
2024-08-03T11:27:12.359144378Z 
 20%|█▉        | 1888/9500 [6:29:42<26:04:26, 12.33s/it]08/03/2024 04:27:12 - INFO - __main__ -   Step: 1888, LR: 1.6520550639801667e-05, Loss: 715.3856201171875
2024-08-03T11:27:24.580967671Z 
 20%|█▉        | 1889/9500 [6:29:54<26:00:03, 12.30s/it]08/03/2024 04:27:24 - INFO - __main__ -   Step: 1889, LR: 1.6518380096114387e-05, Loss: 652.1193237304688
2024-08-03T11:27:36.986365018Z 
 20%|█▉        | 1890/9500 [6:30:06<26:03:55, 12.33s/it]08/03/2024 04:27:36 - INFO - __main__ -   Step: 1890, LR: 1.651620955242711e-05, Loss: 799.2689208984375
2024-08-03T11:27:49.537664729Z 
 20%|█▉        | 1891/9500 [6:30:19<26:12:07, 12.40s/it]08/03/2024 04:27:49 - INFO - __main__ -   Step: 1891, LR: 1.651403900873983e-05, Loss: 679.3828125
2024-08-03T11:28:01.667413870Z 
 20%|█▉        | 1892/9500 [6:30:31<26:01:43, 12.32s/it]08/03/2024 04:28:01 - INFO - __main__ -   Step: 1892, LR: 1.6511868465052553e-05, Loss: 604.6927490234375
2024-08-03T11:28:13.969268761Z 
 20%|█▉        | 1893/9500 [6:30:43<26:00:59, 12.31s/it]08/03/2024 04:28:13 - INFO - __main__ -   Step: 1893, LR: 1.6509697921365273e-05, Loss: 906.9263916015625
2024-08-03T11:28:26.442408698Z 
 20%|█▉        | 1894/9500 [6:30:56<26:06:54, 12.36s/it]08/03/2024 04:28:26 - INFO - __main__ -   Step: 1894, LR: 1.6507527377677993e-05, Loss: 600.0864868164062
2024-08-03T11:28:38.530370736Z 
 20%|█▉        | 1895/9500 [6:31:08<25:56:19, 12.28s/it]08/03/2024 04:28:38 - INFO - __main__ -   Step: 1895, LR: 1.6505356833990716e-05, Loss: 593.709228515625
2024-08-03T11:28:50.511957582Z 
 20%|█▉        | 1896/9500 [6:31:20<25:44:50, 12.19s/it]08/03/2024 04:28:50 - INFO - __main__ -   Step: 1896, LR: 1.6503186290303436e-05, Loss: 557.6163940429688
2024-08-03T11:29:03.014870842Z 
 20%|█▉        | 1897/9500 [6:31:32<25:56:31, 12.28s/it]08/03/2024 04:29:03 - INFO - __main__ -   Step: 1897, LR: 1.650101574661616e-05, Loss: 633.8746337890625
2024-08-03T11:29:15.257022345Z 
 20%|█▉        | 1898/9500 [6:31:45<25:54:43, 12.27s/it]08/03/2024 04:29:15 - INFO - __main__ -   Step: 1898, LR: 1.649884520292888e-05, Loss: 659.5691528320312
2024-08-03T11:29:27.473887226Z 
 20%|█▉        | 1899/9500 [6:31:57<25:52:30, 12.25s/it]08/03/2024 04:29:27 - INFO - __main__ -   Step: 1899, LR: 1.64966746592416e-05, Loss: 766.9718017578125
2024-08-03T11:29:40.415060437Z 
 20%|██        | 1900/9500 [6:32:10<26:18:21, 12.46s/it]08/03/2024 04:29:40 - INFO - __main__ -   Step: 1900, LR: 1.6494504115554322e-05, Loss: 688.7095947265625
2024-08-03T11:29:52.403066237Z 
 20%|██        | 1901/9500 [6:32:22<26:00:12, 12.32s/it]08/03/2024 04:29:52 - INFO - __main__ -   Step: 1901, LR: 1.6492333571867042e-05, Loss: 536.66796875
2024-08-03T11:30:04.514170752Z 
 20%|██        | 1902/9500 [6:32:34<25:52:05, 12.26s/it]08/03/2024 04:30:04 - INFO - __main__ -   Step: 1902, LR: 1.6490163028179762e-05, Loss: 842.0040893554688
2024-08-03T11:30:17.329657446Z 
 20%|██        | 1903/9500 [6:32:47<26:13:06, 12.42s/it]08/03/2024 04:30:17 - INFO - __main__ -   Step: 1903, LR: 1.648799248449248e-05, Loss: 737.9508056640625
2024-08-03T11:30:29.565362202Z 
 20%|██        | 1904/9500 [6:32:59<26:05:45, 12.37s/it]08/03/2024 04:30:29 - INFO - __main__ -   Step: 1904, LR: 1.6485821940805205e-05, Loss: 536.487548828125
2024-08-03T11:30:42.042724474Z 
 20%|██        | 1905/9500 [6:33:11<26:09:42, 12.40s/it]08/03/2024 04:30:42 - INFO - __main__ -   Step: 1905, LR: 1.6483651397117925e-05, Loss: 674.70654296875
2024-08-03T11:30:54.694370525Z 
 20%|██        | 1906/9500 [6:33:24<26:19:02, 12.48s/it]08/03/2024 04:30:54 - INFO - __main__ -   Step: 1906, LR: 1.6481480853430648e-05, Loss: 540.6627197265625
2024-08-03T11:31:06.888451479Z 
 20%|██        | 1907/9500 [6:33:36<26:08:08, 12.39s/it]08/03/2024 04:31:06 - INFO - __main__ -   Step: 1907, LR: 1.6479310309743368e-05, Loss: 676.4935913085938
2024-08-03T11:31:18.914853236Z 
 20%|██        | 1908/9500 [6:33:48<25:54:04, 12.28s/it]08/03/2024 04:31:18 - INFO - __main__ -   Step: 1908, LR: 1.6477139766056088e-05, Loss: 680.490478515625
2024-08-03T11:31:31.362478345Z 
 20%|██        | 1909/9500 [6:34:01<26:00:09, 12.33s/it]08/03/2024 04:31:31 - INFO - __main__ -   Step: 1909, LR: 1.647496922236881e-05, Loss: 663.1959838867188
2024-08-03T11:31:43.890023692Z 
 20%|██        | 1910/9500 [6:34:13<26:07:23, 12.39s/it]08/03/2024 04:31:43 - INFO - __main__ -   Step: 1910, LR: 1.647279867868153e-05, Loss: 884.7333374023438
2024-08-03T11:31:56.238875597Z 
 20%|██        | 1911/9500 [6:34:26<26:05:35, 12.38s/it]08/03/2024 04:31:56 - INFO - __main__ -   Step: 1911, LR: 1.6470628134994254e-05, Loss: 673.0234985351562
2024-08-03T11:32:08.802073845Z 
 20%|██        | 1912/9500 [6:34:38<26:12:25, 12.43s/it]08/03/2024 04:32:08 - INFO - __main__ -   Step: 1912, LR: 1.6468457591306974e-05, Loss: 597.1320190429688
2024-08-03T11:32:21.250358187Z 
 20%|██        | 1913/9500 [6:34:51<26:12:45, 12.44s/it]08/03/2024 04:32:21 - INFO - __main__ -   Step: 1913, LR: 1.6466287047619697e-05, Loss: 763.3526000976562
2024-08-03T11:32:33.282479831Z 
 20%|██        | 1914/9500 [6:35:03<25:57:10, 12.32s/it]08/03/2024 04:32:33 - INFO - __main__ -   Step: 1914, LR: 1.6464116503932417e-05, Loss: 732.4044189453125
2024-08-03T11:32:45.897190962Z 
 20%|██        | 1915/9500 [6:35:15<26:08:18, 12.41s/it]08/03/2024 04:32:45 - INFO - __main__ -   Step: 1915, LR: 1.6461945960245137e-05, Loss: 656.43115234375
2024-08-03T11:32:57.899813475Z 
 20%|██        | 1916/9500 [6:35:27<25:52:48, 12.28s/it]08/03/2024 04:32:57 - INFO - __main__ -   Step: 1916, LR: 1.6459775416557857e-05, Loss: 541.353271484375
2024-08-03T11:33:09.885335156Z 
 20%|██        | 1917/9500 [6:35:39<25:41:15, 12.20s/it]08/03/2024 04:33:09 - INFO - __main__ -   Step: 1917, LR: 1.6457604872870577e-05, Loss: 554.572998046875
2024-08-03T11:33:22.800223210Z 
 20%|██        | 1918/9500 [6:35:52<26:08:19, 12.41s/it]08/03/2024 04:33:22 - INFO - __main__ -   Step: 1918, LR: 1.64554343291833e-05, Loss: 623.0739135742188
2024-08-03T11:33:35.087125682Z 
 20%|██        | 1919/9500 [6:36:05<26:03:26, 12.37s/it]08/03/2024 04:33:35 - INFO - __main__ -   Step: 1919, LR: 1.645326378549602e-05, Loss: 549.73388671875
2024-08-03T11:33:47.167686613Z 
 20%|██        | 1920/9500 [6:36:17<25:52:05, 12.29s/it]08/03/2024 04:33:47 - INFO - __main__ -   Step: 1920, LR: 1.6451093241808743e-05, Loss: 489.11651611328125
2024-08-03T11:34:00.071721223Z 
 20%|██        | 1921/9500 [6:36:30<26:15:18, 12.47s/it]08/03/2024 04:34:00 - INFO - __main__ -   Step: 1921, LR: 1.6448922698121463e-05, Loss: 721.063232421875
2024-08-03T11:34:12.218212601Z 
 20%|██        | 1922/9500 [6:36:42<26:02:48, 12.37s/it]08/03/2024 04:34:12 - INFO - __main__ -   Step: 1922, LR: 1.6446752154434186e-05, Loss: 662.5875854492188
2024-08-03T11:34:23.925749997Z 
 20%|██        | 1923/9500 [6:36:53<25:37:22, 12.17s/it]08/03/2024 04:34:23 - INFO - __main__ -   Step: 1923, LR: 1.6444581610746906e-05, Loss: 481.03668212890625
2024-08-03T11:34:36.311842009Z 
 20%|██        | 1924/9500 [6:37:06<25:45:11, 12.24s/it]08/03/2024 04:34:36 - INFO - __main__ -   Step: 1924, LR: 1.6442411067059626e-05, Loss: 549.7747802734375
2024-08-03T11:34:48.537284799Z 
 20%|██        | 1925/9500 [6:37:18<25:44:32, 12.23s/it]08/03/2024 04:34:48 - INFO - __main__ -   Step: 1925, LR: 1.644024052337235e-05, Loss: 603.408203125
2024-08-03T11:35:00.536380097Z 
 20%|██        | 1926/9500 [6:37:30<25:35:26, 12.16s/it]08/03/2024 04:35:00 - INFO - __main__ -   Step: 1926, LR: 1.643806997968507e-05, Loss: 578.26904296875
2024-08-03T11:35:13.688309159Z 
 20%|██        | 1927/9500 [6:37:43<26:12:40, 12.46s/it]08/03/2024 04:35:13 - INFO - __main__ -   Step: 1927, LR: 1.6435899435997792e-05, Loss: 664.8973388671875
2024-08-03T11:35:25.785062469Z 
 20%|██        | 1928/9500 [6:37:55<25:58:41, 12.35s/it]08/03/2024 04:35:25 - INFO - __main__ -   Step: 1928, LR: 1.6433728892310512e-05, Loss: 825.9119262695312
2024-08-03T11:35:38.131154397Z 
 20%|██        | 1929/9500 [6:38:08<25:58:18, 12.35s/it]08/03/2024 04:35:38 - INFO - __main__ -   Step: 1929, LR: 1.6431558348623232e-05, Loss: 672.05859375
2024-08-03T11:35:50.691159801Z 
 20%|██        | 1930/9500 [6:38:20<26:06:04, 12.41s/it]08/03/2024 04:35:50 - INFO - __main__ -   Step: 1930, LR: 1.6429387804935952e-05, Loss: 633.5166015625
2024-08-03T11:36:03.354498281Z 
 20%|██        | 1931/9500 [6:38:33<26:15:20, 12.49s/it]08/03/2024 04:36:03 - INFO - __main__ -   Step: 1931, LR: 1.6427217261248675e-05, Loss: 740.0986328125
2024-08-03T11:36:15.879051768Z 
 20%|██        | 1932/9500 [6:38:45<26:16:31, 12.50s/it]08/03/2024 04:36:15 - INFO - __main__ -   Step: 1932, LR: 1.6425046717561395e-05, Loss: 633.3181762695312
2024-08-03T11:36:28.135657711Z 
 20%|██        | 1933/9500 [6:38:58<26:07:09, 12.43s/it]08/03/2024 04:36:28 - INFO - __main__ -   Step: 1933, LR: 1.6422876173874115e-05, Loss: 670.1105346679688
2024-08-03T11:36:41.357486935Z 
 20%|██        | 1934/9500 [6:39:11<26:37:02, 12.66s/it]08/03/2024 04:36:41 - INFO - __main__ -   Step: 1934, LR: 1.6420705630186838e-05, Loss: 826.1036376953125
2024-08-03T11:36:53.165561736Z 
 20%|██        | 1935/9500 [6:39:23<26:04:25, 12.41s/it]08/03/2024 04:36:53 - INFO - __main__ -   Step: 1935, LR: 1.6418535086499558e-05, Loss: 494.8838806152344
2024-08-03T11:37:05.382548153Z 
 20%|██        | 1936/9500 [6:39:35<25:56:59, 12.35s/it]08/03/2024 04:37:05 - INFO - __main__ -   Step: 1936, LR: 1.641636454281228e-05, Loss: 633.859130859375
2024-08-03T11:37:17.865740092Z 
 20%|██        | 1937/9500 [6:39:47<26:01:48, 12.39s/it]08/03/2024 04:37:17 - INFO - __main__ -   Step: 1937, LR: 1.6414193999125e-05, Loss: 619.177734375
2024-08-03T11:37:29.908797031Z 
 20%|██        | 1938/9500 [6:39:59<25:48:28, 12.29s/it]08/03/2024 04:37:29 - INFO - __main__ -   Step: 1938, LR: 1.641202345543772e-05, Loss: 669.60107421875
2024-08-03T11:37:42.395351688Z 
 20%|██        | 1939/9500 [6:40:12<25:55:50, 12.35s/it]08/03/2024 04:37:42 - INFO - __main__ -   Step: 1939, LR: 1.6409852911750444e-05, Loss: 486.99609375
2024-08-03T11:37:54.883243680Z 
 20%|██        | 1940/9500 [6:40:24<26:00:59, 12.39s/it]08/03/2024 04:37:54 - INFO - __main__ -   Step: 1940, LR: 1.6407682368063164e-05, Loss: 636.718505859375
2024-08-03T11:38:07.012658200Z 
 20%|██        | 1941/9500 [6:40:36<25:50:58, 12.31s/it]08/03/2024 04:38:07 - INFO - __main__ -   Step: 1941, LR: 1.6405511824375887e-05, Loss: 499.69097900390625
2024-08-03T11:38:18.936796053Z 
 20%|██        | 1942/9500 [6:40:48<25:36:09, 12.19s/it]08/03/2024 04:38:18 - INFO - __main__ -   Step: 1942, LR: 1.6403341280688607e-05, Loss: 455.44219970703125
2024-08-03T11:38:31.703442918Z 
 20%|██        | 1943/9500 [6:41:01<25:57:33, 12.37s/it]08/03/2024 04:38:31 - INFO - __main__ -   Step: 1943, LR: 1.6401170737001327e-05, Loss: 715.9598999023438
2024-08-03T11:38:43.640407384Z 
 20%|██        | 1944/9500 [6:41:13<25:41:07, 12.24s/it]08/03/2024 04:38:43 - INFO - __main__ -   Step: 1944, LR: 1.6399000193314047e-05, Loss: 731.109619140625
2024-08-03T11:38:55.867212574Z 
 20%|██        | 1945/9500 [6:41:25<25:40:30, 12.23s/it]08/03/2024 04:38:55 - INFO - __main__ -   Step: 1945, LR: 1.639682964962677e-05, Loss: 615.3927001953125
2024-08-03T11:39:08.426576718Z 
 20%|██        | 1946/9500 [6:41:38<25:52:34, 12.33s/it]08/03/2024 04:39:08 - INFO - __main__ -   Step: 1946, LR: 1.639465910593949e-05, Loss: 745.5531005859375
2024-08-03T11:39:20.733389426Z 
 20%|██        | 1947/9500 [6:41:50<25:51:26, 12.32s/it]08/03/2024 04:39:20 - INFO - __main__ -   Step: 1947, LR: 1.639248856225221e-05, Loss: 612.4683837890625
2024-08-03T11:39:32.795422116Z 
 21%|██        | 1948/9500 [6:42:02<25:41:18, 12.25s/it]08/03/2024 04:39:32 - INFO - __main__ -   Step: 1948, LR: 1.6390318018564933e-05, Loss: 622.69189453125
2024-08-03T11:39:45.323882802Z 
 21%|██        | 1949/9500 [6:42:15<25:51:47, 12.33s/it]08/03/2024 04:39:45 - INFO - __main__ -   Step: 1949, LR: 1.6388147474877653e-05, Loss: 687.1426391601562
2024-08-03T11:39:57.456333040Z 
 21%|██        | 1950/9500 [6:42:27<25:44:07, 12.27s/it]08/03/2024 04:39:57 - INFO - __main__ -   Step: 1950, LR: 1.6385976931190376e-05, Loss: 568.638916015625
2024-08-03T11:40:09.586599950Z 
 21%|██        | 1951/9500 [6:42:39<25:38:35, 12.23s/it]08/03/2024 04:40:09 - INFO - __main__ -   Step: 1951, LR: 1.6383806387503096e-05, Loss: 579.90673828125
2024-08-03T11:40:22.052231637Z 
 21%|██        | 1952/9500 [6:42:51<25:47:19, 12.30s/it]08/03/2024 04:40:22 - INFO - __main__ -   Step: 1952, LR: 1.6381635843815816e-05, Loss: 714.4046630859375
2024-08-03T11:40:34.274062230Z 
 21%|██        | 1953/9500 [6:43:04<25:44:10, 12.28s/it]08/03/2024 04:40:34 - INFO - __main__ -   Step: 1953, LR: 1.637946530012854e-05, Loss: 791.90478515625
2024-08-03T11:40:46.377996484Z 
 21%|██        | 1954/9500 [6:43:16<25:37:27, 12.22s/it]08/03/2024 04:40:46 - INFO - __main__ -   Step: 1954, LR: 1.637729475644126e-05, Loss: 711.7830810546875
2024-08-03T11:40:59.063779182Z 
 21%|██        | 1955/9500 [6:43:29<25:54:39, 12.36s/it]08/03/2024 04:40:59 - INFO - __main__ -   Step: 1955, LR: 1.6375124212753982e-05, Loss: 672.978515625
2024-08-03T11:41:11.004087647Z 
 21%|██        | 1956/9500 [6:43:40<25:38:30, 12.24s/it]08/03/2024 04:41:11 - INFO - __main__ -   Step: 1956, LR: 1.6372953669066702e-05, Loss: 513.33935546875
2024-08-03T11:41:23.455733089Z 
 21%|██        | 1957/9500 [6:43:53<25:46:25, 12.30s/it]08/03/2024 04:41:23 - INFO - __main__ -   Step: 1957, LR: 1.6370783125379422e-05, Loss: 794.6525268554688
2024-08-03T11:41:35.898255238Z 
 21%|██        | 1958/9500 [6:44:05<25:51:33, 12.34s/it]08/03/2024 04:41:35 - INFO - __main__ -   Step: 1958, LR: 1.6368612581692142e-05, Loss: 509.97607421875
2024-08-03T11:41:47.968294327Z 
 21%|██        | 1959/9500 [6:44:17<25:41:02, 12.26s/it]08/03/2024 04:41:47 - INFO - __main__ -   Step: 1959, LR: 1.6366442038004865e-05, Loss: 691.5198974609375
2024-08-03T11:42:00.086593856Z 
 21%|██        | 1960/9500 [6:44:30<25:35:27, 12.22s/it]08/03/2024 04:42:00 - INFO - __main__ -   Step: 1960, LR: 1.6364271494317585e-05, Loss: 623.2407836914062
2024-08-03T11:42:12.455049990Z 
 21%|██        | 1961/9500 [6:44:42<25:40:53, 12.26s/it]08/03/2024 04:42:12 - INFO - __main__ -   Step: 1961, LR: 1.6362100950630305e-05, Loss: 494.9835205078125
2024-08-03T11:42:24.813598727Z 
 21%|██        | 1962/9500 [6:44:54<25:44:16, 12.29s/it]08/03/2024 04:42:24 - INFO - __main__ -   Step: 1962, LR: 1.6359930406943028e-05, Loss: 616.9525146484375
2024-08-03T11:42:36.821602693Z 
 21%|██        | 1963/9500 [6:45:06<25:33:22, 12.21s/it]08/03/2024 04:42:36 - INFO - __main__ -   Step: 1963, LR: 1.6357759863255748e-05, Loss: 844.2867431640625
2024-08-03T11:42:49.086670285Z 
 21%|██        | 1964/9500 [6:45:19<25:35:22, 12.22s/it]08/03/2024 04:42:49 - INFO - __main__ -   Step: 1964, LR: 1.635558931956847e-05, Loss: 636.1438598632812
2024-08-03T11:43:01.630185309Z 
 21%|██        | 1965/9500 [6:45:31<25:47:12, 12.32s/it]08/03/2024 04:43:01 - INFO - __main__ -   Step: 1965, LR: 1.635341877588119e-05, Loss: 737.419189453125
2024-08-03T11:43:13.849788114Z 
 21%|██        | 1966/9500 [6:45:43<25:43:12, 12.29s/it]08/03/2024 04:43:13 - INFO - __main__ -   Step: 1966, LR: 1.635124823219391e-05, Loss: 590.2025756835938
2024-08-03T11:43:26.596837305Z 
 21%|██        | 1967/9500 [6:45:56<26:00:12, 12.43s/it]08/03/2024 04:43:26 - INFO - __main__ -   Step: 1967, LR: 1.6349077688506634e-05, Loss: 766.4778442382812
2024-08-03T11:43:39.108260500Z 
 21%|██        | 1968/9500 [6:46:09<26:03:09, 12.45s/it]08/03/2024 04:43:39 - INFO - __main__ -   Step: 1968, LR: 1.6346907144819354e-05, Loss: 615.023193359375
2024-08-03T11:43:50.854349697Z 
 21%|██        | 1969/9500 [6:46:20<25:36:23, 12.24s/it]08/03/2024 04:43:50 - INFO - __main__ -   Step: 1969, LR: 1.6344736601132078e-05, Loss: 537.6824951171875
2024-08-03T11:44:03.378856277Z 
 21%|██        | 1970/9500 [6:46:33<25:46:52, 12.33s/it]08/03/2024 04:44:03 - INFO - __main__ -   Step: 1970, LR: 1.6342566057444797e-05, Loss: 597.656494140625
2024-08-03T11:44:15.836688117Z 
 21%|██        | 1971/9500 [6:46:45<25:51:38, 12.37s/it]08/03/2024 04:44:15 - INFO - __main__ -   Step: 1971, LR: 1.6340395513757517e-05, Loss: 790.332763671875
2024-08-03T11:44:28.040399608Z 
 21%|██        | 1972/9500 [6:46:57<25:45:21, 12.32s/it]08/03/2024 04:44:28 - INFO - __main__ -   Step: 1972, LR: 1.6338224970070237e-05, Loss: 670.2026977539062
2024-08-03T11:44:40.141115296Z 
 21%|██        | 1973/9500 [6:47:10<25:37:00, 12.25s/it]08/03/2024 04:44:40 - INFO - __main__ -   Step: 1973, LR: 1.633605442638296e-05, Loss: 704.5816040039062
2024-08-03T11:44:52.512661697Z 
 21%|██        | 1974/9500 [6:47:22<25:41:18, 12.29s/it]08/03/2024 04:44:52 - INFO - __main__ -   Step: 1974, LR: 1.633388388269568e-05, Loss: 751.1497802734375
2024-08-03T11:45:05.018101595Z 
 21%|██        | 1975/9500 [6:47:34<25:49:16, 12.35s/it]08/03/2024 04:45:05 - INFO - __main__ -   Step: 1975, LR: 1.63317133390084e-05, Loss: 665.909912109375
2024-08-03T11:45:17.098522660Z 
 21%|██        | 1976/9500 [6:47:47<25:38:49, 12.27s/it]08/03/2024 04:45:17 - INFO - __main__ -   Step: 1976, LR: 1.6329542795321123e-05, Loss: 794.2047119140625
2024-08-03T11:45:29.890719844Z 
 21%|██        | 1977/9500 [6:47:59<25:58:12, 12.43s/it]08/03/2024 04:45:29 - INFO - __main__ -   Step: 1977, LR: 1.6327372251633843e-05, Loss: 725.1983642578125
2024-08-03T11:45:42.716999362Z 
 21%|██        | 1978/9500 [6:48:12<26:13:00, 12.55s/it]08/03/2024 04:45:42 - INFO - __main__ -   Step: 1978, LR: 1.6325201707946566e-05, Loss: 810.8748779296875
2024-08-03T11:45:54.931307043Z 
 21%|██        | 1979/9500 [6:48:24<26:00:16, 12.45s/it]08/03/2024 04:45:54 - INFO - __main__ -   Step: 1979, LR: 1.6323031164259286e-05, Loss: 550.1421508789062
2024-08-03T11:46:07.509738681Z 
 21%|██        | 1980/9500 [6:48:37<26:04:59, 12.49s/it]08/03/2024 04:46:07 - INFO - __main__ -   Step: 1980, LR: 1.6320860620572006e-05, Loss: 719.5441284179688
2024-08-03T11:46:19.528437765Z 
 21%|██        | 1981/9500 [6:48:49<25:47:11, 12.35s/it]08/03/2024 04:46:19 - INFO - __main__ -   Step: 1981, LR: 1.631869007688473e-05, Loss: 567.7930908203125
2024-08-03T11:46:31.650924935Z 
 21%|██        | 1982/9500 [6:49:01<25:38:33, 12.28s/it]08/03/2024 04:46:31 - INFO - __main__ -   Step: 1982, LR: 1.631651953319745e-05, Loss: 642.462158203125
2024-08-03T11:46:44.098553395Z 
 21%|██        | 1983/9500 [6:49:14<25:44:42, 12.33s/it]08/03/2024 04:46:44 - INFO - __main__ -   Step: 1983, LR: 1.6314348989510173e-05, Loss: 570.6748046875
2024-08-03T11:46:56.415782372Z 
 21%|██        | 1984/9500 [6:49:26<25:44:01, 12.33s/it]08/03/2024 04:46:56 - INFO - __main__ -   Step: 1984, LR: 1.6312178445822892e-05, Loss: 685.7847290039062
2024-08-03T11:47:08.730257658Z 
 21%|██        | 1985/9500 [6:49:38<25:43:24, 12.32s/it]08/03/2024 04:47:08 - INFO - __main__ -   Step: 1985, LR: 1.6310007902135612e-05, Loss: 609.4637451171875
2024-08-03T11:47:21.286029720Z 
 21%|██        | 1986/9500 [6:49:51<25:51:57, 12.39s/it]08/03/2024 04:47:21 - INFO - __main__ -   Step: 1986, LR: 1.6307837358448332e-05, Loss: 547.9765625
2024-08-03T11:47:33.267052540Z 
 21%|██        | 1987/9500 [6:50:03<25:36:17, 12.27s/it]08/03/2024 04:47:33 - INFO - __main__ -   Step: 1987, LR: 1.6305666814761055e-05, Loss: 605.8741455078125
2024-08-03T11:47:45.685247697Z 
 21%|██        | 1988/9500 [6:50:15<25:41:41, 12.31s/it]08/03/2024 04:47:45 - INFO - __main__ -   Step: 1988, LR: 1.6303496271073775e-05, Loss: 804.6740112304688
2024-08-03T11:47:58.672018608Z 
 21%|██        | 1989/9500 [6:50:28<26:06:44, 12.52s/it]08/03/2024 04:47:58 - INFO - __main__ -   Step: 1989, LR: 1.6301325727386495e-05, Loss: 676.6729736328125
2024-08-03T11:48:11.348575500Z 
 21%|██        | 1990/9500 [6:50:41<26:12:35, 12.56s/it]08/03/2024 04:48:11 - INFO - __main__ -   Step: 1990, LR: 1.629915518369922e-05, Loss: 655.7344970703125
2024-08-03T11:48:23.550968583Z 
 21%|██        | 1991/9500 [6:50:53<25:58:48, 12.46s/it]08/03/2024 04:48:23 - INFO - __main__ -   Step: 1991, LR: 1.6296984640011938e-05, Loss: 584.7686157226562
2024-08-03T11:48:35.935797037Z 
 21%|██        | 1992/9500 [6:51:05<25:55:56, 12.43s/it]08/03/2024 04:48:35 - INFO - __main__ -   Step: 1992, LR: 1.629481409632466e-05, Loss: 627.751220703125
2024-08-03T11:48:48.219560488Z 
 21%|██        | 1993/9500 [6:51:18<25:50:04, 12.39s/it]08/03/2024 04:48:48 - INFO - __main__ -   Step: 1993, LR: 1.629264355263738e-05, Loss: 702.9979858398438
2024-08-03T11:49:00.312781101Z 
 21%|██        | 1994/9500 [6:51:30<25:38:46, 12.30s/it]08/03/2024 04:49:00 - INFO - __main__ -   Step: 1994, LR: 1.62904730089501e-05, Loss: 515.2573852539062
2024-08-03T11:49:13.141347993Z 
 21%|██        | 1995/9500 [6:51:43<25:58:23, 12.46s/it]08/03/2024 04:49:13 - INFO - __main__ -   Step: 1995, LR: 1.6288302465262824e-05, Loss: 431.60333251953125
2024-08-03T11:49:25.319071553Z 
 21%|██        | 1996/9500 [6:51:55<25:47:37, 12.37s/it]08/03/2024 04:49:25 - INFO - __main__ -   Step: 1996, LR: 1.6286131921575544e-05, Loss: 631.64306640625
2024-08-03T11:49:37.298350925Z 
 21%|██        | 1997/9500 [6:52:07<25:32:35, 12.26s/it]08/03/2024 04:49:37 - INFO - __main__ -   Step: 1997, LR: 1.6283961377888268e-05, Loss: 769.385009765625
2024-08-03T11:49:49.736965446Z 
 21%|██        | 1998/9500 [6:52:19<25:39:15, 12.31s/it]08/03/2024 04:49:49 - INFO - __main__ -   Step: 1998, LR: 1.6281790834200987e-05, Loss: 643.1478271484375
2024-08-03T11:50:01.898186964Z 
 21%|██        | 1999/9500 [6:52:31<25:33:26, 12.27s/it]08/03/2024 04:50:01 - INFO - __main__ -   Step: 1999, LR: 1.627962029051371e-05, Loss: 748.210693359375
2024-08-03T11:50:13.910760914Z 
 21%|██        | 2000/9500 [6:52:43<25:23:44, 12.19s/it]08/03/2024 04:50:13 - INFO - __main__ -   Step: 2000, LR: 1.6277449746826427e-05, Loss: 661.005126953125
2024-08-03T11:50:26.318181410Z 
 21%|██        | 2001/9500 [6:52:56<25:31:41, 12.26s/it]08/03/2024 04:50:26 - INFO - __main__ -   Step: 2001, LR: 1.627527920313915e-05, Loss: 576.2919311523438
2024-08-03T11:50:38.137528647Z 
 21%|██        | 2002/9500 [6:53:08<25:15:08, 12.12s/it]08/03/2024 04:50:38 - INFO - __main__ -   Step: 2002, LR: 1.627310865945187e-05, Loss: 607.1363525390625
2024-08-03T11:50:50.697227594Z 
 21%|██        | 2003/9500 [6:53:20<25:31:15, 12.25s/it]08/03/2024 04:50:50 - INFO - __main__ -   Step: 2003, LR: 1.627093811576459e-05, Loss: 639.705810546875
2024-08-03T11:51:03.207463714Z 
 21%|██        | 2004/9500 [6:53:33<25:40:37, 12.33s/it]08/03/2024 04:51:03 - INFO - __main__ -   Step: 2004, LR: 1.6268767572077313e-05, Loss: 789.4840087890625
2024-08-03T11:51:15.336295001Z 
 21%|██        | 2005/9500 [6:53:45<25:32:48, 12.27s/it]08/03/2024 04:51:15 - INFO - __main__ -   Step: 2005, LR: 1.6266597028390033e-05, Loss: 811.5408935546875
2024-08-03T11:51:27.796921340Z 
 21%|██        | 2006/9500 [6:53:57<25:39:44, 12.33s/it]08/03/2024 04:51:27 - INFO - __main__ -   Step: 2006, LR: 1.6264426484702757e-05, Loss: 621.0379638671875
2024-08-03T11:51:40.269967411Z 
 21%|██        | 2007/9500 [6:54:10<25:44:58, 12.37s/it]08/03/2024 04:51:40 - INFO - __main__ -   Step: 2007, LR: 1.6262255941015476e-05, Loss: 567.9885864257812
2024-08-03T11:51:52.800127550Z 
 21%|██        | 2008/9500 [6:54:22<25:50:43, 12.42s/it]08/03/2024 04:51:52 - INFO - __main__ -   Step: 2008, LR: 1.62600853973282e-05, Loss: 723.6574096679688
2024-08-03T11:52:05.056941309Z 
 21%|██        | 2009/9500 [6:54:34<25:44:26, 12.37s/it]08/03/2024 04:52:05 - INFO - __main__ -   Step: 2009, LR: 1.625791485364092e-05, Loss: 614.7901611328125
2024-08-03T11:52:17.304105309Z 
 21%|██        | 2010/9500 [6:54:47<25:39:36, 12.33s/it]08/03/2024 04:52:17 - INFO - __main__ -   Step: 2010, LR: 1.625574430995364e-05, Loss: 602.892333984375
2024-08-03T11:52:29.513962261Z 
 21%|██        | 2011/9500 [6:54:59<25:34:47, 12.30s/it]08/03/2024 04:52:29 - INFO - __main__ -   Step: 2011, LR: 1.6253573766266363e-05, Loss: 680.5186767578125
2024-08-03T11:52:41.766209751Z 
 21%|██        | 2012/9500 [6:55:11<25:32:56, 12.28s/it]08/03/2024 04:52:41 - INFO - __main__ -   Step: 2012, LR: 1.6251403222579083e-05, Loss: 501.78521728515625
2024-08-03T11:52:54.312325687Z 
 21%|██        | 2013/9500 [6:55:24<25:42:34, 12.36s/it]08/03/2024 04:52:54 - INFO - __main__ -   Step: 2013, LR: 1.6249232678891806e-05, Loss: 667.7783813476562
2024-08-03T11:53:06.677448960Z 
 21%|██        | 2014/9500 [6:55:36<25:42:29, 12.36s/it]08/03/2024 04:53:06 - INFO - __main__ -   Step: 2014, LR: 1.6247062135204522e-05, Loss: 633.7554931640625
2024-08-03T11:53:18.776571755Z 
 21%|██        | 2015/9500 [6:55:48<25:32:24, 12.28s/it]08/03/2024 04:53:18 - INFO - __main__ -   Step: 2015, LR: 1.6244891591517246e-05, Loss: 569.1068115234375
2024-08-03T11:53:30.718779224Z 
 21%|██        | 2016/9500 [6:56:00<25:19:25, 12.18s/it]08/03/2024 04:53:30 - INFO - __main__ -   Step: 2016, LR: 1.6242721047829965e-05, Loss: 547.82568359375
2024-08-03T11:53:43.358605709Z 
 21%|██        | 2017/9500 [6:56:13<25:36:22, 12.32s/it]08/03/2024 04:53:43 - INFO - __main__ -   Step: 2017, LR: 1.624055050414269e-05, Loss: 752.8279418945312
2024-08-03T11:53:55.573525328Z 
 21%|██        | 2018/9500 [6:56:25<25:32:16, 12.29s/it]08/03/2024 04:53:55 - INFO - __main__ -   Step: 2018, LR: 1.623837996045541e-05, Loss: 573.6608276367188
2024-08-03T11:54:07.918263177Z 
 21%|██▏       | 2019/9500 [6:56:37<25:34:12, 12.30s/it]08/03/2024 04:54:07 - INFO - __main__ -   Step: 2019, LR: 1.623620941676813e-05, Loss: 736.8812255859375
2024-08-03T11:54:20.371352670Z 
 21%|██▏       | 2020/9500 [6:56:50<25:39:31, 12.35s/it]08/03/2024 04:54:20 - INFO - __main__ -   Step: 2020, LR: 1.623403887308085e-05, Loss: 641.5888061523438
2024-08-03T11:54:32.317787783Z 
 21%|██▏       | 2021/9500 [6:57:02<25:24:16, 12.23s/it]08/03/2024 04:54:32 - INFO - __main__ -   Step: 2021, LR: 1.623186832939357e-05, Loss: 553.9107666015625
2024-08-03T11:54:44.111746385Z 
 21%|██▏       | 2022/9500 [6:57:14<25:07:49, 12.10s/it]08/03/2024 04:54:44 - INFO - __main__ -   Step: 2022, LR: 1.6229697785706295e-05, Loss: 564.0631103515625
2024-08-03T11:54:56.610296145Z 
 21%|██▏       | 2023/9500 [6:57:26<25:22:35, 12.22s/it]08/03/2024 04:54:56 - INFO - __main__ -   Step: 2023, LR: 1.6227527242019015e-05, Loss: 670.5603637695312
2024-08-03T11:55:08.754946064Z 
 21%|██▏       | 2024/9500 [6:57:38<25:19:39, 12.20s/it]08/03/2024 04:55:08 - INFO - __main__ -   Step: 2024, LR: 1.6225356698331734e-05, Loss: 743.2913818359375
2024-08-03T11:55:20.742748990Z 
 21%|██▏       | 2025/9500 [6:57:50<25:11:39, 12.13s/it]08/03/2024 04:55:20 - INFO - __main__ -   Step: 2025, LR: 1.6223186154644458e-05, Loss: 560.7141723632812
2024-08-03T11:55:33.087777884Z 
 21%|██▏       | 2026/9500 [6:58:03<25:19:20, 12.20s/it]08/03/2024 04:55:33 - INFO - __main__ -   Step: 2026, LR: 1.6221015610957178e-05, Loss: 613.2821044921875
2024-08-03T11:55:45.043514593Z 
 21%|██▏       | 2027/9500 [6:58:14<25:10:07, 12.12s/it]08/03/2024 04:55:45 - INFO - __main__ -   Step: 2027, LR: 1.6218845067269897e-05, Loss: 551.3900756835938
2024-08-03T11:55:57.375242076Z 
 21%|██▏       | 2028/9500 [6:58:27<25:17:40, 12.19s/it]08/03/2024 04:55:57 - INFO - __main__ -   Step: 2028, LR: 1.6216674523582617e-05, Loss: 545.419677734375
2024-08-03T11:56:09.727785756Z 
 21%|██▏       | 2029/9500 [6:58:39<25:23:38, 12.24s/it]08/03/2024 04:56:09 - INFO - __main__ -   Step: 2029, LR: 1.621450397989534e-05, Loss: 530.5090942382812
2024-08-03T11:56:21.830020592Z 
 21%|██▏       | 2030/9500 [6:58:51<25:18:25, 12.20s/it]08/03/2024 04:56:21 - INFO - __main__ -   Step: 2030, LR: 1.621233343620806e-05, Loss: 583.5447387695312
2024-08-03T11:56:33.795369415Z 
 21%|██▏       | 2031/9500 [6:59:03<25:09:36, 12.13s/it]08/03/2024 04:56:33 - INFO - __main__ -   Step: 2031, LR: 1.6210162892520784e-05, Loss: 622.6697998046875
2024-08-03T11:56:46.166919446Z 
 21%|██▏       | 2032/9500 [6:59:16<25:18:31, 12.20s/it]08/03/2024 04:56:46 - INFO - __main__ -   Step: 2032, LR: 1.6207992348833504e-05, Loss: 621.0106811523438
2024-08-03T11:56:58.354944156Z 
 21%|██▏       | 2033/9500 [6:59:28<25:17:52, 12.20s/it]08/03/2024 04:56:58 - INFO - __main__ -   Step: 2033, LR: 1.6205821805146223e-05, Loss: 580.4520263671875
2024-08-03T11:57:10.785132245Z 
 21%|██▏       | 2034/9500 [6:59:40<25:26:23, 12.27s/it]08/03/2024 04:57:10 - INFO - __main__ -   Step: 2034, LR: 1.6203651261458947e-05, Loss: 753.87451171875
2024-08-03T11:57:23.337208365Z 
 21%|██▏       | 2035/9500 [6:59:53<25:36:49, 12.35s/it]08/03/2024 04:57:23 - INFO - __main__ -   Step: 2035, LR: 1.6201480717771667e-05, Loss: 607.7301025390625
2024-08-03T11:57:35.573668050Z 
 21%|██▏       | 2036/9500 [7:00:05<25:32:17, 12.32s/it]08/03/2024 04:57:35 - INFO - __main__ -   Step: 2036, LR: 1.619931017408439e-05, Loss: 549.7439575195312
2024-08-03T11:57:47.632855851Z 
 21%|██▏       | 2037/9500 [7:00:17<25:22:27, 12.24s/it]08/03/2024 04:57:47 - INFO - __main__ -   Step: 2037, LR: 1.619713963039711e-05, Loss: 627.3689575195312
2024-08-03T11:58:00.059984876Z 
 21%|██▏       | 2038/9500 [7:00:29<25:29:14, 12.30s/it]08/03/2024 04:58:00 - INFO - __main__ -   Step: 2038, LR: 1.619496908670983e-05, Loss: 598.020263671875
2024-08-03T11:58:12.312731458Z 
 21%|██▏       | 2039/9500 [7:00:42<25:27:24, 12.28s/it]08/03/2024 04:58:12 - INFO - __main__ -   Step: 2039, LR: 1.6192798543022553e-05, Loss: 456.34234619140625
2024-08-03T11:58:24.631709387Z 
 21%|██▏       | 2040/9500 [7:00:54<25:28:32, 12.29s/it]08/03/2024 04:58:24 - INFO - __main__ -   Step: 2040, LR: 1.6190627999335273e-05, Loss: 746.999755859375
2024-08-03T11:58:37.214515294Z 
 21%|██▏       | 2041/9500 [7:01:07<25:39:06, 12.38s/it]08/03/2024 04:58:37 - INFO - __main__ -   Step: 2041, LR: 1.6188457455647993e-05, Loss: 713.7515869140625
2024-08-03T11:58:49.328810502Z 
 21%|██▏       | 2042/9500 [7:01:19<25:28:57, 12.30s/it]08/03/2024 04:58:49 - INFO - __main__ -   Step: 2042, LR: 1.6186286911960712e-05, Loss: 695.8719482421875
2024-08-03T11:59:01.238755789Z 
 22%|██▏       | 2043/9500 [7:01:31<25:14:12, 12.18s/it]08/03/2024 04:59:01 - INFO - __main__ -   Step: 2043, LR: 1.6184116368273436e-05, Loss: 648.5845947265625
2024-08-03T11:59:13.868486778Z 
 22%|██▏       | 2044/9500 [7:01:43<25:30:38, 12.32s/it]08/03/2024 04:59:13 - INFO - __main__ -   Step: 2044, LR: 1.6181945824586156e-05, Loss: 580.322021484375
2024-08-03T11:59:26.196884719Z 
 22%|██▏       | 2045/9500 [7:01:56<25:30:50, 12.32s/it]08/03/2024 04:59:26 - INFO - __main__ -   Step: 2045, LR: 1.617977528089888e-05, Loss: 555.75
2024-08-03T11:59:38.567625483Z 
 22%|██▏       | 2046/9500 [7:02:08<25:32:30, 12.34s/it]08/03/2024 04:59:38 - INFO - __main__ -   Step: 2046, LR: 1.61776047372116e-05, Loss: 809.9517822265625
2024-08-03T11:59:51.034552558Z 
 22%|██▏       | 2047/9500 [7:02:20<25:37:11, 12.38s/it]08/03/2024 04:59:51 - INFO - __main__ -   Step: 2047, LR: 1.617543419352432e-05, Loss: 644.77734375
2024-08-03T12:00:03.114084992Z 
 22%|██▏       | 2048/9500 [7:02:33<25:25:58, 12.29s/it]08/03/2024 05:00:03 - INFO - __main__ -   Step: 2048, LR: 1.6173263649837042e-05, Loss: 574.947021484375
2024-08-03T12:00:15.373002272Z 
 22%|██▏       | 2049/9500 [7:02:45<25:24:44, 12.28s/it]08/03/2024 05:00:15 - INFO - __main__ -   Step: 2049, LR: 1.617109310614976e-05, Loss: 725.2333374023438
2024-08-03T12:00:27.887140395Z 
 22%|██▏       | 2050/9500 [7:02:57<25:33:19, 12.35s/it]08/03/2024 05:00:27 - INFO - __main__ -   Step: 2050, LR: 1.6168922562462485e-05, Loss: 686.8663330078125
2024-08-03T12:00:40.061939188Z 
 22%|██▏       | 2051/9500 [7:03:09<25:26:37, 12.30s/it]08/03/2024 05:00:40 - INFO - __main__ -   Step: 2051, LR: 1.6166752018775205e-05, Loss: 783.6724853515625
2024-08-03T12:00:52.062095835Z 
 22%|██▏       | 2052/9500 [7:03:21<25:15:23, 12.21s/it]08/03/2024 05:00:52 - INFO - __main__ -   Step: 2052, LR: 1.6164581475087925e-05, Loss: 603.5413818359375
2024-08-03T12:01:04.748499847Z 
 22%|██▏       | 2053/9500 [7:03:34<25:33:00, 12.35s/it]08/03/2024 05:01:04 - INFO - __main__ -   Step: 2053, LR: 1.6162410931400648e-05, Loss: 612.6917724609375
2024-08-03T12:01:16.536529943Z 
 22%|██▏       | 2054/9500 [7:03:46<25:11:49, 12.18s/it]08/03/2024 05:01:16 - INFO - __main__ -   Step: 2054, LR: 1.6160240387713368e-05, Loss: 549.885498046875
2024-08-03T12:01:28.480540404Z 
 22%|██▏       | 2055/9500 [7:03:58<25:02:45, 12.11s/it]08/03/2024 05:01:28 - INFO - __main__ -   Step: 2055, LR: 1.6158069844026088e-05, Loss: 550.0249633789062
2024-08-03T12:01:41.288415596Z 
 22%|██▏       | 2056/9500 [7:04:11<25:28:29, 12.32s/it]08/03/2024 05:01:41 - INFO - __main__ -   Step: 2056, LR: 1.6155899300338807e-05, Loss: 569.5369873046875
2024-08-03T12:01:53.224399847Z 
 22%|██▏       | 2057/9500 [7:04:23<25:13:59, 12.20s/it]08/03/2024 05:01:53 - INFO - __main__ -   Step: 2057, LR: 1.615372875665153e-05, Loss: 474.9454345703125
2024-08-03T12:02:05.369634346Z 
 22%|██▏       | 2058/9500 [7:04:35<25:11:34, 12.19s/it]08/03/2024 05:02:05 - INFO - __main__ -   Step: 2058, LR: 1.615155821296425e-05, Loss: 657.8250732421875
2024-08-03T12:02:17.567877129Z 
 22%|██▏       | 2059/9500 [7:04:47<25:11:48, 12.19s/it]08/03/2024 05:02:17 - INFO - __main__ -   Step: 2059, LR: 1.6149387669276974e-05, Loss: 673.4678344726562
2024-08-03T12:02:30.352064916Z 
 22%|██▏       | 2060/9500 [7:05:00<25:33:41, 12.37s/it]08/03/2024 05:02:30 - INFO - __main__ -   Step: 2060, LR: 1.6147217125589694e-05, Loss: 771.9443969726562
2024-08-03T12:02:42.375093311Z 
 22%|██▏       | 2061/9500 [7:05:12<25:20:37, 12.26s/it]08/03/2024 05:02:42 - INFO - __main__ -   Step: 2061, LR: 1.6145046581902414e-05, Loss: 653.33984375
2024-08-03T12:02:54.497334891Z 
 22%|██▏       | 2062/9500 [7:05:24<25:15:07, 12.22s/it]08/03/2024 05:02:54 - INFO - __main__ -   Step: 2062, LR: 1.6142876038215137e-05, Loss: 653.200927734375
2024-08-03T12:03:07.356760392Z 
 22%|██▏       | 2063/9500 [7:05:37<25:38:37, 12.41s/it]08/03/2024 05:03:07 - INFO - __main__ -   Step: 2063, LR: 1.6140705494527857e-05, Loss: 596.294921875
2024-08-03T12:03:19.136103007Z 
 22%|██▏       | 2064/9500 [7:05:49<25:14:50, 12.22s/it]08/03/2024 05:03:19 - INFO - __main__ -   Step: 2064, LR: 1.613853495084058e-05, Loss: 518.517578125
2024-08-03T12:03:31.086826452Z 
 22%|██▏       | 2065/9500 [7:06:01<25:04:31, 12.14s/it]08/03/2024 05:03:31 - INFO - __main__ -   Step: 2065, LR: 1.61363644071533e-05, Loss: 496.0123291015625
2024-08-03T12:03:43.524489665Z 
 22%|██▏       | 2066/9500 [7:06:13<25:15:19, 12.23s/it]08/03/2024 05:03:43 - INFO - __main__ -   Step: 2066, LR: 1.613419386346602e-05, Loss: 698.3241577148438
2024-08-03T12:03:55.524644592Z 
 22%|██▏       | 2067/9500 [7:06:25<25:06:34, 12.16s/it]08/03/2024 05:03:55 - INFO - __main__ -   Step: 2067, LR: 1.6132023319778743e-05, Loss: 512.17236328125
2024-08-03T12:04:07.805333109Z 
 22%|██▏       | 2068/9500 [7:06:37<25:10:48, 12.20s/it]08/03/2024 05:04:07 - INFO - __main__ -   Step: 2068, LR: 1.6129852776091463e-05, Loss: 570.6806640625
2024-08-03T12:04:20.334175751Z 
 22%|██▏       | 2069/9500 [7:06:50<25:22:56, 12.30s/it]08/03/2024 05:04:20 - INFO - __main__ -   Step: 2069, LR: 1.6127682232404183e-05, Loss: 584.08056640625
2024-08-03T12:04:32.437370188Z 
 22%|██▏       | 2070/9500 [7:07:02<25:15:32, 12.24s/it]08/03/2024 05:04:32 - INFO - __main__ -   Step: 2070, LR: 1.6125511688716902e-05, Loss: 819.136474609375
2024-08-03T12:04:44.365028908Z 
 22%|██▏       | 2071/9500 [7:07:14<25:03:47, 12.15s/it]08/03/2024 05:04:44 - INFO - __main__ -   Step: 2071, LR: 1.6123341145029626e-05, Loss: 573.0010986328125
2024-08-03T12:04:56.631674315Z 
 22%|██▏       | 2072/9500 [7:07:26<25:08:06, 12.18s/it]08/03/2024 05:04:56 - INFO - __main__ -   Step: 2072, LR: 1.6121170601342346e-05, Loss: 564.9755249023438
2024-08-03T12:05:08.677716407Z 
 22%|██▏       | 2073/9500 [7:07:38<25:02:51, 12.14s/it]08/03/2024 05:05:08 - INFO - __main__ -   Step: 2073, LR: 1.611900005765507e-05, Loss: 513.0689086914062
2024-08-03T12:05:20.982027153Z 
 22%|██▏       | 2074/9500 [7:07:50<25:08:43, 12.19s/it]08/03/2024 05:05:20 - INFO - __main__ -   Step: 2074, LR: 1.611682951396779e-05, Loss: 768.2212524414062
2024-08-03T12:05:33.356556235Z 
 22%|██▏       | 2075/9500 [7:08:03<25:15:21, 12.25s/it]08/03/2024 05:05:33 - INFO - __main__ -   Step: 2075, LR: 1.611465897028051e-05, Loss: 551.2652587890625
2024-08-03T12:05:45.701414114Z 
 22%|██▏       | 2076/9500 [7:08:15<25:18:51, 12.28s/it]08/03/2024 05:05:45 - INFO - __main__ -   Step: 2076, LR: 1.6112488426593232e-05, Loss: 635.7034912109375
2024-08-03T12:05:57.820752399Z 
 22%|██▏       | 2077/9500 [7:08:27<25:12:52, 12.23s/it]08/03/2024 05:05:57 - INFO - __main__ -   Step: 2077, LR: 1.6110317882905952e-05, Loss: 750.7672119140625
2024-08-03T12:06:10.223487720Z 
 22%|██▏       | 2078/9500 [7:08:40<25:19:07, 12.28s/it]08/03/2024 05:06:10 - INFO - __main__ -   Step: 2078, LR: 1.6108147339218675e-05, Loss: 494.1845703125
2024-08-03T12:06:22.698549136Z 
 22%|██▏       | 2079/9500 [7:08:52<25:26:07, 12.34s/it]08/03/2024 05:06:22 - INFO - __main__ -   Step: 2079, LR: 1.6105976795531395e-05, Loss: 625.8994140625
2024-08-03T12:06:34.909755140Z 
 22%|██▏       | 2080/9500 [7:09:04<25:21:11, 12.30s/it]08/03/2024 05:06:34 - INFO - __main__ -   Step: 2080, LR: 1.6103806251844115e-05, Loss: 717.4046020507812
2024-08-03T12:06:47.525305152Z 
 22%|██▏       | 2081/9500 [7:09:17<25:32:39, 12.40s/it]08/03/2024 05:06:47 - INFO - __main__ -   Step: 2081, LR: 1.6101635708156838e-05, Loss: 659.947998046875
2024-08-03T12:07:00.040502541Z 
 22%|██▏       | 2082/9500 [7:09:29<25:36:51, 12.43s/it]08/03/2024 05:07:00 - INFO - __main__ -   Step: 2082, LR: 1.6099465164469558e-05, Loss: 567.4102172851562
2024-08-03T12:07:12.329043166Z 
 22%|██▏       | 2083/9500 [7:09:42<25:31:25, 12.39s/it]08/03/2024 05:07:12 - INFO - __main__ -   Step: 2083, LR: 1.6097294620782278e-05, Loss: 602.7384033203125
2024-08-03T12:07:25.010132717Z 
 22%|██▏       | 2084/9500 [7:09:54<25:42:03, 12.48s/it]08/03/2024 05:07:25 - INFO - __main__ -   Step: 2084, LR: 1.6095124077094998e-05, Loss: 677.372802734375
2024-08-03T12:07:37.215345689Z 
 22%|██▏       | 2085/9500 [7:10:07<25:31:47, 12.39s/it]08/03/2024 05:07:37 - INFO - __main__ -   Step: 2085, LR: 1.609295353340772e-05, Loss: 596.8770751953125
2024-08-03T12:07:49.211456455Z 
 22%|██▏       | 2086/9500 [7:10:19<25:16:49, 12.28s/it]08/03/2024 05:07:49 - INFO - __main__ -   Step: 2086, LR: 1.609078298972044e-05, Loss: 604.1216430664062
2024-08-03T12:08:01.786855026Z 
 22%|██▏       | 2087/9500 [7:10:31<25:27:44, 12.37s/it]08/03/2024 05:08:01 - INFO - __main__ -   Step: 2087, LR: 1.6088612446033164e-05, Loss: 596.1461791992188
2024-08-03T12:08:13.621581636Z 
 22%|██▏       | 2088/9500 [7:10:43<25:07:52, 12.21s/it]08/03/2024 05:08:13 - INFO - __main__ -   Step: 2088, LR: 1.6086441902345884e-05, Loss: 455.9210510253906
2024-08-03T12:08:25.725246557Z 
 22%|██▏       | 2089/9500 [7:10:55<25:03:52, 12.18s/it]08/03/2024 05:08:25 - INFO - __main__ -   Step: 2089, LR: 1.6084271358658604e-05, Loss: 599.6893310546875
2024-08-03T12:08:38.355538106Z 
 22%|██▏       | 2090/9500 [7:11:08<25:20:30, 12.31s/it]08/03/2024 05:08:38 - INFO - __main__ -   Step: 2090, LR: 1.6082100814971327e-05, Loss: 655.9322509765625
2024-08-03T12:08:50.549434216Z 
 22%|██▏       | 2091/9500 [7:11:20<25:15:56, 12.28s/it]08/03/2024 05:08:50 - INFO - __main__ -   Step: 2091, LR: 1.6079930271284047e-05, Loss: 636.326416015625
2024-08-03T12:09:02.914503245Z 
 22%|██▏       | 2092/9500 [7:11:32<25:19:00, 12.30s/it]08/03/2024 05:09:02 - INFO - __main__ -   Step: 2092, LR: 1.607775972759677e-05, Loss: 696.2335815429688
2024-08-03T12:09:15.726766378Z 
 22%|██▏       | 2093/9500 [7:11:45<25:37:40, 12.46s/it]08/03/2024 05:09:15 - INFO - __main__ -   Step: 2093, LR: 1.607558918390949e-05, Loss: 654.6015014648438
2024-08-03T12:09:27.811171556Z 
 22%|██▏       | 2094/9500 [7:11:57<25:23:43, 12.34s/it]08/03/2024 05:09:27 - INFO - __main__ -   Step: 2094, LR: 1.607341864022221e-05, Loss: 626.4652099609375
2024-08-03T12:09:40.370627232Z 
 22%|██▏       | 2095/9500 [7:12:10<25:31:27, 12.41s/it]08/03/2024 05:09:40 - INFO - __main__ -   Step: 2095, LR: 1.6071248096534933e-05, Loss: 761.8540649414062
2024-08-03T12:09:52.691276044Z 
 22%|██▏       | 2096/9500 [7:12:22<25:27:59, 12.38s/it]08/03/2024 05:09:52 - INFO - __main__ -   Step: 2096, LR: 1.6069077552847653e-05, Loss: 607.911865234375
2024-08-03T12:10:04.783539460Z 
 22%|██▏       | 2097/9500 [7:12:34<25:17:02, 12.30s/it]08/03/2024 05:10:04 - INFO - __main__ -   Step: 2097, LR: 1.6066907009160373e-05, Loss: 601.956298828125
2024-08-03T12:10:16.866157765Z 
 22%|██▏       | 2098/9500 [7:12:46<25:08:58, 12.23s/it]08/03/2024 05:10:16 - INFO - __main__ -   Step: 2098, LR: 1.6064736465473093e-05, Loss: 640.920654296875
2024-08-03T12:10:29.268765519Z 
 22%|██▏       | 2099/9500 [7:12:59<25:15:04, 12.28s/it]08/03/2024 05:10:29 - INFO - __main__ -   Step: 2099, LR: 1.6062565921785816e-05, Loss: 654.1055297851562
2024-08-03T12:10:41.603588904Z 
 22%|██▏       | 2100/9500 [7:13:11<25:16:48, 12.30s/it]08/03/2024 05:10:41 - INFO - __main__ -   Step: 2100, LR: 1.6060395378098536e-05, Loss: 557.985595703125
2024-08-03T12:10:53.791945734Z 
 22%|██▏       | 2101/9500 [7:13:23<25:12:31, 12.27s/it]08/03/2024 05:10:53 - INFO - __main__ -   Step: 2101, LR: 1.605822483441126e-05, Loss: 737.31494140625
2024-08-03T12:11:06.146140500Z 
 22%|██▏       | 2102/9500 [7:13:36<25:15:36, 12.29s/it]08/03/2024 05:11:06 - INFO - __main__ -   Step: 2102, LR: 1.605605429072398e-05, Loss: 696.5487060546875
2024-08-03T12:11:18.857999655Z 
 22%|██▏       | 2103/9500 [7:13:48<25:30:56, 12.42s/it]08/03/2024 05:11:18 - INFO - __main__ -   Step: 2103, LR: 1.60538837470367e-05, Loss: 804.138427734375
2024-08-03T12:11:31.559931173Z 
 22%|██▏       | 2104/9500 [7:14:01<25:41:13, 12.50s/it]08/03/2024 05:11:31 - INFO - __main__ -   Step: 2104, LR: 1.6051713203349422e-05, Loss: 607.126953125
2024-08-03T12:11:43.670728046Z 
 22%|██▏       | 2105/9500 [7:14:13<25:26:30, 12.39s/it]08/03/2024 05:11:43 - INFO - __main__ -   Step: 2105, LR: 1.6049542659662142e-05, Loss: 729.1298217773438
2024-08-03T12:11:56.337713335Z 
 22%|██▏       | 2106/9500 [7:14:26<25:36:42, 12.47s/it]08/03/2024 05:11:56 - INFO - __main__ -   Step: 2106, LR: 1.6047372115974865e-05, Loss: 628.5665283203125
2024-08-03T12:12:08.492255466Z 
 22%|██▏       | 2107/9500 [7:14:38<25:24:50, 12.38s/it]08/03/2024 05:12:08 - INFO - __main__ -   Step: 2107, LR: 1.6045201572287585e-05, Loss: 597.3895874023438
2024-08-03T12:12:20.620346886Z 
 22%|██▏       | 2108/9500 [7:14:50<25:15:30, 12.30s/it]08/03/2024 05:12:20 - INFO - __main__ -   Step: 2108, LR: 1.6043031028600308e-05, Loss: 613.2603149414062
2024-08-03T12:12:33.069945441Z 
 22%|██▏       | 2109/9500 [7:15:03<25:20:46, 12.35s/it]08/03/2024 05:12:33 - INFO - __main__ -   Step: 2109, LR: 1.6040860484913028e-05, Loss: 525.3958740234375
2024-08-03T12:12:45.209305233Z 
 22%|██▏       | 2110/9500 [7:15:15<25:12:56, 12.28s/it]08/03/2024 05:12:45 - INFO - __main__ -   Step: 2110, LR: 1.6038689941225748e-05, Loss: 535.257080078125
2024-08-03T12:12:57.594908305Z 
 22%|██▏       | 2111/9500 [7:15:27<25:16:31, 12.31s/it]08/03/2024 05:12:57 - INFO - __main__ -   Step: 2111, LR: 1.6036519397538468e-05, Loss: 670.0643920898438
2024-08-03T12:13:09.985273469Z 
 22%|██▏       | 2112/9500 [7:15:39<25:19:06, 12.34s/it]08/03/2024 05:13:09 - INFO - __main__ -   Step: 2112, LR: 1.6034348853851188e-05, Loss: 646.7947387695312
2024-08-03T12:13:22.310704851Z 
 22%|██▏       | 2113/9500 [7:15:52<25:18:28, 12.33s/it]08/03/2024 05:13:22 - INFO - __main__ -   Step: 2113, LR: 1.603217831016391e-05, Loss: 520.1421508789062
2024-08-03T12:13:34.890012322Z 
 22%|██▏       | 2114/9500 [7:16:04<25:27:20, 12.41s/it]08/03/2024 05:13:34 - INFO - __main__ -   Step: 2114, LR: 1.603000776647663e-05, Loss: 782.933349609375
2024-08-03T12:13:47.151835260Z 
 22%|██▏       | 2115/9500 [7:16:17<25:21:45, 12.36s/it]08/03/2024 05:13:47 - INFO - __main__ -   Step: 2115, LR: 1.6027837222789354e-05, Loss: 486.19366455078125
2024-08-03T12:13:59.426703398Z 
 22%|██▏       | 2116/9500 [7:16:29<25:18:16, 12.34s/it]08/03/2024 05:13:59 - INFO - __main__ -   Step: 2116, LR: 1.6025666679102074e-05, Loss: 661.9835205078125
2024-08-03T12:14:11.632235934Z 
 22%|██▏       | 2117/9500 [7:16:41<25:13:13, 12.30s/it]08/03/2024 05:14:11 - INFO - __main__ -   Step: 2117, LR: 1.6023496135414797e-05, Loss: 652.3546142578125
2024-08-03T12:14:24.046149192Z 
 22%|██▏       | 2118/9500 [7:16:53<25:17:18, 12.33s/it]08/03/2024 05:14:24 - INFO - __main__ -   Step: 2118, LR: 1.6021325591727517e-05, Loss: 549.656494140625
2024-08-03T12:14:36.128547211Z 
 22%|██▏       | 2119/9500 [7:17:06<25:07:51, 12.26s/it]08/03/2024 05:14:36 - INFO - __main__ -   Step: 2119, LR: 1.6019155048040237e-05, Loss: 445.7073974609375
2024-08-03T12:14:48.047597316Z 
 22%|██▏       | 2120/9500 [7:17:17<24:55:10, 12.16s/it]08/03/2024 05:14:48 - INFO - __main__ -   Step: 2120, LR: 1.601698450435296e-05, Loss: 534.0565795898438
2024-08-03T12:15:00.488166331Z 
 22%|██▏       | 2121/9500 [7:17:30<25:05:29, 12.24s/it]08/03/2024 05:15:00 - INFO - __main__ -   Step: 2121, LR: 1.601481396066568e-05, Loss: 558.3841552734375
2024-08-03T12:15:12.734888726Z 
 22%|██▏       | 2122/9500 [7:17:42<25:05:28, 12.24s/it]08/03/2024 05:15:12 - INFO - __main__ -   Step: 2122, LR: 1.6012643416978403e-05, Loss: 648.1430053710938
2024-08-03T12:15:25.090345477Z 
 22%|██▏       | 2123/9500 [7:17:55<25:09:25, 12.28s/it]08/03/2024 05:15:25 - INFO - __main__ -   Step: 2123, LR: 1.6010472873291123e-05, Loss: 688.660888671875
2024-08-03T12:15:37.873165701Z 
 22%|██▏       | 2124/9500 [7:18:07<25:27:52, 12.43s/it]08/03/2024 05:15:37 - INFO - __main__ -   Step: 2124, LR: 1.6008302329603843e-05, Loss: 565.6192626953125
2024-08-03T12:15:50.147552501Z 
 22%|██▏       | 2125/9500 [7:18:20<25:21:59, 12.38s/it]08/03/2024 05:15:50 - INFO - __main__ -   Step: 2125, LR: 1.6006131785916563e-05, Loss: 497.8543395996094
2024-08-03T12:16:02.346831171Z 
 22%|██▏       | 2126/9500 [7:18:32<25:15:01, 12.33s/it]08/03/2024 05:16:02 - INFO - __main__ -   Step: 2126, LR: 1.6003961242229286e-05, Loss: 556.38427734375
2024-08-03T12:16:14.555909694Z 
 22%|██▏       | 2127/9500 [7:18:44<25:10:28, 12.29s/it]08/03/2024 05:16:14 - INFO - __main__ -   Step: 2127, LR: 1.6001790698542006e-05, Loss: 526.21875
2024-08-03T12:16:27.170036390Z 
 22%|██▏       | 2128/9500 [7:18:57<25:22:08, 12.39s/it]08/03/2024 05:16:27 - INFO - __main__ -   Step: 2128, LR: 1.5999620154854726e-05, Loss: 621.7055053710938
2024-08-03T12:16:39.021218679Z 
 22%|██▏       | 2129/9500 [7:19:08<25:02:08, 12.23s/it]08/03/2024 05:16:39 - INFO - __main__ -   Step: 2129, LR: 1.599744961116745e-05, Loss: 585.8612060546875
2024-08-03T12:16:51.755363507Z 
 22%|██▏       | 2130/9500 [7:19:21<25:20:36, 12.38s/it]08/03/2024 05:16:51 - INFO - __main__ -   Step: 2130, LR: 1.599527906748017e-05, Loss: 755.65185546875
2024-08-03T12:17:04.236973940Z 
 22%|██▏       | 2131/9500 [7:19:34<25:24:09, 12.41s/it]08/03/2024 05:17:04 - INFO - __main__ -   Step: 2131, LR: 1.5993108523792892e-05, Loss: 497.02984619140625
2024-08-03T12:17:16.371506486Z 
 22%|██▏       | 2132/9500 [7:19:46<25:13:48, 12.33s/it]08/03/2024 05:17:16 - INFO - __main__ -   Step: 2132, LR: 1.5990937980105612e-05, Loss: 654.583984375
2024-08-03T12:17:28.919102835Z 
 22%|██▏       | 2133/9500 [7:19:58<25:21:42, 12.39s/it]08/03/2024 05:17:28 - INFO - __main__ -   Step: 2133, LR: 1.5988767436418332e-05, Loss: 767.401123046875
2024-08-03T12:17:41.437671687Z 
 22%|██▏       | 2134/9500 [7:20:11<25:26:06, 12.43s/it]08/03/2024 05:17:41 - INFO - __main__ -   Step: 2134, LR: 1.5986596892731055e-05, Loss: 776.013427734375
2024-08-03T12:17:53.894488859Z 
 22%|██▏       | 2135/9500 [7:20:23<25:26:50, 12.44s/it]08/03/2024 05:17:53 - INFO - __main__ -   Step: 2135, LR: 1.5984426349043775e-05, Loss: 560.1528930664062
2024-08-03T12:18:06.449952330Z 
 22%|██▏       | 2136/9500 [7:20:36<25:30:56, 12.47s/it]08/03/2024 05:18:06 - INFO - __main__ -   Step: 2136, LR: 1.59822558053565e-05, Loss: 593.68603515625
2024-08-03T12:18:18.509107580Z 
 22%|██▏       | 2137/9500 [7:20:48<25:15:28, 12.35s/it]08/03/2024 05:18:18 - INFO - __main__ -   Step: 2137, LR: 1.5980085261669218e-05, Loss: 582.8533935546875
2024-08-03T12:18:30.649232695Z 
 23%|██▎       | 2138/9500 [7:21:00<25:07:34, 12.29s/it]08/03/2024 05:18:30 - INFO - __main__ -   Step: 2138, LR: 1.5977914717981938e-05, Loss: 669.8233032226562
2024-08-03T12:18:43.305784074Z 
 23%|██▎       | 2139/9500 [7:21:13<25:20:58, 12.40s/it]08/03/2024 05:18:43 - INFO - __main__ -   Step: 2139, LR: 1.5975744174294658e-05, Loss: 748.8984985351562
2024-08-03T12:18:55.683850536Z 
 23%|██▎       | 2140/9500 [7:21:25<25:20:03, 12.39s/it]08/03/2024 05:18:55 - INFO - __main__ -   Step: 2140, LR: 1.597357363060738e-05, Loss: 658.39453125
2024-08-03T12:19:07.597930761Z 
 23%|██▎       | 2141/9500 [7:21:37<25:02:16, 12.25s/it]08/03/2024 05:19:07 - INFO - __main__ -   Step: 2141, LR: 1.59714030869201e-05, Loss: 464.5201721191406
2024-08-03T12:19:20.007690143Z 
 23%|██▎       | 2142/9500 [7:21:49<25:08:00, 12.30s/it]08/03/2024 05:19:20 - INFO - __main__ -   Step: 2142, LR: 1.596923254323282e-05, Loss: 626.6307373046875
2024-08-03T12:19:32.089447026Z 
 23%|██▎       | 2143/9500 [7:22:02<24:59:52, 12.23s/it]08/03/2024 05:19:32 - INFO - __main__ -   Step: 2143, LR: 1.5967061999545544e-05, Loss: 762.2612915039062
2024-08-03T12:19:44.345176755Z 
 23%|██▎       | 2144/9500 [7:22:14<25:00:32, 12.24s/it]08/03/2024 05:19:44 - INFO - __main__ -   Step: 2144, LR: 1.5964891455858264e-05, Loss: 659.84765625
2024-08-03T12:19:56.336808703Z 
 23%|██▎       | 2145/9500 [7:22:26<24:51:13, 12.16s/it]08/03/2024 05:19:56 - INFO - __main__ -   Step: 2145, LR: 1.5962720912170987e-05, Loss: 607.6453857421875
2024-08-03T12:20:08.704068648Z 
 23%|██▎       | 2146/9500 [7:22:38<24:58:27, 12.23s/it]08/03/2024 05:20:08 - INFO - __main__ -   Step: 2146, LR: 1.5960550368483707e-05, Loss: 646.8848876953125
2024-08-03T12:20:20.634981652Z 
 23%|██▎       | 2147/9500 [7:22:50<24:47:25, 12.14s/it]08/03/2024 05:20:20 - INFO - __main__ -   Step: 2147, LR: 1.5958379824796427e-05, Loss: 554.0164794921875
2024-08-03T12:20:32.917229104Z 
 23%|██▎       | 2148/9500 [7:23:02<24:52:32, 12.18s/it]08/03/2024 05:20:32 - INFO - __main__ -   Step: 2148, LR: 1.595620928110915e-05, Loss: 645.1239013671875
2024-08-03T12:20:45.173161339Z 
 23%|██▎       | 2149/9500 [7:23:15<24:55:06, 12.20s/it]08/03/2024 05:20:45 - INFO - __main__ -   Step: 2149, LR: 1.595403873742187e-05, Loss: 533.02880859375
2024-08-03T12:20:57.549168373Z 
 23%|██▎       | 2150/9500 [7:23:27<25:01:15, 12.26s/it]08/03/2024 05:20:57 - INFO - __main__ -   Step: 2150, LR: 1.5951868193734593e-05, Loss: 658.980712890625
2024-08-03T12:21:09.998731825Z 
 23%|██▎       | 2151/9500 [7:23:39<25:08:11, 12.31s/it]08/03/2024 05:21:09 - INFO - __main__ -   Step: 2151, LR: 1.5949697650047313e-05, Loss: 580.6454467773438
2024-08-03T12:21:22.611468084Z 
 23%|██▎       | 2152/9500 [7:23:52<25:18:58, 12.40s/it]08/03/2024 05:21:22 - INFO - __main__ -   Step: 2152, LR: 1.5947527106360033e-05, Loss: 774.7318725585938
2024-08-03T12:21:34.862525397Z 
 23%|██▎       | 2153/9500 [7:24:04<25:13:10, 12.36s/it]08/03/2024 05:21:34 - INFO - __main__ -   Step: 2153, LR: 1.5945356562672753e-05, Loss: 731.23974609375
2024-08-03T12:21:46.874288110Z 
 23%|██▎       | 2154/9500 [7:24:16<25:00:16, 12.25s/it]08/03/2024 05:21:46 - INFO - __main__ -   Step: 2154, LR: 1.5943186018985476e-05, Loss: 655.2467041015625
2024-08-03T12:21:59.328473998Z 
 23%|██▎       | 2155/9500 [7:24:29<25:07:26, 12.31s/it]08/03/2024 05:21:59 - INFO - __main__ -   Step: 2155, LR: 1.5941015475298196e-05, Loss: 733.5064086914062
2024-08-03T12:22:11.789852308Z 
 23%|██▎       | 2156/9500 [7:24:41<25:12:37, 12.36s/it]08/03/2024 05:22:11 - INFO - __main__ -   Step: 2156, LR: 1.5938844931610916e-05, Loss: 711.500732421875
2024-08-03T12:22:23.942046201Z 
 23%|██▎       | 2157/9500 [7:24:53<25:04:52, 12.30s/it]08/03/2024 05:22:23 - INFO - __main__ -   Step: 2157, LR: 1.593667438792364e-05, Loss: 646.3858642578125
2024-08-03T12:22:36.325814353Z 
 23%|██▎       | 2158/9500 [7:25:06<25:07:52, 12.32s/it]08/03/2024 05:22:36 - INFO - __main__ -   Step: 2158, LR: 1.593450384423636e-05, Loss: 596.4776000976562
2024-08-03T12:22:48.518226717Z 
 23%|██▎       | 2159/9500 [7:25:18<25:02:53, 12.28s/it]08/03/2024 05:22:48 - INFO - __main__ -   Step: 2159, LR: 1.5932333300549082e-05, Loss: 716.0686645507812
2024-08-03T12:23:00.640119092Z 
 23%|██▎       | 2160/9500 [7:25:30<24:56:45, 12.24s/it]08/03/2024 05:23:00 - INFO - __main__ -   Step: 2160, LR: 1.5930162756861802e-05, Loss: 637.814453125
2024-08-03T12:23:13.047216429Z 
 23%|██▎       | 2161/9500 [7:25:42<25:02:51, 12.29s/it]08/03/2024 05:23:13 - INFO - __main__ -   Step: 2161, LR: 1.5927992213174522e-05, Loss: 693.09033203125
2024-08-03T12:23:25.054848988Z 
 23%|██▎       | 2162/9500 [7:25:54<24:52:25, 12.20s/it]08/03/2024 05:23:25 - INFO - __main__ -   Step: 2162, LR: 1.5925821669487245e-05, Loss: 687.7648315429688
2024-08-03T12:23:37.342451034Z 
 23%|██▎       | 2163/9500 [7:26:07<24:55:19, 12.23s/it]08/03/2024 05:23:37 - INFO - __main__ -   Step: 2163, LR: 1.5923651125799965e-05, Loss: 682.52978515625
2024-08-03T12:23:49.655178133Z 
 23%|██▎       | 2164/9500 [7:26:19<24:58:13, 12.25s/it]08/03/2024 05:23:49 - INFO - __main__ -   Step: 2164, LR: 1.592148058211269e-05, Loss: 503.95257568359375
2024-08-03T12:24:01.707098997Z 
 23%|██▎       | 2165/9500 [7:26:31<24:50:36, 12.19s/it]08/03/2024 05:24:01 - INFO - __main__ -   Step: 2165, LR: 1.5919310038425408e-05, Loss: 614.4696655273438
2024-08-03T12:24:13.513199405Z 
 23%|██▎       | 2166/9500 [7:26:43<24:36:12, 12.08s/it]08/03/2024 05:24:13 - INFO - __main__ -   Step: 2166, LR: 1.5917139494738128e-05, Loss: 448.62506103515625
2024-08-03T12:24:26.175963306Z 
 23%|██▎       | 2167/9500 [7:26:56<24:57:29, 12.25s/it]08/03/2024 05:24:26 - INFO - __main__ -   Step: 2167, LR: 1.5914968951050848e-05, Loss: 653.8980712890625
2024-08-03T12:24:38.404895049Z 
 23%|██▎       | 2168/9500 [7:27:08<24:56:24, 12.25s/it]08/03/2024 05:24:38 - INFO - __main__ -   Step: 2168, LR: 1.591279840736357e-05, Loss: 623.0396728515625
2024-08-03T12:24:50.416713755Z 
 23%|██▎       | 2169/9500 [7:27:20<24:47:38, 12.18s/it]08/03/2024 05:24:50 - INFO - __main__ -   Step: 2169, LR: 1.591062786367629e-05, Loss: 604.7554321289062
2024-08-03T12:25:03.347953956Z 
 23%|██▎       | 2170/9500 [7:27:33<25:15:07, 12.40s/it]08/03/2024 05:25:03 - INFO - __main__ -   Step: 2170, LR: 1.590845731998901e-05, Loss: 599.61328125
2024-08-03T12:25:15.493333419Z 
 23%|██▎       | 2171/9500 [7:27:45<25:05:31, 12.33s/it]08/03/2024 05:25:15 - INFO - __main__ -   Step: 2171, LR: 1.5906286776301734e-05, Loss: 585.4552001953125
2024-08-03T12:25:27.884602628Z 
 23%|██▎       | 2172/9500 [7:27:57<25:07:44, 12.35s/it]08/03/2024 05:25:27 - INFO - __main__ -   Step: 2172, LR: 1.5904116232614454e-05, Loss: 636.1881103515625
2024-08-03T12:25:40.312203361Z 
 23%|██▎       | 2173/9500 [7:28:10<25:10:31, 12.37s/it]08/03/2024 05:25:40 - INFO - __main__ -   Step: 2173, LR: 1.5901945688927177e-05, Loss: 571.9495849609375
2024-08-03T12:25:52.700493365Z 
 23%|██▎       | 2174/9500 [7:28:22<25:11:02, 12.38s/it]08/03/2024 05:25:52 - INFO - __main__ -   Step: 2174, LR: 1.5899775145239897e-05, Loss: 586.52734375
2024-08-03T12:26:04.750598313Z 
 23%|██▎       | 2175/9500 [7:28:34<24:58:54, 12.28s/it]08/03/2024 05:26:04 - INFO - __main__ -   Step: 2175, LR: 1.5897604601552617e-05, Loss: 633.2459716796875
2024-08-03T12:26:17.306282055Z 
 23%|██▎       | 2176/9500 [7:28:47<25:08:53, 12.36s/it]08/03/2024 05:26:17 - INFO - __main__ -   Step: 2176, LR: 1.589543405786534e-05, Loss: 550.3414916992188
2024-08-03T12:26:29.548149460Z 
 23%|██▎       | 2177/9500 [7:28:59<25:04:18, 12.33s/it]08/03/2024 05:26:29 - INFO - __main__ -   Step: 2177, LR: 1.589326351417806e-05, Loss: 636.4725952148438
2024-08-03T12:26:41.521590349Z 
 23%|██▎       | 2178/9500 [7:29:11<24:51:14, 12.22s/it]08/03/2024 05:26:41 - INFO - __main__ -   Step: 2178, LR: 1.5891092970490783e-05, Loss: 563.02880859375
2024-08-03T12:26:54.235272658Z 
 23%|██▎       | 2179/9500 [7:29:24<25:09:05, 12.37s/it]08/03/2024 05:26:54 - INFO - __main__ -   Step: 2179, LR: 1.5888922426803503e-05, Loss: 589.9640502929688
2024-08-03T12:27:06.341680717Z 
 23%|██▎       | 2180/9500 [7:29:36<24:59:18, 12.29s/it]08/03/2024 05:27:06 - INFO - __main__ -   Step: 2180, LR: 1.5886751883116223e-05, Loss: 551.3598022460938
2024-08-03T12:27:18.302769112Z 
 23%|██▎       | 2181/9500 [7:29:48<24:47:05, 12.19s/it]08/03/2024 05:27:18 - INFO - __main__ -   Step: 2181, LR: 1.5884581339428943e-05, Loss: 649.110107421875
2024-08-03T12:27:30.566243267Z 
 23%|██▎       | 2182/9500 [7:30:00<24:49:33, 12.21s/it]08/03/2024 05:27:30 - INFO - __main__ -   Step: 2182, LR: 1.5882410795741666e-05, Loss: 455.3828125
2024-08-03T12:27:42.472511553Z 
 23%|██▎       | 2183/9500 [7:30:12<24:38:07, 12.12s/it]08/03/2024 05:27:42 - INFO - __main__ -   Step: 2183, LR: 1.5880240252054386e-05, Loss: 568.9097900390625
2024-08-03T12:27:54.615905694Z 
 23%|██▎       | 2184/9500 [7:30:24<24:38:45, 12.13s/it]08/03/2024 05:27:54 - INFO - __main__ -   Step: 2184, LR: 1.5878069708367106e-05, Loss: 557.6929931640625
2024-08-03T12:28:06.887580315Z 
 23%|██▎       | 2185/9500 [7:30:36<24:43:50, 12.17s/it]08/03/2024 05:28:06 - INFO - __main__ -   Step: 2185, LR: 1.587589916467983e-05, Loss: 686.7825317382812
2024-08-03T12:28:19.367919266Z 
 23%|██▎       | 2186/9500 [7:30:49<24:54:56, 12.26s/it]08/03/2024 05:28:19 - INFO - __main__ -   Step: 2186, LR: 1.587372862099255e-05, Loss: 547.3431396484375
2024-08-03T12:28:31.410595153Z 
 23%|██▎       | 2187/9500 [7:31:01<24:46:40, 12.20s/it]08/03/2024 05:28:31 - INFO - __main__ -   Step: 2187, LR: 1.5871558077305272e-05, Loss: 674.001953125
2024-08-03T12:28:43.484781289Z 
 23%|██▎       | 2188/9500 [7:31:13<24:41:57, 12.16s/it]08/03/2024 05:28:43 - INFO - __main__ -   Step: 2188, LR: 1.5869387533617992e-05, Loss: 613.6165161132812
2024-08-03T12:28:55.912321255Z 
 23%|██▎       | 2189/9500 [7:31:25<24:51:30, 12.24s/it]08/03/2024 05:28:55 - INFO - __main__ -   Step: 2189, LR: 1.5867216989930712e-05, Loss: 666.7764282226562
2024-08-03T12:29:07.854398836Z 
 23%|██▎       | 2190/9500 [7:31:37<24:40:24, 12.15s/it]08/03/2024 05:29:07 - INFO - __main__ -   Step: 2190, LR: 1.5865046446243435e-05, Loss: 511.0210266113281
2024-08-03T12:29:19.993800405Z 
 23%|██▎       | 2191/9500 [7:31:49<24:39:46, 12.15s/it]08/03/2024 05:29:19 - INFO - __main__ -   Step: 2191, LR: 1.5862875902556155e-05, Loss: 684.2421264648438
2024-08-03T12:29:32.701521932Z 
 23%|██▎       | 2192/9500 [7:32:02<25:00:01, 12.32s/it]08/03/2024 05:29:32 - INFO - __main__ -   Step: 2192, LR: 1.586070535886888e-05, Loss: 751.493896484375
2024-08-03T12:29:44.935710937Z 
 23%|██▎       | 2193/9500 [7:32:14<24:56:51, 12.29s/it]08/03/2024 05:29:44 - INFO - __main__ -   Step: 2193, LR: 1.58585348151816e-05, Loss: 468.70947265625
2024-08-03T12:29:57.620092227Z 
 23%|██▎       | 2194/9500 [7:32:27<25:11:01, 12.41s/it]08/03/2024 05:29:57 - INFO - __main__ -   Step: 2194, LR: 1.5856364271494318e-05, Loss: 585.8934936523438
2024-08-03T12:30:10.189271500Z 
 23%|██▎       | 2195/9500 [7:32:40<25:16:39, 12.46s/it]08/03/2024 05:30:10 - INFO - __main__ -   Step: 2195, LR: 1.5854193727807038e-05, Loss: 717.0686645507812
2024-08-03T12:30:22.589890710Z 
 23%|██▎       | 2196/9500 [7:32:52<25:14:21, 12.44s/it]08/03/2024 05:30:22 - INFO - __main__ -   Step: 2196, LR: 1.585202318411976e-05, Loss: 620.651611328125
2024-08-03T12:30:34.710142337Z 
 23%|██▎       | 2197/9500 [7:33:04<25:02:30, 12.34s/it]08/03/2024 05:30:34 - INFO - __main__ -   Step: 2197, LR: 1.584985264043248e-05, Loss: 544.32177734375
2024-08-03T12:30:47.371358982Z 
 23%|██▎       | 2198/9500 [7:33:17<25:13:51, 12.44s/it]08/03/2024 05:30:47 - INFO - __main__ -   Step: 2198, LR: 1.58476820967452e-05, Loss: 539.073486328125
2024-08-03T12:30:59.653859456Z 
 23%|██▎       | 2199/9500 [7:33:29<25:07:56, 12.39s/it]08/03/2024 05:30:59 - INFO - __main__ -   Step: 2199, LR: 1.5845511553057924e-05, Loss: 630.51220703125
2024-08-03T12:31:11.880302699Z 
 23%|██▎       | 2200/9500 [7:33:41<25:01:40, 12.34s/it]08/03/2024 05:31:11 - INFO - __main__ -   Step: 2200, LR: 1.5843341009370644e-05, Loss: 531.9747314453125
2024-08-03T12:31:24.283132871Z 
 23%|██▎       | 2201/9500 [7:33:54<25:03:39, 12.36s/it]08/03/2024 05:31:24 - INFO - __main__ -   Step: 2201, LR: 1.5841170465683368e-05, Loss: 612.664306640625
2024-08-03T12:31:36.567991611Z 
 23%|██▎       | 2202/9500 [7:34:06<25:00:42, 12.34s/it]08/03/2024 05:31:36 - INFO - __main__ -   Step: 2202, LR: 1.5838999921996087e-05, Loss: 667.0511474609375
2024-08-03T12:31:48.663091202Z 
 23%|██▎       | 2203/9500 [7:34:18<24:51:38, 12.27s/it]08/03/2024 05:31:48 - INFO - __main__ -   Step: 2203, LR: 1.583682937830881e-05, Loss: 525.8350219726562
2024-08-03T12:32:01.427628444Z 
 23%|██▎       | 2204/9500 [7:34:31<25:09:39, 12.41s/it]08/03/2024 05:32:01 - INFO - __main__ -   Step: 2204, LR: 1.583465883462153e-05, Loss: 692.166259765625
2024-08-03T12:32:13.512251591Z 
 23%|██▎       | 2205/9500 [7:34:43<24:57:23, 12.32s/it]08/03/2024 05:32:13 - INFO - __main__ -   Step: 2205, LR: 1.583248829093425e-05, Loss: 426.47998046875
2024-08-03T12:32:25.883888848Z 
 23%|██▎       | 2206/9500 [7:34:55<24:59:14, 12.33s/it]08/03/2024 05:32:25 - INFO - __main__ -   Step: 2206, LR: 1.5830317747246974e-05, Loss: 581.147216796875
2024-08-03T12:32:38.663289446Z 
 23%|██▎       | 2207/9500 [7:35:08<25:15:18, 12.47s/it]08/03/2024 05:32:38 - INFO - __main__ -   Step: 2207, LR: 1.5828147203559693e-05, Loss: 616.7203369140625
2024-08-03T12:32:50.858494708Z 
 23%|██▎       | 2208/9500 [7:35:20<25:05:12, 12.39s/it]08/03/2024 05:32:50 - INFO - __main__ -   Step: 2208, LR: 1.5825976659872413e-05, Loss: 594.2469482421875
2024-08-03T12:33:03.382788688Z 
 23%|██▎       | 2209/9500 [7:35:33<25:10:04, 12.43s/it]08/03/2024 05:33:03 - INFO - __main__ -   Step: 2209, LR: 1.5823806116185133e-05, Loss: 725.8841552734375
2024-08-03T12:33:16.042776723Z 
 23%|██▎       | 2210/9500 [7:35:45<25:18:22, 12.50s/it]08/03/2024 05:33:16 - INFO - __main__ -   Step: 2210, LR: 1.5821635572497856e-05, Loss: 776.1488037109375
2024-08-03T12:33:28.249295355Z 
 23%|██▎       | 2211/9500 [7:35:58<25:07:34, 12.41s/it]08/03/2024 05:33:28 - INFO - __main__ -   Step: 2211, LR: 1.5819465028810576e-05, Loss: 729.42919921875
2024-08-03T12:33:40.548935435Z 
 23%|██▎       | 2212/9500 [7:36:10<25:03:21, 12.38s/it]08/03/2024 05:33:40 - INFO - __main__ -   Step: 2212, LR: 1.58172944851233e-05, Loss: 638.9686279296875
2024-08-03T12:33:53.207136964Z 
 23%|██▎       | 2213/9500 [7:36:23<25:13:23, 12.46s/it]08/03/2024 05:33:53 - INFO - __main__ -   Step: 2213, LR: 1.581512394143602e-05, Loss: 609.1105346679688
2024-08-03T12:34:05.178553359Z 
 23%|██▎       | 2214/9500 [7:36:35<24:55:18, 12.31s/it]08/03/2024 05:34:05 - INFO - __main__ -   Step: 2214, LR: 1.581295339774874e-05, Loss: 490.37213134765625
2024-08-03T12:34:17.149850779Z 
 23%|██▎       | 2215/9500 [7:36:47<24:42:40, 12.21s/it]08/03/2024 05:34:17 - INFO - __main__ -   Step: 2215, LR: 1.5810782854061463e-05, Loss: 519.73095703125
2024-08-03T12:34:29.825828247Z 
 23%|██▎       | 2216/9500 [7:36:59<24:59:23, 12.35s/it]08/03/2024 05:34:29 - INFO - __main__ -   Step: 2216, LR: 1.5808612310374182e-05, Loss: 679.0833129882812
2024-08-03T12:34:42.001950768Z 
 23%|██▎       | 2217/9500 [7:37:11<24:52:48, 12.30s/it]08/03/2024 05:34:42 - INFO - __main__ -   Step: 2217, LR: 1.5806441766686906e-05, Loss: 515.8345336914062
2024-08-03T12:34:54.711113007Z 
 23%|██▎       | 2218/9500 [7:37:24<25:07:34, 12.42s/it]08/03/2024 05:34:54 - INFO - __main__ -   Step: 2218, LR: 1.5804271222999626e-05, Loss: 643.9004516601562
2024-08-03T12:35:07.230999944Z 
 23%|██▎       | 2219/9500 [7:37:37<25:10:56, 12.45s/it]08/03/2024 05:35:07 - INFO - __main__ -   Step: 2219, LR: 1.5802100679312345e-05, Loss: 610.99169921875
2024-08-03T12:35:19.733440396Z 
 23%|██▎       | 2220/9500 [7:37:49<25:12:36, 12.47s/it]08/03/2024 05:35:19 - INFO - __main__ -   Step: 2220, LR: 1.579993013562507e-05, Loss: 639.51513671875
2024-08-03T12:35:31.910138575Z 
 23%|██▎       | 2221/9500 [7:38:01<25:01:51, 12.38s/it]08/03/2024 05:35:31 - INFO - __main__ -   Step: 2221, LR: 1.579775959193779e-05, Loss: 559.2247924804688
2024-08-03T12:35:44.285903363Z 
 23%|██▎       | 2222/9500 [7:38:14<25:01:30, 12.38s/it]08/03/2024 05:35:44 - INFO - __main__ -   Step: 2222, LR: 1.579558904825051e-05, Loss: 517.083251953125
2024-08-03T12:35:56.603225987Z 
 23%|██▎       | 2223/9500 [7:38:26<24:59:04, 12.36s/it]08/03/2024 05:35:56 - INFO - __main__ -   Step: 2223, LR: 1.5793418504563228e-05, Loss: 744.749267578125
2024-08-03T12:36:08.571271327Z 
 23%|██▎       | 2224/9500 [7:38:38<24:44:36, 12.24s/it]08/03/2024 05:36:08 - INFO - __main__ -   Step: 2224, LR: 1.579124796087595e-05, Loss: 564.8712158203125
2024-08-03T12:36:20.768930361Z 
 23%|██▎       | 2225/9500 [7:38:50<24:42:46, 12.23s/it]08/03/2024 05:36:20 - INFO - __main__ -   Step: 2225, LR: 1.578907741718867e-05, Loss: 673.1410522460938
2024-08-03T12:36:32.960685434Z 
 23%|██▎       | 2226/9500 [7:39:02<24:41:12, 12.22s/it]08/03/2024 05:36:32 - INFO - __main__ -   Step: 2226, LR: 1.5786906873501395e-05, Loss: 724.7282104492188
2024-08-03T12:36:45.176117635Z 
 23%|██▎       | 2227/9500 [7:39:15<24:40:55, 12.22s/it]08/03/2024 05:36:45 - INFO - __main__ -   Step: 2227, LR: 1.5784736329814114e-05, Loss: 586.3253173828125
2024-08-03T12:36:57.705747362Z 
 23%|██▎       | 2228/9500 [7:39:27<24:52:05, 12.31s/it]08/03/2024 05:36:57 - INFO - __main__ -   Step: 2228, LR: 1.5782565786126834e-05, Loss: 584.2026977539062
2024-08-03T12:37:10.116144115Z 
 23%|██▎       | 2229/9500 [7:39:40<24:55:28, 12.34s/it]08/03/2024 05:37:10 - INFO - __main__ -   Step: 2229, LR: 1.5780395242439558e-05, Loss: 852.2080078125
2024-08-03T12:37:22.309038394Z 
 23%|██▎       | 2230/9500 [7:39:52<24:49:54, 12.30s/it]08/03/2024 05:37:22 - INFO - __main__ -   Step: 2230, LR: 1.5778224698752277e-05, Loss: 660.30810546875
2024-08-03T12:37:34.165719478Z 
 23%|██▎       | 2231/9500 [7:40:04<24:33:43, 12.16s/it]08/03/2024 05:37:34 - INFO - __main__ -   Step: 2231, LR: 1.5776054155065e-05, Loss: 472.9713439941406
2024-08-03T12:37:46.632102533Z 
 23%|██▎       | 2232/9500 [7:40:16<24:44:29, 12.25s/it]08/03/2024 05:37:46 - INFO - __main__ -   Step: 2232, LR: 1.577388361137772e-05, Loss: 665.8203125
2024-08-03T12:37:59.179971441Z 
 24%|██▎       | 2233/9500 [7:40:29<24:54:55, 12.34s/it]08/03/2024 05:37:59 - INFO - __main__ -   Step: 2233, LR: 1.577171306769044e-05, Loss: 773.6468505859375
2024-08-03T12:38:11.400221953Z 
 24%|██▎       | 2234/9500 [7:40:41<24:50:16, 12.31s/it]08/03/2024 05:38:11 - INFO - __main__ -   Step: 2234, LR: 1.5769542524003164e-05, Loss: 739.9112548828125
2024-08-03T12:38:23.906177071Z 
 24%|██▎       | 2235/9500 [7:40:53<24:57:19, 12.37s/it]08/03/2024 05:38:23 - INFO - __main__ -   Step: 2235, LR: 1.5767371980315884e-05, Loss: 544.7308959960938
2024-08-03T12:38:36.097817219Z 
 24%|██▎       | 2236/9500 [7:41:06<24:50:47, 12.31s/it]08/03/2024 05:38:36 - INFO - __main__ -   Step: 2236, LR: 1.5765201436628603e-05, Loss: 667.8046875
2024-08-03T12:38:48.142648062Z 
 24%|██▎       | 2237/9500 [7:41:18<24:40:49, 12.23s/it]08/03/2024 05:38:48 - INFO - __main__ -   Step: 2237, LR: 1.5763030892941323e-05, Loss: 681.8824462890625
2024-08-03T12:39:00.942815940Z 
 24%|██▎       | 2238/9500 [7:41:30<25:01:11, 12.40s/it]08/03/2024 05:39:00 - INFO - __main__ -   Step: 2238, LR: 1.5760860349254047e-05, Loss: 674.2327880859375
2024-08-03T12:39:13.418958031Z 
 24%|██▎       | 2239/9500 [7:41:43<25:03:37, 12.43s/it]08/03/2024 05:39:13 - INFO - __main__ -   Step: 2239, LR: 1.5758689805566766e-05, Loss: 855.7130737304688
2024-08-03T12:39:25.644750936Z 
 24%|██▎       | 2240/9500 [7:41:55<24:56:12, 12.37s/it]08/03/2024 05:39:25 - INFO - __main__ -   Step: 2240, LR: 1.575651926187949e-05, Loss: 705.0435180664062
2024-08-03T12:39:38.250673380Z 
 24%|██▎       | 2241/9500 [7:42:08<25:04:43, 12.44s/it]08/03/2024 05:39:38 - INFO - __main__ -   Step: 2241, LR: 1.575434871819221e-05, Loss: 533.9752807617188
2024-08-03T12:39:50.847094324Z 
 24%|██▎       | 2242/9500 [7:42:20<25:10:17, 12.49s/it]08/03/2024 05:39:50 - INFO - __main__ -   Step: 2242, LR: 1.575217817450493e-05, Loss: 701.390380859375
2024-08-03T12:40:02.940867916Z 
 24%|██▎       | 2243/9500 [7:42:32<24:55:52, 12.37s/it]08/03/2024 05:40:02 - INFO - __main__ -   Step: 2243, LR: 1.5750007630817653e-05, Loss: 664.48876953125
2024-08-03T12:40:15.360754899Z 
 24%|██▎       | 2244/9500 [7:42:45<24:57:33, 12.38s/it]08/03/2024 05:40:15 - INFO - __main__ -   Step: 2244, LR: 1.5747837087130373e-05, Loss: 619.1005249023438
2024-08-03T12:40:27.278842900Z 
 24%|██▎       | 2245/9500 [7:42:57<24:40:29, 12.24s/it]08/03/2024 05:40:27 - INFO - __main__ -   Step: 2245, LR: 1.5745666543443096e-05, Loss: 615.1058349609375
2024-08-03T12:40:39.708355278Z 
 24%|██▎       | 2246/9500 [7:43:09<24:47:00, 12.30s/it]08/03/2024 05:40:39 - INFO - __main__ -   Step: 2246, LR: 1.5743495999755816e-05, Loss: 735.0369873046875
2024-08-03T12:40:51.967582879Z 
 24%|██▎       | 2247/9500 [7:43:21<24:45:20, 12.29s/it]08/03/2024 05:40:51 - INFO - __main__ -   Step: 2247, LR: 1.5741325456068536e-05, Loss: 598.62890625
2024-08-03T12:41:04.064729433Z 
 24%|██▎       | 2248/9500 [7:43:34<24:38:14, 12.23s/it]08/03/2024 05:41:04 - INFO - __main__ -   Step: 2248, LR: 1.573915491238126e-05, Loss: 700.11181640625
2024-08-03T12:41:15.995594686Z 
 24%|██▎       | 2249/9500 [7:43:45<24:27:10, 12.14s/it]08/03/2024 05:41:15 - INFO - __main__ -   Step: 2249, LR: 1.573698436869398e-05, Loss: 574.7498779296875
2024-08-03T12:41:28.710340338Z 
 24%|██▎       | 2250/9500 [7:43:58<24:47:47, 12.31s/it]08/03/2024 05:41:28 - INFO - __main__ -   Step: 2250, LR: 1.57348138250067e-05, Loss: 714.3379516601562
2024-08-03T12:41:40.825006319Z 
 24%|██▎       | 2251/9500 [7:44:10<24:40:24, 12.25s/it]08/03/2024 05:41:40 - INFO - __main__ -   Step: 2251, LR: 1.573264328131942e-05, Loss: 613.4052734375
2024-08-03T12:41:53.093400477Z 
 24%|██▎       | 2252/9500 [7:44:23<24:40:44, 12.26s/it]08/03/2024 05:41:53 - INFO - __main__ -   Step: 2252, LR: 1.573047273763214e-05, Loss: 735.8221435546875
2024-08-03T12:42:05.715424018Z 
 24%|██▎       | 2253/9500 [7:44:35<24:53:44, 12.37s/it]08/03/2024 05:42:05 - INFO - __main__ -   Step: 2253, LR: 1.572830219394486e-05, Loss: 520.1171875
2024-08-03T12:42:17.764487499Z 
 24%|██▎       | 2254/9500 [7:44:47<24:42:00, 12.27s/it]08/03/2024 05:42:17 - INFO - __main__ -   Step: 2254, LR: 1.5726131650257585e-05, Loss: 741.6275024414062
2024-08-03T12:42:29.811623453Z 
 24%|██▎       | 2255/9500 [7:44:59<24:33:40, 12.20s/it]08/03/2024 05:42:29 - INFO - __main__ -   Step: 2255, LR: 1.5723961106570305e-05, Loss: 724.503173828125
2024-08-03T12:42:42.391553298Z 
 24%|██▎       | 2256/9500 [7:45:12<24:47:04, 12.32s/it]08/03/2024 05:42:42 - INFO - __main__ -   Step: 2256, LR: 1.5721790562883024e-05, Loss: 732.3842163085938
2024-08-03T12:42:54.503954314Z 
 24%|██▍       | 2257/9500 [7:45:24<24:39:27, 12.26s/it]08/03/2024 05:42:54 - INFO - __main__ -   Step: 2257, LR: 1.5719620019195748e-05, Loss: 623.5751342773438
2024-08-03T12:43:06.692398861Z 
 24%|██▍       | 2258/9500 [7:45:36<24:36:49, 12.24s/it]08/03/2024 05:43:06 - INFO - __main__ -   Step: 2258, LR: 1.5717449475508468e-05, Loss: 664.6699829101562
2024-08-03T12:43:19.248132442Z 
 24%|██▍       | 2259/9500 [7:45:49<24:48:12, 12.33s/it]08/03/2024 05:43:19 - INFO - __main__ -   Step: 2259, LR: 1.571527893182119e-05, Loss: 658.6834716796875
2024-08-03T12:43:31.575299148Z 
 24%|██▍       | 2260/9500 [7:46:01<24:47:50, 12.33s/it]08/03/2024 05:43:31 - INFO - __main__ -   Step: 2260, LR: 1.571310838813391e-05, Loss: 553.5386962890625
2024-08-03T12:43:43.671872354Z 
 24%|██▍       | 2261/9500 [7:46:13<24:39:11, 12.26s/it]08/03/2024 05:43:43 - INFO - __main__ -   Step: 2261, LR: 1.571093784444663e-05, Loss: 741.3423461914062
2024-08-03T12:43:56.184567610Z 
 24%|██▍       | 2262/9500 [7:46:26<24:48:07, 12.34s/it]08/03/2024 05:43:56 - INFO - __main__ -   Step: 2262, LR: 1.5708767300759354e-05, Loss: 669.62646484375
2024-08-03T12:44:08.359099041Z 
 24%|██▍       | 2263/9500 [7:46:38<24:42:04, 12.29s/it]08/03/2024 05:44:08 - INFO - __main__ -   Step: 2263, LR: 1.5706596757072074e-05, Loss: 755.4334716796875
2024-08-03T12:44:21.062338801Z 
 24%|██▍       | 2264/9500 [7:46:50<24:56:54, 12.41s/it]08/03/2024 05:44:21 - INFO - __main__ -   Step: 2264, LR: 1.5704426213384794e-05, Loss: 755.509033203125
2024-08-03T12:44:33.607072159Z 
 24%|██▍       | 2265/9500 [7:47:03<25:01:30, 12.45s/it]08/03/2024 05:44:33 - INFO - __main__ -   Step: 2265, LR: 1.5702255669697513e-05, Loss: 675.2333984375
2024-08-03T12:44:45.605828846Z 
 24%|██▍       | 2266/9500 [7:47:15<24:44:53, 12.32s/it]08/03/2024 05:44:45 - INFO - __main__ -   Step: 2266, LR: 1.5700085126010237e-05, Loss: 573.7127685546875
2024-08-03T12:44:57.757336645Z 
 24%|██▍       | 2267/9500 [7:47:27<24:38:45, 12.27s/it]08/03/2024 05:44:57 - INFO - __main__ -   Step: 2267, LR: 1.5697914582322957e-05, Loss: 564.0013427734375
2024-08-03T12:45:10.365399222Z 
 24%|██▍       | 2268/9500 [7:47:40<24:50:52, 12.37s/it]08/03/2024 05:45:10 - INFO - __main__ -   Step: 2268, LR: 1.569574403863568e-05, Loss: 641.9525146484375
2024-08-03T12:45:22.515739067Z 
 24%|██▍       | 2269/9500 [7:47:52<24:42:46, 12.30s/it]08/03/2024 05:45:22 - INFO - __main__ -   Step: 2269, LR: 1.56935734949484e-05, Loss: 700.9931030273438
2024-08-03T12:45:34.532961409Z 
 24%|██▍       | 2270/9500 [7:48:04<24:32:12, 12.22s/it]08/03/2024 05:45:34 - INFO - __main__ -   Step: 2270, LR: 1.569140295126112e-05, Loss: 563.0003662109375
2024-08-03T12:45:47.114112735Z 
 24%|██▍       | 2271/9500 [7:48:17<24:45:09, 12.33s/it]08/03/2024 05:45:47 - INFO - __main__ -   Step: 2271, LR: 1.5689232407573843e-05, Loss: 643.4840087890625
2024-08-03T12:45:59.333125100Z 
 24%|██▍       | 2272/9500 [7:48:29<24:41:03, 12.29s/it]08/03/2024 05:45:59 - INFO - __main__ -   Step: 2272, LR: 1.5687061863886563e-05, Loss: 606.0635986328125
2024-08-03T12:46:12.047174320Z 
 24%|██▍       | 2273/9500 [7:48:41<24:56:01, 12.42s/it]08/03/2024 05:46:12 - INFO - __main__ -   Step: 2273, LR: 1.5684891320199286e-05, Loss: 746.709716796875
2024-08-03T12:46:24.445189235Z 
 24%|██▍       | 2274/9500 [7:48:54<24:55:00, 12.41s/it]08/03/2024 05:46:24 - INFO - __main__ -   Step: 2274, LR: 1.5682720776512006e-05, Loss: 562.5108032226562
2024-08-03T12:46:37.070488441Z 
 24%|██▍       | 2275/9500 [7:49:07<25:02:26, 12.48s/it]08/03/2024 05:46:37 - INFO - __main__ -   Step: 2275, LR: 1.5680550232824726e-05, Loss: 771.3138427734375
2024-08-03T12:46:49.361171970Z 
 24%|██▍       | 2276/9500 [7:49:19<24:55:30, 12.42s/it]08/03/2024 05:46:49 - INFO - __main__ -   Step: 2276, LR: 1.567837968913745e-05, Loss: 820.5234375
2024-08-03T12:47:01.930395787Z 
 24%|██▍       | 2277/9500 [7:49:31<25:00:39, 12.47s/it]08/03/2024 05:47:01 - INFO - __main__ -   Step: 2277, LR: 1.567620914545017e-05, Loss: 646.9158325195312
2024-08-03T12:47:14.432317477Z 
 24%|██▍       | 2278/9500 [7:49:44<25:01:44, 12.48s/it]08/03/2024 05:47:14 - INFO - __main__ -   Step: 2278, LR: 1.567403860176289e-05, Loss: 630.7894287109375
2024-08-03T12:47:26.890618505Z 
 24%|██▍       | 2279/9500 [7:49:56<25:00:54, 12.47s/it]08/03/2024 05:47:26 - INFO - __main__ -   Step: 2279, LR: 1.567186805807561e-05, Loss: 682.3095092773438
2024-08-03T12:47:38.756430261Z 
 24%|██▍       | 2280/9500 [7:50:08<24:38:50, 12.29s/it]08/03/2024 05:47:38 - INFO - __main__ -   Step: 2280, LR: 1.5669697514388332e-05, Loss: 506.5931396484375
2024-08-03T12:47:51.073191794Z 
 24%|██▍       | 2281/9500 [7:50:21<24:39:36, 12.30s/it]08/03/2024 05:47:51 - INFO - __main__ -   Step: 2281, LR: 1.566752697070105e-05, Loss: 710.464111328125
2024-08-03T12:48:03.227203835Z 
 24%|██▍       | 2282/9500 [7:50:33<24:34:13, 12.25s/it]08/03/2024 05:48:03 - INFO - __main__ -   Step: 2282, LR: 1.5665356427013775e-05, Loss: 824.426025390625
2024-08-03T12:48:15.379526172Z 
 24%|██▍       | 2283/9500 [7:50:45<24:30:19, 12.22s/it]08/03/2024 05:48:15 - INFO - __main__ -   Step: 2283, LR: 1.5663185883326495e-05, Loss: 585.46337890625
2024-08-03T12:48:28.342171710Z 
 24%|██▍       | 2284/9500 [7:50:58<24:56:46, 12.45s/it]08/03/2024 05:48:28 - INFO - __main__ -   Step: 2284, LR: 1.5661015339639215e-05, Loss: 681.7277221679688
2024-08-03T12:48:40.640462453Z 
 24%|██▍       | 2285/9500 [7:51:10<24:51:15, 12.40s/it]08/03/2024 05:48:40 - INFO - __main__ -   Step: 2285, LR: 1.5658844795951938e-05, Loss: 667.6763305664062
2024-08-03T12:48:52.861862072Z 
 24%|██▍       | 2286/9500 [7:51:22<24:44:33, 12.35s/it]08/03/2024 05:48:52 - INFO - __main__ -   Step: 2286, LR: 1.5656674252264658e-05, Loss: 634.4527587890625
2024-08-03T12:49:05.730464808Z 
 24%|██▍       | 2287/9500 [7:51:35<25:03:09, 12.50s/it]08/03/2024 05:49:05 - INFO - __main__ -   Step: 2287, LR: 1.565450370857738e-05, Loss: 795.4267578125
2024-08-03T12:49:18.250241763Z 
 24%|██▍       | 2288/9500 [7:51:48<25:03:31, 12.51s/it]08/03/2024 05:49:18 - INFO - __main__ -   Step: 2288, LR: 1.56523331648901e-05, Loss: 809.98974609375
2024-08-03T12:49:30.710462337Z 
 24%|██▍       | 2289/9500 [7:52:00<25:01:35, 12.49s/it]08/03/2024 05:49:30 - INFO - __main__ -   Step: 2289, LR: 1.565016262120282e-05, Loss: 677.0950927734375
2024-08-03T12:49:43.401664194Z 
 24%|██▍       | 2290/9500 [7:52:13<25:08:28, 12.55s/it]08/03/2024 05:49:43 - INFO - __main__ -   Step: 2290, LR: 1.5647992077515544e-05, Loss: 668.43408203125
2024-08-03T12:49:55.781110316Z 
 24%|██▍       | 2291/9500 [7:52:25<25:02:00, 12.50s/it]08/03/2024 05:49:55 - INFO - __main__ -   Step: 2291, LR: 1.5645821533828264e-05, Loss: 675.795166015625
2024-08-03T12:50:08.033581087Z 
 24%|██▍       | 2292/9500 [7:52:37<24:52:50, 12.43s/it]08/03/2024 05:50:08 - INFO - __main__ -   Step: 2292, LR: 1.5643650990140984e-05, Loss: 678.8717041015625
2024-08-03T12:50:20.444314282Z 
 24%|██▍       | 2293/9500 [7:52:50<24:52:03, 12.42s/it]08/03/2024 05:50:20 - INFO - __main__ -   Step: 2293, LR: 1.5641480446453704e-05, Loss: 801.4462890625
2024-08-03T12:50:32.838156245Z 
 24%|██▍       | 2294/9500 [7:53:02<24:50:50, 12.41s/it]08/03/2024 05:50:32 - INFO - __main__ -   Step: 2294, LR: 1.5639309902766427e-05, Loss: 710.4166259765625
2024-08-03T12:50:45.311954754Z 
 24%|██▍       | 2295/9500 [7:53:15<24:52:47, 12.43s/it]08/03/2024 05:50:45 - INFO - __main__ -   Step: 2295, LR: 1.5637139359079147e-05, Loss: 727.5465087890625
2024-08-03T12:50:57.623269145Z 
 24%|██▍       | 2296/9500 [7:53:27<24:48:16, 12.40s/it]08/03/2024 05:50:57 - INFO - __main__ -   Step: 2296, LR: 1.563496881539187e-05, Loss: 566.9078369140625
2024-08-03T12:51:09.899859702Z 
 24%|██▍       | 2297/9500 [7:53:39<24:43:47, 12.36s/it]08/03/2024 05:51:09 - INFO - __main__ -   Step: 2297, LR: 1.563279827170459e-05, Loss: 640.4542236328125
2024-08-03T12:51:21.958172321Z 
 24%|██▍       | 2298/9500 [7:53:51<24:32:44, 12.27s/it]08/03/2024 05:51:21 - INFO - __main__ -   Step: 2298, LR: 1.563062772801731e-05, Loss: 591.5409545898438
2024-08-03T12:51:34.748920652Z 
 24%|██▍       | 2299/9500 [7:54:04<24:51:18, 12.43s/it]08/03/2024 05:51:34 - INFO - __main__ -   Step: 2299, LR: 1.5628457184330033e-05, Loss: 560.8870239257812
2024-08-03T12:51:46.970482840Z 
 24%|██▍       | 2300/9500 [7:54:16<24:43:44, 12.36s/it]08/03/2024 05:51:46 - INFO - __main__ -   Step: 2300, LR: 1.5626286640642753e-05, Loss: 622.7313232421875
2024-08-03T12:51:59.520204836Z 
 24%|██▍       | 2301/9500 [7:54:29<24:50:12, 12.42s/it]08/03/2024 05:51:59 - INFO - __main__ -   Step: 2301, LR: 1.5624116096955476e-05, Loss: 679.5137329101562
2024-08-03T12:52:12.073146786Z 
 24%|██▍       | 2302/9500 [7:54:42<24:54:46, 12.46s/it]08/03/2024 05:52:12 - INFO - __main__ -   Step: 2302, LR: 1.5621945553268196e-05, Loss: 566.021728515625
2024-08-03T12:52:24.236746715Z 
 24%|██▍       | 2303/9500 [7:54:54<24:43:54, 12.37s/it]08/03/2024 05:52:24 - INFO - __main__ -   Step: 2303, LR: 1.561977500958092e-05, Loss: 568.3541259765625
2024-08-03T12:52:36.286427413Z 
 24%|██▍       | 2304/9500 [7:55:06<24:32:07, 12.27s/it]08/03/2024 05:52:36 - INFO - __main__ -   Step: 2304, LR: 1.561760446589364e-05, Loss: 573.191162109375
2024-08-03T12:52:49.300388842Z 
 24%|██▍       | 2305/9500 [7:55:19<24:58:31, 12.50s/it]08/03/2024 05:52:49 - INFO - __main__ -   Step: 2305, LR: 1.561543392220636e-05, Loss: 766.5396118164062
2024-08-03T12:53:01.724309935Z 
 24%|██▍       | 2306/9500 [7:55:31<24:55:42, 12.47s/it]08/03/2024 05:53:01 - INFO - __main__ -   Step: 2306, LR: 1.561326337851908e-05, Loss: 640.5958251953125
2024-08-03T12:53:14.025237317Z 
 24%|██▍       | 2307/9500 [7:55:43<24:49:16, 12.42s/it]08/03/2024 05:53:14 - INFO - __main__ -   Step: 2307, LR: 1.56110928348318e-05, Loss: 601.920654296875
2024-08-03T12:53:26.745798494Z 
 24%|██▍       | 2308/9500 [7:55:56<24:59:45, 12.51s/it]08/03/2024 05:53:26 - INFO - __main__ -   Step: 2308, LR: 1.5608922291144522e-05, Loss: 735.0938720703125
2024-08-03T12:53:38.783281935Z 
 24%|██▍       | 2309/9500 [7:56:08<24:42:30, 12.37s/it]08/03/2024 05:53:38 - INFO - __main__ -   Step: 2309, LR: 1.5606751747457242e-05, Loss: 531.1507568359375
2024-08-03T12:53:51.577520995Z 
 24%|██▍       | 2310/9500 [7:56:21<24:57:31, 12.50s/it]08/03/2024 05:53:51 - INFO - __main__ -   Step: 2310, LR: 1.5604581203769965e-05, Loss: 684.6904296875
2024-08-03T12:54:04.328508900Z 
 24%|██▍       | 2311/9500 [7:56:34<25:06:26, 12.57s/it]08/03/2024 05:54:04 - INFO - __main__ -   Step: 2311, LR: 1.5602410660082685e-05, Loss: 514.4815673828125
2024-08-03T12:54:16.483051602Z 
 24%|██▍       | 2312/9500 [7:56:46<24:51:13, 12.45s/it]08/03/2024 05:54:16 - INFO - __main__ -   Step: 2312, LR: 1.5600240116395408e-05, Loss: 563.225830078125
2024-08-03T12:54:28.759846775Z 
 24%|██▍       | 2313/9500 [7:56:58<24:44:53, 12.40s/it]08/03/2024 05:54:28 - INFO - __main__ -   Step: 2313, LR: 1.5598069572708128e-05, Loss: 579.655517578125
2024-08-03T12:54:41.104087538Z 
 24%|██▍       | 2314/9500 [7:57:11<24:42:49, 12.38s/it]08/03/2024 05:54:41 - INFO - __main__ -   Step: 2314, LR: 1.5595899029020848e-05, Loss: 665.6444702148438
2024-08-03T12:54:53.184420602Z 
 24%|██▍       | 2315/9500 [7:57:23<24:31:47, 12.29s/it]08/03/2024 05:54:53 - INFO - __main__ -   Step: 2315, LR: 1.559372848533357e-05, Loss: 755.64453125
2024-08-03T12:55:05.324249667Z 
 24%|██▍       | 2316/9500 [7:57:35<24:26:10, 12.25s/it]08/03/2024 05:55:05 - INFO - __main__ -   Step: 2316, LR: 1.559155794164629e-05, Loss: 646.3802490234375
2024-08-03T12:55:17.920190359Z 
 24%|██▍       | 2317/9500 [7:57:47<24:38:34, 12.35s/it]08/03/2024 05:55:17 - INFO - __main__ -   Step: 2317, LR: 1.5589387397959014e-05, Loss: 667.924560546875
2024-08-03T12:55:30.893199093Z 
 24%|██▍       | 2318/9500 [7:58:00<25:00:43, 12.54s/it]08/03/2024 05:55:30 - INFO - __main__ -   Step: 2318, LR: 1.5587216854271734e-05, Loss: 642.5641479492188
2024-08-03T12:55:42.919995751Z 
 24%|██▍       | 2319/9500 [7:58:12<24:42:10, 12.38s/it]08/03/2024 05:55:42 - INFO - __main__ -   Step: 2319, LR: 1.5585046310584454e-05, Loss: 506.32171630859375
2024-08-03T12:55:55.930893821Z 
 24%|██▍       | 2320/9500 [7:58:25<25:04:28, 12.57s/it]08/03/2024 05:55:55 - INFO - __main__ -   Step: 2320, LR: 1.5582875766897174e-05, Loss: 747.554443359375
2024-08-03T12:56:08.529479068Z 
 24%|██▍       | 2321/9500 [7:58:38<25:05:12, 12.58s/it]08/03/2024 05:56:08 - INFO - __main__ -   Step: 2321, LR: 1.5580705223209897e-05, Loss: 572.6895751953125
2024-08-03T12:56:20.816120093Z 
 24%|██▍       | 2322/9500 [7:58:50<24:54:28, 12.49s/it]08/03/2024 05:56:20 - INFO - __main__ -   Step: 2322, LR: 1.5578534679522617e-05, Loss: 694.4609375
2024-08-03T12:56:32.807533057Z 
 24%|██▍       | 2323/9500 [7:59:02<24:36:17, 12.34s/it]08/03/2024 05:56:32 - INFO - __main__ -   Step: 2323, LR: 1.5576364135835337e-05, Loss: 755.247314453125
2024-08-03T12:56:45.314462369Z 
 24%|██▍       | 2324/9500 [7:59:15<24:41:59, 12.39s/it]08/03/2024 05:56:45 - INFO - __main__ -   Step: 2324, LR: 1.557419359214806e-05, Loss: 699.8831787109375
2024-08-03T12:56:57.357439213Z 
 24%|██▍       | 2325/9500 [7:59:27<24:29:18, 12.29s/it]08/03/2024 05:56:57 - INFO - __main__ -   Step: 2325, LR: 1.557202304846078e-05, Loss: 623.681640625
2024-08-03T12:57:09.614648108Z 
 24%|██▍       | 2326/9500 [7:59:39<24:28:02, 12.28s/it]08/03/2024 05:57:09 - INFO - __main__ -   Step: 2326, LR: 1.5569852504773503e-05, Loss: 624.0921630859375
2024-08-03T12:57:22.154618471Z 
 24%|██▍       | 2327/9500 [7:59:52<24:37:14, 12.36s/it]08/03/2024 05:57:22 - INFO - __main__ -   Step: 2327, LR: 1.5567681961086223e-05, Loss: 625.3853759765625
2024-08-03T12:57:34.194157338Z 
 25%|██▍       | 2328/9500 [8:00:04<24:25:39, 12.26s/it]08/03/2024 05:57:34 - INFO - __main__ -   Step: 2328, LR: 1.5565511417398943e-05, Loss: 673.0870361328125
2024-08-03T12:57:46.419627819Z 
 25%|██▍       | 2329/9500 [8:00:16<24:24:09, 12.25s/it]08/03/2024 05:57:46 - INFO - __main__ -   Step: 2329, LR: 1.5563340873711666e-05, Loss: 589.297607421875
2024-08-03T12:57:58.867311529Z 
 25%|██▍       | 2330/9500 [8:00:28<24:31:00, 12.31s/it]08/03/2024 05:57:58 - INFO - __main__ -   Step: 2330, LR: 1.5561170330024386e-05, Loss: 638.7235107421875
2024-08-03T12:58:10.943335893Z 
 25%|██▍       | 2331/9500 [8:00:40<24:22:25, 12.24s/it]08/03/2024 05:58:10 - INFO - __main__ -   Step: 2331, LR: 1.555899978633711e-05, Loss: 622.075927734375
2024-08-03T12:58:23.523599499Z 
 25%|██▍       | 2332/9500 [8:00:53<24:34:26, 12.34s/it]08/03/2024 05:58:23 - INFO - __main__ -   Step: 2332, LR: 1.555682924264983e-05, Loss: 716.0874633789062
2024-08-03T12:58:36.025212890Z 
 25%|██▍       | 2333/9500 [8:01:05<24:39:57, 12.39s/it]08/03/2024 05:58:36 - INFO - __main__ -   Step: 2333, LR: 1.555465869896255e-05, Loss: 668.2811279296875
2024-08-03T12:58:48.279519226Z 
 25%|██▍       | 2334/9500 [8:01:18<24:34:53, 12.35s/it]08/03/2024 05:58:48 - INFO - __main__ -   Step: 2334, LR: 1.555248815527527e-05, Loss: 589.2998657226562
2024-08-03T12:59:00.613428097Z 
 25%|██▍       | 2335/9500 [8:01:30<24:34:08, 12.34s/it]08/03/2024 05:59:00 - INFO - __main__ -   Step: 2335, LR: 1.5550317611587992e-05, Loss: 713.9151611328125
2024-08-03T12:59:12.910100050Z 
 25%|██▍       | 2336/9500 [8:01:42<24:32:13, 12.33s/it]08/03/2024 05:59:12 - INFO - __main__ -   Step: 2336, LR: 1.5548147067900712e-05, Loss: 537.227294921875
2024-08-03T12:59:24.991236311Z 
 25%|██▍       | 2337/9500 [8:01:54<24:23:05, 12.26s/it]08/03/2024 05:59:24 - INFO - __main__ -   Step: 2337, LR: 1.5545976524213432e-05, Loss: 533.856689453125
2024-08-03T12:59:37.246156916Z 
 25%|██▍       | 2338/9500 [8:02:07<24:22:52, 12.26s/it]08/03/2024 05:59:37 - INFO - __main__ -   Step: 2338, LR: 1.5543805980526155e-05, Loss: 522.1153564453125
2024-08-03T12:59:49.691369886Z 
 25%|██▍       | 2339/9500 [8:02:19<24:29:28, 12.31s/it]08/03/2024 05:59:49 - INFO - __main__ -   Step: 2339, LR: 1.5541635436838875e-05, Loss: 607.8052368164062
2024-08-03T13:00:02.008838946Z 
 25%|██▍       | 2340/9500 [8:02:31<24:29:27, 12.31s/it]08/03/2024 06:00:02 - INFO - __main__ -   Step: 2340, LR: 1.5539464893151598e-05, Loss: 535.544921875
2024-08-03T13:00:14.285343871Z 
 25%|██▍       | 2341/9500 [8:02:44<24:27:55, 12.30s/it]08/03/2024 06:00:14 - INFO - __main__ -   Step: 2341, LR: 1.5537294349464318e-05, Loss: 632.9140625
2024-08-03T13:00:27.075940277Z 
 25%|██▍       | 2342/9500 [8:02:57<24:45:09, 12.45s/it]08/03/2024 06:00:27 - INFO - __main__ -   Step: 2342, LR: 1.5535123805777038e-05, Loss: 612.4263305664062
2024-08-03T13:00:39.409654940Z 
 25%|██▍       | 2343/9500 [8:03:09<24:40:49, 12.41s/it]08/03/2024 06:00:39 - INFO - __main__ -   Step: 2343, LR: 1.553295326208976e-05, Loss: 871.4132690429688
2024-08-03T13:00:51.515893488Z 
 25%|██▍       | 2344/9500 [8:03:21<24:29:36, 12.32s/it]08/03/2024 06:00:51 - INFO - __main__ -   Step: 2344, LR: 1.553078271840248e-05, Loss: 651.0821533203125
2024-08-03T13:01:04.252227615Z 
 25%|██▍       | 2345/9500 [8:03:34<24:44:13, 12.45s/it]08/03/2024 06:01:04 - INFO - __main__ -   Step: 2345, LR: 1.5528612174715204e-05, Loss: 809.8539428710938
2024-08-03T13:01:16.323137323Z 
 25%|██▍       | 2346/9500 [8:03:46<24:30:34, 12.33s/it]08/03/2024 06:01:16 - INFO - __main__ -   Step: 2346, LR: 1.5526441631027924e-05, Loss: 536.0520629882812
2024-08-03T13:01:28.466884160Z 
 25%|██▍       | 2347/9500 [8:03:58<24:23:35, 12.28s/it]08/03/2024 06:01:28 - INFO - __main__ -   Step: 2347, LR: 1.5524271087340644e-05, Loss: 668.40185546875
2024-08-03T13:01:41.093491864Z 
 25%|██▍       | 2348/9500 [8:04:11<24:35:53, 12.38s/it]08/03/2024 06:01:41 - INFO - __main__ -   Step: 2348, LR: 1.5522100543653364e-05, Loss: 592.1640625
2024-08-03T13:01:53.220725363Z 
 25%|██▍       | 2349/9500 [8:04:23<24:26:35, 12.31s/it]08/03/2024 06:01:53 - INFO - __main__ -   Step: 2349, LR: 1.5519929999966087e-05, Loss: 583.4049072265625
2024-08-03T13:02:05.506008246Z 
 25%|██▍       | 2350/9500 [8:04:35<24:25:40, 12.30s/it]08/03/2024 06:02:05 - INFO - __main__ -   Step: 2350, LR: 1.5517759456278807e-05, Loss: 788.4694213867188
2024-08-03T13:02:17.822916709Z 
 25%|██▍       | 2351/9500 [8:04:47<24:26:06, 12.30s/it]08/03/2024 06:02:17 - INFO - __main__ -   Step: 2351, LR: 1.5515588912591527e-05, Loss: 576.6475830078125
2024-08-03T13:02:29.923203410Z 
 25%|██▍       | 2352/9500 [8:04:59<24:18:34, 12.24s/it]08/03/2024 06:02:29 - INFO - __main__ -   Step: 2352, LR: 1.551341836890425e-05, Loss: 620.9992065429688
2024-08-03T13:02:41.981609974Z 
 25%|██▍       | 2353/9500 [8:05:11<24:11:45, 12.19s/it]08/03/2024 06:02:41 - INFO - __main__ -   Step: 2353, LR: 1.551124782521697e-05, Loss: 610.1679077148438
2024-08-03T13:02:54.739523886Z 
 25%|██▍       | 2354/9500 [8:05:24<24:31:56, 12.36s/it]08/03/2024 06:02:54 - INFO - __main__ -   Step: 2354, LR: 1.5509077281529693e-05, Loss: 565.2422485351562
2024-08-03T13:03:06.727053315Z 
 25%|██▍       | 2355/9500 [8:05:36<24:18:28, 12.25s/it]08/03/2024 06:03:06 - INFO - __main__ -   Step: 2355, LR: 1.5506906737842413e-05, Loss: 648.4100341796875
2024-08-03T13:03:18.926667230Z 
 25%|██▍       | 2356/9500 [8:05:48<24:16:33, 12.23s/it]08/03/2024 06:03:18 - INFO - __main__ -   Step: 2356, LR: 1.5504736194155133e-05, Loss: 484.0859069824219
2024-08-03T13:03:31.609978391Z 
 25%|██▍       | 2357/9500 [8:06:01<24:32:25, 12.37s/it]08/03/2024 06:03:31 - INFO - __main__ -   Step: 2357, LR: 1.5502565650467856e-05, Loss: 594.3461303710938
2024-08-03T13:03:43.798687078Z 
 25%|██▍       | 2358/9500 [8:06:13<24:25:49, 12.31s/it]08/03/2024 06:03:43 - INFO - __main__ -   Step: 2358, LR: 1.5500395106780576e-05, Loss: 628.7755126953125
2024-08-03T13:03:56.020671963Z 
 25%|██▍       | 2359/9500 [8:06:25<24:22:17, 12.29s/it]08/03/2024 06:03:56 - INFO - __main__ -   Step: 2359, LR: 1.54982245630933e-05, Loss: 508.696533203125
2024-08-03T13:04:08.348938788Z 
 25%|██▍       | 2360/9500 [8:06:38<24:23:35, 12.30s/it]08/03/2024 06:04:08 - INFO - __main__ -   Step: 2360, LR: 1.549605401940602e-05, Loss: 627.786376953125
2024-08-03T13:04:20.688851938Z 
 25%|██▍       | 2361/9500 [8:06:50<24:24:50, 12.31s/it]08/03/2024 06:04:20 - INFO - __main__ -   Step: 2361, LR: 1.549388347571874e-05, Loss: 753.0284423828125
2024-08-03T13:04:33.031601585Z 
 25%|██▍       | 2362/9500 [8:07:02<24:25:46, 12.32s/it]08/03/2024 06:04:33 - INFO - __main__ -   Step: 2362, LR: 1.549171293203146e-05, Loss: 724.92626953125
2024-08-03T13:04:45.085775720Z 
 25%|██▍       | 2363/9500 [8:07:15<24:16:02, 12.24s/it]08/03/2024 06:04:45 - INFO - __main__ -   Step: 2363, LR: 1.5489542388344182e-05, Loss: 558.3438720703125
2024-08-03T13:04:57.575153657Z 
 25%|██▍       | 2364/9500 [8:07:27<24:24:42, 12.32s/it]08/03/2024 06:04:57 - INFO - __main__ -   Step: 2364, LR: 1.5487371844656902e-05, Loss: 638.958251953125
2024-08-03T13:05:09.694159513Z 
 25%|██▍       | 2365/9500 [8:07:39<24:17:29, 12.26s/it]08/03/2024 06:05:09 - INFO - __main__ -   Step: 2365, LR: 1.5485201300969622e-05, Loss: 525.5969848632812
2024-08-03T13:05:21.787676614Z 
 25%|██▍       | 2366/9500 [8:07:51<24:11:29, 12.21s/it]08/03/2024 06:05:21 - INFO - __main__ -   Step: 2366, LR: 1.5483030757282345e-05, Loss: 663.8653564453125
2024-08-03T13:05:34.423105079Z 
 25%|██▍       | 2367/9500 [8:08:04<24:26:32, 12.34s/it]08/03/2024 06:05:34 - INFO - __main__ -   Step: 2367, LR: 1.5480860213595065e-05, Loss: 561.5457763671875
2024-08-03T13:05:46.573473458Z 
 25%|██▍       | 2368/9500 [8:08:16<24:19:42, 12.28s/it]08/03/2024 06:05:46 - INFO - __main__ -   Step: 2368, LR: 1.547868966990779e-05, Loss: 688.9605712890625
2024-08-03T13:05:58.689744354Z 
 25%|██▍       | 2369/9500 [8:08:28<24:13:39, 12.23s/it]08/03/2024 06:05:58 - INFO - __main__ -   Step: 2369, LR: 1.5476519126220508e-05, Loss: 622.690673828125
2024-08-03T13:06:11.255593369Z 
 25%|██▍       | 2370/9500 [8:08:41<24:25:23, 12.33s/it]08/03/2024 06:06:11 - INFO - __main__ -   Step: 2370, LR: 1.5474348582533228e-05, Loss: 564.7797241210938
2024-08-03T13:06:23.388544889Z 
 25%|██▍       | 2371/9500 [8:08:53<24:18:07, 12.27s/it]08/03/2024 06:06:23 - INFO - __main__ -   Step: 2371, LR: 1.547217803884595e-05, Loss: 562.9929809570312
2024-08-03T13:06:35.890321162Z 
 25%|██▍       | 2372/9500 [8:09:05<24:26:06, 12.34s/it]08/03/2024 06:06:35 - INFO - __main__ -   Step: 2372, LR: 1.547000749515867e-05, Loss: 715.4405517578125
2024-08-03T13:06:48.312818945Z 
 25%|██▍       | 2373/9500 [8:09:18<24:28:47, 12.37s/it]08/03/2024 06:06:48 - INFO - __main__ -   Step: 2373, LR: 1.5467836951471394e-05, Loss: 469.2309265136719
2024-08-03T13:07:00.524310638Z 
 25%|██▍       | 2374/9500 [8:09:30<24:23:06, 12.32s/it]08/03/2024 06:07:00 - INFO - __main__ -   Step: 2374, LR: 1.5465666407784114e-05, Loss: 683.3665161132812
2024-08-03T13:07:12.954670440Z 
 25%|██▌       | 2375/9500 [8:09:42<24:26:52, 12.35s/it]08/03/2024 06:07:12 - INFO - __main__ -   Step: 2375, LR: 1.5463495864096834e-05, Loss: 588.7008056640625
2024-08-03T13:07:25.534571144Z 
 25%|██▌       | 2376/9500 [8:09:55<24:34:45, 12.42s/it]08/03/2024 06:07:25 - INFO - __main__ -   Step: 2376, LR: 1.5461325320409554e-05, Loss: 562.518798828125
2024-08-03T13:07:37.734818198Z 
 25%|██▌       | 2377/9500 [8:10:07<24:26:41, 12.35s/it]08/03/2024 06:07:37 - INFO - __main__ -   Step: 2377, LR: 1.5459154776722277e-05, Loss: 519.0601806640625
2024-08-03T13:07:49.843654270Z 
 25%|██▌       | 2378/9500 [8:10:19<24:17:44, 12.28s/it]08/03/2024 06:07:49 - INFO - __main__ -   Step: 2378, LR: 1.5456984233034997e-05, Loss: 617.715576171875
2024-08-03T13:08:02.430449750Z 
 25%|██▌       | 2379/9500 [8:10:32<24:28:25, 12.37s/it]08/03/2024 06:08:02 - INFO - __main__ -   Step: 2379, LR: 1.5454813689347717e-05, Loss: 728.0825805664062
2024-08-03T13:08:14.792547754Z 
 25%|██▌       | 2380/9500 [8:10:44<24:27:49, 12.37s/it]08/03/2024 06:08:14 - INFO - __main__ -   Step: 2380, LR: 1.545264314566044e-05, Loss: 738.646240234375
2024-08-03T13:08:26.902246032Z 
 25%|██▌       | 2381/9500 [8:10:56<24:18:23, 12.29s/it]08/03/2024 06:08:26 - INFO - __main__ -   Step: 2381, LR: 1.545047260197316e-05, Loss: 720.9180908203125
2024-08-03T13:08:39.534996539Z 
 25%|██▌       | 2382/9500 [8:11:09<24:30:19, 12.39s/it]08/03/2024 06:08:39 - INFO - __main__ -   Step: 2382, LR: 1.5448302058285883e-05, Loss: 742.8497314453125
2024-08-03T13:08:51.938297554Z 
 25%|██▌       | 2383/9500 [8:11:21<24:30:27, 12.40s/it]08/03/2024 06:08:51 - INFO - __main__ -   Step: 2383, LR: 1.5446131514598603e-05, Loss: 563.1181640625
2024-08-03T13:09:04.832679503Z 
 25%|██▌       | 2384/9500 [8:11:34<24:47:57, 12.55s/it]08/03/2024 06:09:04 - INFO - __main__ -   Step: 2384, LR: 1.5443960970911323e-05, Loss: 664.282470703125
2024-08-03T13:09:17.290559371Z 
 25%|██▌       | 2385/9500 [8:11:47<24:44:36, 12.52s/it]08/03/2024 06:09:17 - INFO - __main__ -   Step: 2385, LR: 1.5441790427224046e-05, Loss: 528.2281494140625
2024-08-03T13:09:29.525451227Z 
 25%|██▌       | 2386/9500 [8:11:59<24:34:17, 12.43s/it]08/03/2024 06:09:29 - INFO - __main__ -   Step: 2386, LR: 1.5439619883536766e-05, Loss: 668.9539794921875
2024-08-03T13:09:41.589688728Z 
 25%|██▌       | 2387/9500 [8:12:11<24:20:55, 12.32s/it]08/03/2024 06:09:41 - INFO - __main__ -   Step: 2387, LR: 1.543744933984949e-05, Loss: 799.66796875
2024-08-03T13:09:54.166060452Z 
 25%|██▌       | 2388/9500 [8:12:24<24:29:42, 12.40s/it]08/03/2024 06:09:54 - INFO - __main__ -   Step: 2388, LR: 1.543527879616221e-05, Loss: 577.2763671875
2024-08-03T13:10:06.593939761Z 
 25%|██▌       | 2389/9500 [8:12:36<24:30:31, 12.41s/it]08/03/2024 06:10:06 - INFO - __main__ -   Step: 2389, LR: 1.543310825247493e-05, Loss: 614.3224487304688
2024-08-03T13:10:19.090806724Z 
 25%|██▌       | 2390/9500 [8:12:49<24:33:29, 12.43s/it]08/03/2024 06:10:19 - INFO - __main__ -   Step: 2390, LR: 1.543093770878765e-05, Loss: 635.3442993164062
2024-08-03T13:10:31.704459314Z 
 25%|██▌       | 2391/9500 [8:13:01<24:39:38, 12.49s/it]08/03/2024 06:10:31 - INFO - __main__ -   Step: 2391, LR: 1.5428767165100372e-05, Loss: 487.451416015625
2024-08-03T13:10:43.681345978Z 
 25%|██▌       | 2392/9500 [8:13:13<24:21:16, 12.33s/it]08/03/2024 06:10:43 - INFO - __main__ -   Step: 2392, LR: 1.5426596621413092e-05, Loss: 638.9114990234375
2024-08-03T13:10:55.813445153Z 
 25%|██▌       | 2393/9500 [8:13:25<24:13:51, 12.27s/it]08/03/2024 06:10:55 - INFO - __main__ -   Step: 2393, LR: 1.5424426077725812e-05, Loss: 559.0716552734375
2024-08-03T13:11:08.457606124Z 
 25%|██▌       | 2394/9500 [8:13:38<24:26:48, 12.39s/it]08/03/2024 06:11:08 - INFO - __main__ -   Step: 2394, LR: 1.5422255534038535e-05, Loss: 598.9420166015625
2024-08-03T13:11:20.753038227Z 
 25%|██▌       | 2395/9500 [8:13:50<24:23:24, 12.36s/it]08/03/2024 06:11:20 - INFO - __main__ -   Step: 2395, LR: 1.5420084990351255e-05, Loss: 764.2447509765625
2024-08-03T13:11:33.011127691Z 
 25%|██▌       | 2396/9500 [8:14:02<24:19:38, 12.33s/it]08/03/2024 06:11:33 - INFO - __main__ -   Step: 2396, LR: 1.541791444666398e-05, Loss: 751.9669189453125
2024-08-03T13:11:45.381723025Z 
 25%|██▌       | 2397/9500 [8:14:15<24:20:57, 12.34s/it]08/03/2024 06:11:45 - INFO - __main__ -   Step: 2397, LR: 1.5415743902976698e-05, Loss: 568.01611328125
2024-08-03T13:11:58.076880568Z 
 25%|██▌       | 2398/9500 [8:14:28<24:33:19, 12.45s/it]08/03/2024 06:11:58 - INFO - __main__ -   Step: 2398, LR: 1.541357335928942e-05, Loss: 693.3133544921875
2024-08-03T13:12:10.103161915Z 
 25%|██▌       | 2399/9500 [8:14:40<24:18:11, 12.32s/it]08/03/2024 06:12:10 - INFO - __main__ -   Step: 2399, LR: 1.541140281560214e-05, Loss: 626.01806640625
2024-08-03T13:12:22.738473860Z 
 25%|██▌       | 2400/9500 [8:14:52<24:29:07, 12.42s/it]08/03/2024 06:12:22 - INFO - __main__ -   Step: 2400, LR: 1.540923227191486e-05, Loss: 557.4130249023438
2024-08-03T13:12:34.688612043Z 
 25%|██▌       | 2401/9500 [8:15:04<24:12:25, 12.28s/it]08/03/2024 06:12:34 - INFO - __main__ -   Step: 2401, LR: 1.5407061728227585e-05, Loss: 713.2553100585938
2024-08-03T13:12:46.921583263Z 
 25%|██▌       | 2402/9500 [8:15:16<24:10:41, 12.26s/it]08/03/2024 06:12:46 - INFO - __main__ -   Step: 2402, LR: 1.5404891184540304e-05, Loss: 701.3657836914062
2024-08-03T13:12:59.532062543Z 
 25%|██▌       | 2403/9500 [8:15:29<24:22:50, 12.37s/it]08/03/2024 06:12:59 - INFO - __main__ -   Step: 2403, LR: 1.5402720640853024e-05, Loss: 594.8360595703125
2024-08-03T13:13:12.267646507Z 
 25%|██▌       | 2404/9500 [8:15:42<24:35:41, 12.48s/it]08/03/2024 06:13:12 - INFO - __main__ -   Step: 2404, LR: 1.5400550097165744e-05, Loss: 691.5489501953125
2024-08-03T13:13:24.476752200Z 
 25%|██▌       | 2405/9500 [8:15:54<24:25:57, 12.40s/it]08/03/2024 06:13:24 - INFO - __main__ -   Step: 2405, LR: 1.5398379553478467e-05, Loss: 578.7471313476562
2024-08-03T13:13:36.368582083Z 
 25%|██▌       | 2406/9500 [8:16:06<24:07:49, 12.25s/it]08/03/2024 06:13:36 - INFO - __main__ -   Step: 2406, LR: 1.5396209009791187e-05, Loss: 488.23895263671875
2024-08-03T13:13:49.028927078Z 
 25%|██▌       | 2407/9500 [8:16:18<24:22:20, 12.37s/it]08/03/2024 06:13:49 - INFO - __main__ -   Step: 2407, LR: 1.539403846610391e-05, Loss: 631.0968017578125
2024-08-03T13:14:01.298044112Z 
 25%|██▌       | 2408/9500 [8:16:31<24:18:32, 12.34s/it]08/03/2024 06:14:01 - INFO - __main__ -   Step: 2408, LR: 1.539186792241663e-05, Loss: 554.8994750976562
2024-08-03T13:14:13.480083431Z 
 25%|██▌       | 2409/9500 [8:16:43<24:12:44, 12.29s/it]08/03/2024 06:14:13 - INFO - __main__ -   Step: 2409, LR: 1.538969737872935e-05, Loss: 649.9456176757812
2024-08-03T13:14:26.010142952Z 
 25%|██▌       | 2410/9500 [8:16:55<24:20:58, 12.36s/it]08/03/2024 06:14:26 - INFO - __main__ -   Step: 2410, LR: 1.5387526835042073e-05, Loss: 604.6728515625
2024-08-03T13:14:37.878900287Z 
 25%|██▌       | 2411/9500 [8:17:07<24:03:13, 12.22s/it]08/03/2024 06:14:37 - INFO - __main__ -   Step: 2411, LR: 1.5385356291354793e-05, Loss: 481.18634033203125
2024-08-03T13:14:50.018044812Z 
 25%|██▌       | 2412/9500 [8:17:19<24:00:19, 12.19s/it]08/03/2024 06:14:50 - INFO - __main__ -   Step: 2412, LR: 1.5383185747667517e-05, Loss: 574.1163940429688
2024-08-03T13:15:02.430674594Z 
 25%|██▌       | 2413/9500 [8:17:32<24:07:56, 12.26s/it]08/03/2024 06:15:02 - INFO - __main__ -   Step: 2413, LR: 1.5381015203980236e-05, Loss: 509.9451904296875
2024-08-03T13:15:14.509405600Z 
 25%|██▌       | 2414/9500 [8:17:44<24:01:21, 12.20s/it]08/03/2024 06:15:14 - INFO - __main__ -   Step: 2414, LR: 1.5378844660292956e-05, Loss: 604.0821533203125
2024-08-03T13:15:26.545439848Z 
 25%|██▌       | 2415/9500 [8:17:56<23:55:11, 12.15s/it]08/03/2024 06:15:26 - INFO - __main__ -   Step: 2415, LR: 1.537667411660568e-05, Loss: 651.5733642578125
2024-08-03T13:15:38.824456540Z 
 25%|██▌       | 2416/9500 [8:18:08<23:59:24, 12.19s/it]08/03/2024 06:15:38 - INFO - __main__ -   Step: 2416, LR: 1.53745035729184e-05, Loss: 419.91680908203125
2024-08-03T13:15:50.938570138Z 
 25%|██▌       | 2417/9500 [8:18:20<23:56:28, 12.17s/it]08/03/2024 06:15:50 - INFO - __main__ -   Step: 2417, LR: 1.537233302923112e-05, Loss: 625.9248046875
2024-08-03T13:16:03.255113356Z 
 25%|██▌       | 2418/9500 [8:18:33<24:01:31, 12.21s/it]08/03/2024 06:16:03 - INFO - __main__ -   Step: 2418, LR: 1.537016248554384e-05, Loss: 579.0455932617188
2024-08-03T13:16:15.599320997Z 
 25%|██▌       | 2419/9500 [8:18:45<24:05:57, 12.25s/it]08/03/2024 06:16:15 - INFO - __main__ -   Step: 2419, LR: 1.5367991941856562e-05, Loss: 666.3994750976562
2024-08-03T13:16:27.797382584Z 
 25%|██▌       | 2420/9500 [8:18:57<24:03:50, 12.24s/it]08/03/2024 06:16:27 - INFO - __main__ -   Step: 2420, LR: 1.5365821398169282e-05, Loss: 539.13623046875
2024-08-03T13:16:40.082603413Z 
 25%|██▌       | 2421/9500 [8:19:10<24:05:20, 12.25s/it]08/03/2024 06:16:40 - INFO - __main__ -   Step: 2421, LR: 1.5363650854482006e-05, Loss: 724.1141357421875
2024-08-03T13:16:52.562683387Z 
 25%|██▌       | 2422/9500 [8:19:22<24:13:18, 12.32s/it]08/03/2024 06:16:52 - INFO - __main__ -   Step: 2422, LR: 1.5361480310794725e-05, Loss: 584.0263061523438
2024-08-03T13:17:04.720346116Z 
 26%|██▌       | 2423/9500 [8:19:34<24:07:22, 12.27s/it]08/03/2024 06:17:04 - INFO - __main__ -   Step: 2423, LR: 1.5359309767107445e-05, Loss: 697.764892578125
2024-08-03T13:17:17.197920271Z 
 26%|██▌       | 2424/9500 [8:19:47<24:14:25, 12.33s/it]08/03/2024 06:17:17 - INFO - __main__ -   Step: 2424, LR: 1.535713922342017e-05, Loss: 788.2230224609375
2024-08-03T13:17:29.869450330Z 
 26%|██▌       | 2425/9500 [8:19:59<24:26:15, 12.43s/it]08/03/2024 06:17:29 - INFO - __main__ -   Step: 2425, LR: 1.535496867973289e-05, Loss: 589.6064453125
2024-08-03T13:17:41.814586732Z 
 26%|██▌       | 2426/9500 [8:20:11<24:08:43, 12.29s/it]08/03/2024 06:17:41 - INFO - __main__ -   Step: 2426, LR: 1.535279813604561e-05, Loss: 548.714599609375
2024-08-03T13:17:53.892488420Z 
 26%|██▌       | 2427/9500 [8:20:23<24:01:07, 12.22s/it]08/03/2024 06:17:53 - INFO - __main__ -   Step: 2427, LR: 1.535062759235833e-05, Loss: 679.7640380859375
2024-08-03T13:18:06.482041583Z 
 26%|██▌       | 2428/9500 [8:20:36<24:13:47, 12.33s/it]08/03/2024 06:18:06 - INFO - __main__ -   Step: 2428, LR: 1.534845704867105e-05, Loss: 709.4940795898438
2024-08-03T13:18:18.535811486Z 
 26%|██▌       | 2429/9500 [8:20:48<24:03:40, 12.25s/it]08/03/2024 06:18:18 - INFO - __main__ -   Step: 2429, LR: 1.5346286504983775e-05, Loss: 633.9662475585938
2024-08-03T13:18:30.858650990Z 
 26%|██▌       | 2430/9500 [8:21:00<24:06:01, 12.27s/it]08/03/2024 06:18:30 - INFO - __main__ -   Step: 2430, LR: 1.5344115961296495e-05, Loss: 563.2109375
2024-08-03T13:18:43.374632590Z 
 26%|██▌       | 2431/9500 [8:21:13<24:14:28, 12.35s/it]08/03/2024 06:18:43 - INFO - __main__ -   Step: 2431, LR: 1.5341945417609214e-05, Loss: 682.1871948242188
2024-08-03T13:18:55.801344278Z 
 26%|██▌       | 2432/9500 [8:21:25<24:17:08, 12.37s/it]08/03/2024 06:18:55 - INFO - __main__ -   Step: 2432, LR: 1.5339774873921934e-05, Loss: 729.1530151367188
2024-08-03T13:19:08.001073767Z 
 26%|██▌       | 2433/9500 [8:21:37<24:10:55, 12.32s/it]08/03/2024 06:19:08 - INFO - __main__ -   Step: 2433, LR: 1.5337604330234658e-05, Loss: 734.4097290039062
2024-08-03T13:19:20.535104293Z 
 26%|██▌       | 2434/9500 [8:21:50<24:18:19, 12.38s/it]08/03/2024 06:19:20 - INFO - __main__ -   Step: 2434, LR: 1.5335433786547377e-05, Loss: 754.59716796875
2024-08-03T13:19:33.088144565Z 
 26%|██▌       | 2435/9500 [8:22:03<24:24:07, 12.43s/it]08/03/2024 06:19:33 - INFO - __main__ -   Step: 2435, LR: 1.53332632428601e-05, Loss: 699.18310546875
2024-08-03T13:19:45.205101611Z 
 26%|██▌       | 2436/9500 [8:22:15<24:12:42, 12.34s/it]08/03/2024 06:19:45 - INFO - __main__ -   Step: 2436, LR: 1.533109269917282e-05, Loss: 534.80615234375
2024-08-03T13:19:57.732372299Z 
 26%|██▌       | 2437/9500 [8:22:27<24:19:09, 12.40s/it]08/03/2024 06:19:57 - INFO - __main__ -   Step: 2437, LR: 1.532892215548554e-05, Loss: 608.77294921875
2024-08-03T13:20:10.027712598Z 
 26%|██▌       | 2438/9500 [8:22:39<24:15:24, 12.37s/it]08/03/2024 06:20:10 - INFO - __main__ -   Step: 2438, LR: 1.5326751611798264e-05, Loss: 784.1841430664062
2024-08-03T13:20:22.160637628Z 
 26%|██▌       | 2439/9500 [8:22:52<24:07:00, 12.30s/it]08/03/2024 06:20:22 - INFO - __main__ -   Step: 2439, LR: 1.5324581068110983e-05, Loss: 777.52099609375
2024-08-03T13:20:34.801352879Z 
 26%|██▌       | 2440/9500 [8:23:04<24:18:58, 12.40s/it]08/03/2024 06:20:34 - INFO - __main__ -   Step: 2440, LR: 1.5322410524423707e-05, Loss: 588.2205810546875
2024-08-03T13:20:46.861604465Z 
 26%|██▌       | 2441/9500 [8:23:16<24:06:48, 12.30s/it]08/03/2024 06:20:46 - INFO - __main__ -   Step: 2441, LR: 1.5320239980736427e-05, Loss: 557.161865234375
2024-08-03T13:20:59.116848619Z 
 26%|██▌       | 2442/9500 [8:23:29<24:05:06, 12.28s/it]08/03/2024 06:20:59 - INFO - __main__ -   Step: 2442, LR: 1.5318069437049146e-05, Loss: 656.39208984375
2024-08-03T13:21:11.534116456Z 
 26%|██▌       | 2443/9500 [8:23:41<24:09:33, 12.32s/it]08/03/2024 06:21:11 - INFO - __main__ -   Step: 2443, LR: 1.531589889336187e-05, Loss: 558.0692138671875
2024-08-03T13:21:23.492533930Z 
 26%|██▌       | 2444/9500 [8:23:53<23:56:27, 12.21s/it]08/03/2024 06:21:23 - INFO - __main__ -   Step: 2444, LR: 1.531372834967459e-05, Loss: 545.2886352539062
2024-08-03T13:21:35.433926829Z 
 26%|██▌       | 2445/9500 [8:24:05<23:46:36, 12.13s/it]08/03/2024 06:21:35 - INFO - __main__ -   Step: 2445, LR: 1.531155780598731e-05, Loss: 630.4739990234375
2024-08-03T13:21:47.759412963Z 
 26%|██▌       | 2446/9500 [8:24:17<23:53:12, 12.19s/it]08/03/2024 06:21:47 - INFO - __main__ -   Step: 2446, LR: 1.530938726230003e-05, Loss: 558.6063232421875
2024-08-03T13:22:00.439906492Z 
 26%|██▌       | 2447/9500 [8:24:30<24:10:17, 12.34s/it]08/03/2024 06:22:00 - INFO - __main__ -   Step: 2447, LR: 1.5307216718612753e-05, Loss: 707.0482177734375
2024-08-03T13:22:12.496483408Z 
 26%|██▌       | 2448/9500 [8:24:42<24:00:09, 12.25s/it]08/03/2024 06:22:12 - INFO - __main__ -   Step: 2448, LR: 1.5305046174925472e-05, Loss: 530.20654296875
2024-08-03T13:22:24.608641100Z 
 26%|██▌       | 2449/9500 [8:24:54<23:54:58, 12.21s/it]08/03/2024 06:22:24 - INFO - __main__ -   Step: 2449, LR: 1.5302875631238196e-05, Loss: 524.239013671875
2024-08-03T13:22:37.513233646Z 
 26%|██▌       | 2450/9500 [8:25:07<24:19:14, 12.42s/it]08/03/2024 06:22:37 - INFO - __main__ -   Step: 2450, LR: 1.5300705087550916e-05, Loss: 707.8402099609375
2024-08-03T13:22:49.725400540Z 
 26%|██▌       | 2451/9500 [8:25:19<24:11:43, 12.36s/it]08/03/2024 06:22:49 - INFO - __main__ -   Step: 2451, LR: 1.5298534543863635e-05, Loss: 714.0206298828125
2024-08-03T13:23:01.834863888Z 
 26%|██▌       | 2452/9500 [8:25:31<24:02:48, 12.28s/it]08/03/2024 06:23:01 - INFO - __main__ -   Step: 2452, LR: 1.529636400017636e-05, Loss: 534.1978759765625
2024-08-03T13:23:14.201854919Z 
 26%|██▌       | 2453/9500 [8:25:44<24:05:34, 12.31s/it]08/03/2024 06:23:14 - INFO - __main__ -   Step: 2453, LR: 1.529419345648908e-05, Loss: 589.650634765625
2024-08-03T13:23:26.402226029Z 
 26%|██▌       | 2454/9500 [8:25:56<24:01:34, 12.28s/it]08/03/2024 06:23:26 - INFO - __main__ -   Step: 2454, LR: 1.5292022912801802e-05, Loss: 550.0469970703125
2024-08-03T13:23:38.589678684Z 
 26%|██▌       | 2455/9500 [8:26:08<23:58:15, 12.25s/it]08/03/2024 06:23:38 - INFO - __main__ -   Step: 2455, LR: 1.528985236911452e-05, Loss: 479.7483825683594
2024-08-03T13:23:51.081863882Z 
 26%|██▌       | 2456/9500 [8:26:21<24:06:36, 12.32s/it]08/03/2024 06:23:51 - INFO - __main__ -   Step: 2456, LR: 1.528768182542724e-05, Loss: 561.3554077148438
2024-08-03T13:24:03.247515626Z 
 26%|██▌       | 2457/9500 [8:26:33<24:00:53, 12.28s/it]08/03/2024 06:24:03 - INFO - __main__ -   Step: 2457, LR: 1.5285511281739965e-05, Loss: 662.3762817382812
2024-08-03T13:24:15.491127664Z 
 26%|██▌       | 2458/9500 [8:26:45<23:59:36, 12.27s/it]08/03/2024 06:24:15 - INFO - __main__ -   Step: 2458, LR: 1.5283340738052685e-05, Loss: 856.87255859375
2024-08-03T13:24:28.290454848Z 
 26%|██▌       | 2459/9500 [8:26:58<24:18:10, 12.43s/it]08/03/2024 06:24:28 - INFO - __main__ -   Step: 2459, LR: 1.5281170194365405e-05, Loss: 622.1060791015625
2024-08-03T13:24:40.540744350Z 
 26%|██▌       | 2460/9500 [8:27:10<24:11:47, 12.37s/it]08/03/2024 06:24:40 - INFO - __main__ -   Step: 2460, LR: 1.5278999650678124e-05, Loss: 607.078125
2024-08-03T13:24:52.882437451Z 
 26%|██▌       | 2461/9500 [8:27:22<24:10:28, 12.36s/it]08/03/2024 06:24:52 - INFO - __main__ -   Step: 2461, LR: 1.5276829106990848e-05, Loss: 710.9185791015625
2024-08-03T13:25:05.691486853Z 
 26%|██▌       | 2462/9500 [8:27:35<24:25:56, 12.50s/it]08/03/2024 06:25:05 - INFO - __main__ -   Step: 2462, LR: 1.5274658563303567e-05, Loss: 692.4127807617188
2024-08-03T13:25:17.533660434Z 
 26%|██▌       | 2463/9500 [8:27:47<24:02:40, 12.30s/it]08/03/2024 06:25:17 - INFO - __main__ -   Step: 2463, LR: 1.527248801961629e-05, Loss: 642.33740234375
2024-08-03T13:25:29.730477818Z 
 26%|██▌       | 2464/9500 [8:27:59<23:58:48, 12.27s/it]08/03/2024 06:25:29 - INFO - __main__ -   Step: 2464, LR: 1.527031747592901e-05, Loss: 585.8340454101562
2024-08-03T13:25:42.253145023Z 
 26%|██▌       | 2465/9500 [8:28:12<24:07:30, 12.35s/it]08/03/2024 06:25:42 - INFO - __main__ -   Step: 2465, LR: 1.526814693224173e-05, Loss: 704.1722412109375
2024-08-03T13:25:54.255202334Z 
 26%|██▌       | 2466/9500 [8:28:24<23:55:13, 12.24s/it]08/03/2024 06:25:54 - INFO - __main__ -   Step: 2466, LR: 1.5265976388554454e-05, Loss: 669.539794921875
2024-08-03T13:26:06.411932405Z 
 26%|██▌       | 2467/9500 [8:28:36<23:52:00, 12.22s/it]08/03/2024 06:26:06 - INFO - __main__ -   Step: 2467, LR: 1.5263805844867174e-05, Loss: 628.345458984375
2024-08-03T13:26:18.733820154Z 
 26%|██▌       | 2468/9500 [8:28:48<23:55:30, 12.25s/it]08/03/2024 06:26:18 - INFO - __main__ -   Step: 2468, LR: 1.5261635301179897e-05, Loss: 567.4418334960938
2024-08-03T13:26:30.973279621Z 
 26%|██▌       | 2469/9500 [8:29:00<23:54:58, 12.25s/it]08/03/2024 06:26:30 - INFO - __main__ -   Step: 2469, LR: 1.5259464757492617e-05, Loss: 624.5623779296875
2024-08-03T13:26:43.404633956Z 
 26%|██▌       | 2470/9500 [8:29:13<24:01:18, 12.30s/it]08/03/2024 06:26:43 - INFO - __main__ -   Step: 2470, LR: 1.5257294213805338e-05, Loss: 665.8134765625
2024-08-03T13:26:55.635670077Z 
 26%|██▌       | 2471/9500 [8:29:25<23:58:38, 12.28s/it]08/03/2024 06:26:55 - INFO - __main__ -   Step: 2471, LR: 1.525512367011806e-05, Loss: 476.7911376953125
2024-08-03T13:27:07.306421540Z 
 26%|██▌       | 2472/9500 [8:29:37<23:37:00, 12.10s/it]08/03/2024 06:27:07 - INFO - __main__ -   Step: 2472, LR: 1.5252953126430781e-05, Loss: 456.997802734375
2024-08-03T13:27:19.315651542Z 
 26%|██▌       | 2473/9500 [8:29:49<23:33:42, 12.07s/it]08/03/2024 06:27:19 - INFO - __main__ -   Step: 2473, LR: 1.52507825827435e-05, Loss: 585.3231201171875
2024-08-03T13:27:31.800428799Z 
 26%|██▌       | 2474/9500 [8:30:01<23:48:02, 12.19s/it]08/03/2024 06:27:31 - INFO - __main__ -   Step: 2474, LR: 1.5248612039056221e-05, Loss: 588.45458984375
2024-08-03T13:27:44.012400530Z 
 26%|██▌       | 2475/9500 [8:30:13<23:48:26, 12.20s/it]08/03/2024 06:27:44 - INFO - __main__ -   Step: 2475, LR: 1.5246441495368943e-05, Loss: 652.471923828125
2024-08-03T13:27:56.087505588Z 
 26%|██▌       | 2476/9500 [8:30:26<23:43:50, 12.16s/it]08/03/2024 06:27:56 - INFO - __main__ -   Step: 2476, LR: 1.5244270951681663e-05, Loss: 586.4849853515625
2024-08-03T13:28:08.744600159Z 
 26%|██▌       | 2477/9500 [8:30:38<24:01:00, 12.31s/it]08/03/2024 06:28:08 - INFO - __main__ -   Step: 2477, LR: 1.5242100407994384e-05, Loss: 605.5387573242188
2024-08-03T13:28:21.071766986Z 
 26%|██▌       | 2478/9500 [8:30:51<24:01:21, 12.32s/it]08/03/2024 06:28:21 - INFO - __main__ -   Step: 2478, LR: 1.5239929864307106e-05, Loss: 684.1351318359375
2024-08-03T13:28:33.328647401Z 
 26%|██▌       | 2479/9500 [8:31:03<23:59:04, 12.30s/it]08/03/2024 06:28:33 - INFO - __main__ -   Step: 2479, LR: 1.5237759320619827e-05, Loss: 616.5628662109375
2024-08-03T13:28:45.883905880Z 
 26%|██▌       | 2480/9500 [8:31:15<24:07:54, 12.38s/it]08/03/2024 06:28:45 - INFO - __main__ -   Step: 2480, LR: 1.5235588776932549e-05, Loss: 773.8936767578125
2024-08-03T13:28:58.124883846Z 
 26%|██▌       | 2481/9500 [8:31:28<24:02:59, 12.34s/it]08/03/2024 06:28:58 - INFO - __main__ -   Step: 2481, LR: 1.523341823324527e-05, Loss: 596.9706420898438
2024-08-03T13:29:10.716720003Z 
 26%|██▌       | 2482/9500 [8:31:40<24:11:47, 12.41s/it]08/03/2024 06:29:10 - INFO - __main__ -   Step: 2482, LR: 1.523124768955799e-05, Loss: 847.9669189453125
2024-08-03T13:29:23.122818809Z 
 26%|██▌       | 2483/9500 [8:31:53<24:11:22, 12.41s/it]08/03/2024 06:29:23 - INFO - __main__ -   Step: 2483, LR: 1.5229077145870712e-05, Loss: 662.1539306640625
2024-08-03T13:29:35.347133188Z 
 26%|██▌       | 2484/9500 [8:32:05<24:04:39, 12.35s/it]08/03/2024 06:29:35 - INFO - __main__ -   Step: 2484, LR: 1.5226906602183433e-05, Loss: 576.52978515625
2024-08-03T13:29:47.381475852Z 
 26%|██▌       | 2485/9500 [8:32:17<23:53:13, 12.26s/it]08/03/2024 06:29:47 - INFO - __main__ -   Step: 2485, LR: 1.5224736058496155e-05, Loss: 409.7569580078125
2024-08-03T13:29:59.926372524Z 
 26%|██▌       | 2486/9500 [8:32:29<24:03:03, 12.34s/it]08/03/2024 06:29:59 - INFO - __main__ -   Step: 2486, LR: 1.5222565514808876e-05, Loss: 622.68017578125
2024-08-03T13:30:12.059029974Z 
 26%|██▌       | 2487/9500 [8:32:41<23:55:25, 12.28s/it]08/03/2024 06:30:12 - INFO - __main__ -   Step: 2487, LR: 1.5220394971121595e-05, Loss: 613.9921264648438
2024-08-03T13:30:24.473243023Z 
 26%|██▌       | 2488/9500 [8:32:54<23:59:53, 12.32s/it]08/03/2024 06:30:24 - INFO - __main__ -   Step: 2488, LR: 1.5218224427434316e-05, Loss: 607.2859497070312
2024-08-03T13:30:36.624433910Z 
 26%|██▌       | 2489/9500 [8:33:06<23:53:44, 12.27s/it]08/03/2024 06:30:36 - INFO - __main__ -   Step: 2489, LR: 1.5216053883747038e-05, Loss: 697.7830810546875
2024-08-03T13:30:49.259282292Z 
 26%|██▌       | 2490/9500 [8:33:19<24:06:20, 12.38s/it]08/03/2024 06:30:49 - INFO - __main__ -   Step: 2490, LR: 1.521388334005976e-05, Loss: 715.9534301757812
2024-08-03T13:31:01.624774004Z 
 26%|██▌       | 2491/9500 [8:33:31<24:05:38, 12.38s/it]08/03/2024 06:31:01 - INFO - __main__ -   Step: 2491, LR: 1.521171279637248e-05, Loss: 621.126953125
2024-08-03T13:31:13.885388060Z 
 26%|██▌       | 2492/9500 [8:33:43<24:01:24, 12.34s/it]08/03/2024 06:31:13 - INFO - __main__ -   Step: 2492, LR: 1.52095422526852e-05, Loss: 687.5803833007812
2024-08-03T13:31:26.315963922Z 
 26%|██▌       | 2493/9500 [8:33:56<24:04:20, 12.37s/it]08/03/2024 06:31:26 - INFO - __main__ -   Step: 2493, LR: 1.5207371708997922e-05, Loss: 536.0350341796875
2024-08-03T13:31:38.573256178Z 
 26%|██▋       | 2494/9500 [8:34:08<24:00:16, 12.33s/it]08/03/2024 06:31:38 - INFO - __main__ -   Step: 2494, LR: 1.5205201165310644e-05, Loss: 622.7791748046875
2024-08-03T13:31:50.773773971Z 
 26%|██▋       | 2495/9500 [8:34:20<23:55:22, 12.29s/it]08/03/2024 06:31:50 - INFO - __main__ -   Step: 2495, LR: 1.5203030621623365e-05, Loss: 669.8741455078125
2024-08-03T13:32:03.392853930Z 
 26%|██▋       | 2496/9500 [8:34:33<24:06:32, 12.39s/it]08/03/2024 06:32:03 - INFO - __main__ -   Step: 2496, LR: 1.5200860077936085e-05, Loss: 640.613525390625
2024-08-03T13:32:15.604411174Z 
 26%|██▋       | 2497/9500 [8:34:45<24:00:01, 12.34s/it]08/03/2024 06:32:15 - INFO - __main__ -   Step: 2497, LR: 1.5198689534248807e-05, Loss: 619.9395751953125
2024-08-03T13:32:27.955924767Z 
 26%|██▋       | 2498/9500 [8:34:57<24:00:17, 12.34s/it]08/03/2024 06:32:27 - INFO - __main__ -   Step: 2498, LR: 1.5196518990561528e-05, Loss: 646.5443115234375
2024-08-03T13:32:40.648461722Z 
 26%|██▋       | 2499/9500 [8:35:10<24:12:21, 12.45s/it]08/03/2024 06:32:40 - INFO - __main__ -   Step: 2499, LR: 1.519434844687425e-05, Loss: 788.1686401367188
2024-08-03T13:32:52.641824023Z 
 26%|██▋       | 2500/9500 [8:35:22<23:56:16, 12.31s/it]08/03/2024 06:32:52 - INFO - __main__ -   Step: 2500, LR: 1.5192177903186972e-05, Loss: 499.597900390625
2024-08-03T13:33:04.792694231Z 
 26%|██▋       | 2501/9500 [8:35:34<23:50:29, 12.26s/it]08/03/2024 06:33:04 - INFO - __main__ -   Step: 2501, LR: 1.519000735949969e-05, Loss: 761.21533203125
2024-08-03T13:33:17.350840075Z 
 26%|██▋       | 2502/9500 [8:35:47<24:00:35, 12.35s/it]08/03/2024 06:33:17 - INFO - __main__ -   Step: 2502, LR: 1.5187836815812411e-05, Loss: 729.5447998046875
2024-08-03T13:33:29.629881670Z 
 26%|██▋       | 2503/9500 [8:35:59<23:57:51, 12.33s/it]08/03/2024 06:33:29 - INFO - __main__ -   Step: 2503, LR: 1.5185666272125133e-05, Loss: 690.9343872070312
2024-08-03T13:33:41.705990313Z 
 26%|██▋       | 2504/9500 [8:36:11<23:48:46, 12.25s/it]08/03/2024 06:33:41 - INFO - __main__ -   Step: 2504, LR: 1.5183495728437854e-05, Loss: 592.49169921875
2024-08-03T13:33:54.493008042Z 
 26%|██▋       | 2505/9500 [8:36:24<24:07:14, 12.41s/it]08/03/2024 06:33:54 - INFO - __main__ -   Step: 2505, LR: 1.5181325184750574e-05, Loss: 769.044677734375
2024-08-03T13:34:06.652665759Z 
 26%|██▋       | 2506/9500 [8:36:36<23:58:08, 12.34s/it]08/03/2024 06:34:06 - INFO - __main__ -   Step: 2506, LR: 1.5179154641063296e-05, Loss: 604.520263671875
2024-08-03T13:34:18.749513694Z 
 26%|██▋       | 2507/9500 [8:36:48<23:49:31, 12.27s/it]08/03/2024 06:34:18 - INFO - __main__ -   Step: 2507, LR: 1.5176984097376017e-05, Loss: 615.7721557617188
2024-08-03T13:34:31.229705345Z 
 26%|██▋       | 2508/9500 [8:37:01<23:56:48, 12.33s/it]08/03/2024 06:34:31 - INFO - __main__ -   Step: 2508, LR: 1.5174813553688739e-05, Loss: 583.59619140625
2024-08-03T13:34:43.726275108Z 
 26%|██▋       | 2509/9500 [8:37:13<24:02:27, 12.38s/it]08/03/2024 06:34:43 - INFO - __main__ -   Step: 2509, LR: 1.517264301000146e-05, Loss: 768.6595458984375
2024-08-03T13:34:55.798414023Z 
 26%|██▋       | 2510/9500 [8:37:25<23:51:29, 12.29s/it]08/03/2024 06:34:55 - INFO - __main__ -   Step: 2510, LR: 1.517047246631418e-05, Loss: 709.8726806640625
2024-08-03T13:35:08.261048327Z 
 26%|██▋       | 2511/9500 [8:37:38<23:57:24, 12.34s/it]08/03/2024 06:35:08 - INFO - __main__ -   Step: 2511, LR: 1.5168301922626902e-05, Loss: 621.1900634765625
2024-08-03T13:35:20.477933759Z 
 26%|██▋       | 2512/9500 [8:37:50<23:52:53, 12.30s/it]08/03/2024 06:35:20 - INFO - __main__ -   Step: 2512, LR: 1.5166131378939623e-05, Loss: 748.7191772460938
2024-08-03T13:35:32.694186435Z 
 26%|██▋       | 2513/9500 [8:38:02<23:49:39, 12.28s/it]08/03/2024 06:35:32 - INFO - __main__ -   Step: 2513, LR: 1.5163960835252345e-05, Loss: 652.628173828125
2024-08-03T13:35:45.635967462Z 
 26%|██▋       | 2514/9500 [8:38:15<24:12:40, 12.48s/it]08/03/2024 06:35:45 - INFO - __main__ -   Step: 2514, LR: 1.5161790291565063e-05, Loss: 706.267822265625
2024-08-03T13:35:57.870614549Z 
 26%|██▋       | 2515/9500 [8:38:27<24:03:58, 12.40s/it]08/03/2024 06:35:57 - INFO - __main__ -   Step: 2515, LR: 1.5159619747877785e-05, Loss: 634.7564697265625
2024-08-03T13:36:09.871602531Z 
 26%|██▋       | 2516/9500 [8:38:39<23:49:45, 12.28s/it]08/03/2024 06:36:09 - INFO - __main__ -   Step: 2516, LR: 1.5157449204190506e-05, Loss: 439.164794921875
2024-08-03T13:36:22.400055212Z 
 26%|██▋       | 2517/9500 [8:38:52<23:58:07, 12.36s/it]08/03/2024 06:36:22 - INFO - __main__ -   Step: 2517, LR: 1.5155278660503228e-05, Loss: 543.3060913085938
2024-08-03T13:36:34.704095966Z 
 27%|██▋       | 2518/9500 [8:39:04<23:56:03, 12.34s/it]08/03/2024 06:36:34 - INFO - __main__ -   Step: 2518, LR: 1.515310811681595e-05, Loss: 695.6668090820312
2024-08-03T13:36:46.605703268Z 
 27%|██▋       | 2519/9500 [8:39:16<23:40:32, 12.21s/it]08/03/2024 06:36:46 - INFO - __main__ -   Step: 2519, LR: 1.515093757312867e-05, Loss: 510.8304443359375
2024-08-03T13:36:59.150452537Z 
 27%|██▋       | 2520/9500 [8:39:29<23:52:01, 12.31s/it]08/03/2024 06:36:59 - INFO - __main__ -   Step: 2520, LR: 1.514876702944139e-05, Loss: 589.3238525390625
2024-08-03T13:37:11.049856601Z 
 27%|██▋       | 2521/9500 [8:39:40<23:37:31, 12.19s/it]08/03/2024 06:37:11 - INFO - __main__ -   Step: 2521, LR: 1.5146596485754112e-05, Loss: 634.187255859375
2024-08-03T13:37:23.224527495Z 
 27%|██▋       | 2522/9500 [8:39:53<23:36:53, 12.18s/it]08/03/2024 06:37:23 - INFO - __main__ -   Step: 2522, LR: 1.5144425942066834e-05, Loss: 766.107421875
2024-08-03T13:37:35.607766251Z 
 27%|██▋       | 2523/9500 [8:40:05<23:43:39, 12.24s/it]08/03/2024 06:37:35 - INFO - __main__ -   Step: 2523, LR: 1.5142255398379556e-05, Loss: 550.001220703125
2024-08-03T13:37:47.715890390Z 
 27%|██▋       | 2524/9500 [8:40:17<23:38:45, 12.20s/it]08/03/2024 06:37:47 - INFO - __main__ -   Step: 2524, LR: 1.5140084854692277e-05, Loss: 610.6240234375
2024-08-03T13:37:59.889502706Z 
 27%|██▋       | 2525/9500 [8:40:29<23:37:32, 12.19s/it]08/03/2024 06:37:59 - INFO - __main__ -   Step: 2525, LR: 1.5137914311004997e-05, Loss: 555.5599365234375
2024-08-03T13:38:12.341428054Z 
 27%|██▋       | 2526/9500 [8:40:42<23:46:20, 12.27s/it]08/03/2024 06:38:12 - INFO - __main__ -   Step: 2526, LR: 1.5135743767317718e-05, Loss: 727.4161376953125
2024-08-03T13:38:24.697164652Z 
 27%|██▋       | 2527/9500 [8:40:54<23:49:04, 12.30s/it]08/03/2024 06:38:24 - INFO - __main__ -   Step: 2527, LR: 1.513357322363044e-05, Loss: 691.4398193359375
2024-08-03T13:38:36.917061244Z 
 27%|██▋       | 2528/9500 [8:41:06<23:46:11, 12.27s/it]08/03/2024 06:38:36 - INFO - __main__ -   Step: 2528, LR: 1.5131402679943158e-05, Loss: 669.4028930664062
2024-08-03T13:38:49.463437301Z 
 27%|██▋       | 2529/9500 [8:41:19<23:55:29, 12.36s/it]08/03/2024 06:38:49 - INFO - __main__ -   Step: 2529, LR: 1.512923213625588e-05, Loss: 587.67578125
2024-08-03T13:39:01.739487362Z 
 27%|██▋       | 2530/9500 [8:41:31<23:52:32, 12.33s/it]08/03/2024 06:39:01 - INFO - __main__ -   Step: 2530, LR: 1.5127061592568601e-05, Loss: 585.844482421875
2024-08-03T13:39:13.674450740Z 
 27%|██▋       | 2531/9500 [8:41:43<23:38:29, 12.21s/it]08/03/2024 06:39:13 - INFO - __main__ -   Step: 2531, LR: 1.5124891048881323e-05, Loss: 535.822998046875
2024-08-03T13:39:26.144502229Z 
 27%|██▋       | 2532/9500 [8:41:56<23:47:15, 12.29s/it]08/03/2024 06:39:26 - INFO - __main__ -   Step: 2532, LR: 1.5122720505194044e-05, Loss: 683.0311279296875
2024-08-03T13:39:38.562341384Z 
 27%|██▋       | 2533/9500 [8:42:08<23:51:30, 12.33s/it]08/03/2024 06:39:38 - INFO - __main__ -   Step: 2533, LR: 1.5120549961506766e-05, Loss: 642.974365234375
2024-08-03T13:39:50.957221350Z 
 27%|██▋       | 2534/9500 [8:42:20<23:53:37, 12.35s/it]08/03/2024 06:39:50 - INFO - __main__ -   Step: 2534, LR: 1.5118379417819486e-05, Loss: 723.0507202148438
2024-08-03T13:40:02.959253211Z 
 27%|██▋       | 2535/9500 [8:42:32<23:41:22, 12.24s/it]08/03/2024 06:40:02 - INFO - __main__ -   Step: 2535, LR: 1.5116208874132207e-05, Loss: 477.0430908203125
2024-08-03T13:40:15.462668885Z 
 27%|██▋       | 2536/9500 [8:42:45<23:50:11, 12.32s/it]08/03/2024 06:40:15 - INFO - __main__ -   Step: 2536, LR: 1.5114038330444929e-05, Loss: 681.5445556640625
2024-08-03T13:40:27.830595611Z 
 27%|██▋       | 2537/9500 [8:42:57<23:51:34, 12.34s/it]08/03/2024 06:40:27 - INFO - __main__ -   Step: 2537, LR: 1.511186778675765e-05, Loss: 760.1897583007812
2024-08-03T13:40:40.233833717Z 
 27%|██▋       | 2538/9500 [8:43:10<23:53:41, 12.36s/it]08/03/2024 06:40:40 - INFO - __main__ -   Step: 2538, LR: 1.5109697243070372e-05, Loss: 615.904296875
2024-08-03T13:40:53.082314278Z 
 27%|██▋       | 2539/9500 [8:43:23<24:10:39, 12.50s/it]08/03/2024 06:40:53 - INFO - __main__ -   Step: 2539, LR: 1.5107526699383092e-05, Loss: 465.19097900390625
2024-08-03T13:41:05.250965106Z 
 27%|██▋       | 2540/9500 [8:43:35<23:58:46, 12.40s/it]08/03/2024 06:41:05 - INFO - __main__ -   Step: 2540, LR: 1.5105356155695814e-05, Loss: 591.4989013671875
2024-08-03T13:41:17.423459638Z 
 27%|██▋       | 2541/9500 [8:43:47<23:50:32, 12.33s/it]08/03/2024 06:41:17 - INFO - __main__ -   Step: 2541, LR: 1.5103185612008535e-05, Loss: 629.1173095703125
2024-08-03T13:41:30.064797032Z 
 27%|██▋       | 2542/9500 [8:44:00<24:01:01, 12.43s/it]08/03/2024 06:41:30 - INFO - __main__ -   Step: 2542, LR: 1.5101015068321255e-05, Loss: 523.3916015625
2024-08-03T13:41:42.415489440Z 
 27%|██▋       | 2543/9500 [8:44:12<23:58:10, 12.40s/it]08/03/2024 06:41:42 - INFO - __main__ -   Step: 2543, LR: 1.5098844524633975e-05, Loss: 693.8939208984375
2024-08-03T13:41:54.359145370Z 
 27%|██▋       | 2544/9500 [8:44:24<23:41:59, 12.27s/it]08/03/2024 06:41:54 - INFO - __main__ -   Step: 2544, LR: 1.5096673980946696e-05, Loss: 696.757568359375
2024-08-03T13:42:07.003959650Z 
 27%|██▋       | 2545/9500 [8:44:36<23:54:58, 12.38s/it]08/03/2024 06:42:07 - INFO - __main__ -   Step: 2545, LR: 1.5094503437259418e-05, Loss: 567.5040283203125
2024-08-03T13:42:19.118342155Z 
 27%|██▋       | 2546/9500 [8:44:49<23:45:33, 12.30s/it]08/03/2024 06:42:19 - INFO - __main__ -   Step: 2546, LR: 1.509233289357214e-05, Loss: 528.8269653320312
2024-08-03T13:42:31.608984644Z 
 27%|██▋       | 2547/9500 [8:45:01<23:51:59, 12.36s/it]08/03/2024 06:42:31 - INFO - __main__ -   Step: 2547, LR: 1.5090162349884861e-05, Loss: 620.3355712890625
2024-08-03T13:42:44.047337682Z 
 27%|██▋       | 2548/9500 [8:45:13<23:54:36, 12.38s/it]08/03/2024 06:42:44 - INFO - __main__ -   Step: 2548, LR: 1.5087991806197581e-05, Loss: 618.84130859375
2024-08-03T13:42:56.022975540Z 
 27%|██▋       | 2549/9500 [8:45:25<23:40:17, 12.26s/it]08/03/2024 06:42:56 - INFO - __main__ -   Step: 2549, LR: 1.5085821262510303e-05, Loss: 527.5752563476562
2024-08-03T13:43:08.024007793Z 
 27%|██▋       | 2550/9500 [8:45:37<23:31:05, 12.18s/it]08/03/2024 06:43:08 - INFO - __main__ -   Step: 2550, LR: 1.5083650718823024e-05, Loss: 599.0921630859375
2024-08-03T13:43:20.234399926Z 
 27%|██▋       | 2551/9500 [8:45:50<23:31:52, 12.19s/it]08/03/2024 06:43:20 - INFO - __main__ -   Step: 2551, LR: 1.5081480175135746e-05, Loss: 545.0037841796875
2024-08-03T13:43:32.382177264Z 
 27%|██▋       | 2552/9500 [8:46:02<23:30:11, 12.18s/it]08/03/2024 06:43:32 - INFO - __main__ -   Step: 2552, LR: 1.5079309631448467e-05, Loss: 574.6771240234375
2024-08-03T13:43:44.583850917Z 
 27%|██▋       | 2553/9500 [8:46:14<23:30:48, 12.18s/it]08/03/2024 06:43:44 - INFO - __main__ -   Step: 2553, LR: 1.5077139087761187e-05, Loss: 606.6600341796875
2024-08-03T13:43:56.744139322Z 
 27%|██▋       | 2554/9500 [8:46:26<23:29:44, 12.18s/it]08/03/2024 06:43:56 - INFO - __main__ -   Step: 2554, LR: 1.5074968544073909e-05, Loss: 524.7296752929688
2024-08-03T13:44:08.854886715Z 
 27%|██▋       | 2555/9500 [8:46:38<23:27:14, 12.16s/it]08/03/2024 06:44:08 - INFO - __main__ -   Step: 2555, LR: 1.507279800038663e-05, Loss: 671.97412109375
2024-08-03T13:44:21.234159349Z 
 27%|██▋       | 2556/9500 [8:46:51<23:34:43, 12.22s/it]08/03/2024 06:44:21 - INFO - __main__ -   Step: 2556, LR: 1.507062745669935e-05, Loss: 802.8265380859375
2024-08-03T13:44:33.710012674Z 
 27%|██▋       | 2557/9500 [8:47:03<23:43:16, 12.30s/it]08/03/2024 06:44:33 - INFO - __main__ -   Step: 2557, LR: 1.506845691301207e-05, Loss: 579.517578125
2024-08-03T13:44:45.772210993Z 
 27%|██▋       | 2558/9500 [8:47:15<23:34:48, 12.23s/it]08/03/2024 06:44:45 - INFO - __main__ -   Step: 2558, LR: 1.5066286369324791e-05, Loss: 590.2900390625
2024-08-03T13:44:58.073521049Z 
 27%|██▋       | 2559/9500 [8:47:28<23:37:09, 12.25s/it]08/03/2024 06:44:58 - INFO - __main__ -   Step: 2559, LR: 1.5064115825637513e-05, Loss: 672.2434692382812
2024-08-03T13:45:10.810838730Z 
 27%|██▋       | 2560/9500 [8:47:40<23:53:50, 12.40s/it]08/03/2024 06:45:10 - INFO - __main__ -   Step: 2560, LR: 1.5061945281950235e-05, Loss: 692.5462646484375
2024-08-03T13:45:22.948170309Z 
 27%|██▋       | 2561/9500 [8:47:52<23:44:39, 12.32s/it]08/03/2024 06:45:22 - INFO - __main__ -   Step: 2561, LR: 1.5059774738262956e-05, Loss: 725.4212036132812
2024-08-03T13:45:35.095658341Z 
 27%|██▋       | 2562/9500 [8:48:05<23:38:30, 12.27s/it]08/03/2024 06:45:35 - INFO - __main__ -   Step: 2562, LR: 1.5057604194575676e-05, Loss: 702.0125732421875
2024-08-03T13:45:47.676568541Z 
 27%|██▋       | 2563/9500 [8:48:17<23:49:10, 12.36s/it]08/03/2024 06:45:47 - INFO - __main__ -   Step: 2563, LR: 1.5055433650888398e-05, Loss: 790.7986450195312
2024-08-03T13:45:59.781183195Z 
 27%|██▋       | 2564/9500 [8:48:29<23:40:04, 12.28s/it]08/03/2024 06:45:59 - INFO - __main__ -   Step: 2564, LR: 1.5053263107201119e-05, Loss: 554.8260498046875
2024-08-03T13:46:11.869438122Z 
 27%|██▋       | 2565/9500 [8:48:41<23:33:03, 12.23s/it]08/03/2024 06:46:11 - INFO - __main__ -   Step: 2565, LR: 1.505109256351384e-05, Loss: 598.2628784179688
2024-08-03T13:46:24.539235025Z 
 27%|██▋       | 2566/9500 [8:48:54<23:48:16, 12.36s/it]08/03/2024 06:46:24 - INFO - __main__ -   Step: 2566, LR: 1.5048922019826562e-05, Loss: 609.6015625
2024-08-03T13:46:36.744163754Z 
 27%|██▋       | 2567/9500 [8:49:06<23:42:43, 12.31s/it]08/03/2024 06:46:36 - INFO - __main__ -   Step: 2567, LR: 1.5046751476139284e-05, Loss: 752.194091796875
2024-08-03T13:46:48.979038286Z 
 27%|██▋       | 2568/9500 [8:49:18<23:39:49, 12.29s/it]08/03/2024 06:46:48 - INFO - __main__ -   Step: 2568, LR: 1.5044580932452004e-05, Loss: 672.8167114257812
2024-08-03T13:47:01.300239285Z 
 27%|██▋       | 2569/9500 [8:49:31<23:40:43, 12.30s/it]08/03/2024 06:47:01 - INFO - __main__ -   Step: 2569, LR: 1.5042410388764725e-05, Loss: 592.0985107421875
2024-08-03T13:47:13.763578649Z 
 27%|██▋       | 2570/9500 [8:49:43<23:46:13, 12.35s/it]08/03/2024 06:47:13 - INFO - __main__ -   Step: 2570, LR: 1.5040239845077445e-05, Loss: 537.6541137695312
2024-08-03T13:47:25.966450730Z 
 27%|██▋       | 2571/9500 [8:49:55<23:40:58, 12.30s/it]08/03/2024 06:47:25 - INFO - __main__ -   Step: 2571, LR: 1.5038069301390165e-05, Loss: 812.9559326171875
2024-08-03T13:47:38.755391529Z 
 27%|██▋       | 2572/9500 [8:50:08<23:57:33, 12.45s/it]08/03/2024 06:47:38 - INFO - __main__ -   Step: 2572, LR: 1.5035898757702887e-05, Loss: 641.5267333984375
2024-08-03T13:47:50.883219569Z 
 27%|██▋       | 2573/9500 [8:50:20<23:46:11, 12.35s/it]08/03/2024 06:47:50 - INFO - __main__ -   Step: 2573, LR: 1.5033728214015608e-05, Loss: 788.4677734375
2024-08-03T13:48:03.453561680Z 
 27%|██▋       | 2574/9500 [8:50:33<23:53:29, 12.42s/it]08/03/2024 06:48:03 - INFO - __main__ -   Step: 2574, LR: 1.503155767032833e-05, Loss: 613.5256958007812
2024-08-03T13:48:15.580367909Z 
 27%|██▋       | 2575/9500 [8:50:45<23:43:12, 12.33s/it]08/03/2024 06:48:15 - INFO - __main__ -   Step: 2575, LR: 1.5029387126641051e-05, Loss: 649.1163940429688
2024-08-03T13:48:28.068645425Z 
 27%|██▋       | 2576/9500 [8:50:58<23:48:26, 12.38s/it]08/03/2024 06:48:28 - INFO - __main__ -   Step: 2576, LR: 1.5027216582953773e-05, Loss: 483.6044921875
2024-08-03T13:48:40.485649417Z 
 27%|██▋       | 2577/9500 [8:51:10<23:49:34, 12.39s/it]08/03/2024 06:48:40 - INFO - __main__ -   Step: 2577, LR: 1.5025046039266493e-05, Loss: 604.1044921875
2024-08-03T13:48:52.676581762Z 
 27%|██▋       | 2578/9500 [8:51:22<23:42:28, 12.33s/it]08/03/2024 06:48:52 - INFO - __main__ -   Step: 2578, LR: 1.5022875495579214e-05, Loss: 773.8129272460938
2024-08-03T13:49:05.005313913Z 
 27%|██▋       | 2579/9500 [8:51:34<23:42:12, 12.33s/it]08/03/2024 06:49:05 - INFO - __main__ -   Step: 2579, LR: 1.5020704951891936e-05, Loss: 496.8904113769531
2024-08-03T13:49:17.269716210Z 
 27%|██▋       | 2580/9500 [8:51:47<23:39:44, 12.31s/it]08/03/2024 06:49:17 - INFO - __main__ -   Step: 2580, LR: 1.5018534408204657e-05, Loss: 629.43359375
2024-08-03T13:49:29.375601569Z 
 27%|██▋       | 2581/9500 [8:51:59<23:32:30, 12.25s/it]08/03/2024 06:49:29 - INFO - __main__ -   Step: 2581, LR: 1.5016363864517379e-05, Loss: 582.9765014648438
2024-08-03T13:49:42.032758522Z 
 27%|██▋       | 2582/9500 [8:52:11<23:46:25, 12.37s/it]08/03/2024 06:49:42 - INFO - __main__ -   Step: 2582, LR: 1.5014193320830099e-05, Loss: 652.0650634765625
2024-08-03T13:49:54.307465856Z 
 27%|██▋       | 2583/9500 [8:52:24<23:42:52, 12.34s/it]08/03/2024 06:49:54 - INFO - __main__ -   Step: 2583, LR: 1.501202277714282e-05, Loss: 626.01171875
2024-08-03T13:50:06.742900251Z 
 27%|██▋       | 2584/9500 [8:52:36<23:45:53, 12.37s/it]08/03/2024 06:50:06 - INFO - __main__ -   Step: 2584, LR: 1.500985223345554e-05, Loss: 621.3224487304688
2024-08-03T13:50:19.123447516Z 
 27%|██▋       | 2585/9500 [8:52:49<23:46:02, 12.37s/it]08/03/2024 06:50:19 - INFO - __main__ -   Step: 2585, LR: 1.5007681689768262e-05, Loss: 498.88323974609375
2024-08-03T13:50:31.255028309Z 
 27%|██▋       | 2586/9500 [8:53:01<23:37:27, 12.30s/it]08/03/2024 06:50:31 - INFO - __main__ -   Step: 2586, LR: 1.5005511146080982e-05, Loss: 633.4198608398438
2024-08-03T13:50:43.172027207Z 
 27%|██▋       | 2587/9500 [8:53:13<23:24:00, 12.19s/it]08/03/2024 06:50:43 - INFO - __main__ -   Step: 2587, LR: 1.5003340602393703e-05, Loss: 481.3345031738281
2024-08-03T13:50:55.623210552Z 
 27%|██▋       | 2588/9500 [8:53:25<23:32:57, 12.27s/it]08/03/2024 06:50:55 - INFO - __main__ -   Step: 2588, LR: 1.5001170058706425e-05, Loss: 465.2490539550781
2024-08-03T13:51:07.864038594Z 
 27%|██▋       | 2589/9500 [8:53:37<23:31:54, 12.26s/it]08/03/2024 06:51:07 - INFO - __main__ -   Step: 2589, LR: 1.4998999515019146e-05, Loss: 545.6444702148438
2024-08-03T13:51:20.255421086Z 
 27%|██▋       | 2590/9500 [8:53:50<23:36:18, 12.30s/it]08/03/2024 06:51:20 - INFO - __main__ -   Step: 2590, LR: 1.4996828971331868e-05, Loss: 622.6211547851562
2024-08-03T13:51:32.859404942Z 
 27%|██▋       | 2591/9500 [8:54:02<23:46:40, 12.39s/it]08/03/2024 06:51:32 - INFO - __main__ -   Step: 2591, LR: 1.4994658427644588e-05, Loss: 584.7835693359375
2024-08-03T13:51:44.781364750Z 
 27%|██▋       | 2592/9500 [8:54:14<23:30:19, 12.25s/it]08/03/2024 06:51:44 - INFO - __main__ -   Step: 2592, LR: 1.499248788395731e-05, Loss: 534.9852294921875
2024-08-03T13:51:56.761968442Z 
 27%|██▋       | 2593/9500 [8:54:26<23:20:50, 12.17s/it]08/03/2024 06:51:56 - INFO - __main__ -   Step: 2593, LR: 1.499031734027003e-05, Loss: 751.3414306640625
2024-08-03T13:52:09.136613514Z 
 27%|██▋       | 2594/9500 [8:54:39<23:27:44, 12.23s/it]08/03/2024 06:52:09 - INFO - __main__ -   Step: 2594, LR: 1.4988146796582752e-05, Loss: 594.449951171875
2024-08-03T13:52:21.381038519Z 
 27%|██▋       | 2595/9500 [8:54:51<23:28:00, 12.23s/it]08/03/2024 06:52:21 - INFO - __main__ -   Step: 2595, LR: 1.4985976252895474e-05, Loss: 683.4155883789062
2024-08-03T13:52:33.696876760Z 
 27%|██▋       | 2596/9500 [8:55:03<23:30:36, 12.26s/it]08/03/2024 06:52:33 - INFO - __main__ -   Step: 2596, LR: 1.4983805709208194e-05, Loss: 690.0665283203125
2024-08-03T13:52:46.389558335Z 
 27%|██▋       | 2597/9500 [8:55:16<23:45:21, 12.39s/it]08/03/2024 06:52:46 - INFO - __main__ -   Step: 2597, LR: 1.4981635165520915e-05, Loss: 767.145751953125
2024-08-03T13:52:58.699260590Z 
 27%|██▋       | 2598/9500 [8:55:28<23:42:25, 12.37s/it]08/03/2024 06:52:58 - INFO - __main__ -   Step: 2598, LR: 1.4979464621833635e-05, Loss: 656.344970703125
2024-08-03T13:53:10.849645758Z 
 27%|██▋       | 2599/9500 [8:55:40<23:34:47, 12.30s/it]08/03/2024 06:53:10 - INFO - __main__ -   Step: 2599, LR: 1.4977294078146357e-05, Loss: 573.2157592773438
2024-08-03T13:53:23.291428564Z 
 27%|██▋       | 2600/9500 [8:55:53<23:39:27, 12.34s/it]08/03/2024 06:53:23 - INFO - __main__ -   Step: 2600, LR: 1.4975123534459077e-05, Loss: 568.5587768554688
2024-08-03T13:53:35.054894206Z 
 27%|██▋       | 2601/9500 [8:56:04<23:19:15, 12.17s/it]08/03/2024 06:53:35 - INFO - __main__ -   Step: 2601, LR: 1.4972952990771798e-05, Loss: 452.45953369140625
2024-08-03T13:53:46.965805720Z 
 27%|██▋       | 2602/9500 [8:56:16<23:10:07, 12.09s/it]08/03/2024 06:53:46 - INFO - __main__ -   Step: 2602, LR: 1.497078244708452e-05, Loss: 589.3401489257812
2024-08-03T13:53:59.582858035Z 
 27%|██▋       | 2603/9500 [8:56:29<23:28:03, 12.25s/it]08/03/2024 06:53:59 - INFO - __main__ -   Step: 2603, LR: 1.4968611903397241e-05, Loss: 785.96728515625
2024-08-03T13:54:11.861391347Z 
 27%|██▋       | 2604/9500 [8:56:41<23:28:52, 12.26s/it]08/03/2024 06:54:11 - INFO - __main__ -   Step: 2604, LR: 1.4966441359709963e-05, Loss: 620.2838134765625
2024-08-03T13:54:23.948884343Z 
 27%|██▋       | 2605/9500 [8:56:53<23:22:47, 12.21s/it]08/03/2024 06:54:23 - INFO - __main__ -   Step: 2605, LR: 1.4964270816022683e-05, Loss: 598.7686767578125
2024-08-03T13:54:36.831450686Z 
 27%|██▋       | 2606/9500 [8:57:06<23:45:52, 12.41s/it]08/03/2024 06:54:36 - INFO - __main__ -   Step: 2606, LR: 1.4962100272335404e-05, Loss: 651.525390625
2024-08-03T13:54:49.385271474Z 
 27%|██▋       | 2607/9500 [8:57:19<23:50:34, 12.45s/it]08/03/2024 06:54:49 - INFO - __main__ -   Step: 2607, LR: 1.4959929728648126e-05, Loss: 773.7010498046875
2024-08-03T13:55:01.579656608Z 
 27%|██▋       | 2608/9500 [8:57:31<23:41:31, 12.38s/it]08/03/2024 06:55:01 - INFO - __main__ -   Step: 2608, LR: 1.4957759184960847e-05, Loss: 745.4985961914062
2024-08-03T13:55:13.908720923Z 
 27%|██▋       | 2609/9500 [8:57:43<23:39:43, 12.36s/it]08/03/2024 06:55:13 - INFO - __main__ -   Step: 2609, LR: 1.4955588641273569e-05, Loss: 696.2694091796875
2024-08-03T13:55:26.299349764Z 
 27%|██▋       | 2610/9500 [8:57:56<23:40:31, 12.37s/it]08/03/2024 06:55:26 - INFO - __main__ -   Step: 2610, LR: 1.495341809758629e-05, Loss: 668.5640258789062
2024-08-03T13:55:38.779810462Z 
 27%|██▋       | 2611/9500 [8:58:08<23:44:06, 12.40s/it]08/03/2024 06:55:38 - INFO - __main__ -   Step: 2611, LR: 1.495124755389901e-05, Loss: 643.5520629882812
2024-08-03T13:55:51.294367840Z 
 27%|██▋       | 2612/9500 [8:58:21<23:47:44, 12.44s/it]08/03/2024 06:55:51 - INFO - __main__ -   Step: 2612, LR: 1.494907701021173e-05, Loss: 384.5445251464844
2024-08-03T13:56:03.379127139Z 
 28%|██▊       | 2613/9500 [8:58:33<23:35:24, 12.33s/it]08/03/2024 06:56:03 - INFO - __main__ -   Step: 2613, LR: 1.4946906466524452e-05, Loss: 480.99566650390625
2024-08-03T13:56:15.371784297Z 
 28%|██▊       | 2614/9500 [8:58:45<23:23:32, 12.23s/it]08/03/2024 06:56:15 - INFO - __main__ -   Step: 2614, LR: 1.4944735922837172e-05, Loss: 701.0304565429688
2024-08-03T13:56:27.762782225Z 
 28%|██▊       | 2615/9500 [8:58:57<23:28:53, 12.28s/it]08/03/2024 06:56:27 - INFO - __main__ -   Step: 2615, LR: 1.4942565379149893e-05, Loss: 500.8715515136719
2024-08-03T13:56:39.817354145Z 
 28%|██▊       | 2616/9500 [8:59:09<23:21:00, 12.21s/it]08/03/2024 06:56:39 - INFO - __main__ -   Step: 2616, LR: 1.4940394835462615e-05, Loss: 517.1420288085938
2024-08-03T13:56:51.783042357Z 
 28%|██▊       | 2617/9500 [8:59:21<23:12:21, 12.14s/it]08/03/2024 06:56:51 - INFO - __main__ -   Step: 2617, LR: 1.4938224291775336e-05, Loss: 618.5924682617188
2024-08-03T13:57:04.398850097Z 
 28%|██▊       | 2618/9500 [8:59:34<23:28:37, 12.28s/it]08/03/2024 06:57:04 - INFO - __main__ -   Step: 2618, LR: 1.4936053748088058e-05, Loss: 774.9224853515625
2024-08-03T13:57:16.856101961Z 
 28%|██▊       | 2619/9500 [8:59:46<23:34:29, 12.33s/it]08/03/2024 06:57:16 - INFO - __main__ -   Step: 2619, LR: 1.493388320440078e-05, Loss: 877.9525146484375
2024-08-03T13:57:29.120724053Z 
 28%|██▊       | 2620/9500 [8:59:59<23:31:54, 12.31s/it]08/03/2024 06:57:29 - INFO - __main__ -   Step: 2620, LR: 1.49317126607135e-05, Loss: 542.2084350585938
2024-08-03T13:57:41.660939377Z 
 28%|██▊       | 2621/9500 [9:00:11<23:39:30, 12.38s/it]08/03/2024 06:57:41 - INFO - __main__ -   Step: 2621, LR: 1.4929542117026221e-05, Loss: 760.2047119140625
2024-08-03T13:57:54.184440949Z 
 28%|██▊       | 2622/9500 [9:00:24<23:44:11, 12.42s/it]08/03/2024 06:57:54 - INFO - __main__ -   Step: 2622, LR: 1.4927371573338942e-05, Loss: 699.43896484375
2024-08-03T13:58:06.652718886Z 
 28%|██▊       | 2623/9500 [9:00:36<23:45:30, 12.44s/it]08/03/2024 06:58:06 - INFO - __main__ -   Step: 2623, LR: 1.4925201029651664e-05, Loss: 746.6057739257812
2024-08-03T13:58:18.731309507Z 
 28%|██▊       | 2624/9500 [9:00:48<23:32:58, 12.33s/it]08/03/2024 06:58:18 - INFO - __main__ -   Step: 2624, LR: 1.4923030485964386e-05, Loss: 484.228759765625
2024-08-03T13:58:31.448879452Z 
 28%|██▊       | 2625/9500 [9:01:01<23:46:06, 12.45s/it]08/03/2024 06:58:31 - INFO - __main__ -   Step: 2625, LR: 1.4920859942277105e-05, Loss: 612.5343627929688
2024-08-03T13:58:43.743420431Z 
 28%|██▊       | 2626/9500 [9:01:13<23:40:41, 12.40s/it]08/03/2024 06:58:43 - INFO - __main__ -   Step: 2626, LR: 1.4918689398589825e-05, Loss: 740.0984497070312
2024-08-03T13:58:56.244859767Z 
 28%|██▊       | 2627/9500 [9:01:26<23:43:56, 12.43s/it]08/03/2024 06:58:56 - INFO - __main__ -   Step: 2627, LR: 1.4916518854902547e-05, Loss: 637.8642578125
2024-08-03T13:59:08.982381343Z 
 28%|██▊       | 2628/9500 [9:01:38<23:54:16, 12.52s/it]08/03/2024 06:59:08 - INFO - __main__ -   Step: 2628, LR: 1.4914348311215268e-05, Loss: 870.7750244140625
2024-08-03T13:59:20.968787638Z 
 28%|██▊       | 2629/9500 [9:01:50<23:35:39, 12.36s/it]08/03/2024 06:59:20 - INFO - __main__ -   Step: 2629, LR: 1.4912177767527988e-05, Loss: 553.2857666015625
2024-08-03T13:59:33.310427271Z 
 28%|██▊       | 2630/9500 [9:02:03<23:34:43, 12.36s/it]08/03/2024 06:59:33 - INFO - __main__ -   Step: 2630, LR: 1.491000722384071e-05, Loss: 571.8125610351562
2024-08-03T13:59:46.177783536Z 
 28%|██▊       | 2631/9500 [9:02:16<23:52:06, 12.51s/it]08/03/2024 06:59:46 - INFO - __main__ -   Step: 2631, LR: 1.4907836680153431e-05, Loss: 633.7682495117188
2024-08-03T13:59:58.480666660Z 
 28%|██▊       | 2632/9500 [9:02:28<23:44:48, 12.45s/it]08/03/2024 06:59:58 - INFO - __main__ -   Step: 2632, LR: 1.4905666136466153e-05, Loss: 518.4071655273438
2024-08-03T14:00:10.624656880Z 
 28%|██▊       | 2633/9500 [9:02:40<23:34:10, 12.36s/it]08/03/2024 07:00:10 - INFO - __main__ -   Step: 2633, LR: 1.4903495592778875e-05, Loss: 628.7584228515625
2024-08-03T14:00:23.100367946Z 
 28%|██▊       | 2634/9500 [9:02:53<23:38:04, 12.39s/it]08/03/2024 07:00:23 - INFO - __main__ -   Step: 2634, LR: 1.4901325049091594e-05, Loss: 680.397216796875
2024-08-03T14:00:35.186672984Z 
 28%|██▊       | 2635/9500 [9:03:05<23:27:21, 12.30s/it]08/03/2024 07:00:35 - INFO - __main__ -   Step: 2635, LR: 1.4899154505404316e-05, Loss: 775.6510009765625
2024-08-03T14:00:47.539125945Z 
 28%|██▊       | 2636/9500 [9:03:17<23:28:57, 12.32s/it]08/03/2024 07:00:47 - INFO - __main__ -   Step: 2636, LR: 1.4896983961717038e-05, Loss: 664.951904296875
2024-08-03T14:00:59.972950088Z 
 28%|██▊       | 2637/9500 [9:03:29<23:32:48, 12.35s/it]08/03/2024 07:00:59 - INFO - __main__ -   Step: 2637, LR: 1.4894813418029759e-05, Loss: 428.9917297363281
2024-08-03T14:01:12.147573611Z 
 28%|██▊       | 2638/9500 [9:03:42<23:26:30, 12.30s/it]08/03/2024 07:01:12 - INFO - __main__ -   Step: 2638, LR: 1.489264287434248e-05, Loss: 630.7808227539062
2024-08-03T14:01:24.545574157Z 
 28%|██▊       | 2639/9500 [9:03:54<23:29:44, 12.33s/it]08/03/2024 07:01:24 - INFO - __main__ -   Step: 2639, LR: 1.48904723306552e-05, Loss: 596.604736328125
2024-08-03T14:01:37.081748298Z 
 28%|██▊       | 2640/9500 [9:04:07<23:36:38, 12.39s/it]08/03/2024 07:01:37 - INFO - __main__ -   Step: 2640, LR: 1.488830178696792e-05, Loss: 783.479248046875
2024-08-03T14:01:49.307388705Z 
 28%|██▊       | 2641/9500 [9:04:19<23:30:47, 12.34s/it]08/03/2024 07:01:49 - INFO - __main__ -   Step: 2641, LR: 1.4886131243280642e-05, Loss: 600.5381469726562
2024-08-03T14:02:01.480614490Z 
 28%|██▊       | 2642/9500 [9:04:31<23:24:49, 12.29s/it]08/03/2024 07:02:01 - INFO - __main__ -   Step: 2642, LR: 1.4883960699593363e-05, Loss: 599.2379150390625
2024-08-03T14:02:13.886495897Z 
 28%|██▊       | 2643/9500 [9:04:43<23:28:35, 12.33s/it]08/03/2024 07:02:13 - INFO - __main__ -   Step: 2643, LR: 1.4881790155906083e-05, Loss: 623.516357421875
2024-08-03T14:02:26.021994617Z 
 28%|██▊       | 2644/9500 [9:04:55<23:21:51, 12.27s/it]08/03/2024 07:02:26 - INFO - __main__ -   Step: 2644, LR: 1.4879619612218805e-05, Loss: 730.0924072265625
2024-08-03T14:02:37.951020986Z 
 28%|██▊       | 2645/9500 [9:05:07<23:10:02, 12.17s/it]08/03/2024 07:02:37 - INFO - __main__ -   Step: 2645, LR: 1.4877449068531526e-05, Loss: 653.1990966796875
2024-08-03T14:02:50.397325311Z 
 28%|██▊       | 2646/9500 [9:05:20<23:19:24, 12.25s/it]08/03/2024 07:02:50 - INFO - __main__ -   Step: 2646, LR: 1.4875278524844248e-05, Loss: 636.32666015625
2024-08-03T14:03:02.859672897Z 
 28%|██▊       | 2647/9500 [9:05:32<23:26:28, 12.31s/it]08/03/2024 07:03:02 - INFO - __main__ -   Step: 2647, LR: 1.487310798115697e-05, Loss: 727.6333618164062
2024-08-03T14:03:14.865383377Z 
 28%|██▊       | 2648/9500 [9:05:44<23:15:42, 12.22s/it]08/03/2024 07:03:14 - INFO - __main__ -   Step: 2648, LR: 1.487093743746969e-05, Loss: 593.7029418945312
2024-08-03T14:03:27.442536793Z 
 28%|██▊       | 2649/9500 [9:05:57<23:27:40, 12.33s/it]08/03/2024 07:03:27 - INFO - __main__ -   Step: 2649, LR: 1.4868766893782411e-05, Loss: 477.01666259765625
2024-08-03T14:03:39.381708626Z 
 28%|██▊       | 2650/9500 [9:06:09<23:14:08, 12.21s/it]08/03/2024 07:03:39 - INFO - __main__ -   Step: 2650, LR: 1.4866596350095133e-05, Loss: 503.9341735839844
2024-08-03T14:03:51.797939570Z 
 28%|██▊       | 2651/9500 [9:06:21<23:20:56, 12.27s/it]08/03/2024 07:03:51 - INFO - __main__ -   Step: 2651, LR: 1.4864425806407854e-05, Loss: 748.9476318359375
2024-08-03T14:04:04.547729867Z 
 28%|██▊       | 2652/9500 [9:06:34<23:37:02, 12.42s/it]08/03/2024 07:04:04 - INFO - __main__ -   Step: 2652, LR: 1.4862255262720576e-05, Loss: 600.9708251953125
2024-08-03T14:04:17.073662679Z 
 28%|██▊       | 2653/9500 [9:06:47<23:40:38, 12.45s/it]08/03/2024 07:04:17 - INFO - __main__ -   Step: 2653, LR: 1.4860084719033296e-05, Loss: 584.516357421875
2024-08-03T14:04:29.059566366Z 
 28%|██▊       | 2654/9500 [9:06:58<23:24:34, 12.31s/it]08/03/2024 07:04:29 - INFO - __main__ -   Step: 2654, LR: 1.4857914175346015e-05, Loss: 551.417236328125
2024-08-03T14:04:41.838467765Z 
 28%|██▊       | 2655/9500 [9:07:11<23:40:25, 12.45s/it]08/03/2024 07:04:41 - INFO - __main__ -   Step: 2655, LR: 1.4855743631658737e-05, Loss: 570.866943359375
2024-08-03T14:04:54.173644978Z 
 28%|██▊       | 2656/9500 [9:07:24<23:36:11, 12.42s/it]08/03/2024 07:04:54 - INFO - __main__ -   Step: 2656, LR: 1.4853573087971459e-05, Loss: 768.6090087890625
2024-08-03T14:05:06.279564549Z 
 28%|██▊       | 2657/9500 [9:07:36<23:25:27, 12.32s/it]08/03/2024 07:05:06 - INFO - __main__ -   Step: 2657, LR: 1.4851402544284178e-05, Loss: 680.7029418945312
2024-08-03T14:05:18.711133110Z 
 28%|██▊       | 2658/9500 [9:07:48<23:28:57, 12.36s/it]08/03/2024 07:05:18 - INFO - __main__ -   Step: 2658, LR: 1.48492320005969e-05, Loss: 734.8933715820312
2024-08-03T14:05:31.350342027Z 
 28%|██▊       | 2659/9500 [9:08:01<23:38:27, 12.44s/it]08/03/2024 07:05:31 - INFO - __main__ -   Step: 2659, LR: 1.4847061456909622e-05, Loss: 741.0640258789062
2024-08-03T14:05:43.441213737Z 
 28%|██▊       | 2660/9500 [9:08:13<23:26:17, 12.34s/it]08/03/2024 07:05:43 - INFO - __main__ -   Step: 2660, LR: 1.4844890913222343e-05, Loss: 662.6848754882812
2024-08-03T14:05:55.446328517Z 
 28%|██▊       | 2661/9500 [9:08:25<23:14:46, 12.24s/it]08/03/2024 07:05:55 - INFO - __main__ -   Step: 2661, LR: 1.4842720369535065e-05, Loss: 630.3858032226562
2024-08-03T14:06:07.998370102Z 
 28%|██▊       | 2662/9500 [9:08:37<23:25:21, 12.33s/it]08/03/2024 07:06:07 - INFO - __main__ -   Step: 2662, LR: 1.4840549825847785e-05, Loss: 582.0107421875
2024-08-03T14:06:20.451930018Z 
 28%|██▊       | 2663/9500 [9:08:50<23:29:19, 12.37s/it]08/03/2024 07:06:20 - INFO - __main__ -   Step: 2663, LR: 1.4838379282160506e-05, Loss: 670.845947265625
2024-08-03T14:06:32.895770691Z 
 28%|██▊       | 2664/9500 [9:09:02<23:31:42, 12.39s/it]08/03/2024 07:06:32 - INFO - __main__ -   Step: 2664, LR: 1.4836208738473228e-05, Loss: 582.7666625976562
2024-08-03T14:06:45.496872099Z 
 28%|██▊       | 2665/9500 [9:09:15<23:38:41, 12.45s/it]08/03/2024 07:06:45 - INFO - __main__ -   Step: 2665, LR: 1.483403819478595e-05, Loss: 702.7785034179688
2024-08-03T14:06:57.878604322Z 
 28%|██▊       | 2666/9500 [9:09:27<23:36:01, 12.43s/it]08/03/2024 07:06:57 - INFO - __main__ -   Step: 2666, LR: 1.483186765109867e-05, Loss: 660.274658203125
2024-08-03T14:07:10.040644410Z 
 28%|██▊       | 2667/9500 [9:09:39<23:26:35, 12.35s/it]08/03/2024 07:07:10 - INFO - __main__ -   Step: 2667, LR: 1.4829697107411392e-05, Loss: 735.3682861328125
2024-08-03T14:07:22.727938566Z 
 28%|██▊       | 2668/9500 [9:09:52<23:37:52, 12.45s/it]08/03/2024 07:07:22 - INFO - __main__ -   Step: 2668, LR: 1.482752656372411e-05, Loss: 617.0693359375
2024-08-03T14:07:34.875472729Z 
 28%|██▊       | 2669/9500 [9:10:04<23:27:15, 12.36s/it]08/03/2024 07:07:34 - INFO - __main__ -   Step: 2669, LR: 1.4825356020036832e-05, Loss: 522.8035888671875
2024-08-03T14:07:47.349062763Z 
 28%|██▊       | 2670/9500 [9:10:17<23:30:54, 12.39s/it]08/03/2024 07:07:47 - INFO - __main__ -   Step: 2670, LR: 1.4823185476349554e-05, Loss: 731.0921630859375
2024-08-03T14:08:00.031999934Z 
 28%|██▊       | 2671/9500 [9:10:29<23:40:33, 12.48s/it]08/03/2024 07:08:00 - INFO - __main__ -   Step: 2671, LR: 1.4821014932662273e-05, Loss: 711.4384765625
2024-08-03T14:08:12.101250497Z 
 28%|██▊       | 2672/9500 [9:10:42<23:26:16, 12.36s/it]08/03/2024 07:08:12 - INFO - __main__ -   Step: 2672, LR: 1.4818844388974995e-05, Loss: 548.1527099609375
2024-08-03T14:08:24.385193560Z 
 28%|██▊       | 2673/9500 [9:10:54<23:23:33, 12.34s/it]08/03/2024 07:08:24 - INFO - __main__ -   Step: 2673, LR: 1.4816673845287717e-05, Loss: 718.491455078125
2024-08-03T14:08:37.097089152Z 
 28%|██▊       | 2674/9500 [9:11:07<23:36:12, 12.45s/it]08/03/2024 07:08:37 - INFO - __main__ -   Step: 2674, LR: 1.4814503301600438e-05, Loss: 631.1940307617188
2024-08-03T14:08:49.279710924Z 
 28%|██▊       | 2675/9500 [9:11:19<23:26:55, 12.37s/it]08/03/2024 07:08:49 - INFO - __main__ -   Step: 2675, LR: 1.481233275791316e-05, Loss: 572.0874633789062
2024-08-03T14:09:01.248790409Z 
 28%|██▊       | 2676/9500 [9:11:31<23:13:05, 12.25s/it]08/03/2024 07:09:01 - INFO - __main__ -   Step: 2676, LR: 1.4810162214225881e-05, Loss: 577.068603515625
2024-08-03T14:09:13.897845118Z 
 28%|██▊       | 2677/9500 [9:11:43<23:26:33, 12.37s/it]08/03/2024 07:09:13 - INFO - __main__ -   Step: 2677, LR: 1.4807991670538601e-05, Loss: 575.9053955078125
2024-08-03T14:09:25.950980032Z 
 28%|██▊       | 2678/9500 [9:11:55<23:15:34, 12.27s/it]08/03/2024 07:09:25 - INFO - __main__ -   Step: 2678, LR: 1.4805821126851323e-05, Loss: 565.9381103515625
2024-08-03T14:09:38.030085857Z 
 28%|██▊       | 2679/9500 [9:12:07<23:08:42, 12.22s/it]08/03/2024 07:09:38 - INFO - __main__ -   Step: 2679, LR: 1.4803650583164044e-05, Loss: 751.2459716796875
2024-08-03T14:09:50.962859419Z 
 28%|██▊       | 2680/9500 [9:12:20<23:32:57, 12.43s/it]08/03/2024 07:09:50 - INFO - __main__ -   Step: 2680, LR: 1.4801480039476766e-05, Loss: 655.18212890625
2024-08-03T14:10:03.192546838Z 
 28%|██▊       | 2681/9500 [9:12:33<23:25:54, 12.37s/it]08/03/2024 07:10:03 - INFO - __main__ -   Step: 2681, LR: 1.4799309495789484e-05, Loss: 597.9866333007812
2024-08-03T14:10:15.240362361Z 
 28%|██▊       | 2682/9500 [9:12:45<23:14:42, 12.27s/it]08/03/2024 07:10:15 - INFO - __main__ -   Step: 2682, LR: 1.4797138952102206e-05, Loss: 517.3142700195312
2024-08-03T14:10:27.648516104Z 
 28%|██▊       | 2683/9500 [9:12:57<23:19:04, 12.31s/it]08/03/2024 07:10:27 - INFO - __main__ -   Step: 2683, LR: 1.4794968408414927e-05, Loss: 719.33740234375
2024-08-03T14:10:39.810310633Z 
 28%|██▊       | 2684/9500 [9:13:09<23:13:41, 12.27s/it]08/03/2024 07:10:39 - INFO - __main__ -   Step: 2684, LR: 1.4792797864727649e-05, Loss: 584.0213623046875
2024-08-03T14:10:51.720315543Z 
 28%|██▊       | 2685/9500 [9:13:21<23:01:15, 12.16s/it]08/03/2024 07:10:51 - INFO - __main__ -   Step: 2685, LR: 1.479062732104037e-05, Loss: 683.997802734375
2024-08-03T14:11:04.450011298Z 
 28%|██▊       | 2686/9500 [9:13:34<23:20:27, 12.33s/it]08/03/2024 07:11:04 - INFO - __main__ -   Step: 2686, LR: 1.478845677735309e-05, Loss: 676.7186279296875
2024-08-03T14:11:17.008964568Z 
 28%|██▊       | 2687/9500 [9:13:46<23:27:59, 12.40s/it]08/03/2024 07:11:17 - INFO - __main__ -   Step: 2687, LR: 1.4786286233665812e-05, Loss: 769.9896240234375
2024-08-03T14:11:28.865227837Z 
 28%|██▊       | 2688/9500 [9:13:58<23:09:16, 12.24s/it]08/03/2024 07:11:28 - INFO - __main__ -   Step: 2688, LR: 1.4784115689978533e-05, Loss: 554.9007568359375
2024-08-03T14:11:41.488796452Z 
 28%|██▊       | 2689/9500 [9:14:11<23:22:14, 12.35s/it]08/03/2024 07:11:41 - INFO - __main__ -   Step: 2689, LR: 1.4781945146291255e-05, Loss: 685.856689453125
2024-08-03T14:11:53.417844861Z 
 28%|██▊       | 2690/9500 [9:14:23<23:07:36, 12.23s/it]08/03/2024 07:11:53 - INFO - __main__ -   Step: 2690, LR: 1.4779774602603976e-05, Loss: 574.4420166015625
2024-08-03T14:12:05.558865101Z 
 28%|██▊       | 2691/9500 [9:14:35<23:04:31, 12.20s/it]08/03/2024 07:12:05 - INFO - __main__ -   Step: 2691, LR: 1.4777604058916696e-05, Loss: 561.5313720703125
2024-08-03T14:12:18.042803263Z 
 28%|██▊       | 2692/9500 [9:14:47<23:13:59, 12.29s/it]08/03/2024 07:12:18 - INFO - __main__ -   Step: 2692, LR: 1.4775433515229418e-05, Loss: 614.7566528320312
2024-08-03T14:12:30.022330071Z 
 28%|██▊       | 2693/9500 [9:14:59<23:03:21, 12.19s/it]08/03/2024 07:12:30 - INFO - __main__ -   Step: 2693, LR: 1.477326297154214e-05, Loss: 573.10205078125
2024-08-03T14:12:42.108964517Z 
 28%|██▊       | 2694/9500 [9:15:12<22:59:31, 12.16s/it]08/03/2024 07:12:42 - INFO - __main__ -   Step: 2694, LR: 1.4771092427854861e-05, Loss: 742.7086181640625
2024-08-03T14:12:54.702147699Z 
 28%|██▊       | 2695/9500 [9:15:24<23:14:00, 12.29s/it]08/03/2024 07:12:54 - INFO - __main__ -   Step: 2695, LR: 1.4768921884167579e-05, Loss: 532.5564575195312
2024-08-03T14:13:07.239002242Z 
 28%|██▊       | 2696/9500 [9:15:37<23:22:09, 12.36s/it]08/03/2024 07:13:07 - INFO - __main__ -   Step: 2696, LR: 1.47667513404803e-05, Loss: 495.0445861816406
2024-08-03T14:13:19.302928958Z 
 28%|██▊       | 2697/9500 [9:15:49<23:11:43, 12.27s/it]08/03/2024 07:13:19 - INFO - __main__ -   Step: 2697, LR: 1.4764580796793022e-05, Loss: 613.6275634765625
2024-08-03T14:13:32.259647333Z 
 28%|██▊       | 2698/9500 [9:16:02<23:34:42, 12.48s/it]08/03/2024 07:13:32 - INFO - __main__ -   Step: 2698, LR: 1.4762410253105744e-05, Loss: 631.110107421875
2024-08-03T14:13:44.222476418Z 
 28%|██▊       | 2699/9500 [9:16:14<23:16:57, 12.32s/it]08/03/2024 07:13:44 - INFO - __main__ -   Step: 2699, LR: 1.4760239709418465e-05, Loss: 558.0366821289062
2024-08-03T14:13:56.393917461Z 
 28%|██▊       | 2700/9500 [9:16:26<23:11:33, 12.28s/it]08/03/2024 07:13:56 - INFO - __main__ -   Step: 2700, LR: 1.4758069165731185e-05, Loss: 552.5697021484375
2024-08-03T14:14:08.736499066Z 
 28%|██▊       | 2701/9500 [9:16:38<23:13:31, 12.30s/it]08/03/2024 07:14:08 - INFO - __main__ -   Step: 2701, LR: 1.4755898622043907e-05, Loss: 631.9276123046875
2024-08-03T14:14:21.274882726Z 
 28%|██▊       | 2702/9500 [9:16:51<23:21:29, 12.37s/it]08/03/2024 07:14:21 - INFO - __main__ -   Step: 2702, LR: 1.4753728078356628e-05, Loss: 718.1727294921875
2024-08-03T14:14:33.399014402Z 
 28%|██▊       | 2703/9500 [9:17:03<23:12:57, 12.30s/it]08/03/2024 07:14:33 - INFO - __main__ -   Step: 2703, LR: 1.475155753466935e-05, Loss: 628.8232421875
2024-08-03T14:14:45.585967021Z 
 28%|██▊       | 2704/9500 [9:17:15<23:09:02, 12.26s/it]08/03/2024 07:14:45 - INFO - __main__ -   Step: 2704, LR: 1.4749386990982071e-05, Loss: 713.6456909179688
2024-08-03T14:14:58.104986760Z 
 28%|██▊       | 2705/9500 [9:17:28<23:17:30, 12.34s/it]08/03/2024 07:14:58 - INFO - __main__ -   Step: 2705, LR: 1.4747216447294791e-05, Loss: 658.9716186523438
2024-08-03T14:15:10.172430373Z 
 28%|██▊       | 2706/9500 [9:17:40<23:08:02, 12.26s/it]08/03/2024 07:15:10 - INFO - __main__ -   Step: 2706, LR: 1.4745045903607513e-05, Loss: 723.495849609375
2024-08-03T14:15:22.591714262Z 
 28%|██▊       | 2707/9500 [9:17:52<23:13:19, 12.31s/it]08/03/2024 07:15:22 - INFO - __main__ -   Step: 2707, LR: 1.4742875359920234e-05, Loss: 696.3844604492188
2024-08-03T14:15:35.088722648Z 
 29%|██▊       | 2708/9500 [9:18:05<23:19:33, 12.36s/it]08/03/2024 07:15:35 - INFO - __main__ -   Step: 2708, LR: 1.4740704816232956e-05, Loss: 772.6253662109375
2024-08-03T14:15:47.343998621Z 
 29%|██▊       | 2709/9500 [9:18:17<23:15:41, 12.33s/it]08/03/2024 07:15:47 - INFO - __main__ -   Step: 2709, LR: 1.4738534272545674e-05, Loss: 719.772705078125
2024-08-03T14:15:59.453549633Z 
 29%|██▊       | 2710/9500 [9:18:29<23:07:57, 12.26s/it]08/03/2024 07:15:59 - INFO - __main__ -   Step: 2710, LR: 1.4736363728858396e-05, Loss: 669.412353515625
2024-08-03T14:16:11.943027379Z 
 29%|██▊       | 2711/9500 [9:18:41<23:15:22, 12.33s/it]08/03/2024 07:16:11 - INFO - __main__ -   Step: 2711, LR: 1.4734193185171117e-05, Loss: 695.6776123046875
2024-08-03T14:16:24.120565515Z 
 29%|██▊       | 2712/9500 [9:18:54<23:09:56, 12.29s/it]08/03/2024 07:16:24 - INFO - __main__ -   Step: 2712, LR: 1.4732022641483839e-05, Loss: 616.771484375
2024-08-03T14:16:35.936176707Z 
 29%|██▊       | 2713/9500 [9:19:05<22:53:46, 12.14s/it]08/03/2024 07:16:35 - INFO - __main__ -   Step: 2713, LR: 1.472985209779656e-05, Loss: 511.7686462402344
2024-08-03T14:16:48.618335225Z 
 29%|██▊       | 2714/9500 [9:19:18<23:11:48, 12.31s/it]08/03/2024 07:16:48 - INFO - __main__ -   Step: 2714, LR: 1.472768155410928e-05, Loss: 699.6390991210938
2024-08-03T14:17:00.802801148Z 
 29%|██▊       | 2715/9500 [9:19:30<23:07:28, 12.27s/it]08/03/2024 07:17:00 - INFO - __main__ -   Step: 2715, LR: 1.4725511010422002e-05, Loss: 546.8927001953125
2024-08-03T14:17:12.848448972Z 
 29%|██▊       | 2716/9500 [9:19:42<22:59:40, 12.20s/it]08/03/2024 07:17:12 - INFO - __main__ -   Step: 2716, LR: 1.4723340466734723e-05, Loss: 664.8847045898438
2024-08-03T14:17:25.386169783Z 
 29%|██▊       | 2717/9500 [9:19:55<23:10:51, 12.30s/it]08/03/2024 07:17:25 - INFO - __main__ -   Step: 2717, LR: 1.4721169923047445e-05, Loss: 638.9113159179688
2024-08-03T14:17:37.419009252Z 
 29%|██▊       | 2718/9500 [9:20:07<23:01:29, 12.22s/it]08/03/2024 07:17:37 - INFO - __main__ -   Step: 2718, LR: 1.4718999379360166e-05, Loss: 584.1171264648438
2024-08-03T14:17:49.548933498Z 
 29%|██▊       | 2719/9500 [9:20:19<22:58:08, 12.19s/it]08/03/2024 07:17:49 - INFO - __main__ -   Step: 2719, LR: 1.4716828835672888e-05, Loss: 482.07135009765625
2024-08-03T14:18:02.085144022Z 
 29%|██▊       | 2720/9500 [9:20:32<23:09:32, 12.30s/it]08/03/2024 07:18:02 - INFO - __main__ -   Step: 2720, LR: 1.4714658291985608e-05, Loss: 677.7196044921875
2024-08-03T14:18:13.987628006Z 
 29%|██▊       | 2721/9500 [9:20:43<22:55:58, 12.18s/it]08/03/2024 07:18:13 - INFO - __main__ -   Step: 2721, LR: 1.471248774829833e-05, Loss: 573.73974609375
2024-08-03T14:18:25.834729537Z 
 29%|██▊       | 2722/9500 [9:20:55<22:44:32, 12.08s/it]08/03/2024 07:18:25 - INFO - __main__ -   Step: 2722, LR: 1.4710317204611051e-05, Loss: 498.1040954589844
2024-08-03T14:18:38.095341341Z 
 29%|██▊       | 2723/9500 [9:21:08<22:50:28, 12.13s/it]08/03/2024 07:18:38 - INFO - __main__ -   Step: 2723, LR: 1.470814666092377e-05, Loss: 616.7325439453125
2024-08-03T14:18:50.037413342Z 
 29%|██▊       | 2724/9500 [9:21:19<22:43:47, 12.08s/it]08/03/2024 07:18:50 - INFO - __main__ -   Step: 2724, LR: 1.470597611723649e-05, Loss: 526.328857421875
2024-08-03T14:19:02.188697674Z 
 29%|██▊       | 2725/9500 [9:21:32<22:46:09, 12.10s/it]08/03/2024 07:19:02 - INFO - __main__ -   Step: 2725, LR: 1.4703805573549212e-05, Loss: 670.9995727539062
2024-08-03T14:19:15.265762077Z 
 29%|██▊       | 2726/9500 [9:21:45<23:19:05, 12.39s/it]08/03/2024 07:19:15 - INFO - __main__ -   Step: 2726, LR: 1.4701635029861934e-05, Loss: 650.3421630859375
2024-08-03T14:19:27.319556746Z 
 29%|██▊       | 2727/9500 [9:21:57<23:07:24, 12.29s/it]08/03/2024 07:19:27 - INFO - __main__ -   Step: 2727, LR: 1.4699464486174655e-05, Loss: 479.00750732421875
2024-08-03T14:19:39.320845924Z 
 29%|██▊       | 2728/9500 [9:22:09<22:57:24, 12.20s/it]08/03/2024 07:19:39 - INFO - __main__ -   Step: 2728, LR: 1.4697293942487377e-05, Loss: 535.3141479492188
2024-08-03T14:19:52.013639344Z 
 29%|██▊       | 2729/9500 [9:22:21<23:13:45, 12.35s/it]08/03/2024 07:19:52 - INFO - __main__ -   Step: 2729, LR: 1.4695123398800097e-05, Loss: 672.248779296875
2024-08-03T14:20:03.989842370Z 
 29%|██▊       | 2730/9500 [9:22:33<23:00:53, 12.24s/it]08/03/2024 07:20:03 - INFO - __main__ -   Step: 2730, LR: 1.4692952855112818e-05, Loss: 493.849609375
2024-08-03T14:20:15.967078808Z 
 29%|██▊       | 2731/9500 [9:22:45<22:51:50, 12.16s/it]08/03/2024 07:20:15 - INFO - __main__ -   Step: 2731, LR: 1.469078231142554e-05, Loss: 621.42431640625
2024-08-03T14:20:28.786815261Z 
 29%|██▉       | 2732/9500 [9:22:58<23:13:57, 12.36s/it]08/03/2024 07:20:28 - INFO - __main__ -   Step: 2732, LR: 1.4688611767738262e-05, Loss: 750.617919921875
2024-08-03T14:20:40.826081258Z 
 29%|██▉       | 2733/9500 [9:23:10<23:02:59, 12.26s/it]08/03/2024 07:20:40 - INFO - __main__ -   Step: 2733, LR: 1.4686441224050983e-05, Loss: 641.332275390625
2024-08-03T14:20:53.147422437Z 
 29%|██▉       | 2734/9500 [9:23:23<23:04:45, 12.28s/it]08/03/2024 07:20:53 - INFO - __main__ -   Step: 2734, LR: 1.4684270680363703e-05, Loss: 757.014404296875
2024-08-03T14:21:05.687470676Z 
 29%|██▉       | 2735/9500 [9:23:35<23:13:22, 12.36s/it]08/03/2024 07:21:05 - INFO - __main__ -   Step: 2735, LR: 1.4682100136676424e-05, Loss: 558.9329223632812
2024-08-03T14:21:17.847630391Z 
 29%|██▉       | 2736/9500 [9:23:47<23:06:27, 12.30s/it]08/03/2024 07:21:17 - INFO - __main__ -   Step: 2736, LR: 1.4679929592989146e-05, Loss: 481.9510498046875
2024-08-03T14:21:30.077005124Z 
 29%|██▉       | 2737/9500 [9:24:00<23:03:55, 12.28s/it]08/03/2024 07:21:30 - INFO - __main__ -   Step: 2737, LR: 1.4677759049301866e-05, Loss: 628.1666259765625
2024-08-03T14:21:42.651674834Z 
 29%|██▉       | 2738/9500 [9:24:12<23:13:45, 12.37s/it]08/03/2024 07:21:42 - INFO - __main__ -   Step: 2738, LR: 1.4675588505614586e-05, Loss: 660.4972534179688
2024-08-03T14:21:54.637970850Z 
 29%|██▉       | 2739/9500 [9:24:24<23:00:40, 12.25s/it]08/03/2024 07:21:54 - INFO - __main__ -   Step: 2739, LR: 1.4673417961927307e-05, Loss: 476.7606201171875
2024-08-03T14:22:06.534057816Z 
 29%|██▉       | 2740/9500 [9:24:36<22:48:25, 12.15s/it]08/03/2024 07:22:06 - INFO - __main__ -   Step: 2740, LR: 1.4671247418240029e-05, Loss: 575.5357666015625
2024-08-03T14:22:19.124716520Z 
 29%|██▉       | 2741/9500 [9:24:49<23:03:15, 12.28s/it]08/03/2024 07:22:19 - INFO - __main__ -   Step: 2741, LR: 1.466907687455275e-05, Loss: 458.31488037109375
2024-08-03T14:22:31.361801872Z 
 29%|██▉       | 2742/9500 [9:25:01<23:01:36, 12.27s/it]08/03/2024 07:22:31 - INFO - __main__ -   Step: 2742, LR: 1.4666906330865472e-05, Loss: 761.1447143554688
2024-08-03T14:22:43.257507023Z 
 29%|██▉       | 2743/9500 [9:25:13<22:48:53, 12.16s/it]08/03/2024 07:22:43 - INFO - __main__ -   Step: 2743, LR: 1.4664735787178192e-05, Loss: 507.16876220703125
2024-08-03T14:22:55.809521050Z 
 29%|██▉       | 2744/9500 [9:25:25<23:02:05, 12.27s/it]08/03/2024 07:22:55 - INFO - __main__ -   Step: 2744, LR: 1.4662565243490913e-05, Loss: 639.8326416015625
2024-08-03T14:23:08.298254072Z 
 29%|██▉       | 2745/9500 [9:25:38<23:09:07, 12.34s/it]08/03/2024 07:23:08 - INFO - __main__ -   Step: 2745, LR: 1.4660394699803635e-05, Loss: 645.7056884765625
2024-08-03T14:23:20.413075693Z 
 29%|██▉       | 2746/9500 [9:25:50<23:01:21, 12.27s/it]08/03/2024 07:23:20 - INFO - __main__ -   Step: 2746, LR: 1.4658224156116357e-05, Loss: 519.0142822265625
2024-08-03T14:23:32.610876982Z 
 29%|██▉       | 2747/9500 [9:26:02<22:58:40, 12.25s/it]08/03/2024 07:23:32 - INFO - __main__ -   Step: 2747, LR: 1.4656053612429078e-05, Loss: 606.5885620117188
2024-08-03T14:23:45.132266603Z 
 29%|██▉       | 2748/9500 [9:26:15<23:07:39, 12.33s/it]08/03/2024 07:23:45 - INFO - __main__ -   Step: 2748, LR: 1.4653883068741798e-05, Loss: 560.2393798828125
2024-08-03T14:23:57.045036136Z 
 29%|██▉       | 2749/9500 [9:26:26<22:53:19, 12.21s/it]08/03/2024 07:23:57 - INFO - __main__ -   Step: 2749, LR: 1.465171252505452e-05, Loss: 533.1339111328125
2024-08-03T14:24:09.375632481Z 
 29%|██▉       | 2750/9500 [9:26:39<22:57:20, 12.24s/it]08/03/2024 07:24:09 - INFO - __main__ -   Step: 2750, LR: 1.4649541981367241e-05, Loss: 760.3387451171875
2024-08-03T14:24:21.686360047Z 
 29%|██▉       | 2751/9500 [9:26:51<22:59:24, 12.26s/it]08/03/2024 07:24:21 - INFO - __main__ -   Step: 2751, LR: 1.4647371437679961e-05, Loss: 484.73980712890625
2024-08-03T14:24:33.553697591Z 
 29%|██▉       | 2752/9500 [9:27:03<22:45:51, 12.14s/it]08/03/2024 07:24:33 - INFO - __main__ -   Step: 2752, LR: 1.464520089399268e-05, Loss: 561.1253051757812
2024-08-03T14:24:45.719232378Z 
 29%|██▉       | 2753/9500 [9:27:15<22:46:22, 12.15s/it]08/03/2024 07:24:45 - INFO - __main__ -   Step: 2753, LR: 1.4643030350305402e-05, Loss: 468.9063415527344
2024-08-03T14:24:58.253452497Z 
 29%|██▉       | 2754/9500 [9:27:28<22:59:05, 12.27s/it]08/03/2024 07:24:58 - INFO - __main__ -   Step: 2754, LR: 1.4640859806618124e-05, Loss: 588.506591796875
2024-08-03T14:25:10.253490042Z 
 29%|██▉       | 2755/9500 [9:27:40<22:49:54, 12.19s/it]08/03/2024 07:25:10 - INFO - __main__ -   Step: 2755, LR: 1.4638689262930846e-05, Loss: 551.8762817382812
2024-08-03T14:25:22.220548671Z 
 29%|██▉       | 2756/9500 [9:27:52<22:42:19, 12.12s/it]08/03/2024 07:25:22 - INFO - __main__ -   Step: 2756, LR: 1.4636518719243567e-05, Loss: 805.9967041015625
2024-08-03T14:25:35.054139812Z 
 29%|██▉       | 2757/9500 [9:28:04<23:06:09, 12.33s/it]08/03/2024 07:25:35 - INFO - __main__ -   Step: 2757, LR: 1.4634348175556287e-05, Loss: 731.1448974609375
2024-08-03T14:25:47.361466724Z 
 29%|██▉       | 2758/9500 [9:28:17<23:05:04, 12.33s/it]08/03/2024 07:25:47 - INFO - __main__ -   Step: 2758, LR: 1.4632177631869009e-05, Loss: 543.1014404296875
2024-08-03T14:25:59.258608039Z 
 29%|██▉       | 2759/9500 [9:28:29<22:50:22, 12.20s/it]08/03/2024 07:25:59 - INFO - __main__ -   Step: 2759, LR: 1.463000708818173e-05, Loss: 511.4066162109375
2024-08-03T14:26:11.863212152Z 
 29%|██▉       | 2760/9500 [9:28:41<23:03:54, 12.32s/it]08/03/2024 07:26:11 - INFO - __main__ -   Step: 2760, LR: 1.4627836544494452e-05, Loss: 635.12109375
2024-08-03T14:26:24.034930203Z 
 29%|██▉       | 2761/9500 [9:28:53<22:58:43, 12.28s/it]08/03/2024 07:26:24 - INFO - __main__ -   Step: 2761, LR: 1.4625666000807173e-05, Loss: 565.4404907226562
2024-08-03T14:26:36.333463730Z 
 29%|██▉       | 2762/9500 [9:29:06<22:59:18, 12.28s/it]08/03/2024 07:26:36 - INFO - __main__ -   Step: 2762, LR: 1.4623495457119895e-05, Loss: 717.8170166015625
2024-08-03T14:26:48.763770832Z 
 29%|██▉       | 2763/9500 [9:29:18<23:04:04, 12.33s/it]08/03/2024 07:26:48 - INFO - __main__ -   Step: 2763, LR: 1.4621324913432615e-05, Loss: 631.957275390625
2024-08-03T14:27:00.894804565Z 
 29%|██▉       | 2764/9500 [9:29:30<22:57:17, 12.27s/it]08/03/2024 07:27:00 - INFO - __main__ -   Step: 2764, LR: 1.4619154369745336e-05, Loss: 704.3082275390625
2024-08-03T14:27:13.465896272Z 
 29%|██▉       | 2765/9500 [9:29:43<23:07:16, 12.36s/it]08/03/2024 07:27:13 - INFO - __main__ -   Step: 2765, LR: 1.4616983826058056e-05, Loss: 673.5400390625
2024-08-03T14:27:26.250499070Z 
 29%|██▉       | 2766/9500 [9:29:56<23:21:23, 12.49s/it]08/03/2024 07:27:26 - INFO - __main__ -   Step: 2766, LR: 1.4614813282370776e-05, Loss: 576.8673095703125
2024-08-03T14:27:38.312650119Z 
 29%|██▉       | 2767/9500 [9:30:08<23:06:55, 12.36s/it]08/03/2024 07:27:38 - INFO - __main__ -   Step: 2767, LR: 1.4612642738683497e-05, Loss: 738.4683837890625
2024-08-03T14:27:50.400866915Z 
 29%|██▉       | 2768/9500 [9:30:20<22:57:36, 12.28s/it]08/03/2024 07:27:50 - INFO - __main__ -   Step: 2768, LR: 1.4610472194996219e-05, Loss: 713.2405395507812
2024-08-03T14:28:03.313988660Z 
 29%|██▉       | 2769/9500 [9:30:33<23:18:45, 12.47s/it]08/03/2024 07:28:03 - INFO - __main__ -   Step: 2769, LR: 1.460830165130894e-05, Loss: 807.60107421875
2024-08-03T14:28:15.606734836Z 
 29%|██▉       | 2770/9500 [9:30:45<23:12:38, 12.42s/it]08/03/2024 07:28:15 - INFO - __main__ -   Step: 2770, LR: 1.4606131107621662e-05, Loss: 653.8473510742188
2024-08-03T14:28:27.695486592Z 
 29%|██▉       | 2771/9500 [9:30:57<23:01:25, 12.32s/it]08/03/2024 07:28:27 - INFO - __main__ -   Step: 2771, LR: 1.4603960563934384e-05, Loss: 492.73291015625
2024-08-03T14:28:40.321085607Z 
 29%|██▉       | 2772/9500 [9:31:10<23:11:34, 12.41s/it]08/03/2024 07:28:40 - INFO - __main__ -   Step: 2772, LR: 1.4601790020247104e-05, Loss: 616.1279296875
2024-08-03T14:28:52.589234601Z 
 29%|██▉       | 2773/9500 [9:31:22<23:06:36, 12.37s/it]08/03/2024 07:28:52 - INFO - __main__ -   Step: 2773, LR: 1.4599619476559825e-05, Loss: 685.7914428710938
2024-08-03T14:29:04.827856990Z 
 29%|██▉       | 2774/9500 [9:31:34<23:02:01, 12.33s/it]08/03/2024 07:29:04 - INFO - __main__ -   Step: 2774, LR: 1.4597448932872547e-05, Loss: 638.8501586914062
2024-08-03T14:29:17.513208570Z 
 29%|██▉       | 2775/9500 [9:31:47<23:13:51, 12.44s/it]08/03/2024 07:29:17 - INFO - __main__ -   Step: 2775, LR: 1.4595278389185268e-05, Loss: 758.0167846679688
2024-08-03T14:29:29.954927978Z 
 29%|██▉       | 2776/9500 [9:31:59<23:13:50, 12.44s/it]08/03/2024 07:29:29 - INFO - __main__ -   Step: 2776, LR: 1.459310784549799e-05, Loss: 687.014404296875
2024-08-03T14:29:42.453740461Z 
 29%|██▉       | 2777/9500 [9:32:12<23:15:42, 12.46s/it]08/03/2024 07:29:42 - INFO - __main__ -   Step: 2777, LR: 1.459093730181071e-05, Loss: 737.9013671875
2024-08-03T14:29:55.012630267Z 
 29%|██▉       | 2778/9500 [9:32:24<23:18:55, 12.49s/it]08/03/2024 07:29:55 - INFO - __main__ -   Step: 2778, LR: 1.4588766758123431e-05, Loss: 568.6044921875
2024-08-03T14:30:07.859996089Z 
 29%|██▉       | 2779/9500 [9:32:37<23:30:51, 12.60s/it]08/03/2024 07:30:07 - INFO - __main__ -   Step: 2779, LR: 1.4586596214436151e-05, Loss: 812.9290771484375
2024-08-03T14:30:19.888872394Z 
 29%|██▉       | 2780/9500 [9:32:49<23:11:37, 12.43s/it]08/03/2024 07:30:19 - INFO - __main__ -   Step: 2780, LR: 1.4584425670748873e-05, Loss: 575.5137939453125
2024-08-03T14:30:32.211251175Z 
 29%|██▉       | 2781/9500 [9:33:02<23:07:57, 12.39s/it]08/03/2024 07:30:32 - INFO - __main__ -   Step: 2781, LR: 1.4582255127061593e-05, Loss: 553.9320068359375
2024-08-03T14:30:44.364801248Z 
 29%|██▉       | 2782/9500 [9:33:14<22:59:40, 12.32s/it]08/03/2024 07:30:44 - INFO - __main__ -   Step: 2782, LR: 1.4580084583374314e-05, Loss: 591.3458862304688
2024-08-03T14:30:56.391263367Z 
 29%|██▉       | 2783/9500 [9:33:26<22:49:31, 12.23s/it]08/03/2024 07:30:56 - INFO - __main__ -   Step: 2783, LR: 1.4577914039687036e-05, Loss: 686.8209838867188
2024-08-03T14:31:09.134795339Z 
 29%|██▉       | 2784/9500 [9:33:39<23:06:26, 12.39s/it]08/03/2024 07:31:09 - INFO - __main__ -   Step: 2784, LR: 1.4575743495999757e-05, Loss: 636.5811157226562
2024-08-03T14:31:21.467224071Z 
 29%|██▉       | 2785/9500 [9:33:51<23:04:25, 12.37s/it]08/03/2024 07:31:21 - INFO - __main__ -   Step: 2785, LR: 1.4573572952312479e-05, Loss: 715.1844482421875
2024-08-03T14:31:33.608727329Z 
 29%|██▉       | 2786/9500 [9:34:03<22:56:33, 12.30s/it]08/03/2024 07:31:33 - INFO - __main__ -   Step: 2786, LR: 1.4571402408625199e-05, Loss: 636.8353271484375
2024-08-03T14:31:46.019223721Z 
 29%|██▉       | 2787/9500 [9:34:15<22:59:59, 12.33s/it]08/03/2024 07:31:46 - INFO - __main__ -   Step: 2787, LR: 1.456923186493792e-05, Loss: 462.4500732421875
2024-08-03T14:31:58.468735470Z 
 29%|██▉       | 2788/9500 [9:34:28<23:03:39, 12.37s/it]08/03/2024 07:31:58 - INFO - __main__ -   Step: 2788, LR: 1.4567061321250642e-05, Loss: 621.638671875
2024-08-03T14:32:10.511382081Z 
 29%|██▉       | 2789/9500 [9:34:40<22:52:31, 12.27s/it]08/03/2024 07:32:10 - INFO - __main__ -   Step: 2789, LR: 1.4564890777563363e-05, Loss: 783.747802734375
2024-08-03T14:32:22.450536428Z 
 29%|██▉       | 2790/9500 [9:34:52<22:41:09, 12.17s/it]08/03/2024 07:32:22 - INFO - __main__ -   Step: 2790, LR: 1.4562720233876085e-05, Loss: 581.079833984375
2024-08-03T14:32:35.321287520Z 
 29%|██▉       | 2791/9500 [9:35:05<23:04:26, 12.38s/it]08/03/2024 07:32:35 - INFO - __main__ -   Step: 2791, LR: 1.4560549690188805e-05, Loss: 680.2633056640625
2024-08-03T14:32:47.638372657Z 
 29%|██▉       | 2792/9500 [9:35:17<23:02:04, 12.36s/it]08/03/2024 07:32:47 - INFO - __main__ -   Step: 2792, LR: 1.4558379146501526e-05, Loss: 670.18896484375
2024-08-03T14:33:00.043735652Z 
 29%|██▉       | 2793/9500 [9:35:29<23:03:19, 12.38s/it]08/03/2024 07:33:00 - INFO - __main__ -   Step: 2793, LR: 1.4556208602814246e-05, Loss: 771.7211303710938
2024-08-03T14:33:12.895905606Z 
 29%|██▉       | 2794/9500 [9:35:42<23:19:07, 12.52s/it]08/03/2024 07:33:12 - INFO - __main__ -   Step: 2794, LR: 1.4554038059126968e-05, Loss: 702.017822265625
2024-08-03T14:33:25.030376270Z 
 29%|██▉       | 2795/9500 [9:35:54<23:06:02, 12.40s/it]08/03/2024 07:33:25 - INFO - __main__ -   Step: 2795, LR: 1.4551867515439688e-05, Loss: 662.42236328125
2024-08-03T14:33:37.173693316Z 
 29%|██▉       | 2796/9500 [9:36:07<22:57:07, 12.33s/it]08/03/2024 07:33:37 - INFO - __main__ -   Step: 2796, LR: 1.4549696971752409e-05, Loss: 580.3701171875
2024-08-03T14:33:50.097004717Z 
 29%|██▉       | 2797/9500 [9:36:20<23:16:57, 12.50s/it]08/03/2024 07:33:50 - INFO - __main__ -   Step: 2797, LR: 1.454752642806513e-05, Loss: 688.0579833984375
2024-08-03T14:34:02.258188436Z 
 29%|██▉       | 2798/9500 [9:36:32<23:05:15, 12.40s/it]08/03/2024 07:34:02 - INFO - __main__ -   Step: 2798, LR: 1.4545355884377852e-05, Loss: 624.6738891601562
2024-08-03T14:34:14.366750065Z 
 29%|██▉       | 2799/9500 [9:36:44<22:55:13, 12.31s/it]08/03/2024 07:34:14 - INFO - __main__ -   Step: 2799, LR: 1.4543185340690574e-05, Loss: 710.8851928710938
2024-08-03T14:34:26.819113604Z 
 29%|██▉       | 2800/9500 [9:36:56<22:59:40, 12.36s/it]08/03/2024 07:34:26 - INFO - __main__ -   Step: 2800, LR: 1.4541014797003294e-05, Loss: 605.5315551757812
2024-08-03T14:34:38.943363675Z 
 29%|██▉       | 2801/9500 [9:37:08<22:51:43, 12.29s/it]08/03/2024 07:34:38 - INFO - __main__ -   Step: 2801, LR: 1.4538844253316015e-05, Loss: 620.0276489257812
2024-08-03T14:34:51.008179486Z 
 29%|██▉       | 2802/9500 [9:37:20<22:44:07, 12.22s/it]08/03/2024 07:34:51 - INFO - __main__ -   Step: 2802, LR: 1.4536673709628737e-05, Loss: 618.9071044921875
2024-08-03T14:35:03.499566571Z 
 30%|██▉       | 2803/9500 [9:37:33<22:53:00, 12.30s/it]08/03/2024 07:35:03 - INFO - __main__ -   Step: 2803, LR: 1.4534503165941458e-05, Loss: 595.8056640625
2024-08-03T14:35:15.719980894Z 
 30%|██▉       | 2804/9500 [9:37:45<22:50:06, 12.28s/it]08/03/2024 07:35:15 - INFO - __main__ -   Step: 2804, LR: 1.453233262225418e-05, Loss: 656.4241333007812
2024-08-03T14:35:28.046933904Z 
 30%|██▉       | 2805/9500 [9:37:57<22:51:34, 12.29s/it]08/03/2024 07:35:28 - INFO - __main__ -   Step: 2805, LR: 1.4530162078566901e-05, Loss: 465.04644775390625
2024-08-03T14:35:40.608470642Z 
 30%|██▉       | 2806/9500 [9:38:10<23:00:23, 12.37s/it]08/03/2024 07:35:40 - INFO - __main__ -   Step: 2806, LR: 1.4527991534879621e-05, Loss: 553.5035400390625
2024-08-03T14:35:52.820473318Z 
 30%|██▉       | 2807/9500 [9:38:22<22:54:48, 12.32s/it]08/03/2024 07:35:52 - INFO - __main__ -   Step: 2807, LR: 1.4525820991192341e-05, Loss: 687.2078857421875
2024-08-03T14:36:05.023998875Z 
 30%|██▉       | 2808/9500 [9:38:34<22:50:33, 12.29s/it]08/03/2024 07:36:05 - INFO - __main__ -   Step: 2808, LR: 1.4523650447505063e-05, Loss: 663.6195678710938
2024-08-03T14:36:17.637090016Z 
 30%|██▉       | 2809/9500 [9:38:47<23:01:12, 12.39s/it]08/03/2024 07:36:17 - INFO - __main__ -   Step: 2809, LR: 1.4521479903817783e-05, Loss: 700.9523315429688
2024-08-03T14:36:29.919278471Z 
 30%|██▉       | 2810/9500 [9:38:59<22:57:31, 12.35s/it]08/03/2024 07:36:29 - INFO - __main__ -   Step: 2810, LR: 1.4519309360130504e-05, Loss: 838.0087890625
2024-08-03T14:36:42.011132688Z 
 30%|██▉       | 2811/9500 [9:39:11<22:48:32, 12.28s/it]08/03/2024 07:36:42 - INFO - __main__ -   Step: 2811, LR: 1.4517138816443226e-05, Loss: 538.958740234375
2024-08-03T14:36:54.774546420Z 
 30%|██▉       | 2812/9500 [9:39:24<23:04:39, 12.42s/it]08/03/2024 07:36:54 - INFO - __main__ -   Step: 2812, LR: 1.4514968272755947e-05, Loss: 668.1597900390625
2024-08-03T14:37:06.776428552Z 
 30%|██▉       | 2813/9500 [9:39:36<22:50:23, 12.30s/it]08/03/2024 07:37:06 - INFO - __main__ -   Step: 2813, LR: 1.4512797729068669e-05, Loss: 678.6060791015625
2024-08-03T14:37:19.240663253Z 
 30%|██▉       | 2814/9500 [9:39:49<22:55:48, 12.35s/it]08/03/2024 07:37:19 - INFO - __main__ -   Step: 2814, LR: 1.451062718538139e-05, Loss: 761.7491455078125
2024-08-03T14:37:31.978469784Z 
 30%|██▉       | 2815/9500 [9:40:01<23:08:41, 12.46s/it]08/03/2024 07:37:31 - INFO - __main__ -   Step: 2815, LR: 1.450845664169411e-05, Loss: 566.5964965820312
2024-08-03T14:37:44.324691069Z 
 30%|██▉       | 2816/9500 [9:40:14<23:04:32, 12.43s/it]08/03/2024 07:37:44 - INFO - __main__ -   Step: 2816, LR: 1.4506286098006832e-05, Loss: 715.2977294921875
2024-08-03T14:37:56.525707461Z 
 30%|██▉       | 2817/9500 [9:40:26<22:56:44, 12.36s/it]08/03/2024 07:37:56 - INFO - __main__ -   Step: 2817, LR: 1.4504115554319553e-05, Loss: 618.0699462890625
2024-08-03T14:38:09.068487929Z 
 30%|██▉       | 2818/9500 [9:40:39<23:02:37, 12.42s/it]08/03/2024 07:38:09 - INFO - __main__ -   Step: 2818, LR: 1.4501945010632275e-05, Loss: 542.265625
2024-08-03T14:38:21.233875809Z 
 30%|██▉       | 2819/9500 [9:40:51<22:54:04, 12.34s/it]08/03/2024 07:38:21 - INFO - __main__ -   Step: 2819, LR: 1.4499774466944997e-05, Loss: 702.40869140625
2024-08-03T14:38:33.555139378Z 
 30%|██▉       | 2820/9500 [9:41:03<22:53:14, 12.33s/it]08/03/2024 07:38:33 - INFO - __main__ -   Step: 2820, LR: 1.4497603923257716e-05, Loss: 662.997314453125
2024-08-03T14:38:46.111673180Z 
 30%|██▉       | 2821/9500 [9:41:16<23:00:27, 12.40s/it]08/03/2024 07:38:46 - INFO - __main__ -   Step: 2821, LR: 1.4495433379570436e-05, Loss: 690.609375
2024-08-03T14:38:58.371271899Z 
 30%|██▉       | 2822/9500 [9:41:28<22:55:31, 12.36s/it]08/03/2024 07:38:58 - INFO - __main__ -   Step: 2822, LR: 1.4493262835883158e-05, Loss: 728.03173828125
2024-08-03T14:39:10.566605038Z 
 30%|██▉       | 2823/9500 [9:41:40<22:49:50, 12.31s/it]08/03/2024 07:39:10 - INFO - __main__ -   Step: 2823, LR: 1.449109229219588e-05, Loss: 521.7084350585938
2024-08-03T14:39:23.024144950Z 
 30%|██▉       | 2824/9500 [9:41:52<22:54:35, 12.35s/it]08/03/2024 07:39:23 - INFO - __main__ -   Step: 2824, LR: 1.44889217485086e-05, Loss: 558.7944946289062
2024-08-03T14:39:34.959639238Z 
 30%|██▉       | 2825/9500 [9:42:04<22:40:24, 12.23s/it]08/03/2024 07:39:34 - INFO - __main__ -   Step: 2825, LR: 1.448675120482132e-05, Loss: 527.268310546875
2024-08-03T14:39:47.068080647Z 
 30%|██▉       | 2826/9500 [9:42:17<22:36:12, 12.19s/it]08/03/2024 07:39:47 - INFO - __main__ -   Step: 2826, LR: 1.4484580661134042e-05, Loss: 633.872802734375
2024-08-03T14:39:59.641519827Z 
 30%|██▉       | 2827/9500 [9:42:29<22:48:43, 12.31s/it]08/03/2024 07:39:59 - INFO - __main__ -   Step: 2827, LR: 1.4482410117446764e-05, Loss: 579.115478515625
2024-08-03T14:40:11.946956186Z 
 30%|██▉       | 2828/9500 [9:42:41<22:48:28, 12.31s/it]08/03/2024 07:40:11 - INFO - __main__ -   Step: 2828, LR: 1.4480239573759485e-05, Loss: 493.3840637207031
2024-08-03T14:40:24.213842257Z 
 30%|██▉       | 2829/9500 [9:42:54<22:46:56, 12.29s/it]08/03/2024 07:40:24 - INFO - __main__ -   Step: 2829, LR: 1.4478069030072205e-05, Loss: 686.567626953125
2024-08-03T14:40:36.504814850Z 
 30%|██▉       | 2830/9500 [9:43:06<22:46:37, 12.29s/it]08/03/2024 07:40:36 - INFO - __main__ -   Step: 2830, LR: 1.4475898486384927e-05, Loss: 570.9586181640625
2024-08-03T14:40:48.863408683Z 
 30%|██▉       | 2831/9500 [9:43:18<22:48:35, 12.31s/it]08/03/2024 07:40:48 - INFO - __main__ -   Step: 2831, LR: 1.4473727942697648e-05, Loss: 691.2103271484375
2024-08-03T14:41:00.970600342Z 
 30%|██▉       | 2832/9500 [9:43:30<22:41:31, 12.25s/it]08/03/2024 07:41:00 - INFO - __main__ -   Step: 2832, LR: 1.447155739901037e-05, Loss: 697.57275390625
2024-08-03T14:41:13.150119489Z 
 30%|██▉       | 2833/9500 [9:43:43<22:38:55, 12.23s/it]08/03/2024 07:41:13 - INFO - __main__ -   Step: 2833, LR: 1.4469386855323092e-05, Loss: 619.5003051757812
2024-08-03T14:41:25.712951881Z 
 30%|██▉       | 2834/9500 [9:43:55<22:49:49, 12.33s/it]08/03/2024 07:41:25 - INFO - __main__ -   Step: 2834, LR: 1.4467216311635811e-05, Loss: 589.879638671875
2024-08-03T14:41:37.787368232Z 
 30%|██▉       | 2835/9500 [9:44:07<22:41:06, 12.25s/it]08/03/2024 07:41:37 - INFO - __main__ -   Step: 2835, LR: 1.4465045767948531e-05, Loss: 717.1688232421875
2024-08-03T14:41:49.900165141Z 
 30%|██▉       | 2836/9500 [9:44:19<22:36:14, 12.21s/it]08/03/2024 07:41:49 - INFO - __main__ -   Step: 2836, LR: 1.4462875224261253e-05, Loss: 486.2933349609375
2024-08-03T14:42:02.931607147Z 
 30%|██▉       | 2837/9500 [9:44:32<23:03:22, 12.46s/it]08/03/2024 07:42:02 - INFO - __main__ -   Step: 2837, LR: 1.4460704680573974e-05, Loss: 790.6400146484375
2024-08-03T14:42:15.112373917Z 
 30%|██▉       | 2838/9500 [9:44:45<22:53:57, 12.37s/it]08/03/2024 07:42:15 - INFO - __main__ -   Step: 2838, LR: 1.4458534136886694e-05, Loss: 531.7821655273438
2024-08-03T14:42:27.144007075Z 
 30%|██▉       | 2839/9500 [9:44:57<22:42:20, 12.27s/it]08/03/2024 07:42:27 - INFO - __main__ -   Step: 2839, LR: 1.4456363593199416e-05, Loss: 824.0371704101562
2024-08-03T14:42:39.708555725Z 
 30%|██▉       | 2840/9500 [9:45:09<22:51:53, 12.36s/it]08/03/2024 07:42:39 - INFO - __main__ -   Step: 2840, LR: 1.4454193049512137e-05, Loss: 734.0150146484375
2024-08-03T14:42:52.219520735Z 
 30%|██▉       | 2841/9500 [9:45:22<22:56:43, 12.40s/it]08/03/2024 07:42:52 - INFO - __main__ -   Step: 2841, LR: 1.4452022505824859e-05, Loss: 677.26806640625
2024-08-03T14:43:04.466212733Z 
 30%|██▉       | 2842/9500 [9:45:34<22:51:14, 12.36s/it]08/03/2024 07:43:04 - INFO - __main__ -   Step: 2842, LR: 1.444985196213758e-05, Loss: 648.5331420898438
2024-08-03T14:43:17.137952212Z 
 30%|██▉       | 2843/9500 [9:45:47<23:01:30, 12.45s/it]08/03/2024 07:43:17 - INFO - __main__ -   Step: 2843, LR: 1.44476814184503e-05, Loss: 575.678466796875
2024-08-03T14:43:29.257906150Z 
 30%|██▉       | 2844/9500 [9:45:59<22:50:15, 12.35s/it]08/03/2024 07:43:29 - INFO - __main__ -   Step: 2844, LR: 1.4445510874763022e-05, Loss: 449.5702819824219
2024-08-03T14:43:41.242367112Z 
 30%|██▉       | 2845/9500 [9:46:11<22:37:49, 12.24s/it]08/03/2024 07:43:41 - INFO - __main__ -   Step: 2845, LR: 1.4443340331075744e-05, Loss: 607.4090576171875
2024-08-03T14:43:53.871984399Z 
 30%|██▉       | 2846/9500 [9:46:23<22:50:31, 12.36s/it]08/03/2024 07:43:53 - INFO - __main__ -   Step: 2846, LR: 1.4441169787388465e-05, Loss: 585.32080078125
2024-08-03T14:44:05.905883533Z 
 30%|██▉       | 2847/9500 [9:46:35<22:39:31, 12.26s/it]08/03/2024 07:44:05 - INFO - __main__ -   Step: 2847, LR: 1.4438999243701187e-05, Loss: 659.392578125
2024-08-03T14:44:18.085194100Z 
 30%|██▉       | 2848/9500 [9:46:48<22:36:36, 12.24s/it]08/03/2024 07:44:18 - INFO - __main__ -   Step: 2848, LR: 1.4436828700013905e-05, Loss: 436.86126708984375
2024-08-03T14:44:30.753111241Z 
 30%|██▉       | 2849/9500 [9:47:00<22:50:45, 12.37s/it]08/03/2024 07:44:30 - INFO - __main__ -   Step: 2849, LR: 1.4434658156326626e-05, Loss: 696.9924926757812
2024-08-03T14:44:42.862000628Z 
 30%|███       | 2850/9500 [9:47:12<22:42:00, 12.29s/it]08/03/2024 07:44:42 - INFO - __main__ -   Step: 2850, LR: 1.4432487612639348e-05, Loss: 555.6827392578125
2024-08-03T14:44:55.107299451Z 
 30%|███       | 2851/9500 [9:47:25<22:40:21, 12.28s/it]08/03/2024 07:44:55 - INFO - __main__ -   Step: 2851, LR: 1.443031706895207e-05, Loss: 539.7492065429688
2024-08-03T14:45:08.138177901Z 
 30%|███       | 2852/9500 [9:47:38<23:05:14, 12.50s/it]08/03/2024 07:45:08 - INFO - __main__ -   Step: 2852, LR: 1.442814652526479e-05, Loss: 659.7904052734375
2024-08-03T14:45:20.544921295Z 
 30%|███       | 2853/9500 [9:47:50<23:01:52, 12.47s/it]08/03/2024 07:45:20 - INFO - __main__ -   Step: 2853, LR: 1.4425975981577511e-05, Loss: 647.6832885742188
2024-08-03T14:45:32.733763284Z 
 30%|███       | 2854/9500 [9:48:02<22:52:11, 12.39s/it]08/03/2024 07:45:32 - INFO - __main__ -   Step: 2854, LR: 1.4423805437890232e-05, Loss: 579.8132934570312
2024-08-03T14:45:45.312857654Z 
 30%|███       | 2855/9500 [9:48:15<22:58:20, 12.45s/it]08/03/2024 07:45:45 - INFO - __main__ -   Step: 2855, LR: 1.4421634894202954e-05, Loss: 588.515380859375
2024-08-03T14:45:57.898392941Z 
 30%|███       | 2856/9500 [9:48:27<23:02:46, 12.49s/it]08/03/2024 07:45:57 - INFO - __main__ -   Step: 2856, LR: 1.4419464350515676e-05, Loss: 604.5407104492188
2024-08-03T14:46:10.057037942Z 
 30%|███       | 2857/9500 [9:48:39<22:51:39, 12.39s/it]08/03/2024 07:46:10 - INFO - __main__ -   Step: 2857, LR: 1.4417293806828395e-05, Loss: 570.1239013671875
2024-08-03T14:46:22.450947675Z 
 30%|███       | 2858/9500 [9:48:52<22:51:36, 12.39s/it]08/03/2024 07:46:22 - INFO - __main__ -   Step: 2858, LR: 1.4415123263141117e-05, Loss: 554.3560791015625
2024-08-03T14:46:34.786558640Z 
 30%|███       | 2859/9500 [9:49:04<22:49:35, 12.37s/it]08/03/2024 07:46:34 - INFO - __main__ -   Step: 2859, LR: 1.4412952719453839e-05, Loss: 516.917724609375
2024-08-03T14:46:46.858100564Z 
 30%|███       | 2860/9500 [9:49:16<22:39:20, 12.28s/it]08/03/2024 07:46:46 - INFO - __main__ -   Step: 2860, LR: 1.441078217576656e-05, Loss: 596.7635498046875
2024-08-03T14:46:59.710791986Z 
 30%|███       | 2861/9500 [9:49:29<22:58:02, 12.45s/it]08/03/2024 07:46:59 - INFO - __main__ -   Step: 2861, LR: 1.4408611632079282e-05, Loss: 688.5577392578125
2024-08-03T14:47:11.701615654Z 
 30%|███       | 2862/9500 [9:49:41<22:42:27, 12.32s/it]08/03/2024 07:47:11 - INFO - __main__ -   Step: 2862, LR: 1.4406441088392e-05, Loss: 622.7810668945312
2024-08-03T14:47:23.859849640Z 
 30%|███       | 2863/9500 [9:49:53<22:37:03, 12.27s/it]08/03/2024 07:47:23 - INFO - __main__ -   Step: 2863, LR: 1.4404270544704721e-05, Loss: 583.2826538085938
2024-08-03T14:47:36.449202647Z 
 30%|███       | 2864/9500 [9:50:06<22:47:30, 12.36s/it]08/03/2024 07:47:36 - INFO - __main__ -   Step: 2864, LR: 1.4402100001017443e-05, Loss: 841.75244140625
2024-08-03T14:47:48.632750102Z 
 30%|███       | 2865/9500 [9:50:18<22:41:18, 12.31s/it]08/03/2024 07:47:48 - INFO - __main__ -   Step: 2865, LR: 1.4399929457330165e-05, Loss: 696.5447998046875
2024-08-03T14:48:00.559750178Z 
 30%|███       | 2866/9500 [9:50:30<22:28:22, 12.20s/it]08/03/2024 07:48:00 - INFO - __main__ -   Step: 2866, LR: 1.4397758913642884e-05, Loss: 614.0057373046875
2024-08-03T14:48:12.866376960Z 
 30%|███       | 2867/9500 [9:50:42<22:31:52, 12.23s/it]08/03/2024 07:48:12 - INFO - __main__ -   Step: 2867, LR: 1.4395588369955606e-05, Loss: 725.0692749023438
2024-08-03T14:48:24.945846356Z 
 30%|███       | 2868/9500 [9:50:54<22:26:43, 12.18s/it]08/03/2024 07:48:24 - INFO - __main__ -   Step: 2868, LR: 1.4393417826268328e-05, Loss: 599.239990234375
2024-08-03T14:48:37.124175332Z 
 30%|███       | 2869/9500 [9:51:07<22:26:20, 12.18s/it]08/03/2024 07:48:37 - INFO - __main__ -   Step: 2869, LR: 1.4391247282581049e-05, Loss: 602.0238037109375
2024-08-03T14:48:49.864095625Z 
 30%|███       | 2870/9500 [9:51:19<22:44:37, 12.35s/it]08/03/2024 07:48:49 - INFO - __main__ -   Step: 2870, LR: 1.438907673889377e-05, Loss: 650.9601440429688
2024-08-03T14:49:02.155115538Z 
 30%|███       | 2871/9500 [9:51:32<22:42:28, 12.33s/it]08/03/2024 07:49:02 - INFO - __main__ -   Step: 2871, LR: 1.4386906195206492e-05, Loss: 705.9156494140625
2024-08-03T14:49:14.309168438Z 
 30%|███       | 2872/9500 [9:51:44<22:36:22, 12.28s/it]08/03/2024 07:49:14 - INFO - __main__ -   Step: 2872, LR: 1.4384735651519212e-05, Loss: 557.923583984375
2024-08-03T14:49:26.711768778Z 
 30%|███       | 2873/9500 [9:51:56<22:40:16, 12.32s/it]08/03/2024 07:49:26 - INFO - __main__ -   Step: 2873, LR: 1.4382565107831934e-05, Loss: 449.1035461425781
2024-08-03T14:49:38.810460226Z 
 30%|███       | 2874/9500 [9:52:08<22:32:52, 12.25s/it]08/03/2024 07:49:38 - INFO - __main__ -   Step: 2874, LR: 1.4380394564144655e-05, Loss: 698.3206176757812
2024-08-03T14:49:51.042843224Z 
 30%|███       | 2875/9500 [9:52:20<22:32:04, 12.25s/it]08/03/2024 07:49:51 - INFO - __main__ -   Step: 2875, LR: 1.4378224020457377e-05, Loss: 520.6871337890625
2024-08-03T14:50:03.330931428Z 
 30%|███       | 2876/9500 [9:52:33<22:33:17, 12.26s/it]08/03/2024 07:50:03 - INFO - __main__ -   Step: 2876, LR: 1.4376053476770095e-05, Loss: 566.0691528320312
2024-08-03T14:50:15.961756177Z 
 30%|███       | 2877/9500 [9:52:45<22:45:25, 12.37s/it]08/03/2024 07:50:15 - INFO - __main__ -   Step: 2877, LR: 1.4373882933082816e-05, Loss: 709.7591552734375
2024-08-03T14:50:28.175680437Z 
 30%|███       | 2878/9500 [9:52:58<22:40:03, 12.32s/it]08/03/2024 07:50:28 - INFO - __main__ -   Step: 2878, LR: 1.4371712389395538e-05, Loss: 615.56396484375
2024-08-03T14:50:40.438914538Z 
 30%|███       | 2879/9500 [9:53:10<22:37:52, 12.31s/it]08/03/2024 07:50:40 - INFO - __main__ -   Step: 2879, LR: 1.436954184570826e-05, Loss: 680.5369873046875
2024-08-03T14:50:53.228928362Z 
 30%|███       | 2880/9500 [9:53:23<22:53:40, 12.45s/it]08/03/2024 07:50:53 - INFO - __main__ -   Step: 2880, LR: 1.4367371302020981e-05, Loss: 515.5443115234375
2024-08-03T14:51:05.685390960Z 
 30%|███       | 2881/9500 [9:53:35<22:53:43, 12.45s/it]08/03/2024 07:51:05 - INFO - __main__ -   Step: 2881, LR: 1.4365200758333701e-05, Loss: 689.1902465820312
2024-08-03T14:51:17.825000411Z 
 30%|███       | 2882/9500 [9:53:47<22:43:09, 12.36s/it]08/03/2024 07:51:17 - INFO - __main__ -   Step: 2882, LR: 1.4363030214646423e-05, Loss: 749.273681640625
2024-08-03T14:51:30.415268639Z 
 30%|███       | 2883/9500 [9:54:00<22:50:37, 12.43s/it]08/03/2024 07:51:30 - INFO - __main__ -   Step: 2883, LR: 1.4360859670959144e-05, Loss: 595.775390625
2024-08-03T14:51:42.269798448Z 
 30%|███       | 2884/9500 [9:54:12<22:31:25, 12.26s/it]08/03/2024 07:51:42 - INFO - __main__ -   Step: 2884, LR: 1.4358689127271866e-05, Loss: 499.6141357421875
2024-08-03T14:51:54.604605313Z 
 30%|███       | 2885/9500 [9:54:24<22:33:50, 12.28s/it]08/03/2024 07:51:54 - INFO - __main__ -   Step: 2885, LR: 1.4356518583584587e-05, Loss: 629.4248046875
2024-08-03T14:52:07.107218951Z 
 30%|███       | 2886/9500 [9:54:37<22:41:00, 12.35s/it]08/03/2024 07:52:07 - INFO - __main__ -   Step: 2886, LR: 1.4354348039897307e-05, Loss: 550.0709838867188
2024-08-03T14:52:19.585703356Z 
 30%|███       | 2887/9500 [9:54:49<22:45:09, 12.39s/it]08/03/2024 07:52:19 - INFO - __main__ -   Step: 2887, LR: 1.4352177496210029e-05, Loss: 634.3325805664062
2024-08-03T14:52:31.640335623Z 
 30%|███       | 2888/9500 [9:55:01<22:33:59, 12.29s/it]08/03/2024 07:52:31 - INFO - __main__ -   Step: 2888, LR: 1.435000695252275e-05, Loss: 576.1549072265625
2024-08-03T14:52:44.118838829Z 
 30%|███       | 2889/9500 [9:55:14<22:40:07, 12.34s/it]08/03/2024 07:52:44 - INFO - __main__ -   Step: 2889, LR: 1.4347836408835472e-05, Loss: 726.7506103515625
2024-08-03T14:52:56.268348754Z 
 30%|███       | 2890/9500 [9:55:26<22:33:28, 12.29s/it]08/03/2024 07:52:56 - INFO - __main__ -   Step: 2890, LR: 1.434566586514819e-05, Loss: 544.4942016601562
2024-08-03T14:53:08.542146823Z 
 30%|███       | 2891/9500 [9:55:38<22:32:53, 12.28s/it]08/03/2024 07:53:08 - INFO - __main__ -   Step: 2891, LR: 1.4343495321460912e-05, Loss: 773.1573486328125
2024-08-03T14:53:20.941279896Z 
 30%|███       | 2892/9500 [9:55:50<22:36:33, 12.32s/it]08/03/2024 07:53:20 - INFO - __main__ -   Step: 2892, LR: 1.4341324777773633e-05, Loss: 549.37890625
2024-08-03T14:53:33.000615926Z 
 30%|███       | 2893/9500 [9:56:02<22:27:48, 12.24s/it]08/03/2024 07:53:33 - INFO - __main__ -   Step: 2893, LR: 1.4339154234086355e-05, Loss: 457.7889099121094
2024-08-03T14:53:45.166685688Z 
 30%|███       | 2894/9500 [9:56:15<22:25:10, 12.22s/it]08/03/2024 07:53:45 - INFO - __main__ -   Step: 2894, LR: 1.4336983690399076e-05, Loss: 672.0626220703125
2024-08-03T14:53:57.557217366Z 
 30%|███       | 2895/9500 [9:56:27<22:30:40, 12.27s/it]08/03/2024 07:53:57 - INFO - __main__ -   Step: 2895, LR: 1.4334813146711796e-05, Loss: 471.5204772949219
2024-08-03T14:54:09.818784323Z 
 30%|███       | 2896/9500 [9:56:39<22:30:12, 12.27s/it]08/03/2024 07:54:09 - INFO - __main__ -   Step: 2896, LR: 1.4332642603024518e-05, Loss: 534.1367797851562
2024-08-03T14:54:21.937382855Z 
 30%|███       | 2897/9500 [9:56:51<22:25:06, 12.22s/it]08/03/2024 07:54:21 - INFO - __main__ -   Step: 2897, LR: 1.433047205933724e-05, Loss: 575.8687744140625
2024-08-03T14:54:34.573982345Z 
 31%|███       | 2898/9500 [9:57:04<22:38:33, 12.35s/it]08/03/2024 07:54:34 - INFO - __main__ -   Step: 2898, LR: 1.432830151564996e-05, Loss: 611.0778198242188
2024-08-03T14:54:46.533471088Z 
 31%|███       | 2899/9500 [9:57:16<22:25:34, 12.23s/it]08/03/2024 07:54:46 - INFO - __main__ -   Step: 2899, LR: 1.4326130971962682e-05, Loss: 501.5700378417969
2024-08-03T14:54:59.173728933Z 
 31%|███       | 2900/9500 [9:57:29<22:38:52, 12.35s/it]08/03/2024 07:54:59 - INFO - __main__ -   Step: 2900, LR: 1.4323960428275402e-05, Loss: 547.9014892578125
2024-08-03T14:55:11.454705466Z 
 31%|███       | 2901/9500 [9:57:41<22:36:17, 12.33s/it]08/03/2024 07:55:11 - INFO - __main__ -   Step: 2901, LR: 1.4321789884588124e-05, Loss: 633.0911865234375
2024-08-03T14:55:24.023049534Z 
 31%|███       | 2902/9500 [9:57:53<22:43:52, 12.40s/it]08/03/2024 07:55:24 - INFO - __main__ -   Step: 2902, LR: 1.4319619340900845e-05, Loss: 658.0744018554688
2024-08-03T14:55:36.341098109Z 
 31%|███       | 2903/9500 [9:58:06<22:40:52, 12.38s/it]08/03/2024 07:55:36 - INFO - __main__ -   Step: 2903, LR: 1.4317448797213567e-05, Loss: 613.9542236328125
2024-08-03T14:55:48.855795407Z 
 31%|███       | 2904/9500 [9:58:18<22:45:12, 12.42s/it]08/03/2024 07:55:48 - INFO - __main__ -   Step: 2904, LR: 1.4315278253526285e-05, Loss: 486.39544677734375
2024-08-03T14:56:01.139446899Z 
 31%|███       | 2905/9500 [9:58:31<22:40:33, 12.38s/it]08/03/2024 07:56:01 - INFO - __main__ -   Step: 2905, LR: 1.4313107709839007e-05, Loss: 579.9006958007812
2024-08-03T14:56:13.442151395Z 
 31%|███       | 2906/9500 [9:58:43<22:37:51, 12.36s/it]08/03/2024 07:56:13 - INFO - __main__ -   Step: 2906, LR: 1.4310937166151728e-05, Loss: 718.5164794921875
2024-08-03T14:56:26.110796251Z 
 31%|███       | 2907/9500 [9:58:56<22:47:59, 12.45s/it]08/03/2024 07:56:26 - INFO - __main__ -   Step: 2907, LR: 1.430876662246445e-05, Loss: 711.0358276367188
2024-08-03T14:56:38.422281298Z 
 31%|███       | 2908/9500 [9:59:08<22:43:13, 12.41s/it]08/03/2024 07:56:38 - INFO - __main__ -   Step: 2908, LR: 1.4306596078777171e-05, Loss: 557.05712890625
2024-08-03T14:56:50.456919120Z 
 31%|███       | 2909/9500 [9:59:20<22:30:43, 12.30s/it]08/03/2024 07:56:50 - INFO - __main__ -   Step: 2909, LR: 1.4304425535089891e-05, Loss: 494.38507080078125
2024-08-03T14:57:03.158217301Z 
 31%|███       | 2910/9500 [9:59:33<22:43:51, 12.42s/it]08/03/2024 07:57:03 - INFO - __main__ -   Step: 2910, LR: 1.4302254991402613e-05, Loss: 800.6156616210938
2024-08-03T14:57:15.178828092Z 
 31%|███       | 2911/9500 [9:59:45<22:30:31, 12.30s/it]08/03/2024 07:57:15 - INFO - __main__ -   Step: 2911, LR: 1.4300084447715334e-05, Loss: 450.59637451171875
2024-08-03T14:57:27.077031501Z 
 31%|███       | 2912/9500 [9:59:57<22:17:12, 12.18s/it]08/03/2024 07:57:27 - INFO - __main__ -   Step: 2912, LR: 1.4297913904028056e-05, Loss: 506.11871337890625
2024-08-03T14:57:39.680919571Z 
 31%|███       | 2913/9500 [10:00:09<22:31:00, 12.31s/it]08/03/2024 07:57:39 - INFO - __main__ -   Step: 2913, LR: 1.4295743360340777e-05, Loss: 678.4244384765625
2024-08-03T14:57:51.715421602Z 
 31%|███       | 2914/9500 [10:00:21<22:21:52, 12.22s/it]08/03/2024 07:57:51 - INFO - __main__ -   Step: 2914, LR: 1.4293572816653499e-05, Loss: 411.23883056640625
2024-08-03T14:58:03.825152701Z 
 31%|███       | 2915/9500 [10:00:33<22:17:52, 12.19s/it]08/03/2024 07:58:03 - INFO - __main__ -   Step: 2915, LR: 1.4291402272966219e-05, Loss: 639.7144775390625
2024-08-03T14:58:16.803325857Z 
 31%|███       | 2916/9500 [10:00:46<22:43:36, 12.43s/it]08/03/2024 07:58:16 - INFO - __main__ -   Step: 2916, LR: 1.428923172927894e-05, Loss: 762.7615966796875
2024-08-03T14:58:28.896860867Z 
 31%|███       | 2917/9500 [10:00:58<22:32:26, 12.33s/it]08/03/2024 07:58:28 - INFO - __main__ -   Step: 2917, LR: 1.4287061185591662e-05, Loss: 688.2051391601562
2024-08-03T14:58:41.006428892Z 
 31%|███       | 2918/9500 [10:01:10<22:25:05, 12.26s/it]08/03/2024 07:58:41 - INFO - __main__ -   Step: 2918, LR: 1.428489064190438e-05, Loss: 538.2293090820312
2024-08-03T14:58:53.163426319Z 
 31%|███       | 2919/9500 [10:01:23<22:21:26, 12.23s/it]08/03/2024 07:58:53 - INFO - __main__ -   Step: 2919, LR: 1.4282720098217102e-05, Loss: 597.1563720703125
2024-08-03T14:59:06.044753777Z 
 31%|███       | 2920/9500 [10:01:35<22:42:39, 12.43s/it]08/03/2024 07:59:06 - INFO - __main__ -   Step: 2920, LR: 1.4280549554529823e-05, Loss: 521.5362548828125
2024-08-03T14:59:18.003314108Z 
 31%|███       | 2921/9500 [10:01:47<22:27:06, 12.29s/it]08/03/2024 07:59:18 - INFO - __main__ -   Step: 2921, LR: 1.4278379010842545e-05, Loss: 524.5721435546875
2024-08-03T14:59:30.287853178Z 
 31%|███       | 2922/9500 [10:02:00<22:26:52, 12.29s/it]08/03/2024 07:59:30 - INFO - __main__ -   Step: 2922, LR: 1.4276208467155266e-05, Loss: 548.4315795898438
2024-08-03T14:59:43.014200400Z 
 31%|███       | 2923/9500 [10:02:12<22:41:10, 12.42s/it]08/03/2024 07:59:43 - INFO - __main__ -   Step: 2923, LR: 1.4274037923467988e-05, Loss: 672.1621704101562
2024-08-03T14:59:55.401589349Z 
 31%|███       | 2924/9500 [10:02:25<22:39:58, 12.41s/it]08/03/2024 07:59:55 - INFO - __main__ -   Step: 2924, LR: 1.4271867379780708e-05, Loss: 624.564697265625
2024-08-03T15:00:07.459651249Z 
 31%|███       | 2925/9500 [10:02:37<22:28:14, 12.30s/it]08/03/2024 08:00:07 - INFO - __main__ -   Step: 2925, LR: 1.426969683609343e-05, Loss: 550.751953125
2024-08-03T15:00:20.374636420Z 
 31%|███       | 2926/9500 [10:02:50<22:48:08, 12.49s/it]08/03/2024 08:00:20 - INFO - __main__ -   Step: 2926, LR: 1.4267526292406151e-05, Loss: 625.7031860351562
2024-08-03T15:00:32.741968521Z 
 31%|███       | 2927/9500 [10:03:02<22:44:00, 12.45s/it]08/03/2024 08:00:32 - INFO - __main__ -   Step: 2927, LR: 1.4265355748718872e-05, Loss: 661.5635986328125
2024-08-03T15:00:44.790079221Z 
 31%|███       | 2928/9500 [10:03:14<22:30:33, 12.33s/it]08/03/2024 08:00:44 - INFO - __main__ -   Step: 2928, LR: 1.4263185205031594e-05, Loss: 625.6474609375
2024-08-03T15:00:57.470346086Z 
 31%|███       | 2929/9500 [10:03:27<22:41:51, 12.44s/it]08/03/2024 08:00:57 - INFO - __main__ -   Step: 2929, LR: 1.4261014661344314e-05, Loss: 747.08642578125
2024-08-03T15:01:09.717216903Z 
 31%|███       | 2930/9500 [10:03:39<22:35:28, 12.38s/it]08/03/2024 08:01:09 - INFO - __main__ -   Step: 2930, LR: 1.4258844117657035e-05, Loss: 553.4083251953125
2024-08-03T15:01:21.720899171Z 
 31%|███       | 2931/9500 [10:03:51<22:22:55, 12.27s/it]08/03/2024 08:01:21 - INFO - __main__ -   Step: 2931, LR: 1.4256673573969757e-05, Loss: 631.723876953125
2024-08-03T15:01:34.269098403Z 
 31%|███       | 2932/9500 [10:04:04<22:31:59, 12.35s/it]08/03/2024 08:01:34 - INFO - __main__ -   Step: 2932, LR: 1.4254503030282477e-05, Loss: 735.899658203125
2024-08-03T15:01:46.372226177Z 
 31%|███       | 2933/9500 [10:04:16<22:23:40, 12.28s/it]08/03/2024 08:01:46 - INFO - __main__ -   Step: 2933, LR: 1.4252332486595197e-05, Loss: 483.495361328125
2024-08-03T15:01:58.504607031Z 
 31%|███       | 2934/9500 [10:04:28<22:18:44, 12.23s/it]08/03/2024 08:01:58 - INFO - __main__ -   Step: 2934, LR: 1.4250161942907918e-05, Loss: 559.2166748046875
2024-08-03T15:02:10.982299946Z 
 31%|███       | 2935/9500 [10:04:40<22:26:32, 12.31s/it]08/03/2024 08:02:10 - INFO - __main__ -   Step: 2935, LR: 1.424799139922064e-05, Loss: 506.7215576171875
2024-08-03T15:02:23.228993319Z 
 31%|███       | 2936/9500 [10:04:53<22:24:22, 12.29s/it]08/03/2024 08:02:23 - INFO - __main__ -   Step: 2936, LR: 1.4245820855533361e-05, Loss: 665.122802734375
2024-08-03T15:02:35.179119987Z 
 31%|███       | 2937/9500 [10:05:05<22:13:03, 12.19s/it]08/03/2024 08:02:35 - INFO - __main__ -   Step: 2937, LR: 1.4243650311846083e-05, Loss: 599.696044921875
2024-08-03T15:02:47.735003229Z 
 31%|███       | 2938/9500 [10:05:17<22:24:57, 12.30s/it]08/03/2024 08:02:47 - INFO - __main__ -   Step: 2938, LR: 1.4241479768158803e-05, Loss: 669.724365234375
2024-08-03T15:03:00.107338154Z 
 31%|███       | 2939/9500 [10:05:30<22:27:12, 12.32s/it]08/03/2024 08:03:00 - INFO - __main__ -   Step: 2939, LR: 1.4239309224471524e-05, Loss: 613.3402099609375
2024-08-03T15:03:12.095963271Z 
 31%|███       | 2940/9500 [10:05:42<22:16:07, 12.22s/it]08/03/2024 08:03:12 - INFO - __main__ -   Step: 2940, LR: 1.4237138680784246e-05, Loss: 568.0454711914062
2024-08-03T15:03:24.808632202Z 
 31%|███       | 2941/9500 [10:05:54<22:32:03, 12.37s/it]08/03/2024 08:03:24 - INFO - __main__ -   Step: 2941, LR: 1.4234968137096967e-05, Loss: 672.3466796875
2024-08-03T15:03:37.196238032Z 
 31%|███       | 2942/9500 [10:06:07<22:32:28, 12.37s/it]08/03/2024 08:03:37 - INFO - __main__ -   Step: 2942, LR: 1.4232797593409689e-05, Loss: 654.2827758789062
2024-08-03T15:03:49.580423006Z 
 31%|███       | 2943/9500 [10:06:19<22:32:36, 12.38s/it]08/03/2024 08:03:49 - INFO - __main__ -   Step: 2943, LR: 1.4230627049722409e-05, Loss: 668.1583251953125
2024-08-03T15:04:02.330794374Z 
 31%|███       | 2944/9500 [10:06:32<22:44:38, 12.49s/it]08/03/2024 08:04:02 - INFO - __main__ -   Step: 2944, LR: 1.422845650603513e-05, Loss: 759.08935546875
2024-08-03T15:04:14.518548234Z 
 31%|███       | 2945/9500 [10:06:44<22:34:33, 12.40s/it]08/03/2024 08:04:14 - INFO - __main__ -   Step: 2945, LR: 1.4226285962347852e-05, Loss: 611.5042724609375
2024-08-03T15:04:27.237372152Z 
 31%|███       | 2946/9500 [10:06:57<22:44:50, 12.49s/it]08/03/2024 08:04:27 - INFO - __main__ -   Step: 2946, LR: 1.4224115418660572e-05, Loss: 806.80615234375
2024-08-03T15:04:39.954878922Z 
 31%|███       | 2947/9500 [10:07:09<22:51:55, 12.56s/it]08/03/2024 08:04:39 - INFO - __main__ -   Step: 2947, LR: 1.4221944874973292e-05, Loss: 456.7557373046875
2024-08-03T15:04:52.459044968Z 
 31%|███       | 2948/9500 [10:07:22<22:49:50, 12.54s/it]08/03/2024 08:04:52 - INFO - __main__ -   Step: 2948, LR: 1.4219774331286013e-05, Loss: 773.0511474609375
2024-08-03T15:05:04.808285787Z 
 31%|███       | 2949/9500 [10:07:34<22:43:14, 12.49s/it]08/03/2024 08:05:04 - INFO - __main__ -   Step: 2949, LR: 1.4217603787598735e-05, Loss: 768.6687622070312
2024-08-03T15:05:17.412211437Z 
 31%|███       | 2950/9500 [10:07:47<22:46:54, 12.52s/it]08/03/2024 08:05:17 - INFO - __main__ -   Step: 2950, LR: 1.4215433243911456e-05, Loss: 678.422119140625
2024-08-03T15:05:30.155245826Z 
 31%|███       | 2951/9500 [10:08:00<22:53:57, 12.59s/it]08/03/2024 08:05:30 - INFO - __main__ -   Step: 2951, LR: 1.4213262700224178e-05, Loss: 574.6589965820312
2024-08-03T15:05:42.389280722Z 
 31%|███       | 2952/9500 [10:08:12<22:42:10, 12.48s/it]08/03/2024 08:05:42 - INFO - __main__ -   Step: 2952, LR: 1.4211092156536898e-05, Loss: 698.4610595703125
2024-08-03T15:05:54.705423090Z 
 31%|███       | 2953/9500 [10:08:24<22:36:32, 12.43s/it]08/03/2024 08:05:54 - INFO - __main__ -   Step: 2953, LR: 1.420892161284962e-05, Loss: 476.50518798828125
2024-08-03T15:06:06.958337289Z 
 31%|███       | 2954/9500 [10:08:36<22:30:28, 12.38s/it]08/03/2024 08:06:06 - INFO - __main__ -   Step: 2954, LR: 1.4206751069162341e-05, Loss: 548.6952514648438
2024-08-03T15:06:19.413441229Z 
 31%|███       | 2955/9500 [10:08:49<22:32:46, 12.40s/it]08/03/2024 08:06:19 - INFO - __main__ -   Step: 2955, LR: 1.4204580525475063e-05, Loss: 707.395263671875
2024-08-03T15:06:32.092693511Z 
 31%|███       | 2956/9500 [10:09:02<22:41:39, 12.48s/it]08/03/2024 08:06:32 - INFO - __main__ -   Step: 2956, LR: 1.4202409981787784e-05, Loss: 632.1720581054688
2024-08-03T15:06:44.074891352Z 
 31%|███       | 2957/9500 [10:09:14<22:25:01, 12.33s/it]08/03/2024 08:06:44 - INFO - __main__ -   Step: 2957, LR: 1.4200239438100506e-05, Loss: 547.7696533203125
2024-08-03T15:06:56.142464140Z 
 31%|███       | 2958/9500 [10:09:26<22:16:05, 12.25s/it]08/03/2024 08:06:56 - INFO - __main__ -   Step: 2958, LR: 1.4198068894413226e-05, Loss: 488.42510986328125
2024-08-03T15:07:08.757994141Z 
 31%|███       | 2959/9500 [10:09:38<22:27:43, 12.36s/it]08/03/2024 08:07:08 - INFO - __main__ -   Step: 2959, LR: 1.4195898350725947e-05, Loss: 558.6104125976562
2024-08-03T15:07:20.988038497Z 
 31%|███       | 2960/9500 [10:09:50<22:23:10, 12.32s/it]08/03/2024 08:07:20 - INFO - __main__ -   Step: 2960, LR: 1.4193727807038667e-05, Loss: 556.0345458984375
2024-08-03T15:07:33.316124606Z 
 31%|███       | 2961/9500 [10:10:03<22:23:06, 12.32s/it]08/03/2024 08:07:33 - INFO - __main__ -   Step: 2961, LR: 1.4191557263351387e-05, Loss: 486.494873046875
2024-08-03T15:07:45.362200927Z 
 31%|███       | 2962/9500 [10:10:15<22:13:51, 12.24s/it]08/03/2024 08:07:45 - INFO - __main__ -   Step: 2962, LR: 1.4189386719664108e-05, Loss: 619.4220581054688
2024-08-03T15:07:58.179787753Z 
 31%|███       | 2963/9500 [10:10:28<22:32:29, 12.41s/it]08/03/2024 08:07:58 - INFO - __main__ -   Step: 2963, LR: 1.418721617597683e-05, Loss: 709.12109375
2024-08-03T15:08:10.251726482Z 
 31%|███       | 2964/9500 [10:10:40<22:21:07, 12.31s/it]08/03/2024 08:08:10 - INFO - __main__ -   Step: 2964, LR: 1.4185045632289552e-05, Loss: 606.1968994140625
2024-08-03T15:08:22.307156440Z 
 31%|███       | 2965/9500 [10:10:52<22:12:32, 12.23s/it]08/03/2024 08:08:22 - INFO - __main__ -   Step: 2965, LR: 1.4182875088602273e-05, Loss: 637.518310546875
2024-08-03T15:08:34.826921291Z 
 31%|███       | 2966/9500 [10:11:04<22:21:39, 12.32s/it]08/03/2024 08:08:34 - INFO - __main__ -   Step: 2966, LR: 1.4180704544914995e-05, Loss: 529.2295532226562
2024-08-03T15:08:47.267499020Z 
 31%|███       | 2967/9500 [10:11:17<22:25:23, 12.36s/it]08/03/2024 08:08:47 - INFO - __main__ -   Step: 2967, LR: 1.4178534001227714e-05, Loss: 600.060546875
2024-08-03T15:08:59.555953960Z 
 31%|███       | 2968/9500 [10:11:29<22:22:58, 12.34s/it]08/03/2024 08:08:59 - INFO - __main__ -   Step: 2968, LR: 1.4176363457540436e-05, Loss: 708.38232421875
2024-08-03T15:09:12.113257537Z 
 31%|███▏      | 2969/9500 [10:11:42<22:29:59, 12.40s/it]08/03/2024 08:09:12 - INFO - __main__ -   Step: 2969, LR: 1.4174192913853158e-05, Loss: 600.0424194335938
2024-08-03T15:09:24.212640145Z 
 31%|███▏      | 2970/9500 [10:11:54<22:19:54, 12.31s/it]08/03/2024 08:09:24 - INFO - __main__ -   Step: 2970, LR: 1.417202237016588e-05, Loss: 601.8948974609375
2024-08-03T15:09:36.440693617Z 
 31%|███▏      | 2971/9500 [10:12:06<22:16:57, 12.29s/it]08/03/2024 08:09:36 - INFO - __main__ -   Step: 2971, LR: 1.41698518264786e-05, Loss: 581.6071166992188
2024-08-03T15:09:48.789793938Z 
 31%|███▏      | 2972/9500 [10:12:18<22:18:48, 12.31s/it]08/03/2024 08:09:48 - INFO - __main__ -   Step: 2972, LR: 1.416768128279132e-05, Loss: 511.01751708984375
2024-08-03T15:10:01.096029249Z 
 31%|███▏      | 2973/9500 [10:12:31<22:18:38, 12.31s/it]08/03/2024 08:10:01 - INFO - __main__ -   Step: 2973, LR: 1.4165510739104042e-05, Loss: 585.0526123046875
2024-08-03T15:10:13.246832026Z 
 31%|███▏      | 2974/9500 [10:12:43<22:13:22, 12.26s/it]08/03/2024 08:10:13 - INFO - __main__ -   Step: 2974, LR: 1.4163340195416762e-05, Loss: 626.7348022460938
2024-08-03T15:10:25.748537302Z 
 31%|███▏      | 2975/9500 [10:12:55<22:21:05, 12.33s/it]08/03/2024 08:10:25 - INFO - __main__ -   Step: 2975, LR: 1.4161169651729484e-05, Loss: 510.0563049316406
2024-08-03T15:10:38.328060970Z 
 31%|███▏      | 2976/9500 [10:13:08<22:28:57, 12.41s/it]08/03/2024 08:10:38 - INFO - __main__ -   Step: 2976, LR: 1.4158999108042203e-05, Loss: 657.236083984375
2024-08-03T15:10:50.243814148Z 
 31%|███▏      | 2977/9500 [10:13:20<22:12:45, 12.26s/it]08/03/2024 08:10:50 - INFO - __main__ -   Step: 2977, LR: 1.4156828564354925e-05, Loss: 651.7435302734375
2024-08-03T15:11:02.928770148Z 
 31%|███▏      | 2978/9500 [10:13:32<22:26:26, 12.39s/it]08/03/2024 08:11:02 - INFO - __main__ -   Step: 2978, LR: 1.4154658020667647e-05, Loss: 665.9051513671875
2024-08-03T15:11:15.398379725Z 
 31%|███▏      | 2979/9500 [10:13:45<22:28:56, 12.41s/it]08/03/2024 08:11:15 - INFO - __main__ -   Step: 2979, LR: 1.4152487476980368e-05, Loss: 688.3878784179688
2024-08-03T15:11:27.487309752Z 
 31%|███▏      | 2980/9500 [10:13:57<22:18:12, 12.31s/it]08/03/2024 08:11:27 - INFO - __main__ -   Step: 2980, LR: 1.415031693329309e-05, Loss: 578.330810546875
2024-08-03T15:11:40.008781876Z 
 31%|███▏      | 2981/9500 [10:14:09<22:24:44, 12.38s/it]08/03/2024 08:11:40 - INFO - __main__ -   Step: 2981, LR: 1.414814638960581e-05, Loss: 571.1279296875
2024-08-03T15:11:52.246206653Z 
 31%|███▏      | 2982/9500 [10:14:22<22:20:00, 12.34s/it]08/03/2024 08:11:52 - INFO - __main__ -   Step: 2982, LR: 1.4145975845918531e-05, Loss: 607.2042236328125
2024-08-03T15:12:04.281699586Z 
 31%|███▏      | 2983/9500 [10:14:34<22:10:01, 12.25s/it]08/03/2024 08:12:04 - INFO - __main__ -   Step: 2983, LR: 1.4143805302231253e-05, Loss: 570.945556640625
2024-08-03T15:12:16.952987020Z 
 31%|███▏      | 2984/9500 [10:14:46<22:23:42, 12.37s/it]08/03/2024 08:12:16 - INFO - __main__ -   Step: 2984, LR: 1.4141634758543974e-05, Loss: 602.3297119140625
2024-08-03T15:12:29.603125452Z 
 31%|███▏      | 2985/9500 [10:14:59<22:32:31, 12.46s/it]08/03/2024 08:12:29 - INFO - __main__ -   Step: 2985, LR: 1.4139464214856696e-05, Loss: 760.4349975585938
2024-08-03T15:12:41.920397142Z 
 31%|███▏      | 2986/9500 [10:15:11<22:27:47, 12.41s/it]08/03/2024 08:12:41 - INFO - __main__ -   Step: 2986, LR: 1.4137293671169416e-05, Loss: 582.555908203125
2024-08-03T15:12:54.434932766Z 
 31%|███▏      | 2987/9500 [10:15:24<22:30:50, 12.44s/it]08/03/2024 08:12:54 - INFO - __main__ -   Step: 2987, LR: 1.4135123127482137e-05, Loss: 487.06109619140625
2024-08-03T15:13:06.612551210Z 
 31%|███▏      | 2988/9500 [10:15:36<22:21:56, 12.36s/it]08/03/2024 08:13:06 - INFO - __main__ -   Step: 2988, LR: 1.4132952583794857e-05, Loss: 657.4246826171875
2024-08-03T15:13:18.844389140Z 
 31%|███▏      | 2989/9500 [10:15:48<22:17:25, 12.32s/it]08/03/2024 08:13:18 - INFO - __main__ -   Step: 2989, LR: 1.4130782040107579e-05, Loss: 650.063720703125
2024-08-03T15:13:31.324867627Z 
 31%|███▏      | 2990/9500 [10:16:01<22:22:17, 12.37s/it]08/03/2024 08:13:31 - INFO - __main__ -   Step: 2990, LR: 1.4128611496420299e-05, Loss: 816.5098876953125
2024-08-03T15:13:43.568074687Z 
 31%|███▏      | 2991/9500 [10:16:13<22:17:55, 12.33s/it]08/03/2024 08:13:43 - INFO - __main__ -   Step: 2991, LR: 1.412644095273302e-05, Loss: 641.7362060546875
2024-08-03T15:13:55.853214754Z 
 31%|███▏      | 2992/9500 [10:16:25<22:16:09, 12.32s/it]08/03/2024 08:13:55 - INFO - __main__ -   Step: 2992, LR: 1.4124270409045742e-05, Loss: 677.4505615234375
2024-08-03T15:14:08.618217585Z 
 32%|███▏      | 2993/9500 [10:16:38<22:30:28, 12.45s/it]08/03/2024 08:14:08 - INFO - __main__ -   Step: 2993, LR: 1.4122099865358463e-05, Loss: 564.5206298828125
2024-08-03T15:14:21.195999128Z 
 32%|███▏      | 2994/9500 [10:16:51<22:34:16, 12.49s/it]08/03/2024 08:14:21 - INFO - __main__ -   Step: 2994, LR: 1.4119929321671185e-05, Loss: 658.761474609375
2024-08-03T15:14:33.688667485Z 
 32%|███▏      | 2995/9500 [10:17:03<22:34:14, 12.49s/it]08/03/2024 08:14:33 - INFO - __main__ -   Step: 2995, LR: 1.4117758777983905e-05, Loss: 781.4073486328125
2024-08-03T15:14:46.167960753Z 
 32%|███▏      | 2996/9500 [10:17:16<22:33:38, 12.49s/it]08/03/2024 08:14:46 - INFO - __main__ -   Step: 2996, LR: 1.4115588234296626e-05, Loss: 573.2796630859375
2024-08-03T15:14:58.962056213Z 
 32%|███▏      | 2997/9500 [10:17:28<22:43:24, 12.58s/it]08/03/2024 08:14:58 - INFO - __main__ -   Step: 2997, LR: 1.4113417690609348e-05, Loss: 699.8695068359375
2024-08-03T15:15:11.161937944Z 
 32%|███▏      | 2998/9500 [10:17:41<22:30:51, 12.47s/it]08/03/2024 08:15:11 - INFO - __main__ -   Step: 2998, LR: 1.411124714692207e-05, Loss: 674.2916259765625
2024-08-03T15:15:23.663741730Z 
 32%|███▏      | 2999/9500 [10:17:53<22:31:46, 12.48s/it]08/03/2024 08:15:23 - INFO - __main__ -   Step: 2999, LR: 1.410907660323479e-05, Loss: 665.2984619140625
2024-08-03T15:15:35.666481597Z 
 32%|███▏      | 3000/9500 [10:18:05<22:16:14, 12.33s/it]08/03/2024 08:15:35 - INFO - __main__ -   Step: 3000, LR: 1.4106906059547512e-05, Loss: 577.6766357421875
2024-08-03T15:15:48.438334603Z 
 32%|███▏      | 3001/9500 [10:18:18<22:30:14, 12.47s/it]08/03/2024 08:15:48 - INFO - __main__ -   Step: 3001, LR: 1.410473551586023e-05, Loss: 600.9522705078125
2024-08-03T15:16:00.905084526Z 
 32%|███▏      | 3002/9500 [10:18:30<22:30:04, 12.47s/it]08/03/2024 08:16:00 - INFO - __main__ -   Step: 3002, LR: 1.4102564972172952e-05, Loss: 562.931884765625
2024-08-03T15:16:13.067845888Z 
 32%|███▏      | 3003/9500 [10:18:43<22:20:00, 12.38s/it]08/03/2024 08:16:13 - INFO - __main__ -   Step: 3003, LR: 1.4100394428485674e-05, Loss: 548.0021362304688
2024-08-03T15:16:25.231435278Z 
 32%|███▏      | 3004/9500 [10:18:55<22:12:55, 12.31s/it]08/03/2024 08:16:25 - INFO - __main__ -   Step: 3004, LR: 1.4098223884798394e-05, Loss: 555.6640014648438
2024-08-03T15:16:37.210107357Z 
 32%|███▏      | 3005/9500 [10:19:07<22:01:55, 12.21s/it]08/03/2024 08:16:37 - INFO - __main__ -   Step: 3005, LR: 1.4096053341111115e-05, Loss: 583.3612060546875
2024-08-03T15:16:49.961377161Z 
 32%|███▏      | 3006/9500 [10:19:19<22:19:14, 12.37s/it]08/03/2024 08:16:49 - INFO - __main__ -   Step: 3006, LR: 1.4093882797423837e-05, Loss: 735.66943359375
2024-08-03T15:17:02.317657117Z 
 32%|███▏      | 3007/9500 [10:19:32<22:18:27, 12.37s/it]08/03/2024 08:17:02 - INFO - __main__ -   Step: 3007, LR: 1.4091712253736558e-05, Loss: 642.4356689453125
2024-08-03T15:17:14.751746643Z 
 32%|███▏      | 3008/9500 [10:19:44<22:20:23, 12.39s/it]08/03/2024 08:17:14 - INFO - __main__ -   Step: 3008, LR: 1.408954171004928e-05, Loss: 622.30712890625
2024-08-03T15:17:27.125490642Z 
 32%|███▏      | 3009/9500 [10:19:57<22:19:42, 12.38s/it]08/03/2024 08:17:27 - INFO - __main__ -   Step: 3009, LR: 1.4087371166362001e-05, Loss: 499.54229736328125
2024-08-03T15:17:39.405490433Z 
 32%|███▏      | 3010/9500 [10:20:09<22:16:09, 12.35s/it]08/03/2024 08:17:39 - INFO - __main__ -   Step: 3010, LR: 1.4085200622674721e-05, Loss: 582.9705200195312
2024-08-03T15:17:51.523546273Z 
 32%|███▏      | 3011/9500 [10:20:21<22:08:19, 12.28s/it]08/03/2024 08:17:51 - INFO - __main__ -   Step: 3011, LR: 1.4083030078987443e-05, Loss: 571.11865234375
2024-08-03T15:18:04.027829920Z 
 32%|███▏      | 3012/9500 [10:20:33<22:15:19, 12.35s/it]08/03/2024 08:18:04 - INFO - __main__ -   Step: 3012, LR: 1.4080859535300164e-05, Loss: 562.468994140625
2024-08-03T15:18:16.052890921Z 
 32%|███▏      | 3013/9500 [10:20:45<22:04:37, 12.25s/it]08/03/2024 08:18:16 - INFO - __main__ -   Step: 3013, LR: 1.4078688991612886e-05, Loss: 470.00250244140625
2024-08-03T15:18:28.484807098Z 
 32%|███▏      | 3014/9500 [10:20:58<22:10:15, 12.31s/it]08/03/2024 08:18:28 - INFO - __main__ -   Step: 3014, LR: 1.4076518447925607e-05, Loss: 459.6075744628906
2024-08-03T15:18:41.034669639Z 
 32%|███▏      | 3015/9500 [10:21:10<22:17:58, 12.38s/it]08/03/2024 08:18:41 - INFO - __main__ -   Step: 3015, LR: 1.4074347904238326e-05, Loss: 625.6910400390625
2024-08-03T15:18:53.160963471Z 
 32%|███▏      | 3016/9500 [10:21:23<22:09:33, 12.30s/it]08/03/2024 08:18:53 - INFO - __main__ -   Step: 3016, LR: 1.4072177360551047e-05, Loss: 504.56414794921875
2024-08-03T15:19:05.243898621Z 
 32%|███▏      | 3017/9500 [10:21:35<22:02:12, 12.24s/it]08/03/2024 08:19:05 - INFO - __main__ -   Step: 3017, LR: 1.4070006816863769e-05, Loss: 594.0069580078125
2024-08-03T15:19:18.010003135Z 
 32%|███▏      | 3018/9500 [10:21:47<22:19:10, 12.40s/it]08/03/2024 08:19:18 - INFO - __main__ -   Step: 3018, LR: 1.406783627317649e-05, Loss: 641.775634765625
2024-08-03T15:19:30.552636531Z 
 32%|███▏      | 3019/9500 [10:22:00<22:23:42, 12.44s/it]08/03/2024 08:19:30 - INFO - __main__ -   Step: 3019, LR: 1.406566572948921e-05, Loss: 591.1534423828125
2024-08-03T15:19:42.777013573Z 
 32%|███▏      | 3020/9500 [10:22:12<22:16:30, 12.38s/it]08/03/2024 08:19:42 - INFO - __main__ -   Step: 3020, LR: 1.4063495185801932e-05, Loss: 673.8488159179688
2024-08-03T15:19:55.243198369Z 
 32%|███▏      | 3021/9500 [10:22:25<22:19:15, 12.40s/it]08/03/2024 08:19:55 - INFO - __main__ -   Step: 3021, LR: 1.4061324642114653e-05, Loss: 516.6508178710938
2024-08-03T15:20:07.593676492Z 
 32%|███▏      | 3022/9500 [10:22:37<22:17:22, 12.39s/it]08/03/2024 08:20:07 - INFO - __main__ -   Step: 3022, LR: 1.4059154098427375e-05, Loss: 517.9241943359375
2024-08-03T15:20:19.759316220Z 
 32%|███▏      | 3023/9500 [10:22:49<22:10:00, 12.32s/it]08/03/2024 08:20:19 - INFO - __main__ -   Step: 3023, LR: 1.4056983554740096e-05, Loss: 633.5494384765625
2024-08-03T15:20:32.331515760Z 
 32%|███▏      | 3024/9500 [10:23:02<22:17:56, 12.40s/it]08/03/2024 08:20:32 - INFO - __main__ -   Step: 3024, LR: 1.4054813011052816e-05, Loss: 625.689453125
2024-08-03T15:20:44.754748102Z 
 32%|███▏      | 3025/9500 [10:23:14<22:18:37, 12.40s/it]08/03/2024 08:20:44 - INFO - __main__ -   Step: 3025, LR: 1.4052642467365538e-05, Loss: 632.1121826171875
2024-08-03T15:20:56.841961464Z 
 32%|███▏      | 3026/9500 [10:23:26<22:08:09, 12.31s/it]08/03/2024 08:20:56 - INFO - __main__ -   Step: 3026, LR: 1.405047192367826e-05, Loss: 568.5702514648438
2024-08-03T15:21:09.063422095Z 
 32%|███▏      | 3027/9500 [10:23:39<22:05:06, 12.28s/it]08/03/2024 08:21:09 - INFO - __main__ -   Step: 3027, LR: 1.4048301379990981e-05, Loss: 495.67242431640625
2024-08-03T15:21:21.235129774Z 
 32%|███▏      | 3028/9500 [10:23:51<22:01:18, 12.25s/it]08/03/2024 08:21:21 - INFO - __main__ -   Step: 3028, LR: 1.4046130836303703e-05, Loss: 523.6804809570312
2024-08-03T15:21:33.406299679Z 
 32%|███▏      | 3029/9500 [10:24:03<21:58:34, 12.23s/it]08/03/2024 08:21:33 - INFO - __main__ -   Step: 3029, LR: 1.404396029261642e-05, Loss: 676.2122802734375
2024-08-03T15:21:45.910464938Z 
 32%|███▏      | 3030/9500 [10:24:15<22:07:22, 12.31s/it]08/03/2024 08:21:45 - INFO - __main__ -   Step: 3030, LR: 1.4041789748929142e-05, Loss: 547.5689697265625
2024-08-03T15:21:58.122078974Z 
 32%|███▏      | 3031/9500 [10:24:28<22:03:59, 12.28s/it]08/03/2024 08:21:58 - INFO - __main__ -   Step: 3031, LR: 1.4039619205241864e-05, Loss: 585.64453125
2024-08-03T15:22:10.332512464Z 
 32%|███▏      | 3032/9500 [10:24:40<22:01:33, 12.26s/it]08/03/2024 08:22:10 - INFO - __main__ -   Step: 3032, LR: 1.4037448661554585e-05, Loss: 593.6005859375
2024-08-03T15:22:23.168960326Z 
 32%|███▏      | 3033/9500 [10:24:53<22:20:00, 12.43s/it]08/03/2024 08:22:23 - INFO - __main__ -   Step: 3033, LR: 1.4035278117867305e-05, Loss: 758.8402099609375
2024-08-03T15:22:35.007109473Z 
 32%|███▏      | 3034/9500 [10:25:04<22:00:35, 12.25s/it]08/03/2024 08:22:35 - INFO - __main__ -   Step: 3034, LR: 1.4033107574180027e-05, Loss: 438.6059265136719
2024-08-03T15:22:47.249201489Z 
 32%|███▏      | 3035/9500 [10:25:17<21:59:59, 12.25s/it]08/03/2024 08:22:47 - INFO - __main__ -   Step: 3035, LR: 1.4030937030492748e-05, Loss: 825.7548217773438
2024-08-03T15:22:59.622542350Z 
 32%|███▏      | 3036/9500 [10:25:29<22:03:45, 12.29s/it]08/03/2024 08:22:59 - INFO - __main__ -   Step: 3036, LR: 1.402876648680547e-05, Loss: 517.7960205078125
2024-08-03T15:23:12.208541469Z 
 32%|███▏      | 3037/9500 [10:25:42<22:13:12, 12.38s/it]08/03/2024 08:23:12 - INFO - __main__ -   Step: 3037, LR: 1.4026595943118191e-05, Loss: 655.4149169921875
2024-08-03T15:23:24.713837958Z 
 32%|███▏      | 3038/9500 [10:25:54<22:17:08, 12.42s/it]08/03/2024 08:23:24 - INFO - __main__ -   Step: 3038, LR: 1.4024425399430911e-05, Loss: 625.2517700195312
2024-08-03T15:23:36.995319523Z 
 32%|███▏      | 3039/9500 [10:26:06<22:12:36, 12.38s/it]08/03/2024 08:23:36 - INFO - __main__ -   Step: 3039, LR: 1.4022254855743633e-05, Loss: 606.088623046875
2024-08-03T15:23:49.433103978Z 
 32%|███▏      | 3040/9500 [10:26:19<22:14:25, 12.39s/it]08/03/2024 08:23:49 - INFO - __main__ -   Step: 3040, LR: 1.4020084312056354e-05, Loss: 406.5033874511719
2024-08-03T15:24:02.104123101Z 
 32%|███▏      | 3041/9500 [10:26:32<22:23:09, 12.48s/it]08/03/2024 08:24:02 - INFO - __main__ -   Step: 3041, LR: 1.4017913768369076e-05, Loss: 632.7862548828125
2024-08-03T15:24:14.643279538Z 
 32%|███▏      | 3042/9500 [10:26:44<22:24:57, 12.50s/it]08/03/2024 08:24:14 - INFO - __main__ -   Step: 3042, LR: 1.4015743224681798e-05, Loss: 548.41552734375
2024-08-03T15:24:26.723624516Z 
 32%|███▏      | 3043/9500 [10:26:56<22:11:20, 12.37s/it]08/03/2024 08:24:26 - INFO - __main__ -   Step: 3043, LR: 1.4013572680994516e-05, Loss: 699.1600952148438
2024-08-03T15:24:39.228455438Z 
 32%|███▏      | 3044/9500 [10:27:09<22:15:26, 12.41s/it]08/03/2024 08:24:39 - INFO - __main__ -   Step: 3044, LR: 1.4011402137307237e-05, Loss: 706.303955078125
2024-08-03T15:24:51.607750864Z 
 32%|███▏      | 3045/9500 [10:27:21<22:14:12, 12.40s/it]08/03/2024 08:24:51 - INFO - __main__ -   Step: 3045, LR: 1.4009231593619959e-05, Loss: 497.6116027832031
2024-08-03T15:25:03.560075484Z 
 32%|███▏      | 3046/9500 [10:27:33<21:59:30, 12.27s/it]08/03/2024 08:25:03 - INFO - __main__ -   Step: 3046, LR: 1.400706104993268e-05, Loss: 507.6341552734375
2024-08-03T15:25:15.723074332Z 
 32%|███▏      | 3047/9500 [10:27:45<21:55:56, 12.24s/it]08/03/2024 08:25:15 - INFO - __main__ -   Step: 3047, LR: 1.40048905062454e-05, Loss: 632.4247436523438
2024-08-03T15:25:27.908373593Z 
 32%|███▏      | 3048/9500 [10:27:57<21:54:07, 12.22s/it]08/03/2024 08:25:27 - INFO - __main__ -   Step: 3048, LR: 1.4002719962558122e-05, Loss: 622.677734375
2024-08-03T15:25:40.762874898Z 
 32%|███▏      | 3049/9500 [10:28:10<22:14:21, 12.41s/it]08/03/2024 08:25:40 - INFO - __main__ -   Step: 3049, LR: 1.4000549418870843e-05, Loss: 688.990234375
2024-08-03T15:25:52.833246059Z 
 32%|███▏      | 3050/9500 [10:28:22<22:03:10, 12.31s/it]08/03/2024 08:25:52 - INFO - __main__ -   Step: 3050, LR: 1.3998378875183565e-05, Loss: 565.9197998046875
2024-08-03T15:26:05.022927215Z 
 32%|███▏      | 3051/9500 [10:28:34<21:59:08, 12.27s/it]08/03/2024 08:26:05 - INFO - __main__ -   Step: 3051, LR: 1.3996208331496287e-05, Loss: 728.47314453125
2024-08-03T15:26:17.551658466Z 
 32%|███▏      | 3052/9500 [10:28:47<22:07:06, 12.35s/it]08/03/2024 08:26:17 - INFO - __main__ -   Step: 3052, LR: 1.3994037787809006e-05, Loss: 528.7249755859375
2024-08-03T15:26:29.700233669Z 
 32%|███▏      | 3053/9500 [10:28:59<22:00:30, 12.29s/it]08/03/2024 08:26:29 - INFO - __main__ -   Step: 3053, LR: 1.3991867244121728e-05, Loss: 705.0628662109375
2024-08-03T15:26:41.848804582Z 
 32%|███▏      | 3054/9500 [10:29:11<21:55:45, 12.25s/it]08/03/2024 08:26:41 - INFO - __main__ -   Step: 3054, LR: 1.398969670043445e-05, Loss: 558.448486328125
2024-08-03T15:26:54.101670883Z 
 32%|███▏      | 3055/9500 [10:29:24<21:55:44, 12.25s/it]08/03/2024 08:26:54 - INFO - __main__ -   Step: 3055, LR: 1.3987526156747171e-05, Loss: 527.5031127929688
2024-08-03T15:27:06.075047427Z 
 32%|███▏      | 3056/9500 [10:29:36<21:46:39, 12.17s/it]08/03/2024 08:27:06 - INFO - __main__ -   Step: 3056, LR: 1.3985355613059893e-05, Loss: 511.0456237792969
2024-08-03T15:27:18.291359682Z 
 32%|███▏      | 3057/9500 [10:29:48<21:48:03, 12.18s/it]08/03/2024 08:27:18 - INFO - __main__ -   Step: 3057, LR: 1.398318506937261e-05, Loss: 660.6229248046875
2024-08-03T15:27:30.694758733Z 
 32%|███▏      | 3058/9500 [10:30:00<21:55:01, 12.25s/it]08/03/2024 08:27:30 - INFO - __main__ -   Step: 3058, LR: 1.3981014525685332e-05, Loss: 552.4496459960938
2024-08-03T15:27:42.863618559Z 
 32%|███▏      | 3059/9500 [10:30:12<21:52:16, 12.22s/it]08/03/2024 08:27:42 - INFO - __main__ -   Step: 3059, LR: 1.3978843981998054e-05, Loss: 699.171875
2024-08-03T15:27:54.814640535Z 
 32%|███▏      | 3060/9500 [10:30:24<21:43:16, 12.14s/it]08/03/2024 08:27:54 - INFO - __main__ -   Step: 3060, LR: 1.3976673438310775e-05, Loss: 548.9920654296875
2024-08-03T15:28:06.989643883Z 
 32%|███▏      | 3061/9500 [10:30:36<21:44:07, 12.15s/it]08/03/2024 08:28:06 - INFO - __main__ -   Step: 3061, LR: 1.3974502894623495e-05, Loss: 543.44873046875
2024-08-03T15:28:19.154019253Z 
 32%|███▏      | 3062/9500 [10:30:49<21:44:18, 12.16s/it]08/03/2024 08:28:19 - INFO - __main__ -   Step: 3062, LR: 1.3972332350936217e-05, Loss: 602.5333251953125
2024-08-03T15:28:31.505328128Z 
 32%|███▏      | 3063/9500 [10:31:01<21:50:24, 12.21s/it]08/03/2024 08:28:31 - INFO - __main__ -   Step: 3063, LR: 1.3970161807248938e-05, Loss: 661.63232421875
2024-08-03T15:28:43.758464620Z 
 32%|███▏      | 3064/9500 [10:31:13<21:51:26, 12.23s/it]08/03/2024 08:28:43 - INFO - __main__ -   Step: 3064, LR: 1.396799126356166e-05, Loss: 493.3560485839844
2024-08-03T15:28:55.705886535Z 
 32%|███▏      | 3065/9500 [10:31:25<21:42:16, 12.14s/it]08/03/2024 08:28:55 - INFO - __main__ -   Step: 3065, LR: 1.3965820719874382e-05, Loss: 467.26690673828125
2024-08-03T15:29:07.877995794Z 
 32%|███▏      | 3066/9500 [10:31:37<21:43:01, 12.15s/it]08/03/2024 08:29:07 - INFO - __main__ -   Step: 3066, LR: 1.3963650176187103e-05, Loss: 518.05712890625
2024-08-03T15:29:20.773176970Z 
 32%|███▏      | 3067/9500 [10:31:50<22:06:45, 12.37s/it]08/03/2024 08:29:20 - INFO - __main__ -   Step: 3067, LR: 1.3961479632499823e-05, Loss: 606.0061645507812
2024-08-03T15:29:33.159176812Z 
 32%|███▏      | 3068/9500 [10:32:03<22:06:54, 12.38s/it]08/03/2024 08:29:33 - INFO - __main__ -   Step: 3068, LR: 1.3959309088812545e-05, Loss: 688.5508422851562
2024-08-03T15:29:45.092998052Z 
 32%|███▏      | 3069/9500 [10:32:15<21:52:25, 12.24s/it]08/03/2024 08:29:45 - INFO - __main__ -   Step: 3069, LR: 1.3957138545125266e-05, Loss: 603.592041015625
2024-08-03T15:29:57.622124794Z 
 32%|███▏      | 3070/9500 [10:32:27<22:01:22, 12.33s/it]08/03/2024 08:29:57 - INFO - __main__ -   Step: 3070, LR: 1.3954968001437988e-05, Loss: 665.0078735351562
2024-08-03T15:30:09.918348241Z 
 32%|███▏      | 3071/9500 [10:32:39<22:00:04, 12.32s/it]08/03/2024 08:30:09 - INFO - __main__ -   Step: 3071, LR: 1.3952797457750706e-05, Loss: 696.7176513671875
2024-08-03T15:30:22.241251524Z 
 32%|███▏      | 3072/9500 [10:32:52<21:59:58, 12.32s/it]08/03/2024 08:30:22 - INFO - __main__ -   Step: 3072, LR: 1.3950626914063427e-05, Loss: 744.18212890625
2024-08-03T15:30:35.025592036Z 
 32%|███▏      | 3073/9500 [10:33:04<22:14:39, 12.46s/it]08/03/2024 08:30:35 - INFO - __main__ -   Step: 3073, LR: 1.3948456370376149e-05, Loss: 648.5667724609375
2024-08-03T15:30:47.563417283Z 
 32%|███▏      | 3074/9500 [10:33:17<22:16:57, 12.48s/it]08/03/2024 08:30:47 - INFO - __main__ -   Step: 3074, LR: 1.394628582668887e-05, Loss: 594.604736328125
2024-08-03T15:30:59.810869880Z 
 32%|███▏      | 3075/9500 [10:33:29<22:09:10, 12.41s/it]08/03/2024 08:30:59 - INFO - __main__ -   Step: 3075, LR: 1.3944115283001592e-05, Loss: 608.66259765625
2024-08-03T15:31:12.389633990Z 
 32%|███▏      | 3076/9500 [10:33:42<22:14:18, 12.46s/it]08/03/2024 08:31:12 - INFO - __main__ -   Step: 3076, LR: 1.3941944739314312e-05, Loss: 568.9717407226562
2024-08-03T15:31:24.689059974Z 
 32%|███▏      | 3077/9500 [10:33:54<22:08:52, 12.41s/it]08/03/2024 08:31:24 - INFO - __main__ -   Step: 3077, LR: 1.3939774195627034e-05, Loss: 552.2996826171875
2024-08-03T15:31:36.861257199Z 
 32%|███▏      | 3078/9500 [10:34:06<22:00:54, 12.34s/it]08/03/2024 08:31:36 - INFO - __main__ -   Step: 3078, LR: 1.3937603651939755e-05, Loss: 605.0330810546875
2024-08-03T15:31:49.260882487Z 
 32%|███▏      | 3079/9500 [10:34:19<22:02:34, 12.36s/it]08/03/2024 08:31:49 - INFO - __main__ -   Step: 3079, LR: 1.3935433108252477e-05, Loss: 604.913330078125
2024-08-03T15:32:01.347391245Z 
 32%|███▏      | 3080/9500 [10:34:31<21:53:38, 12.28s/it]08/03/2024 08:32:01 - INFO - __main__ -   Step: 3080, LR: 1.3933262564565198e-05, Loss: 700.90869140625
2024-08-03T15:32:13.668355883Z 
 32%|███▏      | 3081/9500 [10:34:43<21:54:51, 12.29s/it]08/03/2024 08:32:13 - INFO - __main__ -   Step: 3081, LR: 1.3931092020877918e-05, Loss: 656.7163696289062
2024-08-03T15:32:26.253680145Z 
 32%|███▏      | 3082/9500 [10:34:56<22:04:07, 12.38s/it]08/03/2024 08:32:26 - INFO - __main__ -   Step: 3082, LR: 1.392892147719064e-05, Loss: 637.2591552734375
2024-08-03T15:32:38.693355784Z 
 32%|███▏      | 3083/9500 [10:35:08<22:05:51, 12.40s/it]08/03/2024 08:32:38 - INFO - __main__ -   Step: 3083, LR: 1.3926750933503361e-05, Loss: 669.578857421875
2024-08-03T15:32:50.811829996Z 
 32%|███▏      | 3084/9500 [10:35:20<21:56:43, 12.31s/it]08/03/2024 08:32:50 - INFO - __main__ -   Step: 3084, LR: 1.3924580389816083e-05, Loss: 612.70458984375
2024-08-03T15:33:03.216703201Z 
 32%|███▏      | 3085/9500 [10:35:33<21:59:27, 12.34s/it]08/03/2024 08:33:03 - INFO - __main__ -   Step: 3085, LR: 1.3922409846128801e-05, Loss: 588.541015625
2024-08-03T15:33:15.452374164Z 
 32%|███▏      | 3086/9500 [10:35:45<21:55:51, 12.31s/it]08/03/2024 08:33:15 - INFO - __main__ -   Step: 3086, LR: 1.3920239302441522e-05, Loss: 660.0247802734375
2024-08-03T15:33:27.719028923Z 
 32%|███▏      | 3087/9500 [10:35:57<21:54:17, 12.30s/it]08/03/2024 08:33:27 - INFO - __main__ -   Step: 3087, LR: 1.3918068758754244e-05, Loss: 635.6881103515625
2024-08-03T15:33:40.260124890Z 
 33%|███▎      | 3088/9500 [10:36:10<22:01:55, 12.37s/it]08/03/2024 08:33:40 - INFO - __main__ -   Step: 3088, LR: 1.3915898215066966e-05, Loss: 882.5106201171875
2024-08-03T15:33:52.555402860Z 
 33%|███▎      | 3089/9500 [10:36:22<21:59:19, 12.35s/it]08/03/2024 08:33:52 - INFO - __main__ -   Step: 3089, LR: 1.3913727671379687e-05, Loss: 571.7744140625
2024-08-03T15:34:05.009922122Z 
 33%|███▎      | 3090/9500 [10:36:34<22:02:33, 12.38s/it]08/03/2024 08:34:05 - INFO - __main__ -   Step: 3090, LR: 1.3911557127692407e-05, Loss: 625.410400390625
2024-08-03T15:34:17.302253609Z 
 33%|███▎      | 3091/9500 [10:36:47<21:59:33, 12.35s/it]08/03/2024 08:34:17 - INFO - __main__ -   Step: 3091, LR: 1.3909386584005129e-05, Loss: 731.2041015625
2024-08-03T15:34:29.916373979Z 
 33%|███▎      | 3092/9500 [10:36:59<22:07:41, 12.43s/it]08/03/2024 08:34:29 - INFO - __main__ -   Step: 3092, LR: 1.390721604031785e-05, Loss: 654.7001342773438
2024-08-03T15:34:42.390991307Z 
 33%|███▎      | 3093/9500 [10:37:12<22:08:52, 12.44s/it]08/03/2024 08:34:42 - INFO - __main__ -   Step: 3093, LR: 1.3905045496630572e-05, Loss: 692.2987670898438
2024-08-03T15:34:54.669878095Z 
 33%|███▎      | 3094/9500 [10:37:24<22:03:20, 12.39s/it]08/03/2024 08:34:54 - INFO - __main__ -   Step: 3094, LR: 1.3902874952943293e-05, Loss: 492.10760498046875
2024-08-03T15:35:07.184904070Z 
 33%|███▎      | 3095/9500 [10:37:37<22:07:00, 12.43s/it]08/03/2024 08:35:07 - INFO - __main__ -   Step: 3095, LR: 1.3900704409256013e-05, Loss: 661.8607177734375
2024-08-03T15:35:19.208051467Z 
 33%|███▎      | 3096/9500 [10:37:49<21:53:44, 12.31s/it]08/03/2024 08:35:19 - INFO - __main__ -   Step: 3096, LR: 1.3898533865568735e-05, Loss: 547.8973388671875
2024-08-03T15:35:31.069261139Z 
 33%|███▎      | 3097/9500 [10:38:01<21:39:12, 12.17s/it]08/03/2024 08:35:31 - INFO - __main__ -   Step: 3097, LR: 1.3896363321881456e-05, Loss: 573.0618896484375
2024-08-03T15:35:43.639466291Z 
 33%|███▎      | 3098/9500 [10:38:13<21:51:40, 12.29s/it]08/03/2024 08:35:43 - INFO - __main__ -   Step: 3098, LR: 1.3894192778194178e-05, Loss: 734.3795166015625
2024-08-03T15:35:55.696544751Z 
 33%|███▎      | 3099/9500 [10:38:25<21:43:54, 12.22s/it]08/03/2024 08:35:55 - INFO - __main__ -   Step: 3099, LR: 1.3892022234506896e-05, Loss: 679.45068359375
2024-08-03T15:36:08.042156426Z 
 33%|███▎      | 3100/9500 [10:38:37<21:47:39, 12.26s/it]08/03/2024 08:36:08 - INFO - __main__ -   Step: 3100, LR: 1.3889851690819618e-05, Loss: 685.6497802734375
2024-08-03T15:36:20.722009774Z 
 33%|███▎      | 3101/9500 [10:38:50<22:00:54, 12.39s/it]08/03/2024 08:36:20 - INFO - __main__ -   Step: 3101, LR: 1.3887681147132339e-05, Loss: 718.705810546875
2024-08-03T15:36:32.639342202Z 
 33%|███▎      | 3102/9500 [10:39:02<21:45:43, 12.25s/it]08/03/2024 08:36:32 - INFO - __main__ -   Step: 3102, LR: 1.388551060344506e-05, Loss: 462.11102294921875
2024-08-03T15:36:44.745434167Z 
 33%|███▎      | 3103/9500 [10:39:14<21:41:04, 12.20s/it]08/03/2024 08:36:44 - INFO - __main__ -   Step: 3103, LR: 1.3883340059757782e-05, Loss: 524.7903442382812
2024-08-03T15:36:57.604480660Z 
 33%|███▎      | 3104/9500 [10:39:27<22:01:50, 12.40s/it]08/03/2024 08:36:57 - INFO - __main__ -   Step: 3104, LR: 1.3881169516070502e-05, Loss: 514.9767456054688
2024-08-03T15:37:09.477143553Z 
 33%|███▎      | 3105/9500 [10:39:39<21:44:47, 12.24s/it]08/03/2024 08:37:09 - INFO - __main__ -   Step: 3105, LR: 1.3878998972383224e-05, Loss: 446.6987609863281
2024-08-03T15:37:22.127040683Z 
 33%|███▎      | 3106/9500 [10:39:52<21:57:37, 12.36s/it]08/03/2024 08:37:22 - INFO - __main__ -   Step: 3106, LR: 1.3876828428695945e-05, Loss: 651.7783203125
2024-08-03T15:37:34.753561743Z 
 33%|███▎      | 3107/9500 [10:40:04<22:05:47, 12.44s/it]08/03/2024 08:37:34 - INFO - __main__ -   Step: 3107, LR: 1.3874657885008667e-05, Loss: 524.4080200195312
2024-08-03T15:37:47.142119686Z 
 33%|███▎      | 3108/9500 [10:40:17<22:03:48, 12.43s/it]08/03/2024 08:37:47 - INFO - __main__ -   Step: 3108, LR: 1.3872487341321388e-05, Loss: 578.6594848632812
2024-08-03T15:37:59.128825558Z 
 33%|███▎      | 3109/9500 [10:40:29<21:49:36, 12.29s/it]08/03/2024 08:37:59 - INFO - __main__ -   Step: 3109, LR: 1.387031679763411e-05, Loss: 638.578125
2024-08-03T15:38:11.554548888Z 
 33%|███▎      | 3110/9500 [10:40:41<21:53:34, 12.33s/it]08/03/2024 08:38:11 - INFO - __main__ -   Step: 3110, LR: 1.386814625394683e-05, Loss: 808.571533203125
2024-08-03T15:38:23.663675905Z 
 33%|███▎      | 3111/9500 [10:40:53<21:46:11, 12.27s/it]08/03/2024 08:38:23 - INFO - __main__ -   Step: 3111, LR: 1.3865975710259551e-05, Loss: 668.3353881835938
2024-08-03T15:38:35.717082340Z 
 33%|███▎      | 3112/9500 [10:41:05<21:39:09, 12.20s/it]08/03/2024 08:38:35 - INFO - __main__ -   Step: 3112, LR: 1.3863805166572273e-05, Loss: 522.3394775390625
2024-08-03T15:38:48.172804121Z 
 33%|███▎      | 3113/9500 [10:41:18<21:47:02, 12.28s/it]08/03/2024 08:38:48 - INFO - __main__ -   Step: 3113, LR: 1.3861634622884991e-05, Loss: 593.71728515625
2024-08-03T15:39:00.359672565Z 
 33%|███▎      | 3114/9500 [10:41:30<21:43:55, 12.25s/it]08/03/2024 08:39:00 - INFO - __main__ -   Step: 3114, LR: 1.3859464079197713e-05, Loss: 682.6715698242188
2024-08-03T15:39:12.446843180Z 
 33%|███▎      | 3115/9500 [10:41:42<21:38:29, 12.20s/it]08/03/2024 08:39:12 - INFO - __main__ -   Step: 3115, LR: 1.3857293535510434e-05, Loss: 533.37646484375
2024-08-03T15:39:25.255999150Z 
 33%|███▎      | 3116/9500 [10:41:55<21:57:39, 12.38s/it]08/03/2024 08:39:25 - INFO - __main__ -   Step: 3116, LR: 1.3855122991823156e-05, Loss: 415.49365234375
2024-08-03T15:39:37.343289621Z 
 33%|███▎      | 3117/9500 [10:42:07<21:47:59, 12.30s/it]08/03/2024 08:39:37 - INFO - __main__ -   Step: 3117, LR: 1.3852952448135877e-05, Loss: 675.5980834960938
2024-08-03T15:39:49.354803378Z 
 33%|███▎      | 3118/9500 [10:42:19<21:38:44, 12.21s/it]08/03/2024 08:39:49 - INFO - __main__ -   Step: 3118, LR: 1.3850781904448599e-05, Loss: 532.4700317382812
2024-08-03T15:40:01.978661317Z 
 33%|███▎      | 3119/9500 [10:42:31<21:51:43, 12.33s/it]08/03/2024 08:40:01 - INFO - __main__ -   Step: 3119, LR: 1.3848611360761319e-05, Loss: 428.35174560546875
2024-08-03T15:40:13.964004548Z 
 33%|███▎      | 3120/9500 [10:42:43<21:40:24, 12.23s/it]08/03/2024 08:40:13 - INFO - __main__ -   Step: 3120, LR: 1.384644081707404e-05, Loss: 589.3855590820312
2024-08-03T15:40:25.951798859Z 
 33%|███▎      | 3121/9500 [10:42:55<21:32:29, 12.16s/it]08/03/2024 08:40:25 - INFO - __main__ -   Step: 3121, LR: 1.3844270273386762e-05, Loss: 559.7965087890625
2024-08-03T15:40:38.614757577Z 
 33%|███▎      | 3122/9500 [10:43:08<21:48:25, 12.31s/it]08/03/2024 08:40:38 - INFO - __main__ -   Step: 3122, LR: 1.3842099729699483e-05, Loss: 617.3107299804688
2024-08-03T15:40:50.608706065Z 
 33%|███▎      | 3123/9500 [10:43:20<21:38:10, 12.21s/it]08/03/2024 08:40:50 - INFO - __main__ -   Step: 3123, LR: 1.3839929186012205e-05, Loss: 490.8079833984375
2024-08-03T15:41:02.808938007Z 
 33%|███▎      | 3124/9500 [10:43:32<21:37:31, 12.21s/it]08/03/2024 08:41:02 - INFO - __main__ -   Step: 3124, LR: 1.3837758642324925e-05, Loss: 575.0840454101562
2024-08-03T15:41:15.267935274Z 
 33%|███▎      | 3125/9500 [10:43:45<21:45:15, 12.28s/it]08/03/2024 08:41:15 - INFO - __main__ -   Step: 3125, LR: 1.3835588098637646e-05, Loss: 676.077880859375
2024-08-03T15:41:27.428726220Z 
 33%|███▎      | 3126/9500 [10:43:57<21:41:06, 12.25s/it]08/03/2024 08:41:27 - INFO - __main__ -   Step: 3126, LR: 1.3833417554950368e-05, Loss: 668.58740234375
2024-08-03T15:41:39.531907073Z 
 33%|███▎      | 3127/9500 [10:44:09<21:36:17, 12.20s/it]08/03/2024 08:41:39 - INFO - __main__ -   Step: 3127, LR: 1.3831247011263088e-05, Loss: 502.0077819824219
2024-08-03T15:41:52.083090005Z 
 33%|███▎      | 3128/9500 [10:44:22<21:47:08, 12.31s/it]08/03/2024 08:41:52 - INFO - __main__ -   Step: 3128, LR: 1.3829076467575808e-05, Loss: 570.6817626953125
2024-08-03T15:42:04.493723023Z 
 33%|███▎      | 3129/9500 [10:44:34<21:50:12, 12.34s/it]08/03/2024 08:42:04 - INFO - __main__ -   Step: 3129, LR: 1.382690592388853e-05, Loss: 864.1363525390625
2024-08-03T15:42:17.080059358Z 
 33%|███▎      | 3130/9500 [10:44:47<21:57:51, 12.41s/it]08/03/2024 08:42:17 - INFO - __main__ -   Step: 3130, LR: 1.382473538020125e-05, Loss: 514.4599609375
2024-08-03T15:42:29.521083779Z 
 33%|███▎      | 3131/9500 [10:44:59<21:58:32, 12.42s/it]08/03/2024 08:42:29 - INFO - __main__ -   Step: 3131, LR: 1.3822564836513972e-05, Loss: 495.79534912109375
2024-08-03T15:42:41.934928499Z 
 33%|███▎      | 3132/9500 [10:45:11<21:58:06, 12.42s/it]08/03/2024 08:42:41 - INFO - __main__ -   Step: 3132, LR: 1.3820394292826694e-05, Loss: 755.5669555664062
2024-08-03T15:42:54.345579171Z 
 33%|███▎      | 3133/9500 [10:45:24<21:57:36, 12.42s/it]08/03/2024 08:42:54 - INFO - __main__ -   Step: 3133, LR: 1.3818223749139414e-05, Loss: 641.6438598632812
2024-08-03T15:43:06.441005719Z 
 33%|███▎      | 3134/9500 [10:45:36<21:47:11, 12.32s/it]08/03/2024 08:43:06 - INFO - __main__ -   Step: 3134, LR: 1.3816053205452135e-05, Loss: 677.7785034179688
2024-08-03T15:43:18.850946957Z 
 33%|███▎      | 3135/9500 [10:45:48<21:49:50, 12.35s/it]08/03/2024 08:43:18 - INFO - __main__ -   Step: 3135, LR: 1.3813882661764857e-05, Loss: 558.5177612304688
2024-08-03T15:43:31.002787757Z 
 33%|███▎      | 3136/9500 [10:46:00<21:43:24, 12.29s/it]08/03/2024 08:43:31 - INFO - __main__ -   Step: 3136, LR: 1.3811712118077578e-05, Loss: 657.1793212890625
2024-08-03T15:43:43.156352974Z 
 33%|███▎      | 3137/9500 [10:46:13<21:38:54, 12.25s/it]08/03/2024 08:43:43 - INFO - __main__ -   Step: 3137, LR: 1.38095415743903e-05, Loss: 703.5340576171875
2024-08-03T15:43:55.742827271Z 
 33%|███▎      | 3138/9500 [10:46:25<21:49:27, 12.35s/it]08/03/2024 08:43:55 - INFO - __main__ -   Step: 3138, LR: 1.380737103070302e-05, Loss: 769.2356567382812
2024-08-03T15:44:08.110325498Z 
 33%|███▎      | 3139/9500 [10:46:38<21:49:50, 12.36s/it]08/03/2024 08:44:08 - INFO - __main__ -   Step: 3139, LR: 1.3805200487015741e-05, Loss: 643.1700439453125
2024-08-03T15:44:20.439081466Z 
 33%|███▎      | 3140/9500 [10:46:50<21:48:47, 12.35s/it]08/03/2024 08:44:20 - INFO - __main__ -   Step: 3140, LR: 1.3803029943328463e-05, Loss: 692.977294921875
2024-08-03T15:44:33.025392667Z 
 33%|███▎      | 3141/9500 [10:47:02<21:56:11, 12.42s/it]08/03/2024 08:44:33 - INFO - __main__ -   Step: 3141, LR: 1.3800859399641183e-05, Loss: 801.6572265625
2024-08-03T15:44:45.221144288Z 
 33%|███▎      | 3142/9500 [10:47:15<21:48:53, 12.35s/it]08/03/2024 08:44:45 - INFO - __main__ -   Step: 3142, LR: 1.3798688855953903e-05, Loss: 651.3468017578125
2024-08-03T15:44:57.066003520Z 
 33%|███▎      | 3143/9500 [10:47:27<21:32:34, 12.20s/it]08/03/2024 08:44:57 - INFO - __main__ -   Step: 3143, LR: 1.3796518312266624e-05, Loss: 498.698974609375
2024-08-03T15:45:09.926626098Z 
 33%|███▎      | 3144/9500 [10:47:39<21:53:22, 12.40s/it]08/03/2024 08:45:09 - INFO - __main__ -   Step: 3144, LR: 1.3794347768579346e-05, Loss: 542.562744140625
2024-08-03T15:45:22.312390656Z 
 33%|███▎      | 3145/9500 [10:47:52<21:52:45, 12.39s/it]08/03/2024 08:45:22 - INFO - __main__ -   Step: 3145, LR: 1.3792177224892067e-05, Loss: 694.1804809570312
2024-08-03T15:45:34.368661389Z 
 33%|███▎      | 3146/9500 [10:48:04<21:41:49, 12.29s/it]08/03/2024 08:45:34 - INFO - __main__ -   Step: 3146, LR: 1.3790006681204789e-05, Loss: 591.597900390625
2024-08-03T15:45:47.081881953Z 
 33%|███▎      | 3147/9500 [10:48:17<21:54:58, 12.42s/it]08/03/2024 08:45:47 - INFO - __main__ -   Step: 3147, LR: 1.3787836137517509e-05, Loss: 847.8233032226562
2024-08-03T15:45:59.295429526Z 
 33%|███▎      | 3148/9500 [10:48:29<21:48:14, 12.36s/it]08/03/2024 08:45:59 - INFO - __main__ -   Step: 3148, LR: 1.378566559383023e-05, Loss: 652.2911987304688
2024-08-03T15:46:11.321643372Z 
 33%|███▎      | 3149/9500 [10:48:41<21:37:31, 12.26s/it]08/03/2024 08:46:11 - INFO - __main__ -   Step: 3149, LR: 1.3783495050142952e-05, Loss: 753.7153930664062
2024-08-03T15:46:23.851242066Z 
 33%|███▎      | 3150/9500 [10:48:53<21:45:55, 12.34s/it]08/03/2024 08:46:23 - INFO - __main__ -   Step: 3150, LR: 1.3781324506455673e-05, Loss: 584.40869140625
2024-08-03T15:46:35.768734435Z 
 33%|███▎      | 3151/9500 [10:49:05<21:32:19, 12.21s/it]08/03/2024 08:46:35 - INFO - __main__ -   Step: 3151, LR: 1.3779153962768395e-05, Loss: 576.6199951171875
2024-08-03T15:46:47.896170972Z 
 33%|███▎      | 3152/9500 [10:49:17<21:29:25, 12.19s/it]08/03/2024 08:46:47 - INFO - __main__ -   Step: 3152, LR: 1.3776983419081117e-05, Loss: 594.0687866210938
2024-08-03T15:47:00.378928371Z 
 33%|███▎      | 3153/9500 [10:49:30<21:38:35, 12.28s/it]08/03/2024 08:47:00 - INFO - __main__ -   Step: 3153, LR: 1.3774812875393836e-05, Loss: 650.4851684570312
2024-08-03T15:47:12.717030890Z 
 33%|███▎      | 3154/9500 [10:49:42<21:40:21, 12.29s/it]08/03/2024 08:47:12 - INFO - __main__ -   Step: 3154, LR: 1.3772642331706558e-05, Loss: 501.01898193359375
2024-08-03T15:47:24.896279738Z 
 33%|███▎      | 3155/9500 [10:49:54<21:36:29, 12.26s/it]08/03/2024 08:47:24 - INFO - __main__ -   Step: 3155, LR: 1.3770471788019278e-05, Loss: 599.717041015625
2024-08-03T15:47:37.570117760Z 
 33%|███▎      | 3156/9500 [10:50:07<21:49:24, 12.38s/it]08/03/2024 08:47:37 - INFO - __main__ -   Step: 3156, LR: 1.3768301244331998e-05, Loss: 431.0151672363281
2024-08-03T15:47:49.657484915Z 
 33%|███▎      | 3157/9500 [10:50:19<21:39:48, 12.30s/it]08/03/2024 08:47:49 - INFO - __main__ -   Step: 3157, LR: 1.376613070064472e-05, Loss: 611.1248779296875
2024-08-03T15:48:01.804932365Z 
 33%|███▎      | 3158/9500 [10:50:31<21:34:54, 12.25s/it]08/03/2024 08:48:01 - INFO - __main__ -   Step: 3158, LR: 1.3763960156957441e-05, Loss: 640.6128540039062
2024-08-03T15:48:14.433381478Z 
 33%|███▎      | 3159/9500 [10:50:44<21:46:40, 12.36s/it]08/03/2024 08:48:14 - INFO - __main__ -   Step: 3159, LR: 1.3761789613270162e-05, Loss: 666.0052490234375
2024-08-03T15:48:26.525512827Z 
 33%|███▎      | 3160/9500 [10:50:56<21:37:51, 12.28s/it]08/03/2024 08:48:26 - INFO - __main__ -   Step: 3160, LR: 1.3759619069582884e-05, Loss: 602.832763671875
2024-08-03T15:48:38.696535266Z 
 33%|███▎      | 3161/9500 [10:51:08<21:34:07, 12.25s/it]08/03/2024 08:48:38 - INFO - __main__ -   Step: 3161, LR: 1.3757448525895606e-05, Loss: 649.2708129882812
2024-08-03T15:48:51.434553500Z 
 33%|███▎      | 3162/9500 [10:51:21<21:49:21, 12.40s/it]08/03/2024 08:48:51 - INFO - __main__ -   Step: 3162, LR: 1.3755277982208325e-05, Loss: 704.4490966796875
2024-08-03T15:49:03.422014987Z 
 33%|███▎      | 3163/9500 [10:51:33<21:36:16, 12.27s/it]08/03/2024 08:49:03 - INFO - __main__ -   Step: 3163, LR: 1.3753107438521047e-05, Loss: 454.59722900390625
2024-08-03T15:49:15.811424870Z 
 33%|███▎      | 3164/9500 [10:51:45<21:39:44, 12.31s/it]08/03/2024 08:49:15 - INFO - __main__ -   Step: 3164, LR: 1.3750936894833769e-05, Loss: 670.9586791992188
2024-08-03T15:49:28.199362775Z 
 33%|███▎      | 3165/9500 [10:51:58<21:42:04, 12.33s/it]08/03/2024 08:49:28 - INFO - __main__ -   Step: 3165, LR: 1.374876635114649e-05, Loss: 493.732177734375
2024-08-03T15:49:40.322936344Z 
 33%|███▎      | 3166/9500 [10:52:10<21:35:14, 12.27s/it]08/03/2024 08:49:40 - INFO - __main__ -   Step: 3166, LR: 1.3746595807459212e-05, Loss: 600.8390502929688
2024-08-03T15:49:52.795803106Z 
 33%|███▎      | 3167/9500 [10:52:22<21:41:29, 12.33s/it]08/03/2024 08:49:52 - INFO - __main__ -   Step: 3167, LR: 1.3744425263771932e-05, Loss: 547.8928833007812
2024-08-03T15:50:05.137483919Z 
 33%|███▎      | 3168/9500 [10:52:35<21:41:37, 12.33s/it]08/03/2024 08:50:05 - INFO - __main__ -   Step: 3168, LR: 1.3742254720084651e-05, Loss: 531.302490234375
2024-08-03T15:50:17.278687548Z 
 33%|███▎      | 3169/9500 [10:52:47<21:35:20, 12.28s/it]08/03/2024 08:50:17 - INFO - __main__ -   Step: 3169, LR: 1.3740084176397373e-05, Loss: 656.6751708984375
2024-08-03T15:50:29.297449808Z 
 33%|███▎      | 3170/9500 [10:52:59<21:26:59, 12.20s/it]08/03/2024 08:50:29 - INFO - __main__ -   Step: 3170, LR: 1.3737913632710095e-05, Loss: 495.34039306640625
2024-08-03T15:50:41.526580853Z 
 33%|███▎      | 3171/9500 [10:53:11<21:27:44, 12.21s/it]08/03/2024 08:50:41 - INFO - __main__ -   Step: 3171, LR: 1.3735743089022814e-05, Loss: 570.3980712890625
2024-08-03T15:50:53.972327881Z 
 33%|███▎      | 3172/9500 [10:53:23<21:35:03, 12.28s/it]08/03/2024 08:50:53 - INFO - __main__ -   Step: 3172, LR: 1.3733572545335536e-05, Loss: 694.0765991210938
2024-08-03T15:51:06.118153233Z 
 33%|███▎      | 3173/9500 [10:53:36<21:30:38, 12.24s/it]08/03/2024 08:51:06 - INFO - __main__ -   Step: 3173, LR: 1.3731402001648258e-05, Loss: 617.201416015625
2024-08-03T15:51:18.459395905Z 
 33%|███▎      | 3174/9500 [10:53:48<21:33:39, 12.27s/it]08/03/2024 08:51:18 - INFO - __main__ -   Step: 3174, LR: 1.3729231457960979e-05, Loss: 473.71514892578125
2024-08-03T15:51:30.483132316Z 
 33%|███▎      | 3175/9500 [10:54:00<21:25:39, 12.20s/it]08/03/2024 08:51:30 - INFO - __main__ -   Step: 3175, LR: 1.37270609142737e-05, Loss: 676.8186645507812
2024-08-03T15:51:42.907169777Z 
 33%|███▎      | 3176/9500 [10:54:12<21:32:40, 12.26s/it]08/03/2024 08:51:42 - INFO - __main__ -   Step: 3176, LR: 1.372489037058642e-05, Loss: 672.3824462890625
2024-08-03T15:51:55.108368497Z 
 33%|███▎      | 3177/9500 [10:54:25<21:30:27, 12.25s/it]08/03/2024 08:51:55 - INFO - __main__ -   Step: 3177, LR: 1.3722719826899142e-05, Loss: 638.349365234375
2024-08-03T15:52:07.883033675Z 
 33%|███▎      | 3178/9500 [10:54:37<21:46:59, 12.40s/it]08/03/2024 08:52:07 - INFO - __main__ -   Step: 3178, LR: 1.3720549283211864e-05, Loss: 622.4253540039062
2024-08-03T15:52:19.920742234Z 
 33%|███▎      | 3179/9500 [10:54:49<21:35:12, 12.29s/it]08/03/2024 08:52:19 - INFO - __main__ -   Step: 3179, LR: 1.3718378739524585e-05, Loss: 432.4075927734375
2024-08-03T15:52:31.737372118Z 
 33%|███▎      | 3180/9500 [10:55:01<21:19:54, 12.15s/it]08/03/2024 08:52:31 - INFO - __main__ -   Step: 3180, LR: 1.3716208195837307e-05, Loss: 474.8403015136719
2024-08-03T15:52:44.157950038Z 
 33%|███▎      | 3181/9500 [10:55:14<21:28:12, 12.23s/it]08/03/2024 08:52:44 - INFO - __main__ -   Step: 3181, LR: 1.3714037652150027e-05, Loss: 724.5255126953125
2024-08-03T15:52:56.008213861Z 
 33%|███▎      | 3182/9500 [10:55:25<21:15:57, 12.12s/it]08/03/2024 08:52:56 - INFO - __main__ -   Step: 3182, LR: 1.3711867108462746e-05, Loss: 533.354248046875
2024-08-03T15:53:08.223845841Z 
 34%|███▎      | 3183/9500 [10:55:38<21:18:51, 12.15s/it]08/03/2024 08:53:08 - INFO - __main__ -   Step: 3183, LR: 1.3709696564775468e-05, Loss: 750.47216796875
2024-08-03T15:53:20.795122713Z 
 34%|███▎      | 3184/9500 [10:55:50<21:32:03, 12.27s/it]08/03/2024 08:53:20 - INFO - __main__ -   Step: 3184, LR: 1.370752602108819e-05, Loss: 810.0335083007812
2024-08-03T15:53:33.166131933Z 
 34%|███▎      | 3185/9500 [10:56:03<21:34:54, 12.30s/it]08/03/2024 08:53:33 - INFO - __main__ -   Step: 3185, LR: 1.370535547740091e-05, Loss: 435.978515625
2024-08-03T15:53:45.352510271Z 
 34%|███▎      | 3186/9500 [10:56:15<21:31:00, 12.27s/it]08/03/2024 08:53:45 - INFO - __main__ -   Step: 3186, LR: 1.3703184933713631e-05, Loss: 685.4624633789062
2024-08-03T15:53:57.861123824Z 
 34%|███▎      | 3187/9500 [10:56:27<21:38:24, 12.34s/it]08/03/2024 08:53:57 - INFO - __main__ -   Step: 3187, LR: 1.3701014390026353e-05, Loss: 543.5746459960938
2024-08-03T15:54:10.120943722Z 
 34%|███▎      | 3188/9500 [10:56:40<21:35:39, 12.32s/it]08/03/2024 08:54:10 - INFO - __main__ -   Step: 3188, LR: 1.3698843846339074e-05, Loss: 770.4926147460938
2024-08-03T15:54:22.106660478Z 
 34%|███▎      | 3189/9500 [10:56:52<21:25:01, 12.22s/it]08/03/2024 08:54:22 - INFO - __main__ -   Step: 3189, LR: 1.3696673302651796e-05, Loss: 627.7584228515625
2024-08-03T15:54:34.504510472Z 
 34%|███▎      | 3190/9500 [10:57:04<21:30:31, 12.27s/it]08/03/2024 08:54:34 - INFO - __main__ -   Step: 3190, LR: 1.3694502758964516e-05, Loss: 516.4581298828125
2024-08-03T15:54:46.705385967Z 
 34%|███▎      | 3191/9500 [10:57:16<21:28:06, 12.25s/it]08/03/2024 08:54:46 - INFO - __main__ -   Step: 3191, LR: 1.3692332215277237e-05, Loss: 616.4983520507812
2024-08-03T15:54:58.907572573Z 
 34%|███▎      | 3192/9500 [10:57:28<21:26:22, 12.24s/it]08/03/2024 08:54:58 - INFO - __main__ -   Step: 3192, LR: 1.3690161671589959e-05, Loss: 739.4127807617188
2024-08-03T15:55:11.482342299Z 
 34%|███▎      | 3193/9500 [10:57:41<21:36:52, 12.34s/it]08/03/2024 08:55:11 - INFO - __main__ -   Step: 3193, LR: 1.368799112790268e-05, Loss: 672.6964721679688
2024-08-03T15:55:24.013532671Z 
 34%|███▎      | 3194/9500 [10:57:53<21:42:46, 12.40s/it]08/03/2024 08:55:24 - INFO - __main__ -   Step: 3194, LR: 1.3685820584215402e-05, Loss: 526.39794921875
2024-08-03T15:55:36.586454912Z 
 34%|███▎      | 3195/9500 [10:58:06<21:48:09, 12.45s/it]08/03/2024 08:55:36 - INFO - __main__ -   Step: 3195, LR: 1.3683650040528123e-05, Loss: 637.0789184570312
2024-08-03T15:55:49.229851807Z 
 34%|███▎      | 3196/9500 [10:58:19<21:54:05, 12.51s/it]08/03/2024 08:55:49 - INFO - __main__ -   Step: 3196, LR: 1.3681479496840842e-05, Loss: 650.478271484375
2024-08-03T15:56:01.603628451Z 
 34%|███▎      | 3197/9500 [10:58:31<21:49:40, 12.47s/it]08/03/2024 08:56:01 - INFO - __main__ -   Step: 3197, LR: 1.3679308953153563e-05, Loss: 823.4449462890625
2024-08-03T15:56:13.837429035Z 
 34%|███▎      | 3198/9500 [10:58:43<21:42:06, 12.40s/it]08/03/2024 08:56:13 - INFO - __main__ -   Step: 3198, LR: 1.3677138409466285e-05, Loss: 644.2479858398438
2024-08-03T15:56:26.418340802Z 
 34%|███▎      | 3199/9500 [10:58:56<21:47:41, 12.45s/it]08/03/2024 08:56:26 - INFO - __main__ -   Step: 3199, LR: 1.3674967865779004e-05, Loss: 631.6779174804688
2024-08-03T15:56:38.365346693Z 
 34%|███▎      | 3200/9500 [10:59:08<21:31:34, 12.30s/it]08/03/2024 08:56:38 - INFO - __main__ -   Step: 3200, LR: 1.3672797322091726e-05, Loss: 485.2007751464844
2024-08-03T15:56:50.727143060Z 
 34%|███▎      | 3201/9500 [10:59:20<21:33:16, 12.32s/it]08/03/2024 08:56:50 - INFO - __main__ -   Step: 3201, LR: 1.3670626778404448e-05, Loss: 713.0665893554688
2024-08-03T15:57:03.445061105Z 
 34%|███▎      | 3202/9500 [10:59:33<21:45:39, 12.44s/it]08/03/2024 08:57:03 - INFO - __main__ -   Step: 3202, LR: 1.366845623471717e-05, Loss: 668.6055908203125
2024-08-03T15:57:15.699897962Z 
 34%|███▎      | 3203/9500 [10:59:45<21:39:39, 12.38s/it]08/03/2024 08:57:15 - INFO - __main__ -   Step: 3203, LR: 1.366628569102989e-05, Loss: 630.6578369140625
2024-08-03T15:57:28.216783107Z 
 34%|███▎      | 3204/9500 [10:59:58<21:43:38, 12.42s/it]08/03/2024 08:57:28 - INFO - __main__ -   Step: 3204, LR: 1.3664115147342612e-05, Loss: 788.5787963867188
2024-08-03T15:57:40.784922142Z 
 34%|███▎      | 3205/9500 [11:00:10<21:47:58, 12.47s/it]08/03/2024 08:57:40 - INFO - __main__ -   Step: 3205, LR: 1.3661944603655332e-05, Loss: 747.9804077148438
2024-08-03T15:57:52.938983969Z 
 34%|███▎      | 3206/9500 [11:00:22<21:37:56, 12.37s/it]08/03/2024 08:57:52 - INFO - __main__ -   Step: 3206, LR: 1.3659774059968054e-05, Loss: 647.5379638671875
2024-08-03T15:58:05.441059155Z 
 34%|███▍      | 3207/9500 [11:00:35<21:41:47, 12.41s/it]08/03/2024 08:58:05 - INFO - __main__ -   Step: 3207, LR: 1.3657603516280775e-05, Loss: 575.477294921875
2024-08-03T15:58:17.905554009Z 
 34%|███▍      | 3208/9500 [11:00:47<21:43:14, 12.43s/it]08/03/2024 08:58:17 - INFO - __main__ -   Step: 3208, LR: 1.3655432972593497e-05, Loss: 547.9956665039062
2024-08-03T15:58:30.053510584Z 
 34%|███▍      | 3209/9500 [11:00:59<21:34:14, 12.34s/it]08/03/2024 08:58:30 - INFO - __main__ -   Step: 3209, LR: 1.3653262428906218e-05, Loss: 582.5501708984375
2024-08-03T15:58:42.233757357Z 
 34%|███▍      | 3210/9500 [11:01:12<21:28:52, 12.29s/it]08/03/2024 08:58:42 - INFO - __main__ -   Step: 3210, LR: 1.3651091885218937e-05, Loss: 740.4022216796875
2024-08-03T15:58:54.648450297Z 
 34%|███▍      | 3211/9500 [11:01:24<21:32:27, 12.33s/it]08/03/2024 08:58:54 - INFO - __main__ -   Step: 3211, LR: 1.3648921341531658e-05, Loss: 805.3836669921875
2024-08-03T15:59:06.831873121Z 
 34%|███▍      | 3212/9500 [11:01:36<21:27:37, 12.29s/it]08/03/2024 08:59:06 - INFO - __main__ -   Step: 3212, LR: 1.364675079784438e-05, Loss: 596.7607421875
2024-08-03T15:59:19.225087661Z 
 34%|███▍      | 3213/9500 [11:01:49<21:30:46, 12.32s/it]08/03/2024 08:59:19 - INFO - __main__ -   Step: 3213, LR: 1.3644580254157101e-05, Loss: 777.7227783203125
2024-08-03T15:59:31.738619421Z 
 34%|███▍      | 3214/9500 [11:02:01<21:36:41, 12.38s/it]08/03/2024 08:59:31 - INFO - __main__ -   Step: 3214, LR: 1.3642409710469821e-05, Loss: 552.5513916015625
2024-08-03T15:59:44.046140658Z 
 34%|███▍      | 3215/9500 [11:02:13<21:34:18, 12.36s/it]08/03/2024 08:59:44 - INFO - __main__ -   Step: 3215, LR: 1.3640239166782543e-05, Loss: 536.1051025390625
2024-08-03T15:59:56.228062086Z 
 34%|███▍      | 3216/9500 [11:02:26<21:28:37, 12.30s/it]08/03/2024 08:59:56 - INFO - __main__ -   Step: 3216, LR: 1.3638068623095264e-05, Loss: 530.3621215820312
2024-08-03T16:00:08.878866854Z 
 34%|███▍      | 3217/9500 [11:02:38<21:39:19, 12.41s/it]08/03/2024 09:00:08 - INFO - __main__ -   Step: 3217, LR: 1.3635898079407986e-05, Loss: 642.2227783203125
2024-08-03T16:00:20.948288379Z 
 34%|███▍      | 3218/9500 [11:02:50<21:28:28, 12.31s/it]08/03/2024 09:00:20 - INFO - __main__ -   Step: 3218, LR: 1.3633727535720707e-05, Loss: 543.359619140625
2024-08-03T16:00:32.890578900Z 
 34%|███▍      | 3219/9500 [11:03:02<21:16:50, 12.20s/it]08/03/2024 09:00:32 - INFO - __main__ -   Step: 3219, LR: 1.3631556992033427e-05, Loss: 469.7815246582031
2024-08-03T16:00:45.167454429Z 
 34%|███▍      | 3220/9500 [11:03:15<21:19:07, 12.22s/it]08/03/2024 09:00:45 - INFO - __main__ -   Step: 3220, LR: 1.3629386448346149e-05, Loss: 637.835205078125
2024-08-03T16:00:57.745807329Z 
 34%|███▍      | 3221/9500 [11:03:27<21:30:09, 12.33s/it]08/03/2024 09:00:57 - INFO - __main__ -   Step: 3221, LR: 1.362721590465887e-05, Loss: 629.5361328125
2024-08-03T16:01:10.042317368Z 
 34%|███▍      | 3222/9500 [11:03:39<21:28:54, 12.32s/it]08/03/2024 09:01:10 - INFO - __main__ -   Step: 3222, LR: 1.3625045360971592e-05, Loss: 541.3695068359375
2024-08-03T16:01:22.261643110Z 
 34%|███▍      | 3223/9500 [11:03:52<21:25:38, 12.29s/it]08/03/2024 09:01:22 - INFO - __main__ -   Step: 3223, LR: 1.3622874817284313e-05, Loss: 683.8201293945312
2024-08-03T16:01:34.592435508Z 
 34%|███▍      | 3224/9500 [11:04:04<21:26:43, 12.30s/it]08/03/2024 09:01:34 - INFO - __main__ -   Step: 3224, LR: 1.3620704273597032e-05, Loss: 612.616455078125
2024-08-03T16:01:46.659035766Z 
 34%|███▍      | 3225/9500 [11:04:16<21:19:10, 12.23s/it]08/03/2024 09:01:46 - INFO - __main__ -   Step: 3225, LR: 1.3618533729909753e-05, Loss: 491.5042419433594
2024-08-03T16:01:58.757791338Z 
 34%|███▍      | 3226/9500 [11:04:28<21:14:48, 12.19s/it]08/03/2024 09:01:58 - INFO - __main__ -   Step: 3226, LR: 1.3616363186222475e-05, Loss: 558.8967895507812
2024-08-03T16:02:11.577394284Z 
 34%|███▍      | 3227/9500 [11:04:41<21:34:17, 12.38s/it]08/03/2024 09:02:11 - INFO - __main__ -   Step: 3227, LR: 1.3614192642535196e-05, Loss: 684.6946411132812
2024-08-03T16:02:23.828601283Z 
 34%|███▍      | 3228/9500 [11:04:53<21:30:04, 12.34s/it]08/03/2024 09:02:23 - INFO - __main__ -   Step: 3228, LR: 1.3612022098847916e-05, Loss: 519.381103515625
2024-08-03T16:02:36.029051011Z 
 34%|███▍      | 3229/9500 [11:05:05<21:25:27, 12.30s/it]08/03/2024 09:02:36 - INFO - __main__ -   Step: 3229, LR: 1.3609851555160638e-05, Loss: 559.8364868164062
2024-08-03T16:02:48.435998264Z 
 34%|███▍      | 3230/9500 [11:05:18<21:28:38, 12.33s/it]08/03/2024 09:02:48 - INFO - __main__ -   Step: 3230, LR: 1.360768101147336e-05, Loss: 630.283447265625
2024-08-03T16:03:00.886169135Z 
 34%|███▍      | 3231/9500 [11:05:30<21:32:08, 12.37s/it]08/03/2024 09:03:00 - INFO - __main__ -   Step: 3231, LR: 1.3605510467786081e-05, Loss: 617.853515625
2024-08-03T16:03:13.367665408Z 
 34%|███▍      | 3232/9500 [11:05:43<21:35:31, 12.40s/it]08/03/2024 09:03:13 - INFO - __main__ -   Step: 3232, LR: 1.3603339924098802e-05, Loss: 675.1028442382812
2024-08-03T16:03:26.180232687Z 
 34%|███▍      | 3233/9500 [11:05:56<21:48:11, 12.52s/it]08/03/2024 09:03:26 - INFO - __main__ -   Step: 3233, LR: 1.3601169380411522e-05, Loss: 589.486328125
2024-08-03T16:03:38.314818365Z 
 34%|███▍      | 3234/9500 [11:06:08<21:35:46, 12.41s/it]08/03/2024 09:03:38 - INFO - __main__ -   Step: 3234, LR: 1.3598998836724244e-05, Loss: 553.158203125
2024-08-03T16:03:50.511438752Z 
 34%|███▍      | 3235/9500 [11:06:20<21:28:57, 12.34s/it]08/03/2024 09:03:50 - INFO - __main__ -   Step: 3235, LR: 1.3596828293036965e-05, Loss: 531.02978515625
2024-08-03T16:04:03.057157370Z 
 34%|███▍      | 3236/9500 [11:06:32<21:35:03, 12.40s/it]08/03/2024 09:04:03 - INFO - __main__ -   Step: 3236, LR: 1.3594657749349687e-05, Loss: 543.9038696289062
2024-08-03T16:04:15.205377121Z 
 34%|███▍      | 3237/9500 [11:06:45<21:26:48, 12.33s/it]08/03/2024 09:04:15 - INFO - __main__ -   Step: 3237, LR: 1.3592487205662409e-05, Loss: 693.638671875
2024-08-03T16:04:27.524291911Z 
 34%|███▍      | 3238/9500 [11:06:57<21:26:19, 12.33s/it]08/03/2024 09:04:27 - INFO - __main__ -   Step: 3238, LR: 1.3590316661975127e-05, Loss: 589.6802368164062
2024-08-03T16:04:39.944403310Z 
 34%|███▍      | 3239/9500 [11:07:09<21:29:06, 12.35s/it]08/03/2024 09:04:39 - INFO - __main__ -   Step: 3239, LR: 1.3588146118287848e-05, Loss: 492.7981872558594
2024-08-03T16:04:52.060216123Z 
 34%|███▍      | 3240/9500 [11:07:21<21:21:27, 12.28s/it]08/03/2024 09:04:52 - INFO - __main__ -   Step: 3240, LR: 1.358597557460057e-05, Loss: 496.0231628417969
2024-08-03T16:05:04.057491311Z 
 34%|███▍      | 3241/9500 [11:07:33<21:12:20, 12.20s/it]08/03/2024 09:05:04 - INFO - __main__ -   Step: 3241, LR: 1.3583805030913291e-05, Loss: 598.44580078125
2024-08-03T16:05:16.641074930Z 
 34%|███▍      | 3242/9500 [11:07:46<21:24:13, 12.31s/it]08/03/2024 09:05:16 - INFO - __main__ -   Step: 3242, LR: 1.3581634487226011e-05, Loss: 713.5611572265625
2024-08-03T16:05:28.928080747Z 
 34%|███▍      | 3243/9500 [11:07:58<21:23:13, 12.31s/it]08/03/2024 09:05:28 - INFO - __main__ -   Step: 3243, LR: 1.3579463943538733e-05, Loss: 537.1875
2024-08-03T16:05:41.314457649Z 
 34%|███▍      | 3244/9500 [11:08:11<21:25:33, 12.33s/it]08/03/2024 09:05:41 - INFO - __main__ -   Step: 3244, LR: 1.3577293399851454e-05, Loss: 481.70672607421875
2024-08-03T16:05:53.701579764Z 
 34%|███▍      | 3245/9500 [11:08:23<21:27:08, 12.35s/it]08/03/2024 09:05:53 - INFO - __main__ -   Step: 3245, LR: 1.3575122856164176e-05, Loss: 669.01318359375
2024-08-03T16:06:06.045394714Z 
 34%|███▍      | 3246/9500 [11:08:35<21:26:51, 12.35s/it]08/03/2024 09:06:06 - INFO - __main__ -   Step: 3246, LR: 1.3572952312476897e-05, Loss: 613.0028686523438
2024-08-03T16:06:18.209472893Z 
 34%|███▍      | 3247/9500 [11:08:48<21:20:57, 12.29s/it]08/03/2024 09:06:18 - INFO - __main__ -   Step: 3247, LR: 1.3570781768789617e-05, Loss: 729.4217529296875
2024-08-03T16:06:31.229239918Z 
 34%|███▍      | 3248/9500 [11:09:01<21:43:31, 12.51s/it]08/03/2024 09:06:31 - INFO - __main__ -   Step: 3248, LR: 1.3568611225102339e-05, Loss: 806.031005859375
2024-08-03T16:06:43.428101722Z 
 34%|███▍      | 3249/9500 [11:09:13<21:33:35, 12.42s/it]08/03/2024 09:06:43 - INFO - __main__ -   Step: 3249, LR: 1.356644068141506e-05, Loss: 534.1134033203125
2024-08-03T16:06:55.366221606Z 
 34%|███▍      | 3250/9500 [11:09:25<21:18:26, 12.27s/it]08/03/2024 09:06:55 - INFO - __main__ -   Step: 3250, LR: 1.3564270137727782e-05, Loss: 489.71337890625
2024-08-03T16:07:07.931509989Z 
 34%|███▍      | 3251/9500 [11:09:37<21:27:22, 12.36s/it]08/03/2024 09:07:07 - INFO - __main__ -   Step: 3251, LR: 1.3562099594040504e-05, Loss: 513.623291015625
2024-08-03T16:07:19.944095550Z 
 34%|███▍      | 3252/9500 [11:09:49<21:16:15, 12.26s/it]08/03/2024 09:07:19 - INFO - __main__ -   Step: 3252, LR: 1.3559929050353222e-05, Loss: 494.37664794921875
2024-08-03T16:07:32.097536170Z 
 34%|███▍      | 3253/9500 [11:10:02<21:12:52, 12.23s/it]08/03/2024 09:07:32 - INFO - __main__ -   Step: 3253, LR: 1.3557758506665943e-05, Loss: 479.09820556640625
2024-08-03T16:07:44.559578436Z 
 34%|███▍      | 3254/9500 [11:10:14<21:20:03, 12.30s/it]08/03/2024 09:07:44 - INFO - __main__ -   Step: 3254, LR: 1.3555587962978665e-05, Loss: 550.9089965820312
2024-08-03T16:07:56.636423258Z 
 34%|███▍      | 3255/9500 [11:10:26<21:13:00, 12.23s/it]08/03/2024 09:07:56 - INFO - __main__ -   Step: 3255, LR: 1.3553417419291386e-05, Loss: 724.9603271484375
2024-08-03T16:08:09.127151085Z 
 34%|███▍      | 3256/9500 [11:10:39<21:20:55, 12.31s/it]08/03/2024 09:08:09 - INFO - __main__ -   Step: 3256, LR: 1.3551246875604106e-05, Loss: 634.2608642578125
2024-08-03T16:08:21.720261342Z 
 34%|███▍      | 3257/9500 [11:10:51<21:29:35, 12.39s/it]08/03/2024 09:08:21 - INFO - __main__ -   Step: 3257, LR: 1.3549076331916828e-05, Loss: 723.4827880859375
2024-08-03T16:08:33.910978607Z 
 34%|███▍      | 3258/9500 [11:11:03<21:23:02, 12.33s/it]08/03/2024 09:08:33 - INFO - __main__ -   Step: 3258, LR: 1.354690578822955e-05, Loss: 666.3857421875
2024-08-03T16:08:46.171944268Z 
 34%|███▍      | 3259/9500 [11:11:16<21:20:35, 12.31s/it]08/03/2024 09:08:46 - INFO - __main__ -   Step: 3259, LR: 1.3544735244542271e-05, Loss: 700.9581298828125
2024-08-03T16:08:59.060069389Z 
 34%|███▍      | 3260/9500 [11:11:28<21:38:22, 12.48s/it]08/03/2024 09:08:59 - INFO - __main__ -   Step: 3260, LR: 1.3542564700854993e-05, Loss: 506.540771484375
2024-08-03T16:09:11.295795279Z 
 34%|███▍      | 3261/9500 [11:11:41<21:30:24, 12.41s/it]08/03/2024 09:09:11 - INFO - __main__ -   Step: 3261, LR: 1.3540394157167714e-05, Loss: 506.47283935546875
2024-08-03T16:09:23.601632045Z 
 34%|███▍      | 3262/9500 [11:11:53<21:26:57, 12.38s/it]08/03/2024 09:09:23 - INFO - __main__ -   Step: 3262, LR: 1.3538223613480434e-05, Loss: 668.4935302734375
2024-08-03T16:09:35.821043450Z 
 34%|███▍      | 3263/9500 [11:12:05<21:21:47, 12.33s/it]08/03/2024 09:09:35 - INFO - __main__ -   Step: 3263, LR: 1.3536053069793156e-05, Loss: 724.63330078125
2024-08-03T16:09:48.512054427Z 
 34%|███▍      | 3264/9500 [11:12:18<21:32:48, 12.44s/it]08/03/2024 09:09:48 - INFO - __main__ -   Step: 3264, LR: 1.3533882526105877e-05, Loss: 768.0220947265625
2024-08-03T16:10:00.435528641Z 
 34%|███▍      | 3265/9500 [11:12:30<21:16:32, 12.28s/it]08/03/2024 09:10:00 - INFO - __main__ -   Step: 3265, LR: 1.3531711982418599e-05, Loss: 559.4732666015625
2024-08-03T16:10:12.616212214Z 
 34%|███▍      | 3266/9500 [11:12:42<21:13:05, 12.25s/it]08/03/2024 09:10:12 - INFO - __main__ -   Step: 3266, LR: 1.3529541438731317e-05, Loss: 647.52685546875
2024-08-03T16:10:25.174095513Z 
 34%|███▍      | 3267/9500 [11:12:55<21:22:23, 12.34s/it]08/03/2024 09:10:25 - INFO - __main__ -   Step: 3267, LR: 1.3527370895044038e-05, Loss: 550.996826171875
2024-08-03T16:10:37.464770044Z 
 34%|███▍      | 3268/9500 [11:13:07<21:20:31, 12.33s/it]08/03/2024 09:10:37 - INFO - __main__ -   Step: 3268, LR: 1.352520035135676e-05, Loss: 516.1375732421875
2024-08-03T16:10:49.397151343Z 
 34%|███▍      | 3269/9500 [11:13:19<21:07:58, 12.21s/it]08/03/2024 09:10:49 - INFO - __main__ -   Step: 3269, LR: 1.3523029807669481e-05, Loss: 672.186279296875
2024-08-03T16:11:01.828874285Z 
 34%|███▍      | 3270/9500 [11:13:31<21:14:41, 12.28s/it]08/03/2024 09:11:01 - INFO - __main__ -   Step: 3270, LR: 1.3520859263982203e-05, Loss: 537.1012573242188
2024-08-03T16:11:13.977257829Z 
 34%|███▍      | 3271/9500 [11:13:43<21:10:30, 12.24s/it]08/03/2024 09:11:13 - INFO - __main__ -   Step: 3271, LR: 1.3518688720294923e-05, Loss: 651.8900146484375
2024-08-03T16:11:26.286888853Z 
 34%|███▍      | 3272/9500 [11:13:56<21:12:30, 12.26s/it]08/03/2024 09:11:26 - INFO - __main__ -   Step: 3272, LR: 1.3516518176607644e-05, Loss: 556.4161376953125
2024-08-03T16:11:39.048341336Z 
 34%|███▍      | 3273/9500 [11:14:08<21:27:56, 12.41s/it]08/03/2024 09:11:39 - INFO - __main__ -   Step: 3273, LR: 1.3514347632920366e-05, Loss: 618.2757568359375
2024-08-03T16:11:51.412727212Z 
 34%|███▍      | 3274/9500 [11:14:21<21:26:19, 12.40s/it]08/03/2024 09:11:51 - INFO - __main__ -   Step: 3274, LR: 1.3512177089233088e-05, Loss: 713.8699951171875
2024-08-03T16:12:03.784243187Z 
 34%|███▍      | 3275/9500 [11:14:33<21:25:21, 12.39s/it]08/03/2024 09:12:03 - INFO - __main__ -   Step: 3275, LR: 1.3510006545545809e-05, Loss: 663.35205078125
2024-08-03T16:12:16.039924474Z 
 34%|███▍      | 3276/9500 [11:14:45<21:20:59, 12.35s/it]08/03/2024 09:12:16 - INFO - __main__ -   Step: 3276, LR: 1.3507836001858529e-05, Loss: 544.9253540039062
2024-08-03T16:12:28.338093596Z 
 34%|███▍      | 3277/9500 [11:14:58<21:19:11, 12.33s/it]08/03/2024 09:12:28 - INFO - __main__ -   Step: 3277, LR: 1.350566545817125e-05, Loss: 611.1641845703125
2024-08-03T16:12:40.525631647Z 
 35%|███▍      | 3278/9500 [11:15:10<21:14:28, 12.29s/it]08/03/2024 09:12:40 - INFO - __main__ -   Step: 3278, LR: 1.3503494914483972e-05, Loss: 535.7161865234375
2024-08-03T16:12:53.023241982Z 
 35%|███▍      | 3279/9500 [11:15:22<21:20:43, 12.35s/it]08/03/2024 09:12:53 - INFO - __main__ -   Step: 3279, LR: 1.3501324370796694e-05, Loss: 591.03466796875
2024-08-03T16:13:05.025418905Z 
 35%|███▍      | 3280/9500 [11:15:34<21:09:37, 12.25s/it]08/03/2024 09:13:05 - INFO - __main__ -   Step: 3280, LR: 1.3499153827109412e-05, Loss: 576.0272216796875
2024-08-03T16:13:17.445220089Z 
 35%|███▍      | 3281/9500 [11:15:47<21:14:47, 12.30s/it]08/03/2024 09:13:17 - INFO - __main__ -   Step: 3281, LR: 1.3496983283422133e-05, Loss: 542.7906494140625
2024-08-03T16:13:29.766928834Z 
 35%|███▍      | 3282/9500 [11:15:59<21:15:17, 12.31s/it]08/03/2024 09:13:29 - INFO - __main__ -   Step: 3282, LR: 1.3494812739734855e-05, Loss: 495.4105529785156
2024-08-03T16:13:42.047202304Z 
 35%|███▍      | 3283/9500 [11:16:11<21:14:17, 12.30s/it]08/03/2024 09:13:42 - INFO - __main__ -   Step: 3283, LR: 1.3492642196047577e-05, Loss: 643.67578125
2024-08-03T16:13:54.547551777Z 
 35%|███▍      | 3284/9500 [11:16:24<21:20:22, 12.36s/it]08/03/2024 09:13:54 - INFO - __main__ -   Step: 3284, LR: 1.3490471652360298e-05, Loss: 811.4847412109375
2024-08-03T16:14:06.973661778Z 
 35%|███▍      | 3285/9500 [11:16:36<21:22:15, 12.38s/it]08/03/2024 09:14:06 - INFO - __main__ -   Step: 3285, LR: 1.3488301108673018e-05, Loss: 489.6197509765625
2024-08-03T16:14:19.207399503Z 
 35%|███▍      | 3286/9500 [11:16:49<21:17:32, 12.34s/it]08/03/2024 09:14:19 - INFO - __main__ -   Step: 3286, LR: 1.348613056498574e-05, Loss: 687.0477905273438
2024-08-03T16:14:31.308138547Z 
 35%|███▍      | 3287/9500 [11:17:01<21:10:01, 12.26s/it]08/03/2024 09:14:31 - INFO - __main__ -   Step: 3287, LR: 1.3483960021298461e-05, Loss: 571.1743774414062
2024-08-03T16:14:43.928212612Z 
 35%|███▍      | 3288/9500 [11:17:13<21:20:52, 12.37s/it]08/03/2024 09:14:43 - INFO - __main__ -   Step: 3288, LR: 1.3481789477611183e-05, Loss: 501.69305419921875
2024-08-03T16:14:55.966649281Z 
 35%|███▍      | 3289/9500 [11:17:25<21:10:19, 12.27s/it]08/03/2024 09:14:55 - INFO - __main__ -   Step: 3289, LR: 1.3479618933923904e-05, Loss: 527.0340576171875
2024-08-03T16:15:08.285167930Z 
 35%|███▍      | 3290/9500 [11:17:38<21:11:34, 12.29s/it]08/03/2024 09:15:08 - INFO - __main__ -   Step: 3290, LR: 1.3477448390236624e-05, Loss: 524.2672729492188
2024-08-03T16:15:20.899204264Z 
 35%|███▍      | 3291/9500 [11:17:50<21:21:33, 12.38s/it]08/03/2024 09:15:20 - INFO - __main__ -   Step: 3291, LR: 1.3475277846549346e-05, Loss: 700.404052734375
2024-08-03T16:15:33.030873839Z 
 35%|███▍      | 3292/9500 [11:18:02<21:13:30, 12.31s/it]08/03/2024 09:15:33 - INFO - __main__ -   Step: 3292, LR: 1.3473107302862067e-05, Loss: 564.6087646484375
2024-08-03T16:15:45.071434295Z 
 35%|███▍      | 3293/9500 [11:18:15<21:04:59, 12.23s/it]08/03/2024 09:15:45 - INFO - __main__ -   Step: 3293, LR: 1.3470936759174789e-05, Loss: 628.4088134765625
2024-08-03T16:15:57.915800415Z 
 35%|███▍      | 3294/9500 [11:18:27<21:23:54, 12.41s/it]08/03/2024 09:15:57 - INFO - __main__ -   Step: 3294, LR: 1.3468766215487507e-05, Loss: 585.350830078125
2024-08-03T16:16:10.049061579Z 
 35%|███▍      | 3295/9500 [11:18:39<21:15:01, 12.33s/it]08/03/2024 09:16:10 - INFO - __main__ -   Step: 3295, LR: 1.3466595671800228e-05, Loss: 550.18408203125
2024-08-03T16:16:23.171512521Z 
 35%|███▍      | 3296/9500 [11:18:53<21:39:26, 12.57s/it]08/03/2024 09:16:23 - INFO - __main__ -   Step: 3296, LR: 1.346442512811295e-05, Loss: 654.3175659179688
2024-08-03T16:16:35.978669914Z 
 35%|███▍      | 3297/9500 [11:19:05<21:46:40, 12.64s/it]08/03/2024 09:16:35 - INFO - __main__ -   Step: 3297, LR: 1.3462254584425672e-05, Loss: 756.3986206054688
2024-08-03T16:16:48.240262110Z 
 35%|███▍      | 3298/9500 [11:19:18<21:34:45, 12.53s/it]08/03/2024 09:16:48 - INFO - __main__ -   Step: 3298, LR: 1.3460084040738393e-05, Loss: 682.2261962890625
2024-08-03T16:17:00.557790938Z 
 35%|███▍      | 3299/9500 [11:19:30<21:28:04, 12.46s/it]08/03/2024 09:17:00 - INFO - __main__ -   Step: 3299, LR: 1.3457913497051113e-05, Loss: 720.228759765625
2024-08-03T16:17:13.273392388Z 
 35%|███▍      | 3300/9500 [11:19:43<21:35:41, 12.54s/it]08/03/2024 09:17:13 - INFO - __main__ -   Step: 3300, LR: 1.3455742953363835e-05, Loss: 671.0584106445312
2024-08-03T16:17:25.434567100Z 
 35%|███▍      | 3301/9500 [11:19:55<21:23:46, 12.43s/it]08/03/2024 09:17:25 - INFO - __main__ -   Step: 3301, LR: 1.3453572409676556e-05, Loss: 622.4952392578125
2024-08-03T16:17:37.544644970Z 
 35%|███▍      | 3302/9500 [11:20:07<21:13:47, 12.33s/it]08/03/2024 09:17:37 - INFO - __main__ -   Step: 3302, LR: 1.3451401865989278e-05, Loss: 615.3453369140625
2024-08-03T16:17:50.088535098Z 
 35%|███▍      | 3303/9500 [11:20:20<21:20:10, 12.39s/it]08/03/2024 09:17:50 - INFO - __main__ -   Step: 3303, LR: 1.3449231322302e-05, Loss: 768.501220703125
2024-08-03T16:18:02.151707769Z 
 35%|███▍      | 3304/9500 [11:20:32<21:09:41, 12.30s/it]08/03/2024 09:18:02 - INFO - __main__ -   Step: 3304, LR: 1.344706077861472e-05, Loss: 503.3655090332031
2024-08-03T16:18:14.157791694Z 
 35%|███▍      | 3305/9500 [11:20:44<21:00:32, 12.21s/it]08/03/2024 09:18:14 - INFO - __main__ -   Step: 3305, LR: 1.344489023492744e-05, Loss: 581.0720825195312
2024-08-03T16:18:26.306366413Z 
 35%|███▍      | 3306/9500 [11:20:56<20:58:28, 12.19s/it]08/03/2024 09:18:26 - INFO - __main__ -   Step: 3306, LR: 1.3442719691240162e-05, Loss: 572.9718017578125
2024-08-03T16:18:39.161562499Z 
 35%|███▍      | 3307/9500 [11:21:09<21:18:51, 12.39s/it]08/03/2024 09:18:39 - INFO - __main__ -   Step: 3307, LR: 1.3440549147552884e-05, Loss: 556.00244140625
2024-08-03T16:18:51.605072203Z 
 35%|███▍      | 3308/9500 [11:21:21<21:20:18, 12.41s/it]08/03/2024 09:18:51 - INFO - __main__ -   Step: 3308, LR: 1.3438378603865602e-05, Loss: 741.419677734375
2024-08-03T16:19:04.086481501Z 
 35%|███▍      | 3309/9500 [11:21:34<21:22:25, 12.43s/it]08/03/2024 09:19:04 - INFO - __main__ -   Step: 3309, LR: 1.3436208060178324e-05, Loss: 685.5953369140625
2024-08-03T16:19:16.565739134Z 
 35%|███▍      | 3310/9500 [11:21:46<21:23:47, 12.44s/it]08/03/2024 09:19:16 - INFO - __main__ -   Step: 3310, LR: 1.3434037516491045e-05, Loss: 529.02734375
2024-08-03T16:19:28.857588745Z 
 35%|███▍      | 3311/9500 [11:21:58<21:18:52, 12.40s/it]08/03/2024 09:19:28 - INFO - __main__ -   Step: 3311, LR: 1.3431866972803767e-05, Loss: 743.0900268554688
2024-08-03T16:19:41.087602978Z 
 35%|███▍      | 3312/9500 [11:22:11<21:13:28, 12.35s/it]08/03/2024 09:19:41 - INFO - __main__ -   Step: 3312, LR: 1.3429696429116488e-05, Loss: 642.9962158203125
2024-08-03T16:19:53.548938876Z 
 35%|███▍      | 3313/9500 [11:22:23<21:16:46, 12.38s/it]08/03/2024 09:19:53 - INFO - __main__ -   Step: 3313, LR: 1.342752588542921e-05, Loss: 685.677001953125
2024-08-03T16:20:05.727113245Z 
 35%|███▍      | 3314/9500 [11:22:35<21:10:16, 12.32s/it]08/03/2024 09:20:05 - INFO - __main__ -   Step: 3314, LR: 1.342535534174193e-05, Loss: 665.4891357421875
2024-08-03T16:20:17.919986885Z 
 35%|███▍      | 3315/9500 [11:22:47<21:06:07, 12.28s/it]08/03/2024 09:20:17 - INFO - __main__ -   Step: 3315, LR: 1.3423184798054651e-05, Loss: 552.1026611328125
2024-08-03T16:20:30.486469858Z 
 35%|███▍      | 3316/9500 [11:23:00<21:14:40, 12.37s/it]08/03/2024 09:20:30 - INFO - __main__ -   Step: 3316, LR: 1.3421014254367373e-05, Loss: 574.4884033203125
2024-08-03T16:20:42.786957248Z 
 35%|███▍      | 3317/9500 [11:23:12<21:12:24, 12.35s/it]08/03/2024 09:20:42 - INFO - __main__ -   Step: 3317, LR: 1.3418843710680094e-05, Loss: 741.366455078125
2024-08-03T16:20:54.945056758Z 
 35%|███▍      | 3318/9500 [11:23:24<21:06:20, 12.29s/it]08/03/2024 09:20:54 - INFO - __main__ -   Step: 3318, LR: 1.3416673166992816e-05, Loss: 642.8533935546875
2024-08-03T16:21:07.516872506Z 
 35%|███▍      | 3319/9500 [11:23:37<21:14:49, 12.37s/it]08/03/2024 09:21:07 - INFO - __main__ -   Step: 3319, LR: 1.3414502623305536e-05, Loss: 670.927001953125
2024-08-03T16:21:19.877947458Z 
 35%|███▍      | 3320/9500 [11:23:49<21:14:11, 12.37s/it]08/03/2024 09:21:19 - INFO - __main__ -   Step: 3320, LR: 1.3412332079618257e-05, Loss: 613.2376708984375
2024-08-03T16:21:32.069656378Z 
 35%|███▍      | 3321/9500 [11:24:02<21:08:27, 12.32s/it]08/03/2024 09:21:32 - INFO - __main__ -   Step: 3321, LR: 1.3410161535930979e-05, Loss: 531.0047607421875
2024-08-03T16:21:44.896481503Z 
 35%|███▍      | 3322/9500 [11:24:14<21:23:59, 12.47s/it]08/03/2024 09:21:44 - INFO - __main__ -   Step: 3322, LR: 1.3407990992243699e-05, Loss: 623.3855590820312
2024-08-03T16:21:57.060001659Z 
 35%|███▍      | 3323/9500 [11:24:26<21:14:19, 12.38s/it]08/03/2024 09:21:57 - INFO - __main__ -   Step: 3323, LR: 1.3405820448556419e-05, Loss: 527.986083984375
2024-08-03T16:22:09.357635047Z 
 35%|███▍      | 3324/9500 [11:24:39<21:11:37, 12.35s/it]08/03/2024 09:22:09 - INFO - __main__ -   Step: 3324, LR: 1.340364990486914e-05, Loss: 497.59222412109375
2024-08-03T16:22:21.740312994Z 
 35%|███▌      | 3325/9500 [11:24:51<21:12:19, 12.36s/it]08/03/2024 09:22:21 - INFO - __main__ -   Step: 3325, LR: 1.3401479361181862e-05, Loss: 461.639892578125
2024-08-03T16:22:34.122957320Z 
 35%|███▌      | 3326/9500 [11:25:04<21:12:43, 12.37s/it]08/03/2024 09:22:34 - INFO - __main__ -   Step: 3326, LR: 1.3399308817494583e-05, Loss: 478.0435485839844
2024-08-03T16:22:46.553100457Z 
 35%|███▌      | 3327/9500 [11:25:16<21:14:25, 12.39s/it]08/03/2024 09:22:46 - INFO - __main__ -   Step: 3327, LR: 1.3397138273807305e-05, Loss: 519.161865234375
2024-08-03T16:22:59.114758566Z 
 35%|███▌      | 3328/9500 [11:25:29<21:19:35, 12.44s/it]08/03/2024 09:22:59 - INFO - __main__ -   Step: 3328, LR: 1.3394967730120025e-05, Loss: 610.375244140625
2024-08-03T16:23:11.110057932Z 
 35%|███▌      | 3329/9500 [11:25:41<21:05:41, 12.31s/it]08/03/2024 09:23:11 - INFO - __main__ -   Step: 3329, LR: 1.3392797186432746e-05, Loss: 537.044189453125
2024-08-03T16:23:23.299107577Z 
 35%|███▌      | 3330/9500 [11:25:53<21:01:52, 12.27s/it]08/03/2024 09:23:23 - INFO - __main__ -   Step: 3330, LR: 1.3390626642745468e-05, Loss: 458.43951416015625
2024-08-03T16:23:35.750482618Z 
 35%|███▌      | 3331/9500 [11:26:05<21:07:13, 12.33s/it]08/03/2024 09:23:35 - INFO - __main__ -   Step: 3331, LR: 1.338845609905819e-05, Loss: 416.786376953125
2024-08-03T16:23:47.765303344Z 
 35%|███▌      | 3332/9500 [11:26:17<20:57:27, 12.23s/it]08/03/2024 09:23:47 - INFO - __main__ -   Step: 3332, LR: 1.3386285555370911e-05, Loss: 581.4454956054688
2024-08-03T16:24:00.284600473Z 
 35%|███▌      | 3333/9500 [11:26:30<21:06:06, 12.32s/it]08/03/2024 09:24:00 - INFO - __main__ -   Step: 3333, LR: 1.338411501168363e-05, Loss: 641.1995239257812
2024-08-03T16:24:12.843171767Z 
 35%|███▌      | 3334/9500 [11:26:42<21:13:18, 12.39s/it]08/03/2024 09:24:12 - INFO - __main__ -   Step: 3334, LR: 1.3381944467996352e-05, Loss: 765.6275024414062
2024-08-03T16:24:25.083429325Z 
 35%|███▌      | 3335/9500 [11:26:55<21:08:28, 12.35s/it]08/03/2024 09:24:25 - INFO - __main__ -   Step: 3335, LR: 1.3379773924309072e-05, Loss: 702.71240234375
2024-08-03T16:24:37.723181079Z 
 35%|███▌      | 3336/9500 [11:27:07<21:17:20, 12.43s/it]08/03/2024 09:24:37 - INFO - __main__ -   Step: 3336, LR: 1.3377603380621794e-05, Loss: 487.69403076171875
2024-08-03T16:24:50.353999829Z 
 35%|███▌      | 3337/9500 [11:27:20<21:23:12, 12.49s/it]08/03/2024 09:24:50 - INFO - __main__ -   Step: 3337, LR: 1.3375432836934514e-05, Loss: 699.4873046875
2024-08-03T16:25:02.582663469Z 
 35%|███▌      | 3338/9500 [11:27:32<21:14:52, 12.41s/it]08/03/2024 09:25:02 - INFO - __main__ -   Step: 3338, LR: 1.3373262293247235e-05, Loss: 653.3164672851562
2024-08-03T16:25:15.100472546Z 
 35%|███▌      | 3339/9500 [11:27:45<21:17:53, 12.44s/it]08/03/2024 09:25:15 - INFO - __main__ -   Step: 3339, LR: 1.3371091749559957e-05, Loss: 546.6097412109375
2024-08-03T16:25:27.576927426Z 
 35%|███▌      | 3340/9500 [11:27:57<21:18:38, 12.45s/it]08/03/2024 09:25:27 - INFO - __main__ -   Step: 3340, LR: 1.3368921205872678e-05, Loss: 595.451171875
2024-08-03T16:25:39.616305802Z 
 35%|███▌      | 3341/9500 [11:28:09<21:05:39, 12.33s/it]08/03/2024 09:25:39 - INFO - __main__ -   Step: 3341, LR: 1.33667506621854e-05, Loss: 771.80078125
2024-08-03T16:25:52.054042378Z 
 35%|███▌      | 3342/9500 [11:28:21<21:08:46, 12.36s/it]08/03/2024 09:25:52 - INFO - __main__ -   Step: 3342, LR: 1.336458011849812e-05, Loss: 673.35693359375
2024-08-03T16:26:05.122145934Z 
 35%|███▌      | 3343/9500 [11:28:35<21:30:18, 12.57s/it]08/03/2024 09:26:05 - INFO - __main__ -   Step: 3343, LR: 1.3362409574810841e-05, Loss: 683.759521484375
2024-08-03T16:26:17.341436712Z 
 35%|███▌      | 3344/9500 [11:28:47<21:19:10, 12.47s/it]08/03/2024 09:26:17 - INFO - __main__ -   Step: 3344, LR: 1.3360239031123563e-05, Loss: 693.4080810546875
2024-08-03T16:26:29.561695447Z 
 35%|███▌      | 3345/9500 [11:28:59<21:11:20, 12.39s/it]08/03/2024 09:26:29 - INFO - __main__ -   Step: 3345, LR: 1.3358068487436284e-05, Loss: 547.2620849609375
2024-08-03T16:26:42.232576591Z 
 35%|███▌      | 3346/9500 [11:29:12<21:19:41, 12.48s/it]08/03/2024 09:26:42 - INFO - __main__ -   Step: 3346, LR: 1.3355897943749006e-05, Loss: 695.1602783203125
2024-08-03T16:26:54.375241135Z 
 35%|███▌      | 3347/9500 [11:29:24<21:09:12, 12.38s/it]08/03/2024 09:26:54 - INFO - __main__ -   Step: 3347, LR: 1.3353727400061728e-05, Loss: 670.6411743164062
2024-08-03T16:27:06.666671945Z 
 35%|███▌      | 3348/9500 [11:29:36<21:06:22, 12.35s/it]08/03/2024 09:27:06 - INFO - __main__ -   Step: 3348, LR: 1.3351556856374447e-05, Loss: 638.6522216796875
2024-08-03T16:27:18.854508071Z 
 35%|███▌      | 3349/9500 [11:29:48<21:01:09, 12.30s/it]08/03/2024 09:27:18 - INFO - __main__ -   Step: 3349, LR: 1.3349386312687167e-05, Loss: 481.7460021972656
2024-08-03T16:27:31.517461403Z 
 35%|███▌      | 3350/9500 [11:30:01<21:12:03, 12.41s/it]08/03/2024 09:27:31 - INFO - __main__ -   Step: 3350, LR: 1.3347215768999889e-05, Loss: 570.5753784179688
2024-08-03T16:27:43.704892181Z 
 35%|███▌      | 3351/9500 [11:30:13<21:04:59, 12.34s/it]08/03/2024 09:27:43 - INFO - __main__ -   Step: 3351, LR: 1.3345045225312609e-05, Loss: 689.6837158203125
2024-08-03T16:27:56.160378747Z 
 35%|███▌      | 3352/9500 [11:30:26<21:08:14, 12.38s/it]08/03/2024 09:27:56 - INFO - __main__ -   Step: 3352, LR: 1.334287468162533e-05, Loss: 651.95068359375
2024-08-03T16:28:08.725443155Z 
 35%|███▌      | 3353/9500 [11:30:38<21:13:48, 12.43s/it]08/03/2024 09:28:08 - INFO - __main__ -   Step: 3353, LR: 1.3340704137938052e-05, Loss: 632.5157470703125
2024-08-03T16:28:20.775209388Z 
 35%|███▌      | 3354/9500 [11:30:50<21:01:48, 12.32s/it]08/03/2024 09:28:20 - INFO - __main__ -   Step: 3354, LR: 1.3338533594250773e-05, Loss: 654.140625
2024-08-03T16:28:33.473809255Z 
 35%|███▌      | 3355/9500 [11:31:03<21:13:17, 12.43s/it]08/03/2024 09:28:33 - INFO - __main__ -   Step: 3355, LR: 1.3336363050563495e-05, Loss: 596.510009765625
2024-08-03T16:28:46.046110296Z 
 35%|███▌      | 3356/9500 [11:31:15<21:17:21, 12.47s/it]08/03/2024 09:28:46 - INFO - __main__ -   Step: 3356, LR: 1.3334192506876216e-05, Loss: 687.6110229492188
2024-08-03T16:28:58.457734250Z 
 35%|███▌      | 3357/9500 [11:31:28<21:15:14, 12.46s/it]08/03/2024 09:28:58 - INFO - __main__ -   Step: 3357, LR: 1.3332021963188936e-05, Loss: 595.7822875976562
2024-08-03T16:29:10.966588002Z 
 35%|███▌      | 3358/9500 [11:31:40<21:16:40, 12.47s/it]08/03/2024 09:29:10 - INFO - __main__ -   Step: 3358, LR: 1.3329851419501658e-05, Loss: 639.9382934570312
2024-08-03T16:29:23.719999245Z 
 35%|███▌      | 3359/9500 [11:31:53<21:25:07, 12.56s/it]08/03/2024 09:29:23 - INFO - __main__ -   Step: 3359, LR: 1.332768087581438e-05, Loss: 802.2513427734375
2024-08-03T16:29:35.872598084Z 
 35%|███▌      | 3360/9500 [11:32:05<21:12:31, 12.44s/it]08/03/2024 09:29:35 - INFO - __main__ -   Step: 3360, LR: 1.3325510332127101e-05, Loss: 561.8665771484375
2024-08-03T16:29:48.213160603Z 
 35%|███▌      | 3361/9500 [11:32:18<21:09:24, 12.41s/it]08/03/2024 09:29:48 - INFO - __main__ -   Step: 3361, LR: 1.3323339788439823e-05, Loss: 666.366943359375
2024-08-03T16:30:00.774573511Z 
 35%|███▌      | 3362/9500 [11:32:30<21:13:56, 12.45s/it]08/03/2024 09:30:00 - INFO - __main__ -   Step: 3362, LR: 1.3321169244752542e-05, Loss: 554.6534423828125
2024-08-03T16:30:12.776760360Z 
 35%|███▌      | 3363/9500 [11:32:42<20:59:54, 12.32s/it]08/03/2024 09:30:12 - INFO - __main__ -   Step: 3363, LR: 1.3318998701065262e-05, Loss: 581.7293701171875
2024-08-03T16:30:25.029089893Z 
 35%|███▌      | 3364/9500 [11:32:54<20:57:41, 12.30s/it]08/03/2024 09:30:25 - INFO - __main__ -   Step: 3364, LR: 1.3316828157377984e-05, Loss: 749.8101806640625
2024-08-03T16:30:37.893538135Z 
 35%|███▌      | 3365/9500 [11:33:07<21:14:51, 12.47s/it]08/03/2024 09:30:37 - INFO - __main__ -   Step: 3365, LR: 1.3314657613690705e-05, Loss: 668.5877685546875
2024-08-03T16:30:50.028575776Z 
 35%|███▌      | 3366/9500 [11:33:19<21:04:26, 12.37s/it]08/03/2024 09:30:50 - INFO - __main__ -   Step: 3366, LR: 1.3312487070003425e-05, Loss: 666.3563842773438
2024-08-03T16:31:02.226051324Z 
 35%|███▌      | 3367/9500 [11:33:32<20:59:00, 12.32s/it]08/03/2024 09:31:02 - INFO - __main__ -   Step: 3367, LR: 1.3310316526316147e-05, Loss: 726.3538818359375
2024-08-03T16:31:14.711828118Z 
 35%|███▌      | 3368/9500 [11:33:44<21:03:58, 12.37s/it]08/03/2024 09:31:14 - INFO - __main__ -   Step: 3368, LR: 1.3308145982628868e-05, Loss: 575.77392578125
2024-08-03T16:31:26.599255189Z 
 35%|███▌      | 3369/9500 [11:33:56<20:49:02, 12.22s/it]08/03/2024 09:31:26 - INFO - __main__ -   Step: 3369, LR: 1.330597543894159e-05, Loss: 530.0301513671875
2024-08-03T16:31:38.802542239Z 
 35%|███▌      | 3370/9500 [11:34:08<20:48:13, 12.22s/it]08/03/2024 09:31:38 - INFO - __main__ -   Step: 3370, LR: 1.3303804895254312e-05, Loss: 550.4445190429688
2024-08-03T16:31:51.123225947Z 
 35%|███▌      | 3371/9500 [11:34:21<20:51:10, 12.25s/it]08/03/2024 09:31:51 - INFO - __main__ -   Step: 3371, LR: 1.3301634351567031e-05, Loss: 575.8975219726562
2024-08-03T16:32:03.212729972Z 
 35%|███▌      | 3372/9500 [11:34:33<20:46:05, 12.20s/it]08/03/2024 09:32:03 - INFO - __main__ -   Step: 3372, LR: 1.3299463807879753e-05, Loss: 475.71685791015625
2024-08-03T16:32:15.503123707Z 
 36%|███▌      | 3373/9500 [11:34:45<20:48:38, 12.23s/it]08/03/2024 09:32:15 - INFO - __main__ -   Step: 3373, LR: 1.3297293264192475e-05, Loss: 681.9757080078125
2024-08-03T16:32:28.147920493Z 
 36%|███▌      | 3374/9500 [11:34:58<21:01:12, 12.35s/it]08/03/2024 09:32:28 - INFO - __main__ -   Step: 3374, LR: 1.3295122720505196e-05, Loss: 615.1629028320312
2024-08-03T16:32:40.635738978Z 
 36%|███▌      | 3375/9500 [11:35:10<21:05:09, 12.39s/it]08/03/2024 09:32:40 - INFO - __main__ -   Step: 3375, LR: 1.3292952176817918e-05, Loss: 710.0262451171875
2024-08-03T16:32:52.734453705Z 
 36%|███▌      | 3376/9500 [11:35:22<20:55:55, 12.30s/it]08/03/2024 09:32:52 - INFO - __main__ -   Step: 3376, LR: 1.3290781633130638e-05, Loss: 569.4896240234375
2024-08-03T16:33:05.426969536Z 
 36%|███▌      | 3377/9500 [11:35:35<21:07:34, 12.42s/it]08/03/2024 09:33:05 - INFO - __main__ -   Step: 3377, LR: 1.3288611089443357e-05, Loss: 645.7171630859375
2024-08-03T16:33:18.018665884Z 
 36%|███▌      | 3378/9500 [11:35:47<21:12:35, 12.47s/it]08/03/2024 09:33:18 - INFO - __main__ -   Step: 3378, LR: 1.3286440545756079e-05, Loss: 709.3138427734375
2024-08-03T16:33:29.836441991Z 
 36%|███▌      | 3379/9500 [11:35:59<20:52:21, 12.28s/it]08/03/2024 09:33:29 - INFO - __main__ -   Step: 3379, LR: 1.32842700020688e-05, Loss: 346.89959716796875
2024-08-03T16:33:42.601780728Z 
 36%|███▌      | 3380/9500 [11:36:12<21:07:07, 12.42s/it]08/03/2024 09:33:42 - INFO - __main__ -   Step: 3380, LR: 1.328209945838152e-05, Loss: 705.2039184570312
2024-08-03T16:33:55.072586617Z 
 36%|███▌      | 3381/9500 [11:36:25<21:08:22, 12.44s/it]08/03/2024 09:33:55 - INFO - __main__ -   Step: 3381, LR: 1.3279928914694242e-05, Loss: 582.4063720703125
2024-08-03T16:34:07.326990007Z 
 36%|███▌      | 3382/9500 [11:36:37<21:02:35, 12.38s/it]08/03/2024 09:34:07 - INFO - __main__ -   Step: 3382, LR: 1.3277758371006963e-05, Loss: 601.032470703125
2024-08-03T16:34:19.886425430Z 
 36%|███▌      | 3383/9500 [11:36:49<21:07:48, 12.44s/it]08/03/2024 09:34:19 - INFO - __main__ -   Step: 3383, LR: 1.3275587827319685e-05, Loss: 719.716796875
2024-08-03T16:34:31.882131965Z 
 36%|███▌      | 3384/9500 [11:37:01<20:54:09, 12.30s/it]08/03/2024 09:34:31 - INFO - __main__ -   Step: 3384, LR: 1.3273417283632407e-05, Loss: 584.820068359375
2024-08-03T16:34:43.903569568Z 
 36%|███▌      | 3385/9500 [11:37:13<20:45:18, 12.22s/it]08/03/2024 09:34:43 - INFO - __main__ -   Step: 3385, LR: 1.3271246739945126e-05, Loss: 539.8886108398438
2024-08-03T16:34:56.247213421Z 
 36%|███▌      | 3386/9500 [11:37:26<20:48:55, 12.26s/it]08/03/2024 09:34:56 - INFO - __main__ -   Step: 3386, LR: 1.3269076196257848e-05, Loss: 545.6513671875
2024-08-03T16:35:08.533424501Z 
 36%|███▌      | 3387/9500 [11:37:38<20:49:37, 12.27s/it]08/03/2024 09:35:08 - INFO - __main__ -   Step: 3387, LR: 1.326690565257057e-05, Loss: 792.7964477539062
2024-08-03T16:35:20.905303424Z 
 36%|███▌      | 3388/9500 [11:37:50<20:52:40, 12.30s/it]08/03/2024 09:35:20 - INFO - __main__ -   Step: 3388, LR: 1.3264735108883291e-05, Loss: 668.3883056640625
2024-08-03T16:35:33.639935818Z 
 36%|███▌      | 3389/9500 [11:38:03<21:05:50, 12.43s/it]08/03/2024 09:35:33 - INFO - __main__ -   Step: 3389, LR: 1.3262564565196013e-05, Loss: 605.3853759765625
2024-08-03T16:35:46.038912306Z 
 36%|███▌      | 3390/9500 [11:38:15<21:04:44, 12.42s/it]08/03/2024 09:35:46 - INFO - __main__ -   Step: 3390, LR: 1.3260394021508734e-05, Loss: 517.1732177734375
2024-08-03T16:35:58.008856505Z 
 36%|███▌      | 3391/9500 [11:38:27<20:50:47, 12.28s/it]08/03/2024 09:35:58 - INFO - __main__ -   Step: 3391, LR: 1.3258223477821452e-05, Loss: 506.47705078125
2024-08-03T16:36:10.480727771Z 
 36%|███▌      | 3392/9500 [11:38:40<20:56:18, 12.34s/it]08/03/2024 09:36:10 - INFO - __main__ -   Step: 3392, LR: 1.3256052934134174e-05, Loss: 633.7568359375
2024-08-03T16:36:23.431442801Z 
 36%|███▌      | 3393/9500 [11:38:53<21:14:43, 12.52s/it]08/03/2024 09:36:23 - INFO - __main__ -   Step: 3393, LR: 1.3253882390446896e-05, Loss: 674.8452758789062
2024-08-03T16:36:35.585319988Z 
 36%|███▌      | 3394/9500 [11:39:05<21:03:12, 12.41s/it]08/03/2024 09:36:35 - INFO - __main__ -   Step: 3394, LR: 1.3251711846759615e-05, Loss: 612.052001953125
2024-08-03T16:36:47.889996273Z 
 36%|███▌      | 3395/9500 [11:39:17<20:59:42, 12.38s/it]08/03/2024 09:36:47 - INFO - __main__ -   Step: 3395, LR: 1.3249541303072337e-05, Loss: 704.9684448242188
2024-08-03T16:37:00.529564788Z 
 36%|███▌      | 3396/9500 [11:39:30<21:07:24, 12.46s/it]08/03/2024 09:37:00 - INFO - __main__ -   Step: 3396, LR: 1.3247370759385059e-05, Loss: 736.17431640625
2024-08-03T16:37:12.711666721Z 
 36%|███▌      | 3397/9500 [11:39:42<20:58:46, 12.38s/it]08/03/2024 09:37:12 - INFO - __main__ -   Step: 3397, LR: 1.324520021569778e-05, Loss: 726.4776611328125
2024-08-03T16:37:25.007527249Z 
 36%|███▌      | 3398/9500 [11:39:54<20:56:08, 12.35s/it]08/03/2024 09:37:25 - INFO - __main__ -   Step: 3398, LR: 1.3243029672010502e-05, Loss: 657.203857421875
2024-08-03T16:37:37.582976519Z 
 36%|███▌      | 3399/9500 [11:40:07<21:02:45, 12.42s/it]08/03/2024 09:37:37 - INFO - __main__ -   Step: 3399, LR: 1.3240859128323223e-05, Loss: 638.408203125
2024-08-03T16:37:49.980442006Z 
 36%|███▌      | 3400/9500 [11:40:19<21:01:55, 12.41s/it]08/03/2024 09:37:49 - INFO - __main__ -   Step: 3400, LR: 1.3238688584635943e-05, Loss: 642.6009521484375
2024-08-03T16:38:01.999100961Z 
 36%|███▌      | 3401/9500 [11:40:31<20:49:42, 12.29s/it]08/03/2024 09:38:01 - INFO - __main__ -   Step: 3401, LR: 1.3236518040948665e-05, Loss: 517.0863037109375
2024-08-03T16:38:14.718520749Z 
 36%|███▌      | 3402/9500 [11:40:44<21:02:28, 12.42s/it]08/03/2024 09:38:14 - INFO - __main__ -   Step: 3402, LR: 1.3234347497261386e-05, Loss: 634.595947265625
2024-08-03T16:38:27.094212532Z 
 36%|███▌      | 3403/9500 [11:40:57<21:00:51, 12.41s/it]08/03/2024 09:38:27 - INFO - __main__ -   Step: 3403, LR: 1.3232176953574108e-05, Loss: 632.4705810546875
2024-08-03T16:38:39.304015732Z 
 36%|███▌      | 3404/9500 [11:41:09<20:54:36, 12.35s/it]08/03/2024 09:38:39 - INFO - __main__ -   Step: 3404, LR: 1.323000640988683e-05, Loss: 498.73480224609375
2024-08-03T16:38:51.567799867Z 
 36%|███▌      | 3405/9500 [11:41:21<20:51:49, 12.32s/it]08/03/2024 09:38:51 - INFO - __main__ -   Step: 3405, LR: 1.3227835866199548e-05, Loss: 557.11279296875
2024-08-03T16:39:03.557753870Z 
 36%|███▌      | 3406/9500 [11:41:33<20:41:27, 12.22s/it]08/03/2024 09:39:03 - INFO - __main__ -   Step: 3406, LR: 1.3225665322512269e-05, Loss: 552.1357421875
2024-08-03T16:39:15.991151717Z 
 36%|███▌      | 3407/9500 [11:41:45<20:47:37, 12.29s/it]08/03/2024 09:39:15 - INFO - __main__ -   Step: 3407, LR: 1.322349477882499e-05, Loss: 562.7657470703125
2024-08-03T16:39:28.463876510Z 
 36%|███▌      | 3408/9500 [11:41:58<20:53:09, 12.34s/it]08/03/2024 09:39:28 - INFO - __main__ -   Step: 3408, LR: 1.3221324235137712e-05, Loss: 652.6112060546875
2024-08-03T16:39:40.482641039Z 
 36%|███▌      | 3409/9500 [11:42:10<20:43:05, 12.25s/it]08/03/2024 09:39:40 - INFO - __main__ -   Step: 3409, LR: 1.3219153691450432e-05, Loss: 669.1842041015625
2024-08-03T16:39:52.508909851Z 
 36%|███▌      | 3410/9500 [11:42:22<20:36:13, 12.18s/it]08/03/2024 09:39:52 - INFO - __main__ -   Step: 3410, LR: 1.3216983147763154e-05, Loss: 578.4890747070312
2024-08-03T16:40:05.012923398Z 
 36%|███▌      | 3411/9500 [11:42:34<20:45:54, 12.28s/it]08/03/2024 09:40:05 - INFO - __main__ -   Step: 3411, LR: 1.3214812604075875e-05, Loss: 670.77294921875
2024-08-03T16:40:17.016934305Z 
 36%|███▌      | 3412/9500 [11:42:46<20:37:23, 12.20s/it]08/03/2024 09:40:17 - INFO - __main__ -   Step: 3412, LR: 1.3212642060388597e-05, Loss: 493.7540588378906
2024-08-03T16:40:29.216961993Z 
 36%|███▌      | 3413/9500 [11:42:59<20:37:20, 12.20s/it]08/03/2024 09:40:29 - INFO - __main__ -   Step: 3413, LR: 1.3210471516701318e-05, Loss: 504.1318359375
2024-08-03T16:40:41.532619996Z 
 36%|███▌      | 3414/9500 [11:43:11<20:40:45, 12.23s/it]08/03/2024 09:40:41 - INFO - __main__ -   Step: 3414, LR: 1.3208300973014038e-05, Loss: 524.9892578125
2024-08-03T16:40:53.719706271Z 
 36%|███▌      | 3415/9500 [11:43:23<20:39:11, 12.22s/it]08/03/2024 09:40:53 - INFO - __main__ -   Step: 3415, LR: 1.320613042932676e-05, Loss: 670.7486572265625
2024-08-03T16:41:05.895989746Z 
 36%|███▌      | 3416/9500 [11:43:35<20:37:41, 12.21s/it]08/03/2024 09:41:05 - INFO - __main__ -   Step: 3416, LR: 1.3203959885639481e-05, Loss: 610.8726806640625
2024-08-03T16:41:18.388739864Z 
 36%|███▌      | 3417/9500 [11:43:48<20:46:12, 12.29s/it]08/03/2024 09:41:18 - INFO - __main__ -   Step: 3417, LR: 1.3201789341952203e-05, Loss: 568.4306030273438
2024-08-03T16:41:30.790456654Z 
 36%|███▌      | 3418/9500 [11:44:00<20:49:19, 12.32s/it]08/03/2024 09:41:30 - INFO - __main__ -   Step: 3418, LR: 1.3199618798264924e-05, Loss: 679.3624267578125
2024-08-03T16:41:42.836135340Z 
 36%|███▌      | 3419/9500 [11:44:12<20:40:38, 12.24s/it]08/03/2024 09:41:42 - INFO - __main__ -   Step: 3419, LR: 1.3197448254577643e-05, Loss: 468.1561279296875
2024-08-03T16:41:55.601569499Z 
 36%|███▌      | 3420/9500 [11:44:25<20:56:22, 12.40s/it]08/03/2024 09:41:55 - INFO - __main__ -   Step: 3420, LR: 1.3195277710890364e-05, Loss: 538.4956665039062
2024-08-03T16:42:07.689450432Z 
 36%|███▌      | 3421/9500 [11:44:37<20:46:43, 12.31s/it]08/03/2024 09:42:07 - INFO - __main__ -   Step: 3421, LR: 1.3193107167203086e-05, Loss: 702.901123046875
2024-08-03T16:42:19.670625146Z 
 36%|███▌      | 3422/9500 [11:44:49<20:36:39, 12.21s/it]08/03/2024 09:42:19 - INFO - __main__ -   Step: 3422, LR: 1.3190936623515807e-05, Loss: 538.5119018554688
2024-08-03T16:42:32.318301743Z 
 36%|███▌      | 3423/9500 [11:45:02<20:49:49, 12.34s/it]08/03/2024 09:42:32 - INFO - __main__ -   Step: 3423, LR: 1.3188766079828527e-05, Loss: 583.3956298828125
2024-08-03T16:42:44.457179016Z 
 36%|███▌      | 3424/9500 [11:45:14<20:43:31, 12.28s/it]08/03/2024 09:42:44 - INFO - __main__ -   Step: 3424, LR: 1.3186595536141249e-05, Loss: 565.7166137695312
2024-08-03T16:42:56.494279303Z 
 36%|███▌      | 3425/9500 [11:45:26<20:35:56, 12.21s/it]08/03/2024 09:42:56 - INFO - __main__ -   Step: 3425, LR: 1.318442499245397e-05, Loss: 649.722900390625
2024-08-03T16:43:09.234001562Z 
 36%|███▌      | 3426/9500 [11:45:39<20:51:55, 12.37s/it]08/03/2024 09:43:09 - INFO - __main__ -   Step: 3426, LR: 1.3182254448766692e-05, Loss: 617.2797241210938
2024-08-03T16:43:21.400589474Z 
 36%|███▌      | 3427/9500 [11:45:51<20:45:37, 12.31s/it]08/03/2024 09:43:21 - INFO - __main__ -   Step: 3427, LR: 1.3180083905079413e-05, Loss: 508.117431640625
2024-08-03T16:43:33.877260212Z 
 36%|███▌      | 3428/9500 [11:46:03<20:50:36, 12.36s/it]08/03/2024 09:43:33 - INFO - __main__ -   Step: 3428, LR: 1.3177913361392133e-05, Loss: 663.592529296875
2024-08-03T16:43:46.569291266Z 
 36%|███▌      | 3429/9500 [11:46:16<21:00:32, 12.46s/it]08/03/2024 09:43:46 - INFO - __main__ -   Step: 3429, LR: 1.3175742817704855e-05, Loss: 620.8985595703125
2024-08-03T16:43:58.852509108Z 
 36%|███▌      | 3430/9500 [11:46:28<20:55:01, 12.41s/it]08/03/2024 09:43:58 - INFO - __main__ -   Step: 3430, LR: 1.3173572274017576e-05, Loss: 539.3875122070312
2024-08-03T16:44:11.070678711Z 
 36%|███▌      | 3431/9500 [11:46:41<20:49:08, 12.35s/it]08/03/2024 09:44:11 - INFO - __main__ -   Step: 3431, LR: 1.3171401730330298e-05, Loss: 702.6771850585938
2024-08-03T16:44:23.814106814Z 
 36%|███▌      | 3432/9500 [11:46:53<21:00:51, 12.47s/it]08/03/2024 09:44:23 - INFO - __main__ -   Step: 3432, LR: 1.316923118664302e-05, Loss: 676.3447875976562
2024-08-03T16:44:35.857384932Z 
 36%|███▌      | 3433/9500 [11:47:05<20:47:48, 12.34s/it]08/03/2024 09:44:35 - INFO - __main__ -   Step: 3433, LR: 1.3167060642955738e-05, Loss: 700.8531494140625
2024-08-03T16:44:47.891615977Z 
 36%|███▌      | 3434/9500 [11:47:17<20:38:19, 12.25s/it]08/03/2024 09:44:47 - INFO - __main__ -   Step: 3434, LR: 1.316489009926846e-05, Loss: 677.1923828125
2024-08-03T16:45:00.022615464Z 
 36%|███▌      | 3435/9500 [11:47:29<20:34:33, 12.21s/it]08/03/2024 09:45:00 - INFO - __main__ -   Step: 3435, LR: 1.316271955558118e-05, Loss: 478.51214599609375
2024-08-03T16:45:12.632605118Z 
 36%|███▌      | 3436/9500 [11:47:42<20:46:23, 12.33s/it]08/03/2024 09:45:12 - INFO - __main__ -   Step: 3436, LR: 1.3160549011893902e-05, Loss: 630.21484375
2024-08-03T16:45:24.410582817Z 
 36%|███▌      | 3437/9500 [11:47:54<20:29:22, 12.17s/it]08/03/2024 09:45:24 - INFO - __main__ -   Step: 3437, LR: 1.3158378468206622e-05, Loss: 435.99615478515625
2024-08-03T16:45:36.445354972Z 
 36%|███▌      | 3438/9500 [11:48:06<20:25:11, 12.13s/it]08/03/2024 09:45:36 - INFO - __main__ -   Step: 3438, LR: 1.3156207924519344e-05, Loss: 513.927490234375
2024-08-03T16:45:48.982331513Z 
 36%|███▌      | 3439/9500 [11:48:18<20:37:25, 12.25s/it]08/03/2024 09:45:48 - INFO - __main__ -   Step: 3439, LR: 1.3154037380832065e-05, Loss: 608.134521484375
2024-08-03T16:46:00.945149047Z 
 36%|███▌      | 3440/9500 [11:48:30<20:28:31, 12.16s/it]08/03/2024 09:46:00 - INFO - __main__ -   Step: 3440, LR: 1.3151866837144787e-05, Loss: 518.5072021484375
2024-08-03T16:46:13.462712996Z 
 36%|███▌      | 3441/9500 [11:48:43<20:39:02, 12.27s/it]08/03/2024 09:46:13 - INFO - __main__ -   Step: 3441, LR: 1.3149696293457508e-05, Loss: 557.347900390625
2024-08-03T16:46:26.014096490Z 
 36%|███▌      | 3442/9500 [11:48:55<20:47:22, 12.35s/it]08/03/2024 09:46:26 - INFO - __main__ -   Step: 3442, LR: 1.3147525749770228e-05, Loss: 652.7640380859375
2024-08-03T16:46:38.085769164Z 
 36%|███▌      | 3443/9500 [11:49:08<20:38:36, 12.27s/it]08/03/2024 09:46:38 - INFO - __main__ -   Step: 3443, LR: 1.314535520608295e-05, Loss: 549.2093505859375
2024-08-03T16:46:50.720460818Z 
 36%|███▋      | 3444/9500 [11:49:20<20:49:27, 12.38s/it]08/03/2024 09:46:50 - INFO - __main__ -   Step: 3444, LR: 1.3143184662395671e-05, Loss: 583.1976318359375
2024-08-03T16:47:03.213736120Z 
 36%|███▋      | 3445/9500 [11:49:33<20:52:39, 12.41s/it]08/03/2024 09:47:03 - INFO - __main__ -   Step: 3445, LR: 1.3141014118708393e-05, Loss: 527.5442504882812
2024-08-03T16:47:15.151837990Z 
 36%|███▋      | 3446/9500 [11:49:45<20:38:08, 12.27s/it]08/03/2024 09:47:15 - INFO - __main__ -   Step: 3446, LR: 1.3138843575021115e-05, Loss: 455.9152526855469
2024-08-03T16:47:27.187853180Z 
 36%|███▋      | 3447/9500 [11:49:57<20:30:49, 12.20s/it]08/03/2024 09:47:27 - INFO - __main__ -   Step: 3447, LR: 1.3136673031333833e-05, Loss: 525.4639892578125
2024-08-03T16:47:39.594377126Z 
 36%|███▋      | 3448/9500 [11:50:09<20:36:51, 12.26s/it]08/03/2024 09:47:39 - INFO - __main__ -   Step: 3448, LR: 1.3134502487646554e-05, Loss: 565.9244384765625
2024-08-03T16:47:51.851392047Z 
 36%|███▋      | 3449/9500 [11:50:21<20:36:28, 12.26s/it]08/03/2024 09:47:51 - INFO - __main__ -   Step: 3449, LR: 1.3132331943959276e-05, Loss: 640.9600830078125
2024-08-03T16:48:04.514020235Z 
 36%|███▋      | 3450/9500 [11:50:34<20:48:25, 12.38s/it]08/03/2024 09:48:04 - INFO - __main__ -   Step: 3450, LR: 1.3130161400271997e-05, Loss: 475.684814453125
2024-08-03T16:48:16.907144648Z 
 36%|███▋      | 3451/9500 [11:50:46<20:48:36, 12.38s/it]08/03/2024 09:48:16 - INFO - __main__ -   Step: 3451, LR: 1.3127990856584717e-05, Loss: 538.6730346679688
2024-08-03T16:48:28.830114656Z 
 36%|███▋      | 3452/9500 [11:50:58<20:34:25, 12.25s/it]08/03/2024 09:48:28 - INFO - __main__ -   Step: 3452, LR: 1.3125820312897439e-05, Loss: 545.5045776367188
2024-08-03T16:48:41.144124625Z 
 36%|███▋      | 3453/9500 [11:51:11<20:36:16, 12.27s/it]08/03/2024 09:48:41 - INFO - __main__ -   Step: 3453, LR: 1.312364976921016e-05, Loss: 647.0784912109375
2024-08-03T16:48:53.685775800Z 
 36%|███▋      | 3454/9500 [11:51:23<20:44:23, 12.35s/it]08/03/2024 09:48:53 - INFO - __main__ -   Step: 3454, LR: 1.3121479225522882e-05, Loss: 728.12109375
2024-08-03T16:49:05.813471668Z 
 36%|███▋      | 3455/9500 [11:51:35<20:37:29, 12.28s/it]08/03/2024 09:49:05 - INFO - __main__ -   Step: 3455, LR: 1.3119308681835603e-05, Loss: 625.3511962890625
2024-08-03T16:49:17.989492153Z 
 36%|███▋      | 3456/9500 [11:51:47<20:34:03, 12.25s/it]08/03/2024 09:49:17 - INFO - __main__ -   Step: 3456, LR: 1.3117138138148325e-05, Loss: 657.4010009765625
2024-08-03T16:49:30.536497396Z 
 36%|███▋      | 3457/9500 [11:52:00<20:42:48, 12.34s/it]08/03/2024 09:49:30 - INFO - __main__ -   Step: 3457, LR: 1.3114967594461045e-05, Loss: 649.7335205078125
2024-08-03T16:49:42.708100997Z 
 36%|███▋      | 3458/9500 [11:52:12<20:37:31, 12.29s/it]08/03/2024 09:49:42 - INFO - __main__ -   Step: 3458, LR: 1.3112797050773766e-05, Loss: 715.1905517578125
2024-08-03T16:49:54.685759959Z 
 36%|███▋      | 3459/9500 [11:52:24<20:27:54, 12.20s/it]08/03/2024 09:49:54 - INFO - __main__ -   Step: 3459, LR: 1.3110626507086488e-05, Loss: 507.482666015625
2024-08-03T16:50:07.289070910Z 
 36%|███▋      | 3460/9500 [11:52:37<20:40:00, 12.32s/it]08/03/2024 09:50:07 - INFO - __main__ -   Step: 3460, LR: 1.310845596339921e-05, Loss: 596.5464477539062
2024-08-03T16:50:19.464012476Z 
 36%|███▋      | 3461/9500 [11:52:49<20:35:28, 12.28s/it]08/03/2024 09:50:19 - INFO - __main__ -   Step: 3461, LR: 1.3106285419711928e-05, Loss: 550.88720703125
2024-08-03T16:50:31.584232357Z 
 36%|███▋      | 3462/9500 [11:53:01<20:30:36, 12.23s/it]08/03/2024 09:50:31 - INFO - __main__ -   Step: 3462, LR: 1.310411487602465e-05, Loss: 649.4738159179688
2024-08-03T16:50:44.255338847Z 
 36%|███▋      | 3463/9500 [11:53:14<20:43:44, 12.36s/it]08/03/2024 09:50:44 - INFO - __main__ -   Step: 3463, LR: 1.3101944332337371e-05, Loss: 604.4137573242188
2024-08-03T16:50:56.241701144Z 
 36%|███▋      | 3464/9500 [11:53:26<20:32:14, 12.25s/it]08/03/2024 09:50:56 - INFO - __main__ -   Step: 3464, LR: 1.3099773788650092e-05, Loss: 553.82470703125
2024-08-03T16:51:08.542202965Z 
 36%|███▋      | 3465/9500 [11:53:38<20:33:35, 12.26s/it]08/03/2024 09:51:08 - INFO - __main__ -   Step: 3465, LR: 1.3097603244962814e-05, Loss: 533.7535400390625
2024-08-03T16:51:21.442146810Z 
 36%|███▋      | 3466/9500 [11:53:51<20:52:33, 12.46s/it]08/03/2024 09:51:21 - INFO - __main__ -   Step: 3466, LR: 1.3095432701275534e-05, Loss: 498.1361999511719
2024-08-03T16:51:33.204314149Z 
 36%|███▋      | 3467/9500 [11:54:03<20:31:27, 12.25s/it]08/03/2024 09:51:33 - INFO - __main__ -   Step: 3467, LR: 1.3093262157588255e-05, Loss: 486.48284912109375
2024-08-03T16:51:45.400910570Z 
 37%|███▋      | 3468/9500 [11:54:15<20:29:43, 12.23s/it]08/03/2024 09:51:45 - INFO - __main__ -   Step: 3468, LR: 1.3091091613900977e-05, Loss: 611.4210205078125
2024-08-03T16:51:57.680752521Z 
 37%|███▋      | 3469/9500 [11:54:27<20:30:57, 12.25s/it]08/03/2024 09:51:57 - INFO - __main__ -   Step: 3469, LR: 1.3088921070213699e-05, Loss: 512.064697265625
2024-08-03T16:52:09.787332251Z 
 37%|███▋      | 3470/9500 [11:54:39<20:26:32, 12.20s/it]08/03/2024 09:52:09 - INFO - __main__ -   Step: 3470, LR: 1.308675052652642e-05, Loss: 642.458251953125
2024-08-03T16:52:22.374886536Z 
 37%|███▋      | 3471/9500 [11:54:52<20:37:53, 12.32s/it]08/03/2024 09:52:22 - INFO - __main__ -   Step: 3471, LR: 1.308457998283914e-05, Loss: 787.0433959960938
2024-08-03T16:52:35.209808183Z 
 37%|███▋      | 3472/9500 [11:55:05<20:53:13, 12.47s/it]08/03/2024 09:52:35 - INFO - __main__ -   Step: 3472, LR: 1.3082409439151861e-05, Loss: 565.8226928710938
2024-08-03T16:52:47.720667256Z 
 37%|███▋      | 3473/9500 [11:55:17<20:54:07, 12.49s/it]08/03/2024 09:52:47 - INFO - __main__ -   Step: 3473, LR: 1.3080238895464583e-05, Loss: 746.9168701171875
2024-08-03T16:52:59.857882788Z 
 37%|███▋      | 3474/9500 [11:55:29<20:43:25, 12.38s/it]08/03/2024 09:52:59 - INFO - __main__ -   Step: 3474, LR: 1.3078068351777305e-05, Loss: 674.7113647460938
2024-08-03T16:53:12.842834816Z 
 37%|███▋      | 3475/9500 [11:55:42<21:01:26, 12.56s/it]08/03/2024 09:53:12 - INFO - __main__ -   Step: 3475, LR: 1.3075897808090023e-05, Loss: 528.5203857421875
2024-08-03T16:53:25.330708527Z 
 37%|███▋      | 3476/9500 [11:55:55<20:58:59, 12.54s/it]08/03/2024 09:53:25 - INFO - __main__ -   Step: 3476, LR: 1.3073727264402744e-05, Loss: 773.34326171875
2024-08-03T16:53:37.823167379Z 
 37%|███▋      | 3477/9500 [11:56:07<20:57:21, 12.53s/it]08/03/2024 09:53:37 - INFO - __main__ -   Step: 3477, LR: 1.3071556720715466e-05, Loss: 662.73583984375
2024-08-03T16:53:49.821012862Z 
 37%|███▋      | 3478/9500 [11:56:19<20:41:15, 12.37s/it]08/03/2024 09:53:49 - INFO - __main__ -   Step: 3478, LR: 1.3069386177028187e-05, Loss: 473.42608642578125
2024-08-03T16:54:02.462514342Z 
 37%|███▋      | 3479/9500 [11:56:32<20:49:17, 12.45s/it]08/03/2024 09:54:02 - INFO - __main__ -   Step: 3479, LR: 1.3067215633340909e-05, Loss: 672.726318359375
2024-08-03T16:54:14.447106531Z 
 37%|███▋      | 3480/9500 [11:56:44<20:35:07, 12.31s/it]08/03/2024 09:54:14 - INFO - __main__ -   Step: 3480, LR: 1.3065045089653629e-05, Loss: 563.37744140625
2024-08-03T16:54:26.545590811Z 
 37%|███▋      | 3481/9500 [11:56:56<20:28:32, 12.25s/it]08/03/2024 09:54:26 - INFO - __main__ -   Step: 3481, LR: 1.306287454596635e-05, Loss: 603.7611083984375
2024-08-03T16:54:39.268213940Z 
 37%|███▋      | 3482/9500 [11:57:09<20:42:39, 12.39s/it]08/03/2024 09:54:39 - INFO - __main__ -   Step: 3482, LR: 1.3060704002279072e-05, Loss: 516.1721801757812
2024-08-03T16:54:51.089466648Z 
 37%|███▋      | 3483/9500 [11:57:21<20:25:21, 12.22s/it]08/03/2024 09:54:51 - INFO - __main__ -   Step: 3483, LR: 1.3058533458591794e-05, Loss: 557.8187866210938
2024-08-03T16:55:03.446554654Z 
 37%|███▋      | 3484/9500 [11:57:33<20:29:18, 12.26s/it]08/03/2024 09:55:03 - INFO - __main__ -   Step: 3484, LR: 1.3056362914904515e-05, Loss: 581.1591796875
2024-08-03T16:55:15.991167341Z 
 37%|███▋      | 3485/9500 [11:57:45<20:37:38, 12.35s/it]08/03/2024 09:55:15 - INFO - __main__ -   Step: 3485, LR: 1.3054192371217235e-05, Loss: 600.3990478515625
2024-08-03T16:55:28.156718257Z 
 37%|███▋      | 3486/9500 [11:57:58<20:32:01, 12.29s/it]08/03/2024 09:55:28 - INFO - __main__ -   Step: 3486, LR: 1.3052021827529957e-05, Loss: 553.97705078125
2024-08-03T16:55:40.063970162Z 
 37%|███▋      | 3487/9500 [11:58:10<20:20:16, 12.18s/it]08/03/2024 09:55:40 - INFO - __main__ -   Step: 3487, LR: 1.3049851283842678e-05, Loss: 557.79296875
2024-08-03T16:55:52.574971083Z 
 37%|███▋      | 3488/9500 [11:58:22<20:30:07, 12.28s/it]08/03/2024 09:55:52 - INFO - __main__ -   Step: 3488, LR: 1.30476807401554e-05, Loss: 491.38922119140625
2024-08-03T16:56:05.144264999Z 
 37%|███▋      | 3489/9500 [11:58:35<20:38:42, 12.36s/it]08/03/2024 09:56:05 - INFO - __main__ -   Step: 3489, LR: 1.3045510196468118e-05, Loss: 684.0484619140625
2024-08-03T16:56:17.468822065Z 
 37%|███▋      | 3490/9500 [11:58:47<20:37:18, 12.35s/it]08/03/2024 09:56:17 - INFO - __main__ -   Step: 3490, LR: 1.304333965278084e-05, Loss: 607.3965454101562
2024-08-03T16:56:30.028417237Z 
 37%|███▋      | 3491/9500 [11:58:59<20:43:19, 12.41s/it]08/03/2024 09:56:30 - INFO - __main__ -   Step: 3491, LR: 1.3041169109093561e-05, Loss: 679.9874267578125
2024-08-03T16:56:42.309026031Z 
 37%|███▋      | 3492/9500 [11:59:12<20:39:05, 12.37s/it]08/03/2024 09:56:42 - INFO - __main__ -   Step: 3492, LR: 1.3038998565406283e-05, Loss: 530.5142822265625
2024-08-03T16:56:54.392872837Z 
 37%|███▋      | 3493/9500 [11:59:24<20:30:09, 12.29s/it]08/03/2024 09:56:54 - INFO - __main__ -   Step: 3493, LR: 1.3036828021719004e-05, Loss: 537.0371704101562
2024-08-03T16:57:06.701905264Z 
 37%|███▋      | 3494/9500 [11:59:36<20:30:36, 12.29s/it]08/03/2024 09:57:06 - INFO - __main__ -   Step: 3494, LR: 1.3034657478031724e-05, Loss: 620.1334228515625
2024-08-03T16:57:18.789498710Z 
 37%|███▋      | 3495/9500 [11:59:48<20:24:13, 12.23s/it]08/03/2024 09:57:18 - INFO - __main__ -   Step: 3495, LR: 1.3032486934344446e-05, Loss: 627.1853637695312
2024-08-03T16:57:31.412634274Z 
 37%|███▋      | 3496/9500 [12:00:01<20:35:45, 12.35s/it]08/03/2024 09:57:31 - INFO - __main__ -   Step: 3496, LR: 1.3030316390657167e-05, Loss: 686.27978515625
2024-08-03T16:57:44.435725121Z 
 37%|███▋      | 3497/9500 [12:00:14<20:55:45, 12.55s/it]08/03/2024 09:57:44 - INFO - __main__ -   Step: 3497, LR: 1.3028145846969889e-05, Loss: 705.80322265625
2024-08-03T16:57:56.433184397Z 
 37%|███▋      | 3498/9500 [12:00:26<20:38:55, 12.39s/it]08/03/2024 09:57:56 - INFO - __main__ -   Step: 3498, LR: 1.302597530328261e-05, Loss: 690.0035400390625
2024-08-03T16:58:08.970206662Z 
 37%|███▋      | 3499/9500 [12:00:38<20:43:17, 12.43s/it]08/03/2024 09:58:08 - INFO - __main__ -   Step: 3499, LR: 1.3023804759595332e-05, Loss: 620.04345703125
2024-08-03T16:58:21.575462522Z 
 37%|███▋      | 3500/9500 [12:00:51<20:48:18, 12.48s/it]08/03/2024 09:58:21 - INFO - __main__ -   Step: 3500, LR: 1.3021634215908052e-05, Loss: 618.1730346679688
2024-08-03T16:58:33.592554957Z 
 37%|███▋      | 3501/9500 [12:01:03<20:34:07, 12.34s/it]08/03/2024 09:58:33 - INFO - __main__ -   Step: 3501, LR: 1.3019463672220773e-05, Loss: 634.6498413085938
2024-08-03T16:58:45.744269683Z 
 37%|███▋      | 3502/9500 [12:01:15<20:28:10, 12.29s/it]08/03/2024 09:58:45 - INFO - __main__ -   Step: 3502, LR: 1.3017293128533493e-05, Loss: 488.0393981933594
2024-08-03T16:58:58.416189347Z 
 37%|███▋      | 3503/9500 [12:01:28<20:39:32, 12.40s/it]08/03/2024 09:58:58 - INFO - __main__ -   Step: 3503, LR: 1.3015122584846213e-05, Loss: 540.734130859375
2024-08-03T16:59:10.547243021Z 
 37%|███▋      | 3504/9500 [12:01:40<20:31:13, 12.32s/it]08/03/2024 09:59:10 - INFO - __main__ -   Step: 3504, LR: 1.3012952041158934e-05, Loss: 566.1946411132812
2024-08-03T16:59:22.608828279Z 
 37%|███▋      | 3505/9500 [12:01:52<20:23:15, 12.24s/it]08/03/2024 09:59:22 - INFO - __main__ -   Step: 3505, LR: 1.3010781497471656e-05, Loss: 595.3314208984375
2024-08-03T16:59:35.430947282Z 
 37%|███▋      | 3506/9500 [12:02:05<20:40:25, 12.42s/it]08/03/2024 09:59:35 - INFO - __main__ -   Step: 3506, LR: 1.3008610953784378e-05, Loss: 677.7882080078125
2024-08-03T16:59:47.502250241Z 
 37%|███▋      | 3507/9500 [12:02:17<20:29:51, 12.31s/it]08/03/2024 09:59:47 - INFO - __main__ -   Step: 3507, LR: 1.3006440410097099e-05, Loss: 594.89599609375
2024-08-03T16:59:59.490081543Z 
 37%|███▋      | 3508/9500 [12:02:29<20:19:55, 12.22s/it]08/03/2024 09:59:59 - INFO - __main__ -   Step: 3508, LR: 1.300426986640982e-05, Loss: 553.5751342773438
2024-08-03T17:00:12.266865853Z 
 37%|███▋      | 3509/9500 [12:02:42<20:36:31, 12.38s/it]08/03/2024 10:00:12 - INFO - __main__ -   Step: 3509, LR: 1.300209932272254e-05, Loss: 757.335693359375
2024-08-03T17:00:24.273084503Z 
 37%|███▋      | 3510/9500 [12:02:54<20:25:00, 12.27s/it]08/03/2024 10:00:24 - INFO - __main__ -   Step: 3510, LR: 1.2999928779035262e-05, Loss: 747.7557373046875
2024-08-03T17:00:36.418579209Z 
 37%|███▋      | 3511/9500 [12:03:06<20:21:03, 12.23s/it]08/03/2024 10:00:36 - INFO - __main__ -   Step: 3511, LR: 1.2997758235347984e-05, Loss: 612.7723999023438
2024-08-03T17:00:49.170772446Z 
 37%|███▋      | 3512/9500 [12:03:19<20:36:23, 12.39s/it]08/03/2024 10:00:49 - INFO - __main__ -   Step: 3512, LR: 1.2995587691660705e-05, Loss: 612.76904296875
2024-08-03T17:01:01.070233679Z 
 37%|███▋      | 3513/9500 [12:03:31<20:21:32, 12.24s/it]08/03/2024 10:01:01 - INFO - __main__ -   Step: 3513, LR: 1.2993417147973427e-05, Loss: 594.7369384765625
2024-08-03T17:01:13.166870455Z 
 37%|███▋      | 3514/9500 [12:03:43<20:16:59, 12.20s/it]08/03/2024 10:01:13 - INFO - __main__ -   Step: 3514, LR: 1.2991246604286147e-05, Loss: 484.4371337890625
2024-08-03T17:01:25.635305149Z 
 37%|███▋      | 3515/9500 [12:03:55<20:24:52, 12.28s/it]08/03/2024 10:01:25 - INFO - __main__ -   Step: 3515, LR: 1.2989076060598868e-05, Loss: 566.647216796875
2024-08-03T17:01:37.570058316Z 
 37%|███▋      | 3516/9500 [12:04:07<20:14:21, 12.18s/it]08/03/2024 10:01:37 - INFO - __main__ -   Step: 3516, LR: 1.2986905516911588e-05, Loss: 565.2178344726562
2024-08-03T17:01:49.484299115Z 
 37%|███▋      | 3517/9500 [12:04:19<20:06:19, 12.10s/it]08/03/2024 10:01:49 - INFO - __main__ -   Step: 3517, LR: 1.298473497322431e-05, Loss: 390.4244384765625
2024-08-03T17:02:02.026460526Z 
 37%|███▋      | 3518/9500 [12:04:31<20:19:25, 12.23s/it]08/03/2024 10:02:02 - INFO - __main__ -   Step: 3518, LR: 1.298256442953703e-05, Loss: 588.38671875
2024-08-03T17:02:14.407788581Z 
 37%|███▋      | 3519/9500 [12:04:44<20:23:42, 12.28s/it]08/03/2024 10:02:14 - INFO - __main__ -   Step: 3519, LR: 1.2980393885849751e-05, Loss: 807.38330078125
2024-08-03T17:02:26.596721316Z 
 37%|███▋      | 3520/9500 [12:04:56<20:20:54, 12.25s/it]08/03/2024 10:02:26 - INFO - __main__ -   Step: 3520, LR: 1.2978223342162473e-05, Loss: 521.2374877929688
2024-08-03T17:02:38.740216901Z 
 37%|███▋      | 3521/9500 [12:05:08<20:17:31, 12.22s/it]08/03/2024 10:02:38 - INFO - __main__ -   Step: 3521, LR: 1.2976052798475194e-05, Loss: 605.4317016601562
2024-08-03T17:02:51.305709155Z 
 37%|███▋      | 3522/9500 [12:05:21<20:27:42, 12.32s/it]08/03/2024 10:02:51 - INFO - __main__ -   Step: 3522, LR: 1.2973882254787916e-05, Loss: 488.8667907714844
2024-08-03T17:03:03.228383456Z 
 37%|███▋      | 3523/9500 [12:05:33<20:15:31, 12.20s/it]08/03/2024 10:03:03 - INFO - __main__ -   Step: 3523, LR: 1.2971711711100636e-05, Loss: 466.7712707519531
2024-08-03T17:03:15.563588539Z 
 37%|███▋      | 3524/9500 [12:05:45<20:19:19, 12.24s/it]08/03/2024 10:03:15 - INFO - __main__ -   Step: 3524, LR: 1.2969541167413357e-05, Loss: 735.4998779296875
2024-08-03T17:03:28.033713706Z 
 37%|███▋      | 3525/9500 [12:05:57<20:25:56, 12.31s/it]08/03/2024 10:03:28 - INFO - __main__ -   Step: 3525, LR: 1.2967370623726079e-05, Loss: 692.979248046875
2024-08-03T17:03:40.010214972Z 
 37%|███▋      | 3526/9500 [12:06:09<20:15:44, 12.21s/it]08/03/2024 10:03:40 - INFO - __main__ -   Step: 3526, LR: 1.29652000800388e-05, Loss: 521.8762817382812
2024-08-03T17:03:52.313729545Z 
 37%|███▋      | 3527/9500 [12:06:22<20:18:19, 12.24s/it]08/03/2024 10:03:52 - INFO - __main__ -   Step: 3527, LR: 1.2963029536351522e-05, Loss: 572.4082641601562
2024-08-03T17:04:04.459663293Z 
 37%|███▋      | 3528/9500 [12:06:34<20:15:21, 12.21s/it]08/03/2024 10:04:04 - INFO - __main__ -   Step: 3528, LR: 1.2960858992664242e-05, Loss: 591.1648559570312
2024-08-03T17:04:16.673848543Z 
 37%|███▋      | 3529/9500 [12:06:46<20:15:16, 12.21s/it]08/03/2024 10:04:16 - INFO - __main__ -   Step: 3529, LR: 1.2958688448976963e-05, Loss: 541.402587890625
2024-08-03T17:04:28.740649860Z 
 37%|███▋      | 3530/9500 [12:06:58<20:10:44, 12.17s/it]08/03/2024 10:04:28 - INFO - __main__ -   Step: 3530, LR: 1.2956517905289683e-05, Loss: 498.84332275390625
2024-08-03T17:04:41.260299671Z 
 37%|███▋      | 3531/9500 [12:07:11<20:21:00, 12.27s/it]08/03/2024 10:04:41 - INFO - __main__ -   Step: 3531, LR: 1.2954347361602405e-05, Loss: 576.1388549804688
2024-08-03T17:04:53.437891047Z 
 37%|███▋      | 3532/9500 [12:07:23<20:17:57, 12.24s/it]08/03/2024 10:04:53 - INFO - __main__ -   Step: 3532, LR: 1.2952176817915125e-05, Loss: 673.3250122070312
2024-08-03T17:05:05.555192609Z 
 37%|███▋      | 3533/9500 [12:07:35<20:13:56, 12.21s/it]08/03/2024 10:05:05 - INFO - __main__ -   Step: 3533, LR: 1.2950006274227846e-05, Loss: 528.7235107421875
2024-08-03T17:05:17.966703632Z 
 37%|███▋      | 3534/9500 [12:07:47<20:19:51, 12.27s/it]08/03/2024 10:05:17 - INFO - __main__ -   Step: 3534, LR: 1.2947835730540568e-05, Loss: 583.242431640625
2024-08-03T17:05:29.930393944Z 
 37%|███▋      | 3535/9500 [12:07:59<20:10:33, 12.18s/it]08/03/2024 10:05:29 - INFO - __main__ -   Step: 3535, LR: 1.294566518685329e-05, Loss: 601.098388671875
2024-08-03T17:05:42.122040077Z 
 37%|███▋      | 3536/9500 [12:08:12<20:10:48, 12.18s/it]08/03/2024 10:05:42 - INFO - __main__ -   Step: 3536, LR: 1.294349464316601e-05, Loss: 451.8821716308594
2024-08-03T17:05:54.771764275Z 
 37%|███▋      | 3537/9500 [12:08:24<20:24:34, 12.32s/it]08/03/2024 10:05:54 - INFO - __main__ -   Step: 3537, LR: 1.294132409947873e-05, Loss: 702.3832397460938
2024-08-03T17:06:06.965460236Z 
 37%|███▋      | 3538/9500 [12:08:36<20:20:32, 12.28s/it]08/03/2024 10:06:06 - INFO - __main__ -   Step: 3538, LR: 1.2939153555791452e-05, Loss: 647.0728759765625
2024-08-03T17:06:18.930068842Z 
 37%|███▋      | 3539/9500 [12:08:48<20:10:51, 12.19s/it]08/03/2024 10:06:18 - INFO - __main__ -   Step: 3539, LR: 1.2936983012104174e-05, Loss: 477.1290283203125
2024-08-03T17:06:31.712553458Z 
 37%|███▋      | 3540/9500 [12:09:01<20:28:22, 12.37s/it]08/03/2024 10:06:31 - INFO - __main__ -   Step: 3540, LR: 1.2934812468416895e-05, Loss: 628.2393798828125
2024-08-03T17:06:44.122175991Z 
 37%|███▋      | 3541/9500 [12:09:14<20:29:27, 12.38s/it]08/03/2024 10:06:44 - INFO - __main__ -   Step: 3541, LR: 1.2932641924729617e-05, Loss: 655.7166137695312
2024-08-03T17:06:56.372826870Z 
 37%|███▋      | 3542/9500 [12:09:26<20:25:25, 12.34s/it]08/03/2024 10:06:56 - INFO - __main__ -   Step: 3542, LR: 1.2930471381042338e-05, Loss: 728.0943603515625
2024-08-03T17:07:09.288756756Z 
 37%|███▋      | 3543/9500 [12:09:39<20:42:21, 12.51s/it]08/03/2024 10:07:09 - INFO - __main__ -   Step: 3543, LR: 1.2928300837355058e-05, Loss: 780.8092651367188
2024-08-03T17:07:21.461299962Z 
 37%|███▋      | 3544/9500 [12:09:51<20:32:00, 12.41s/it]08/03/2024 10:07:21 - INFO - __main__ -   Step: 3544, LR: 1.2926130293667778e-05, Loss: 678.9395751953125
2024-08-03T17:07:33.899746724Z 
 37%|███▋      | 3545/9500 [12:10:03<20:32:36, 12.42s/it]08/03/2024 10:07:33 - INFO - __main__ -   Step: 3545, LR: 1.29239597499805e-05, Loss: 801.5540161132812
2024-08-03T17:07:46.488661144Z 
 37%|███▋      | 3546/9500 [12:10:16<20:37:26, 12.47s/it]08/03/2024 10:07:46 - INFO - __main__ -   Step: 3546, LR: 1.292178920629322e-05, Loss: 589.2211303710938
2024-08-03T17:07:58.454877423Z 
 37%|███▋      | 3547/9500 [12:10:28<20:22:15, 12.32s/it]08/03/2024 10:07:58 - INFO - __main__ -   Step: 3547, LR: 1.2919618662605941e-05, Loss: 527.3916015625
2024-08-03T17:08:10.641876290Z 
 37%|███▋      | 3548/9500 [12:10:40<20:18:06, 12.28s/it]08/03/2024 10:08:10 - INFO - __main__ -   Step: 3548, LR: 1.2917448118918663e-05, Loss: 546.0465087890625
2024-08-03T17:08:22.986223186Z 
 37%|███▋      | 3549/9500 [12:10:52<20:19:50, 12.30s/it]08/03/2024 10:08:22 - INFO - __main__ -   Step: 3549, LR: 1.2915277575231384e-05, Loss: 541.5391845703125
2024-08-03T17:08:35.030052316Z 
 37%|███▋      | 3550/9500 [12:11:04<20:12:03, 12.22s/it]08/03/2024 10:08:35 - INFO - __main__ -   Step: 3550, LR: 1.2913107031544106e-05, Loss: 508.5106201171875
2024-08-03T17:08:47.575694410Z 
 37%|███▋      | 3551/9500 [12:11:17<20:21:26, 12.32s/it]08/03/2024 10:08:47 - INFO - __main__ -   Step: 3551, LR: 1.2910936487856827e-05, Loss: 801.1189575195312
2024-08-03T17:09:00.221364178Z 
 37%|███▋      | 3552/9500 [12:11:30<20:30:58, 12.42s/it]08/03/2024 10:09:00 - INFO - __main__ -   Step: 3552, LR: 1.2908765944169547e-05, Loss: 782.0882568359375
2024-08-03T17:09:12.430833506Z 
 37%|███▋      | 3553/9500 [12:11:42<20:24:34, 12.35s/it]08/03/2024 10:09:12 - INFO - __main__ -   Step: 3553, LR: 1.2906595400482269e-05, Loss: 642.4051513671875
2024-08-03T17:09:24.704878443Z 
 37%|███▋      | 3554/9500 [12:11:54<20:21:57, 12.33s/it]08/03/2024 10:09:24 - INFO - __main__ -   Step: 3554, LR: 1.290442485679499e-05, Loss: 535.8842163085938
2024-08-03T17:09:37.367209151Z 
 37%|███▋      | 3555/9500 [12:12:07<20:31:37, 12.43s/it]08/03/2024 10:09:37 - INFO - __main__ -   Step: 3555, LR: 1.2902254313107712e-05, Loss: 535.74609375
2024-08-03T17:09:49.488996390Z 
 37%|███▋      | 3556/9500 [12:12:19<20:22:15, 12.34s/it]08/03/2024 10:09:49 - INFO - __main__ -   Step: 3556, LR: 1.2900083769420434e-05, Loss: 731.8507080078125
2024-08-03T17:10:01.552631159Z 
 37%|███▋      | 3557/9500 [12:12:31<20:13:52, 12.26s/it]08/03/2024 10:10:01 - INFO - __main__ -   Step: 3557, LR: 1.2897913225733153e-05, Loss: 648.107421875
2024-08-03T17:10:13.890445972Z 
 37%|███▋      | 3558/9500 [12:12:43<20:16:09, 12.28s/it]08/03/2024 10:10:13 - INFO - __main__ -   Step: 3558, LR: 1.2895742682045873e-05, Loss: 546.6482543945312
2024-08-03T17:10:25.815479406Z 
 37%|███▋      | 3559/9500 [12:12:55<20:05:24, 12.17s/it]08/03/2024 10:10:25 - INFO - __main__ -   Step: 3559, LR: 1.2893572138358595e-05, Loss: 610.7991943359375
2024-08-03T17:10:38.067572197Z 
 37%|███▋      | 3560/9500 [12:13:08<20:07:30, 12.20s/it]08/03/2024 10:10:38 - INFO - __main__ -   Step: 3560, LR: 1.2891401594671316e-05, Loss: 495.7877197265625
2024-08-03T17:10:50.361750566Z 
 37%|███▋      | 3561/9500 [12:13:20<20:10:12, 12.23s/it]08/03/2024 10:10:50 - INFO - __main__ -   Step: 3561, LR: 1.2889231050984036e-05, Loss: 503.5067138671875
2024-08-03T17:11:02.784058515Z 
 37%|███▋      | 3562/9500 [12:13:32<20:15:48, 12.29s/it]08/03/2024 10:11:02 - INFO - __main__ -   Step: 3562, LR: 1.2887060507296758e-05, Loss: 647.6451416015625
2024-08-03T17:11:15.131191607Z 
 38%|███▊      | 3563/9500 [12:13:45<20:17:26, 12.30s/it]08/03/2024 10:11:15 - INFO - __main__ -   Step: 3563, LR: 1.288488996360948e-05, Loss: 594.459716796875
2024-08-03T17:11:27.589227133Z 
 38%|███▊      | 3564/9500 [12:13:57<20:21:47, 12.35s/it]08/03/2024 10:11:27 - INFO - __main__ -   Step: 3564, LR: 1.2882719419922201e-05, Loss: 686.3095703125
2024-08-03T17:11:40.279812591Z 
 38%|███▊      | 3565/9500 [12:14:10<20:31:44, 12.45s/it]08/03/2024 10:11:40 - INFO - __main__ -   Step: 3565, LR: 1.2880548876234922e-05, Loss: 678.0179443359375
2024-08-03T17:11:52.467865099Z 
 38%|███▊      | 3566/9500 [12:14:22<20:23:40, 12.37s/it]08/03/2024 10:11:52 - INFO - __main__ -   Step: 3566, LR: 1.2878378332547642e-05, Loss: 601.5592041015625
2024-08-03T17:12:04.620285466Z 
 38%|███▊      | 3567/9500 [12:14:34<20:16:56, 12.31s/it]08/03/2024 10:12:04 - INFO - __main__ -   Step: 3567, LR: 1.2876207788860364e-05, Loss: 582.9093017578125
2024-08-03T17:12:17.098718454Z 
 38%|███▊      | 3568/9500 [12:14:47<20:21:49, 12.36s/it]08/03/2024 10:12:17 - INFO - __main__ -   Step: 3568, LR: 1.2874037245173085e-05, Loss: 617.4830322265625
2024-08-03T17:12:29.407220189Z 
 38%|███▊      | 3569/9500 [12:14:59<20:20:08, 12.34s/it]08/03/2024 10:12:29 - INFO - __main__ -   Step: 3569, LR: 1.2871866701485807e-05, Loss: 529.6593627929688
2024-08-03T17:12:41.552772186Z 
 38%|███▊      | 3570/9500 [12:15:11<20:14:04, 12.28s/it]08/03/2024 10:12:41 - INFO - __main__ -   Step: 3570, LR: 1.2869696157798529e-05, Loss: 612.3740844726562
2024-08-03T17:12:54.347242195Z 
 38%|███▊      | 3571/9500 [12:15:24<20:28:59, 12.44s/it]08/03/2024 10:12:54 - INFO - __main__ -   Step: 3571, LR: 1.2867525614111248e-05, Loss: 649.93310546875
2024-08-03T17:13:06.462711904Z 
 38%|███▊      | 3572/9500 [12:15:36<20:19:15, 12.34s/it]08/03/2024 10:13:06 - INFO - __main__ -   Step: 3572, LR: 1.2865355070423968e-05, Loss: 687.03857421875
2024-08-03T17:13:18.698973844Z 
 38%|███▊      | 3573/9500 [12:15:48<20:15:57, 12.31s/it]08/03/2024 10:13:18 - INFO - __main__ -   Step: 3573, LR: 1.286318452673669e-05, Loss: 557.891845703125
2024-08-03T17:13:31.619227984Z 
 38%|███▊      | 3574/9500 [12:16:01<20:33:51, 12.49s/it]08/03/2024 10:13:31 - INFO - __main__ -   Step: 3574, LR: 1.2861013983049411e-05, Loss: 488.5528564453125
2024-08-03T17:13:43.786815940Z 
 38%|███▊      | 3575/9500 [12:16:13<20:24:00, 12.40s/it]08/03/2024 10:13:43 - INFO - __main__ -   Step: 3575, LR: 1.2858843439362131e-05, Loss: 560.4459228515625
2024-08-03T17:13:55.795100374Z 
 38%|███▊      | 3576/9500 [12:16:25<20:12:21, 12.28s/it]08/03/2024 10:13:55 - INFO - __main__ -   Step: 3576, LR: 1.2856672895674853e-05, Loss: 598.1126708984375
2024-08-03T17:14:08.469443289Z 
 38%|███▊      | 3577/9500 [12:16:38<20:23:51, 12.40s/it]08/03/2024 10:14:08 - INFO - __main__ -   Step: 3577, LR: 1.2854502351987574e-05, Loss: 673.0301513671875
2024-08-03T17:14:20.452907847Z 
 38%|███▊      | 3578/9500 [12:16:50<20:11:23, 12.27s/it]08/03/2024 10:14:20 - INFO - __main__ -   Step: 3578, LR: 1.2852331808300296e-05, Loss: 582.5709228515625
2024-08-03T17:14:32.699948027Z 
 38%|███▊      | 3579/9500 [12:17:02<20:10:22, 12.27s/it]08/03/2024 10:14:32 - INFO - __main__ -   Step: 3579, LR: 1.2850161264613018e-05, Loss: 610.54833984375
2024-08-03T17:14:45.648160038Z 
 38%|███▊      | 3580/9500 [12:17:15<20:30:24, 12.47s/it]08/03/2024 10:14:45 - INFO - __main__ -   Step: 3580, LR: 1.2847990720925737e-05, Loss: 502.95916748046875
2024-08-03T17:14:57.687171127Z 
 38%|███▊      | 3581/9500 [12:17:27<20:17:26, 12.34s/it]08/03/2024 10:14:57 - INFO - __main__ -   Step: 3581, LR: 1.2845820177238459e-05, Loss: 412.25738525390625
2024-08-03T17:15:09.768529265Z 
 38%|███▊      | 3582/9500 [12:17:39<20:09:33, 12.26s/it]08/03/2024 10:15:09 - INFO - __main__ -   Step: 3582, LR: 1.284364963355118e-05, Loss: 638.4179077148438
2024-08-03T17:15:22.285139235Z 
 38%|███▊      | 3583/9500 [12:17:52<20:16:50, 12.34s/it]08/03/2024 10:15:22 - INFO - __main__ -   Step: 3583, LR: 1.2841479089863902e-05, Loss: 599.3802490234375
2024-08-03T17:15:34.855533781Z 
 38%|███▊      | 3584/9500 [12:18:04<20:23:28, 12.41s/it]08/03/2024 10:15:34 - INFO - __main__ -   Step: 3584, LR: 1.2839308546176624e-05, Loss: 577.662841796875
2024-08-03T17:15:47.058302965Z 
 38%|███▊      | 3585/9500 [12:18:16<20:17:11, 12.35s/it]08/03/2024 10:15:47 - INFO - __main__ -   Step: 3585, LR: 1.2837138002489345e-05, Loss: 768.255615234375
2024-08-03T17:16:00.061536541Z 
 38%|███▊      | 3586/9500 [12:18:29<20:36:22, 12.54s/it]08/03/2024 10:16:00 - INFO - __main__ -   Step: 3586, LR: 1.2834967458802063e-05, Loss: 866.0221557617188
2024-08-03T17:16:12.299657691Z 
 38%|███▊      | 3587/9500 [12:18:42<20:27:09, 12.45s/it]08/03/2024 10:16:12 - INFO - __main__ -   Step: 3587, LR: 1.2832796915114785e-05, Loss: 447.02276611328125
2024-08-03T17:16:24.380445567Z 
 38%|███▊      | 3588/9500 [12:18:54<20:15:57, 12.34s/it]08/03/2024 10:16:24 - INFO - __main__ -   Step: 3588, LR: 1.2830626371427506e-05, Loss: 517.9921875
2024-08-03T17:16:36.720525908Z 
 38%|███▊      | 3589/9500 [12:19:06<20:15:44, 12.34s/it]08/03/2024 10:16:36 - INFO - __main__ -   Step: 3589, LR: 1.2828455827740226e-05, Loss: 375.75689697265625
2024-08-03T17:16:48.679747237Z 
 38%|███▊      | 3590/9500 [12:19:18<20:04:16, 12.23s/it]08/03/2024 10:16:48 - INFO - __main__ -   Step: 3590, LR: 1.2826285284052948e-05, Loss: 525.392333984375
2024-08-03T17:17:00.657013419Z 
 38%|███▊      | 3591/9500 [12:19:30<19:56:42, 12.15s/it]08/03/2024 10:17:00 - INFO - __main__ -   Step: 3591, LR: 1.282411474036567e-05, Loss: 657.0718994140625
2024-08-03T17:17:13.124998724Z 
 38%|███▊      | 3592/9500 [12:19:43<20:05:51, 12.25s/it]08/03/2024 10:17:13 - INFO - __main__ -   Step: 3592, LR: 1.2821944196678391e-05, Loss: 734.68603515625
2024-08-03T17:17:25.673599493Z 
 38%|███▊      | 3593/9500 [12:19:55<20:14:34, 12.34s/it]08/03/2024 10:17:25 - INFO - __main__ -   Step: 3593, LR: 1.2819773652991113e-05, Loss: 721.012451171875
2024-08-03T17:17:37.740973938Z 
 38%|███▊      | 3594/9500 [12:20:07<20:06:24, 12.26s/it]08/03/2024 10:17:37 - INFO - __main__ -   Step: 3594, LR: 1.2817603109303834e-05, Loss: 623.2413330078125
2024-08-03T17:17:50.218185596Z 
 38%|███▊      | 3595/9500 [12:20:20<20:12:43, 12.32s/it]08/03/2024 10:17:50 - INFO - __main__ -   Step: 3595, LR: 1.2815432565616554e-05, Loss: 633.274658203125
2024-08-03T17:18:02.504780442Z 
 38%|███▊      | 3596/9500 [12:20:32<20:11:28, 12.31s/it]08/03/2024 10:18:02 - INFO - __main__ -   Step: 3596, LR: 1.2813262021929276e-05, Loss: 665.4998779296875
2024-08-03T17:18:14.537717353Z 
 38%|███▊      | 3597/9500 [12:20:44<20:03:01, 12.23s/it]08/03/2024 10:18:14 - INFO - __main__ -   Step: 3597, LR: 1.2811091478241997e-05, Loss: 612.4696044921875
2024-08-03T17:18:27.041750824Z 
 38%|███▊      | 3598/9500 [12:20:56<20:10:59, 12.31s/it]08/03/2024 10:18:27 - INFO - __main__ -   Step: 3598, LR: 1.2808920934554719e-05, Loss: 526.3611450195312
2024-08-03T17:18:39.098206269Z 
 38%|███▊      | 3599/9500 [12:21:09<20:03:16, 12.23s/it]08/03/2024 10:18:39 - INFO - __main__ -   Step: 3599, LR: 1.280675039086744e-05, Loss: 616.0726318359375
2024-08-03T17:18:51.351050654Z 
 38%|███▊      | 3600/9500 [12:21:21<20:03:36, 12.24s/it]08/03/2024 10:18:51 - INFO - __main__ -   Step: 3600, LR: 1.2804579847180158e-05, Loss: 575.91796875
2024-08-03T17:19:03.838566354Z 
 38%|███▊      | 3601/9500 [12:21:33<20:10:42, 12.31s/it]08/03/2024 10:19:03 - INFO - __main__ -   Step: 3601, LR: 1.280240930349288e-05, Loss: 446.81256103515625
2024-08-03T17:19:16.093816104Z 
 38%|███▊      | 3602/9500 [12:21:46<20:08:45, 12.30s/it]08/03/2024 10:19:16 - INFO - __main__ -   Step: 3602, LR: 1.2800238759805602e-05, Loss: 714.7562255859375
2024-08-03T17:19:28.320558360Z 
 38%|███▊      | 3603/9500 [12:21:58<20:06:28, 12.28s/it]08/03/2024 10:19:28 - INFO - __main__ -   Step: 3603, LR: 1.2798068216118323e-05, Loss: 575.9959716796875
2024-08-03T17:19:40.921961880Z 
 38%|███▊      | 3604/9500 [12:22:10<20:15:53, 12.37s/it]08/03/2024 10:19:40 - INFO - __main__ -   Step: 3604, LR: 1.2795897672431043e-05, Loss: 578.260498046875
2024-08-03T17:19:52.950259515Z 
 38%|███▊      | 3605/9500 [12:22:22<20:05:30, 12.27s/it]08/03/2024 10:19:52 - INFO - __main__ -   Step: 3605, LR: 1.2793727128743765e-05, Loss: 623.5790405273438
2024-08-03T17:20:04.909326213Z 
 38%|███▊      | 3606/9500 [12:22:34<19:56:08, 12.18s/it]08/03/2024 10:20:04 - INFO - __main__ -   Step: 3606, LR: 1.2791556585056486e-05, Loss: 545.0736083984375
2024-08-03T17:20:17.005348667Z 
 38%|███▊      | 3607/9500 [12:22:46<19:53:34, 12.15s/it]08/03/2024 10:20:17 - INFO - __main__ -   Step: 3607, LR: 1.2789386041369208e-05, Loss: 594.9642944335938
2024-08-03T17:20:29.440270661Z 
 38%|███▊      | 3608/9500 [12:22:59<20:01:40, 12.24s/it]08/03/2024 10:20:29 - INFO - __main__ -   Step: 3608, LR: 1.278721549768193e-05, Loss: 516.8825073242188
2024-08-03T17:20:41.740660349Z 
 38%|███▊      | 3609/9500 [12:23:11<20:03:21, 12.26s/it]08/03/2024 10:20:41 - INFO - __main__ -   Step: 3609, LR: 1.2785044953994649e-05, Loss: 664.912353515625
2024-08-03T17:20:54.183596282Z 
 38%|███▊      | 3610/9500 [12:23:24<20:08:39, 12.31s/it]08/03/2024 10:20:54 - INFO - __main__ -   Step: 3610, LR: 1.278287441030737e-05, Loss: 565.62353515625
2024-08-03T17:21:06.678638965Z 
 38%|███▊      | 3611/9500 [12:23:36<20:13:49, 12.37s/it]08/03/2024 10:21:06 - INFO - __main__ -   Step: 3611, LR: 1.2780703866620092e-05, Loss: 509.5829162597656
2024-08-03T17:21:18.954576215Z 
 38%|███▊      | 3612/9500 [12:23:48<20:10:55, 12.34s/it]08/03/2024 10:21:18 - INFO - __main__ -   Step: 3612, LR: 1.2778533322932814e-05, Loss: 451.1290283203125
2024-08-03T17:21:30.940167228Z 
 38%|███▊      | 3613/9500 [12:24:00<20:00:17, 12.23s/it]08/03/2024 10:21:30 - INFO - __main__ -   Step: 3613, LR: 1.2776362779245535e-05, Loss: 592.83251953125
2024-08-03T17:21:43.417665214Z 
 38%|███▊      | 3614/9500 [12:24:13<20:07:16, 12.31s/it]08/03/2024 10:21:43 - INFO - __main__ -   Step: 3614, LR: 1.2774192235558253e-05, Loss: 551.2861938476562
2024-08-03T17:21:55.491550218Z 
 38%|███▊      | 3615/9500 [12:24:25<20:00:13, 12.24s/it]08/03/2024 10:21:55 - INFO - __main__ -   Step: 3615, LR: 1.2772021691870975e-05, Loss: 486.6706237792969
2024-08-03T17:22:07.472656784Z 
 38%|███▊      | 3616/9500 [12:24:37<19:52:29, 12.16s/it]08/03/2024 10:22:07 - INFO - __main__ -   Step: 3616, LR: 1.2769851148183697e-05, Loss: 541.6723022460938
2024-08-03T17:22:20.063300989Z 
 38%|███▊      | 3617/9500 [12:24:49<20:04:58, 12.29s/it]08/03/2024 10:22:20 - INFO - __main__ -   Step: 3617, LR: 1.2767680604496418e-05, Loss: 597.6448974609375
2024-08-03T17:22:32.015007183Z 
 38%|███▊      | 3618/9500 [12:25:01<19:54:50, 12.19s/it]08/03/2024 10:22:32 - INFO - __main__ -   Step: 3618, LR: 1.2765510060809138e-05, Loss: 537.0316162109375
2024-08-03T17:22:44.321698581Z 
 38%|███▊      | 3619/9500 [12:25:14<19:58:07, 12.22s/it]08/03/2024 10:22:44 - INFO - __main__ -   Step: 3619, LR: 1.276333951712186e-05, Loss: 619.2806396484375
2024-08-03T17:22:56.733639721Z 
 38%|███▊      | 3620/9500 [12:25:26<20:03:27, 12.28s/it]08/03/2024 10:22:56 - INFO - __main__ -   Step: 3620, LR: 1.2761168973434581e-05, Loss: 472.33282470703125
2024-08-03T17:23:09.010187890Z 
 38%|███▊      | 3621/9500 [12:25:38<20:03:08, 12.28s/it]08/03/2024 10:23:09 - INFO - __main__ -   Step: 3621, LR: 1.2758998429747303e-05, Loss: 645.7720336914062
2024-08-03T17:23:21.219535456Z 
 38%|███▊      | 3622/9500 [12:25:51<20:00:52, 12.26s/it]08/03/2024 10:23:21 - INFO - __main__ -   Step: 3622, LR: 1.2756827886060024e-05, Loss: 583.1348876953125
2024-08-03T17:23:33.735708742Z 
 38%|███▊      | 3623/9500 [12:26:03<20:08:16, 12.34s/it]08/03/2024 10:23:33 - INFO - __main__ -   Step: 3623, LR: 1.2754657342372744e-05, Loss: 603.976806640625
2024-08-03T17:23:45.943360422Z 
 38%|███▊      | 3624/9500 [12:26:15<20:04:18, 12.30s/it]08/03/2024 10:23:45 - INFO - __main__ -   Step: 3624, LR: 1.2752486798685466e-05, Loss: 740.2974853515625
2024-08-03T17:23:57.939325535Z 
 38%|███▊      | 3625/9500 [12:26:27<19:55:15, 12.21s/it]08/03/2024 10:23:57 - INFO - __main__ -   Step: 3625, LR: 1.2750316254998187e-05, Loss: 553.4501953125
2024-08-03T17:24:10.307074116Z 
 38%|███▊      | 3626/9500 [12:26:40<19:59:46, 12.26s/it]08/03/2024 10:24:10 - INFO - __main__ -   Step: 3626, LR: 1.2748145711310909e-05, Loss: 657.9083862304688
2024-08-03T17:24:22.847880443Z 
 38%|███▊      | 3627/9500 [12:26:52<20:07:57, 12.34s/it]08/03/2024 10:24:22 - INFO - __main__ -   Step: 3627, LR: 1.274597516762363e-05, Loss: 618.262939453125
2024-08-03T17:24:34.941843588Z 
 38%|███▊      | 3628/9500 [12:27:04<20:00:30, 12.27s/it]08/03/2024 10:24:34 - INFO - __main__ -   Step: 3628, LR: 1.2743804623936349e-05, Loss: 594.4005126953125
2024-08-03T17:24:47.494828352Z 
 38%|███▊      | 3629/9500 [12:27:17<20:08:42, 12.35s/it]08/03/2024 10:24:47 - INFO - __main__ -   Step: 3629, LR: 1.274163408024907e-05, Loss: 480.9336242675781
2024-08-03T17:24:59.548029801Z 
 38%|███▊      | 3630/9500 [12:27:29<19:59:42, 12.26s/it]08/03/2024 10:24:59 - INFO - __main__ -   Step: 3630, LR: 1.2739463536561792e-05, Loss: 567.3439331054688
2024-08-03T17:25:11.878400259Z 
 38%|███▊      | 3631/9500 [12:27:41<20:01:29, 12.28s/it]08/03/2024 10:25:11 - INFO - __main__ -   Step: 3631, LR: 1.2737292992874513e-05, Loss: 637.2154541015625
2024-08-03T17:25:24.319199111Z 
 38%|███▊      | 3632/9500 [12:27:54<20:05:54, 12.33s/it]08/03/2024 10:25:24 - INFO - __main__ -   Step: 3632, LR: 1.2735122449187233e-05, Loss: 473.3944091796875
2024-08-03T17:25:36.561601781Z 
 38%|███▊      | 3633/9500 [12:28:06<20:03:07, 12.30s/it]08/03/2024 10:25:36 - INFO - __main__ -   Step: 3633, LR: 1.2732951905499955e-05, Loss: 613.0616455078125
2024-08-03T17:25:48.572779393Z 
 38%|███▊      | 3634/9500 [12:28:18<19:54:20, 12.22s/it]08/03/2024 10:25:48 - INFO - __main__ -   Step: 3634, LR: 1.2730781361812676e-05, Loss: 569.8956909179688
2024-08-03T17:26:00.948002292Z 
 38%|███▊      | 3635/9500 [12:28:30<19:58:47, 12.26s/it]08/03/2024 10:26:00 - INFO - __main__ -   Step: 3635, LR: 1.2728610818125398e-05, Loss: 600.5899658203125
2024-08-03T17:26:13.043142250Z 
 38%|███▊      | 3636/9500 [12:28:42<19:53:38, 12.21s/it]08/03/2024 10:26:13 - INFO - __main__ -   Step: 3636, LR: 1.272644027443812e-05, Loss: 762.3745727539062
2024-08-03T17:26:25.282106716Z 
 38%|███▊      | 3637/9500 [12:28:55<19:54:10, 12.22s/it]08/03/2024 10:26:25 - INFO - __main__ -   Step: 3637, LR: 1.272426973075084e-05, Loss: 572.1668701171875
2024-08-03T17:26:37.949798981Z 
 38%|███▊      | 3638/9500 [12:29:07<20:07:04, 12.35s/it]08/03/2024 10:26:37 - INFO - __main__ -   Step: 3638, LR: 1.272209918706356e-05, Loss: 612.5236206054688
2024-08-03T17:26:50.128603103Z 
 38%|███▊      | 3639/9500 [12:29:20<20:01:43, 12.30s/it]08/03/2024 10:26:50 - INFO - __main__ -   Step: 3639, LR: 1.2719928643376282e-05, Loss: 530.421142578125
2024-08-03T17:27:02.277613489Z 
 38%|███▊      | 3640/9500 [12:29:32<19:57:01, 12.26s/it]08/03/2024 10:27:02 - INFO - __main__ -   Step: 3640, LR: 1.2717758099689004e-05, Loss: 519.365478515625
2024-08-03T17:27:14.856004292Z 
 38%|███▊      | 3641/9500 [12:29:44<20:06:15, 12.35s/it]08/03/2024 10:27:14 - INFO - __main__ -   Step: 3641, LR: 1.2715587556001725e-05, Loss: 575.0542602539062
2024-08-03T17:27:27.193440947Z 
 38%|███▊      | 3642/9500 [12:29:57<20:05:35, 12.35s/it]08/03/2024 10:27:27 - INFO - __main__ -   Step: 3642, LR: 1.2713417012314444e-05, Loss: 603.5750122070312
2024-08-03T17:27:39.269109718Z 
 38%|███▊      | 3643/9500 [12:30:09<19:57:21, 12.27s/it]08/03/2024 10:27:39 - INFO - __main__ -   Step: 3643, LR: 1.2711246468627165e-05, Loss: 652.951416015625
2024-08-03T17:27:51.789720999Z 
 38%|███▊      | 3644/9500 [12:30:21<20:04:39, 12.34s/it]08/03/2024 10:27:51 - INFO - __main__ -   Step: 3644, LR: 1.2709075924939887e-05, Loss: 550.7882080078125
2024-08-03T17:28:03.944384594Z 
 38%|███▊      | 3645/9500 [12:30:33<19:58:56, 12.29s/it]08/03/2024 10:28:03 - INFO - __main__ -   Step: 3645, LR: 1.2706905381252608e-05, Loss: 740.6056518554688
2024-08-03T17:28:16.033537698Z 
 38%|███▊      | 3646/9500 [12:30:45<19:52:58, 12.23s/it]08/03/2024 10:28:16 - INFO - __main__ -   Step: 3646, LR: 1.2704734837565328e-05, Loss: 665.7487182617188
2024-08-03T17:28:28.563551142Z 
 38%|███▊      | 3647/9500 [12:30:58<20:01:38, 12.32s/it]08/03/2024 10:28:28 - INFO - __main__ -   Step: 3647, LR: 1.270256429387805e-05, Loss: 618.9427490234375
2024-08-03T17:28:40.875159744Z 
 38%|███▊      | 3648/9500 [12:31:10<20:01:13, 12.32s/it]08/03/2024 10:28:40 - INFO - __main__ -   Step: 3648, LR: 1.2700393750190771e-05, Loss: 570.2872924804688
2024-08-03T17:28:53.114483650Z 
 38%|███▊      | 3649/9500 [12:31:23<19:58:46, 12.29s/it]08/03/2024 10:28:53 - INFO - __main__ -   Step: 3649, LR: 1.2698223206503493e-05, Loss: 576.2919921875
2024-08-03T17:29:05.155954167Z 
 38%|███▊      | 3650/9500 [12:31:35<19:51:12, 12.22s/it]08/03/2024 10:29:05 - INFO - __main__ -   Step: 3650, LR: 1.2696052662816214e-05, Loss: 612.541015625
2024-08-03T17:29:17.519976464Z 
 38%|███▊      | 3651/9500 [12:31:47<19:55:16, 12.26s/it]08/03/2024 10:29:17 - INFO - __main__ -   Step: 3651, LR: 1.2693882119128936e-05, Loss: 436.21759033203125
2024-08-03T17:29:29.704076811Z 
 38%|███▊      | 3652/9500 [12:31:59<19:52:50, 12.24s/it]08/03/2024 10:29:29 - INFO - __main__ -   Step: 3652, LR: 1.2691711575441656e-05, Loss: 583.4417114257812
2024-08-03T17:29:41.664434414Z 
 38%|███▊      | 3653/9500 [12:32:11<19:44:30, 12.15s/it]08/03/2024 10:29:41 - INFO - __main__ -   Step: 3653, LR: 1.2689541031754377e-05, Loss: 690.4185180664062
2024-08-03T17:29:54.450601243Z 
 38%|███▊      | 3654/9500 [12:32:24<20:02:44, 12.34s/it]08/03/2024 10:29:54 - INFO - __main__ -   Step: 3654, LR: 1.2687370488067099e-05, Loss: 662.6796875
2024-08-03T17:30:06.643609792Z 
 38%|███▊      | 3655/9500 [12:32:36<19:58:07, 12.30s/it]08/03/2024 10:30:06 - INFO - __main__ -   Step: 3655, LR: 1.2685199944379817e-05, Loss: 561.8333740234375
2024-08-03T17:30:18.815249668Z 
 38%|███▊      | 3656/9500 [12:32:48<19:54:11, 12.26s/it]08/03/2024 10:30:18 - INFO - __main__ -   Step: 3656, LR: 1.2683029400692539e-05, Loss: 521.0123291015625
2024-08-03T17:30:31.404652937Z 
 38%|███▊      | 3657/9500 [12:33:01<20:03:35, 12.36s/it]08/03/2024 10:30:31 - INFO - __main__ -   Step: 3657, LR: 1.268085885700526e-05, Loss: 681.0833740234375
2024-08-03T17:30:43.697065275Z 
 39%|███▊      | 3658/9500 [12:33:13<20:01:26, 12.34s/it]08/03/2024 10:30:43 - INFO - __main__ -   Step: 3658, LR: 1.2678688313317982e-05, Loss: 472.92333984375
2024-08-03T17:30:55.744866009Z 
 39%|███▊      | 3659/9500 [12:33:25<19:52:43, 12.25s/it]08/03/2024 10:30:55 - INFO - __main__ -   Step: 3659, LR: 1.2676517769630703e-05, Loss: 578.7941284179688
2024-08-03T17:31:08.637349560Z 
 39%|███▊      | 3660/9500 [12:33:38<20:11:12, 12.44s/it]08/03/2024 10:31:08 - INFO - __main__ -   Step: 3660, LR: 1.2674347225943425e-05, Loss: 500.7033386230469
2024-08-03T17:31:20.516642685Z 
 39%|███▊      | 3661/9500 [12:33:50<19:54:31, 12.27s/it]08/03/2024 10:31:20 - INFO - __main__ -   Step: 3661, LR: 1.2672176682256145e-05, Loss: 442.39544677734375
2024-08-03T17:31:32.548518006Z 
 39%|███▊      | 3662/9500 [12:34:02<19:47:13, 12.20s/it]08/03/2024 10:31:32 - INFO - __main__ -   Step: 3662, LR: 1.2670006138568866e-05, Loss: 497.93524169921875
2024-08-03T17:31:45.031835673Z 
 39%|███▊      | 3663/9500 [12:34:14<19:55:14, 12.29s/it]08/03/2024 10:31:45 - INFO - __main__ -   Step: 3663, LR: 1.2667835594881588e-05, Loss: 625.848876953125
2024-08-03T17:31:56.889850663Z 
 39%|███▊      | 3664/9500 [12:34:26<19:42:33, 12.16s/it]08/03/2024 10:31:56 - INFO - __main__ -   Step: 3664, LR: 1.266566505119431e-05, Loss: 520.7742309570312
2024-08-03T17:32:09.302054703Z 
 39%|███▊      | 3665/9500 [12:34:39<19:49:46, 12.23s/it]08/03/2024 10:32:09 - INFO - __main__ -   Step: 3665, LR: 1.2663494507507031e-05, Loss: 583.1103515625
2024-08-03T17:32:21.817023780Z 
 39%|███▊      | 3666/9500 [12:34:51<19:57:45, 12.32s/it]08/03/2024 10:32:21 - INFO - __main__ -   Step: 3666, LR: 1.2661323963819751e-05, Loss: 648.0347900390625
2024-08-03T17:32:33.855244865Z 
 39%|███▊      | 3667/9500 [12:35:03<19:49:22, 12.23s/it]08/03/2024 10:32:33 - INFO - __main__ -   Step: 3667, LR: 1.2659153420132472e-05, Loss: 697.25048828125
2024-08-03T17:32:46.289743213Z 
 39%|███▊      | 3668/9500 [12:35:16<19:55:00, 12.29s/it]08/03/2024 10:32:46 - INFO - __main__ -   Step: 3668, LR: 1.2656982876445194e-05, Loss: 889.8084716796875
2024-08-03T17:32:59.193677252Z 
 39%|███▊      | 3669/9500 [12:35:29<20:12:34, 12.48s/it]08/03/2024 10:32:59 - INFO - __main__ -   Step: 3669, LR: 1.2654812332757914e-05, Loss: 510.9317932128906
2024-08-03T17:33:11.130698908Z 
 39%|███▊      | 3670/9500 [12:35:41<19:56:37, 12.32s/it]08/03/2024 10:33:11 - INFO - __main__ -   Step: 3670, LR: 1.2652641789070634e-05, Loss: 605.7158203125
2024-08-03T17:33:23.291446133Z 
 39%|███▊      | 3671/9500 [12:35:53<19:51:54, 12.27s/it]08/03/2024 10:33:23 - INFO - __main__ -   Step: 3671, LR: 1.2650471245383355e-05, Loss: 663.9661254882812
2024-08-03T17:33:35.932603210Z 
 39%|███▊      | 3672/9500 [12:36:05<20:02:34, 12.38s/it]08/03/2024 10:33:35 - INFO - __main__ -   Step: 3672, LR: 1.2648300701696077e-05, Loss: 719.3824462890625
2024-08-03T17:33:48.007070016Z 
 39%|███▊      | 3673/9500 [12:36:17<19:53:26, 12.29s/it]08/03/2024 10:33:48 - INFO - __main__ -   Step: 3673, LR: 1.2646130158008798e-05, Loss: 525.3424072265625
2024-08-03T17:34:00.178524660Z 
 39%|███▊      | 3674/9500 [12:36:30<19:49:49, 12.25s/it]08/03/2024 10:34:00 - INFO - __main__ -   Step: 3674, LR: 1.264395961432152e-05, Loss: 745.72265625
2024-08-03T17:34:12.619901753Z 
 39%|███▊      | 3675/9500 [12:36:42<19:55:05, 12.31s/it]08/03/2024 10:34:12 - INFO - __main__ -   Step: 3675, LR: 1.264178907063424e-05, Loss: 693.3150634765625
2024-08-03T17:34:25.038180031Z 
 39%|███▊      | 3676/9500 [12:36:54<19:58:02, 12.34s/it]08/03/2024 10:34:25 - INFO - __main__ -   Step: 3676, LR: 1.2639618526946961e-05, Loss: 636.5166015625
2024-08-03T17:34:37.453766106Z 
 39%|███▊      | 3677/9500 [12:37:07<19:59:57, 12.36s/it]08/03/2024 10:34:37 - INFO - __main__ -   Step: 3677, LR: 1.2637447983259683e-05, Loss: 644.7509765625
2024-08-03T17:34:50.200083182Z 
 39%|███▊      | 3678/9500 [12:37:20<20:10:49, 12.48s/it]08/03/2024 10:34:50 - INFO - __main__ -   Step: 3678, LR: 1.2635277439572405e-05, Loss: 558.8602905273438
2024-08-03T17:35:02.641869399Z 
 39%|███▊      | 3679/9500 [12:37:32<20:09:36, 12.47s/it]08/03/2024 10:35:02 - INFO - __main__ -   Step: 3679, LR: 1.2633106895885126e-05, Loss: 558.2080078125
2024-08-03T17:35:14.691793398Z 
 39%|███▊      | 3680/9500 [12:37:44<19:57:13, 12.34s/it]08/03/2024 10:35:14 - INFO - __main__ -   Step: 3680, LR: 1.2630936352197846e-05, Loss: 507.20660400390625
2024-08-03T17:35:27.305107658Z 
 39%|███▊      | 3681/9500 [12:37:57<20:04:53, 12.42s/it]08/03/2024 10:35:27 - INFO - __main__ -   Step: 3681, LR: 1.2628765808510567e-05, Loss: 662.0621948242188
2024-08-03T17:35:39.717020680Z 
 39%|███▉      | 3682/9500 [12:38:09<20:04:20, 12.42s/it]08/03/2024 10:35:39 - INFO - __main__ -   Step: 3682, LR: 1.2626595264823289e-05, Loss: 563.7501220703125
2024-08-03T17:35:51.962845025Z 
 39%|███▉      | 3683/9500 [12:38:21<19:59:04, 12.37s/it]08/03/2024 10:35:51 - INFO - __main__ -   Step: 3683, LR: 1.2624424721136009e-05, Loss: 578.993896484375
2024-08-03T17:36:04.523514805Z 
 39%|███▉      | 3684/9500 [12:38:34<20:04:28, 12.43s/it]08/03/2024 10:36:04 - INFO - __main__ -   Step: 3684, LR: 1.2622254177448729e-05, Loss: 667.5540771484375
2024-08-03T17:36:16.633949171Z 
 39%|███▉      | 3685/9500 [12:38:46<19:55:05, 12.33s/it]08/03/2024 10:36:16 - INFO - __main__ -   Step: 3685, LR: 1.262008363376145e-05, Loss: 549.314697265625
2024-08-03T17:36:28.591811590Z 
 39%|███▉      | 3686/9500 [12:38:58<19:44:00, 12.22s/it]08/03/2024 10:36:28 - INFO - __main__ -   Step: 3686, LR: 1.2617913090074172e-05, Loss: 637.1248779296875
2024-08-03T17:36:41.307850688Z 
 39%|███▉      | 3687/9500 [12:39:11<19:58:16, 12.37s/it]08/03/2024 10:36:41 - INFO - __main__ -   Step: 3687, LR: 1.2615742546386893e-05, Loss: 816.096923828125
2024-08-03T17:36:53.399246477Z 
 39%|███▉      | 3688/9500 [12:39:23<19:50:01, 12.29s/it]08/03/2024 10:36:53 - INFO - __main__ -   Step: 3688, LR: 1.2613572002699615e-05, Loss: 529.2089233398438
2024-08-03T17:37:05.298028498Z 
 39%|███▉      | 3689/9500 [12:39:35<19:38:35, 12.17s/it]08/03/2024 10:37:05 - INFO - __main__ -   Step: 3689, LR: 1.2611401459012335e-05, Loss: 561.9642333984375
2024-08-03T17:37:17.997835207Z 
 39%|███▉      | 3690/9500 [12:39:47<19:53:48, 12.33s/it]08/03/2024 10:37:17 - INFO - __main__ -   Step: 3690, LR: 1.2609230915325056e-05, Loss: 732.39208984375
2024-08-03T17:37:30.212515304Z 
 39%|███▉      | 3691/9500 [12:40:00<19:50:18, 12.29s/it]08/03/2024 10:37:30 - INFO - __main__ -   Step: 3691, LR: 1.2607060371637778e-05, Loss: 541.3623657226562
2024-08-03T17:37:42.264715370Z 
 39%|███▉      | 3692/9500 [12:40:12<19:43:03, 12.22s/it]08/03/2024 10:37:42 - INFO - __main__ -   Step: 3692, LR: 1.26048898279505e-05, Loss: 632.6019287109375
2024-08-03T17:37:54.680726617Z 
 39%|███▉      | 3693/9500 [12:40:24<19:48:29, 12.28s/it]08/03/2024 10:37:54 - INFO - __main__ -   Step: 3693, LR: 1.2602719284263221e-05, Loss: 609.8329467773438
2024-08-03T17:38:07.138403233Z 
 39%|███▉      | 3694/9500 [12:40:37<19:53:27, 12.33s/it]08/03/2024 10:38:07 - INFO - __main__ -   Step: 3694, LR: 1.2600548740575943e-05, Loss: 616.0433349609375
2024-08-03T17:38:19.543663317Z 
 39%|███▉      | 3695/9500 [12:40:49<19:55:19, 12.35s/it]08/03/2024 10:38:19 - INFO - __main__ -   Step: 3695, LR: 1.2598378196888663e-05, Loss: 565.7120971679688
2024-08-03T17:38:31.658842955Z 
 39%|███▉      | 3696/9500 [12:41:01<19:48:10, 12.28s/it]08/03/2024 10:38:31 - INFO - __main__ -   Step: 3696, LR: 1.2596207653201384e-05, Loss: 629.44921875
2024-08-03T17:38:44.268558887Z 
 39%|███▉      | 3697/9500 [12:41:14<19:57:27, 12.38s/it]08/03/2024 10:38:44 - INFO - __main__ -   Step: 3697, LR: 1.2594037109514104e-05, Loss: 550.273193359375
2024-08-03T17:38:56.199025895Z 
 39%|███▉      | 3698/9500 [12:41:26<19:44:10, 12.25s/it]08/03/2024 10:38:56 - INFO - __main__ -   Step: 3698, LR: 1.2591866565826824e-05, Loss: 516.5693359375
2024-08-03T17:39:08.463270430Z 
 39%|███▉      | 3699/9500 [12:41:38<19:44:29, 12.25s/it]08/03/2024 10:39:08 - INFO - __main__ -   Step: 3699, LR: 1.2589696022139545e-05, Loss: 577.9551391601562
2024-08-03T17:39:21.662399864Z 
 39%|███▉      | 3700/9500 [12:41:51<20:11:46, 12.54s/it]08/03/2024 10:39:21 - INFO - __main__ -   Step: 3700, LR: 1.2587525478452267e-05, Loss: 615.9766845703125
2024-08-03T17:39:33.845603243Z 
 39%|███▉      | 3701/9500 [12:42:03<20:01:21, 12.43s/it]08/03/2024 10:39:33 - INFO - __main__ -   Step: 3701, LR: 1.2585354934764989e-05, Loss: 655.919189453125
2024-08-03T17:39:46.143186846Z 
 39%|███▉      | 3702/9500 [12:42:16<19:57:18, 12.39s/it]08/03/2024 10:39:46 - INFO - __main__ -   Step: 3702, LR: 1.258318439107771e-05, Loss: 616.1420288085938
2024-08-03T17:39:58.836502405Z 
 39%|███▉      | 3703/9500 [12:42:28<20:05:53, 12.48s/it]08/03/2024 10:39:58 - INFO - __main__ -   Step: 3703, LR: 1.2581013847390432e-05, Loss: 623.2200317382812
2024-08-03T17:40:10.940696226Z 
 39%|███▉      | 3704/9500 [12:42:40<19:54:45, 12.37s/it]08/03/2024 10:40:10 - INFO - __main__ -   Step: 3704, LR: 1.2578843303703152e-05, Loss: 675.6350708007812
2024-08-03T17:40:23.010029229Z 
 39%|███▉      | 3705/9500 [12:42:52<19:45:53, 12.28s/it]08/03/2024 10:40:23 - INFO - __main__ -   Step: 3705, LR: 1.2576672760015873e-05, Loss: 514.1478271484375
2024-08-03T17:40:35.459997380Z 
 39%|███▉      | 3706/9500 [12:43:05<19:50:39, 12.33s/it]08/03/2024 10:40:35 - INFO - __main__ -   Step: 3706, LR: 1.2574502216328595e-05, Loss: 573.83837890625
2024-08-03T17:40:47.448109149Z 
 39%|███▉      | 3707/9500 [12:43:17<19:40:33, 12.23s/it]08/03/2024 10:40:47 - INFO - __main__ -   Step: 3707, LR: 1.2572331672641316e-05, Loss: 595.4161376953125
2024-08-03T17:40:59.509234585Z 
 39%|███▉      | 3708/9500 [12:43:29<19:35:31, 12.18s/it]08/03/2024 10:40:59 - INFO - __main__ -   Step: 3708, LR: 1.2570161128954038e-05, Loss: 617.4301147460938
2024-08-03T17:41:11.929823821Z 
 39%|███▉      | 3709/9500 [12:43:41<19:42:22, 12.25s/it]08/03/2024 10:41:11 - INFO - __main__ -   Step: 3709, LR: 1.2567990585266758e-05, Loss: 606.678955078125
2024-08-03T17:41:23.990335553Z 
 39%|███▉      | 3710/9500 [12:43:53<19:36:40, 12.19s/it]08/03/2024 10:41:23 - INFO - __main__ -   Step: 3710, LR: 1.256582004157948e-05, Loss: 495.64581298828125
2024-08-03T17:41:36.170144021Z 
 39%|███▉      | 3711/9500 [12:44:06<19:36:04, 12.19s/it]08/03/2024 10:41:36 - INFO - __main__ -   Step: 3711, LR: 1.2563649497892199e-05, Loss: 584.3422241210938
2024-08-03T17:41:49.069236453Z 
 39%|███▉      | 3712/9500 [12:44:19<19:56:24, 12.40s/it]08/03/2024 10:41:49 - INFO - __main__ -   Step: 3712, LR: 1.256147895420492e-05, Loss: 607.2318115234375
2024-08-03T17:42:01.001151709Z 
 39%|███▉      | 3713/9500 [12:44:30<19:42:35, 12.26s/it]08/03/2024 10:42:01 - INFO - __main__ -   Step: 3713, LR: 1.255930841051764e-05, Loss: 543.474853515625
2024-08-03T17:42:13.495918090Z 
 39%|███▉      | 3714/9500 [12:44:43<19:49:08, 12.33s/it]08/03/2024 10:42:13 - INFO - __main__ -   Step: 3714, LR: 1.2557137866830362e-05, Loss: 524.8065795898438
2024-08-03T17:42:26.652433186Z 
 39%|███▉      | 3715/9500 [12:44:56<20:12:48, 12.58s/it]08/03/2024 10:42:26 - INFO - __main__ -   Step: 3715, LR: 1.2554967323143084e-05, Loss: 631.8814697265625
2024-08-03T17:42:38.655770966Z 
 39%|███▉      | 3716/9500 [12:45:08<19:55:57, 12.41s/it]08/03/2024 10:42:38 - INFO - __main__ -   Step: 3716, LR: 1.2552796779455805e-05, Loss: 445.41070556640625
2024-08-03T17:42:51.002067616Z 
 39%|███▉      | 3717/9500 [12:45:20<19:54:00, 12.39s/it]08/03/2024 10:42:51 - INFO - __main__ -   Step: 3717, LR: 1.2550626235768527e-05, Loss: 633.9564208984375
2024-08-03T17:43:03.455781484Z 
 39%|███▉      | 3718/9500 [12:45:33<19:55:42, 12.41s/it]08/03/2024 10:43:03 - INFO - __main__ -   Step: 3718, LR: 1.2548455692081247e-05, Loss: 476.4033203125
2024-08-03T17:43:15.482823986Z 
 39%|███▉      | 3719/9500 [12:45:45<19:44:29, 12.29s/it]08/03/2024 10:43:15 - INFO - __main__ -   Step: 3719, LR: 1.2546285148393968e-05, Loss: 518.6958618164062
2024-08-03T17:43:28.064587468Z 
 39%|███▉      | 3720/9500 [12:45:58<19:52:36, 12.38s/it]08/03/2024 10:43:28 - INFO - __main__ -   Step: 3720, LR: 1.254411460470669e-05, Loss: 764.2392578125
2024-08-03T17:43:40.477172976Z 
 39%|███▉      | 3721/9500 [12:46:10<19:53:21, 12.39s/it]08/03/2024 10:43:40 - INFO - __main__ -   Step: 3721, LR: 1.2541944061019411e-05, Loss: 667.856689453125
2024-08-03T17:43:52.788199344Z 
 39%|███▉      | 3722/9500 [12:46:22<19:50:51, 12.37s/it]08/03/2024 10:43:52 - INFO - __main__ -   Step: 3722, LR: 1.2539773517332133e-05, Loss: 642.971435546875
2024-08-03T17:44:04.869454988Z 
 39%|███▉      | 3723/9500 [12:46:34<19:42:25, 12.28s/it]08/03/2024 10:44:04 - INFO - __main__ -   Step: 3723, LR: 1.2537602973644853e-05, Loss: 576.0327758789062
2024-08-03T17:44:17.833539590Z 
 39%|███▉      | 3724/9500 [12:46:47<20:01:57, 12.49s/it]08/03/2024 10:44:17 - INFO - __main__ -   Step: 3724, LR: 1.2535432429957574e-05, Loss: 724.9229736328125
2024-08-03T17:44:30.293389949Z 
 39%|███▉      | 3725/9500 [12:47:00<20:01:00, 12.48s/it]08/03/2024 10:44:30 - INFO - __main__ -   Step: 3725, LR: 1.2533261886270294e-05, Loss: 637.80029296875
2024-08-03T17:44:42.287726625Z 
 39%|███▉      | 3726/9500 [12:47:12<19:46:49, 12.33s/it]08/03/2024 10:44:42 - INFO - __main__ -   Step: 3726, LR: 1.2531091342583016e-05, Loss: 629.5614013671875
2024-08-03T17:44:54.672227859Z 
 39%|███▉      | 3727/9500 [12:47:24<19:48:07, 12.35s/it]08/03/2024 10:44:54 - INFO - __main__ -   Step: 3727, LR: 1.2528920798895736e-05, Loss: 528.7822265625
2024-08-03T17:45:06.667042672Z 
 39%|███▉      | 3728/9500 [12:47:36<19:37:42, 12.24s/it]08/03/2024 10:45:06 - INFO - __main__ -   Step: 3728, LR: 1.2526750255208457e-05, Loss: 570.6588134765625
2024-08-03T17:45:18.642140996Z 
 39%|███▉      | 3729/9500 [12:47:48<19:29:47, 12.16s/it]08/03/2024 10:45:18 - INFO - __main__ -   Step: 3729, LR: 1.2524579711521179e-05, Loss: 657.2667236328125
2024-08-03T17:45:31.209873172Z 
 39%|███▉      | 3730/9500 [12:48:01<19:41:17, 12.28s/it]08/03/2024 10:45:31 - INFO - __main__ -   Step: 3730, LR: 1.25224091678339e-05, Loss: 654.09521484375
2024-08-03T17:45:43.372429055Z 
 39%|███▉      | 3731/9500 [12:48:13<19:37:34, 12.25s/it]08/03/2024 10:45:43 - INFO - __main__ -   Step: 3731, LR: 1.2520238624146622e-05, Loss: 504.809814453125
2024-08-03T17:45:55.319614850Z 
 39%|███▉      | 3732/9500 [12:48:25<19:28:43, 12.16s/it]08/03/2024 10:45:55 - INFO - __main__ -   Step: 3732, LR: 1.2518068080459342e-05, Loss: 526.6122436523438
2024-08-03T17:46:08.184446277Z 
 39%|███▉      | 3733/9500 [12:48:38<19:48:55, 12.37s/it]08/03/2024 10:46:08 - INFO - __main__ -   Step: 3733, LR: 1.2515897536772063e-05, Loss: 750.156005859375
2024-08-03T17:46:20.134381936Z 
 39%|███▉      | 3734/9500 [12:48:50<19:36:37, 12.24s/it]08/03/2024 10:46:20 - INFO - __main__ -   Step: 3734, LR: 1.2513726993084785e-05, Loss: 595.8221435546875
2024-08-03T17:46:31.846998469Z 
 39%|███▉      | 3735/9500 [12:49:01<19:21:06, 12.08s/it]08/03/2024 10:46:31 - INFO - __main__ -   Step: 3735, LR: 1.2511556449397506e-05, Loss: 504.156982421875
2024-08-03T17:46:43.978597834Z 
 39%|███▉      | 3736/9500 [12:49:13<19:22:15, 12.10s/it]08/03/2024 10:46:43 - INFO - __main__ -   Step: 3736, LR: 1.2509385905710228e-05, Loss: 632.4091186523438
2024-08-03T17:46:56.531498813Z 
 39%|███▉      | 3737/9500 [12:49:26<19:35:09, 12.23s/it]08/03/2024 10:46:56 - INFO - __main__ -   Step: 3737, LR: 1.250721536202295e-05, Loss: 471.9462890625
2024-08-03T17:47:08.615346161Z 
 39%|███▉      | 3738/9500 [12:49:38<19:30:35, 12.19s/it]08/03/2024 10:47:08 - INFO - __main__ -   Step: 3738, LR: 1.250504481833567e-05, Loss: 501.8641357421875
2024-08-03T17:47:20.730680416Z 
 39%|███▉      | 3739/9500 [12:49:50<19:28:16, 12.17s/it]08/03/2024 10:47:20 - INFO - __main__ -   Step: 3739, LR: 1.2502874274648389e-05, Loss: 513.9923095703125
2024-08-03T17:47:33.417346542Z 
 39%|███▉      | 3740/9500 [12:50:03<19:43:01, 12.32s/it]08/03/2024 10:47:33 - INFO - __main__ -   Step: 3740, LR: 1.250070373096111e-05, Loss: 738.8407592773438
2024-08-03T17:47:45.390050865Z 
 39%|███▉      | 3741/9500 [12:50:15<19:32:43, 12.22s/it]08/03/2024 10:47:45 - INFO - __main__ -   Step: 3741, LR: 1.249853318727383e-05, Loss: 515.4804077148438
2024-08-03T17:47:57.355822479Z 
 39%|███▉      | 3742/9500 [12:50:27<19:25:15, 12.14s/it]08/03/2024 10:47:57 - INFO - __main__ -   Step: 3742, LR: 1.2496362643586552e-05, Loss: 687.912109375
2024-08-03T17:48:09.779561342Z 
 39%|███▉      | 3743/9500 [12:50:39<19:33:09, 12.23s/it]08/03/2024 10:48:09 - INFO - __main__ -   Step: 3743, LR: 1.2494192099899274e-05, Loss: 607.7275390625
2024-08-03T17:48:21.969670107Z 
 39%|███▉      | 3744/9500 [12:50:51<19:31:53, 12.22s/it]08/03/2024 10:48:21 - INFO - __main__ -   Step: 3744, LR: 1.2492021556211995e-05, Loss: 537.09814453125
2024-08-03T17:48:34.153955002Z 
 39%|███▉      | 3745/9500 [12:51:04<19:30:46, 12.21s/it]08/03/2024 10:48:34 - INFO - __main__ -   Step: 3745, LR: 1.2489851012524717e-05, Loss: 682.929931640625
2024-08-03T17:48:47.170157150Z 
 39%|███▉      | 3746/9500 [12:51:17<19:53:52, 12.45s/it]08/03/2024 10:48:47 - INFO - __main__ -   Step: 3746, LR: 1.2487680468837438e-05, Loss: 796.8397216796875
2024-08-03T17:48:59.228367858Z 
 39%|███▉      | 3747/9500 [12:51:29<19:42:25, 12.33s/it]08/03/2024 10:48:59 - INFO - __main__ -   Step: 3747, LR: 1.2485509925150158e-05, Loss: 682.2607421875
2024-08-03T17:49:11.776821455Z 
 39%|███▉      | 3748/9500 [12:51:41<19:48:26, 12.40s/it]08/03/2024 10:49:11 - INFO - __main__ -   Step: 3748, LR: 1.248333938146288e-05, Loss: 668.5848388671875
2024-08-03T17:49:24.358093015Z 
 39%|███▉      | 3749/9500 [12:51:54<19:53:33, 12.45s/it]08/03/2024 10:49:24 - INFO - __main__ -   Step: 3749, LR: 1.2481168837775601e-05, Loss: 683.3969116210938
2024-08-03T17:49:36.727148574Z 
 39%|███▉      | 3750/9500 [12:52:06<19:50:57, 12.43s/it]08/03/2024 10:49:36 - INFO - __main__ -   Step: 3750, LR: 1.2478998294088323e-05, Loss: 545.996337890625
2024-08-03T17:49:48.892567946Z 
 39%|███▉      | 3751/9500 [12:52:18<19:43:12, 12.35s/it]08/03/2024 10:49:48 - INFO - __main__ -   Step: 3751, LR: 1.2476827750401044e-05, Loss: 592.717041015625
2024-08-03T17:50:01.684324894Z 
 39%|███▉      | 3752/9500 [12:52:31<19:55:43, 12.48s/it]08/03/2024 10:50:01 - INFO - __main__ -   Step: 3752, LR: 1.2474657206713764e-05, Loss: 560.0865478515625
2024-08-03T17:50:13.847823665Z 
 40%|███▉      | 3753/9500 [12:52:43<19:46:23, 12.39s/it]08/03/2024 10:50:13 - INFO - __main__ -   Step: 3753, LR: 1.2472486663026484e-05, Loss: 547.7297973632812
2024-08-03T17:50:25.899057501Z 
 40%|███▉      | 3754/9500 [12:52:55<19:36:33, 12.29s/it]08/03/2024 10:50:25 - INFO - __main__ -   Step: 3754, LR: 1.2470316119339206e-05, Loss: 550.1381225585938
2024-08-03T17:50:38.881258469Z 
 40%|███▉      | 3755/9500 [12:53:08<19:56:21, 12.49s/it]08/03/2024 10:50:38 - INFO - __main__ -   Step: 3755, LR: 1.2468145575651927e-05, Loss: 519.6144409179688
2024-08-03T17:50:51.134061403Z 
 40%|███▉      | 3756/9500 [12:53:21<19:49:12, 12.42s/it]08/03/2024 10:50:51 - INFO - __main__ -   Step: 3756, LR: 1.2465975031964647e-05, Loss: 696.6373901367188
2024-08-03T17:51:03.507275110Z 
 40%|███▉      | 3757/9500 [12:53:33<19:47:34, 12.41s/it]08/03/2024 10:51:03 - INFO - __main__ -   Step: 3757, LR: 1.2463804488277369e-05, Loss: 729.4484252929688
2024-08-03T17:51:16.330538922Z 
 40%|███▉      | 3758/9500 [12:53:46<19:59:20, 12.53s/it]08/03/2024 10:51:16 - INFO - __main__ -   Step: 3758, LR: 1.246163394459009e-05, Loss: 810.8389282226562
2024-08-03T17:51:28.498939774Z 
 40%|███▉      | 3759/9500 [12:53:58<19:48:40, 12.42s/it]08/03/2024 10:51:28 - INFO - __main__ -   Step: 3759, LR: 1.2459463400902812e-05, Loss: 713.9500732421875
2024-08-03T17:51:40.708397036Z 
 40%|███▉      | 3760/9500 [12:54:10<19:42:20, 12.36s/it]08/03/2024 10:51:40 - INFO - __main__ -   Step: 3760, LR: 1.2457292857215533e-05, Loss: 589.9532470703125
2024-08-03T17:51:53.184955686Z 
 40%|███▉      | 3761/9500 [12:54:23<19:45:30, 12.39s/it]08/03/2024 10:51:53 - INFO - __main__ -   Step: 3761, LR: 1.2455122313528253e-05, Loss: 731.1690673828125
2024-08-03T17:52:05.215206717Z 
 40%|███▉      | 3762/9500 [12:54:35<19:34:50, 12.28s/it]08/03/2024 10:52:05 - INFO - __main__ -   Step: 3762, LR: 1.2452951769840975e-05, Loss: 669.7077026367188
2024-08-03T17:52:17.208088414Z 
 40%|███▉      | 3763/9500 [12:54:47<19:26:16, 12.20s/it]08/03/2024 10:52:17 - INFO - __main__ -   Step: 3763, LR: 1.2450781226153696e-05, Loss: 545.8458862304688
2024-08-03T17:52:29.816637613Z 
 40%|███▉      | 3764/9500 [12:54:59<19:37:51, 12.32s/it]08/03/2024 10:52:29 - INFO - __main__ -   Step: 3764, LR: 1.2448610682466418e-05, Loss: 517.6134033203125
2024-08-03T17:52:42.275185810Z 
 40%|███▉      | 3765/9500 [12:55:12<19:41:37, 12.36s/it]08/03/2024 10:52:42 - INFO - __main__ -   Step: 3765, LR: 1.244644013877914e-05, Loss: 568.7376708984375
2024-08-03T17:52:54.408447506Z 
 40%|███▉      | 3766/9500 [12:55:24<19:34:50, 12.29s/it]08/03/2024 10:52:54 - INFO - __main__ -   Step: 3766, LR: 1.244426959509186e-05, Loss: 566.7296142578125
2024-08-03T17:53:06.784286692Z 
 40%|███▉      | 3767/9500 [12:55:36<19:37:00, 12.32s/it]08/03/2024 10:53:06 - INFO - __main__ -   Step: 3767, LR: 1.244209905140458e-05, Loss: 688.8580322265625
2024-08-03T17:53:18.890044460Z 
 40%|███▉      | 3768/9500 [12:55:48<19:30:42, 12.25s/it]08/03/2024 10:53:18 - INFO - __main__ -   Step: 3768, LR: 1.24399285077173e-05, Loss: 538.4425659179688
2024-08-03T17:53:31.395395853Z 
 40%|███▉      | 3769/9500 [12:56:01<19:37:41, 12.33s/it]08/03/2024 10:53:31 - INFO - __main__ -   Step: 3769, LR: 1.2437757964030022e-05, Loss: 761.5573120117188
2024-08-03T17:53:43.993308187Z 
 40%|███▉      | 3770/9500 [12:56:13<19:45:10, 12.41s/it]08/03/2024 10:53:43 - INFO - __main__ -   Step: 3770, LR: 1.2435587420342742e-05, Loss: 623.846923828125
2024-08-03T17:53:56.129405129Z 
 40%|███▉      | 3771/9500 [12:56:26<19:37:07, 12.33s/it]08/03/2024 10:53:56 - INFO - __main__ -   Step: 3771, LR: 1.2433416876655464e-05, Loss: 748.3465576171875
2024-08-03T17:54:08.129864873Z 
 40%|███▉      | 3772/9500 [12:56:38<19:27:32, 12.23s/it]08/03/2024 10:54:08 - INFO - __main__ -   Step: 3772, LR: 1.2431246332968185e-05, Loss: 504.0519104003906
2024-08-03T17:54:20.710549006Z 
 40%|███▉      | 3773/9500 [12:56:50<19:37:22, 12.34s/it]08/03/2024 10:54:20 - INFO - __main__ -   Step: 3773, LR: 1.2429075789280907e-05, Loss: 479.06451416015625
2024-08-03T17:54:32.898115369Z 
 40%|███▉      | 3774/9500 [12:57:02<19:32:56, 12.29s/it]08/03/2024 10:54:32 - INFO - __main__ -   Step: 3774, LR: 1.2426905245593628e-05, Loss: 634.830078125
2024-08-03T17:54:44.935735487Z 
 40%|███▉      | 3775/9500 [12:57:14<19:25:29, 12.21s/it]08/03/2024 10:54:44 - INFO - __main__ -   Step: 3775, LR: 1.2424734701906348e-05, Loss: 565.67822265625
2024-08-03T17:54:57.540037947Z 
 40%|███▉      | 3776/9500 [12:57:27<19:36:26, 12.33s/it]08/03/2024 10:54:57 - INFO - __main__ -   Step: 3776, LR: 1.242256415821907e-05, Loss: 488.99407958984375
2024-08-03T17:55:09.760884207Z 
 40%|███▉      | 3777/9500 [12:57:39<19:33:04, 12.30s/it]08/03/2024 10:55:09 - INFO - __main__ -   Step: 3777, LR: 1.2420393614531791e-05, Loss: 686.4388427734375
2024-08-03T17:55:21.930022436Z 
 40%|███▉      | 3778/9500 [12:57:51<19:29:09, 12.26s/it]08/03/2024 10:55:21 - INFO - __main__ -   Step: 3778, LR: 1.2418223070844513e-05, Loss: 609.7085571289062
2024-08-03T17:55:34.214772486Z 
 40%|███▉      | 3779/9500 [12:58:04<19:29:40, 12.27s/it]08/03/2024 10:55:34 - INFO - __main__ -   Step: 3779, LR: 1.2416052527157235e-05, Loss: 600.7109375
2024-08-03T17:55:46.994140068Z 
 40%|███▉      | 3780/9500 [12:58:16<19:44:06, 12.42s/it]08/03/2024 10:55:46 - INFO - __main__ -   Step: 3780, LR: 1.2413881983469956e-05, Loss: 501.22845458984375
2024-08-03T17:55:59.198742154Z 
 40%|███▉      | 3781/9500 [12:58:29<19:37:44, 12.36s/it]08/03/2024 10:55:59 - INFO - __main__ -   Step: 3781, LR: 1.2411711439782674e-05, Loss: 632.8602294921875
2024-08-03T17:56:11.428896426Z 
 40%|███▉      | 3782/9500 [12:58:41<19:33:55, 12.32s/it]08/03/2024 10:56:11 - INFO - __main__ -   Step: 3782, LR: 1.2409540896095396e-05, Loss: 871.8226928710938
2024-08-03T17:56:23.935140025Z 
 40%|███▉      | 3783/9500 [12:58:53<19:39:05, 12.37s/it]08/03/2024 10:56:23 - INFO - __main__ -   Step: 3783, LR: 1.2407370352408117e-05, Loss: 415.09173583984375
2024-08-03T17:56:36.099456223Z 
 40%|███▉      | 3784/9500 [12:59:06<19:32:52, 12.31s/it]08/03/2024 10:56:36 - INFO - __main__ -   Step: 3784, LR: 1.2405199808720837e-05, Loss: 624.3680419921875
2024-08-03T17:56:48.321959624Z 
 40%|███▉      | 3785/9500 [12:59:18<19:30:08, 12.28s/it]08/03/2024 10:56:48 - INFO - __main__ -   Step: 3785, LR: 1.2403029265033559e-05, Loss: 707.1482543945312
2024-08-03T17:57:00.998079163Z 
 40%|███▉      | 3786/9500 [12:59:30<19:41:06, 12.40s/it]08/03/2024 10:57:00 - INFO - __main__ -   Step: 3786, LR: 1.240085872134628e-05, Loss: 565.1824951171875
2024-08-03T17:57:13.016320426Z 
 40%|███▉      | 3787/9500 [12:59:42<19:29:55, 12.29s/it]08/03/2024 10:57:13 - INFO - __main__ -   Step: 3787, LR: 1.2398688177659002e-05, Loss: 579.01708984375
2024-08-03T17:57:25.081304144Z 
 40%|███▉      | 3788/9500 [12:59:55<19:23:22, 12.22s/it]08/03/2024 10:57:25 - INFO - __main__ -   Step: 3788, LR: 1.2396517633971724e-05, Loss: 465.80908203125
2024-08-03T17:57:37.951103555Z 
 40%|███▉      | 3789/9500 [13:00:07<19:41:43, 12.42s/it]08/03/2024 10:57:37 - INFO - __main__ -   Step: 3789, LR: 1.2394347090284445e-05, Loss: 651.324951171875
2024-08-03T17:57:50.301551108Z 
 40%|███▉      | 3790/9500 [13:00:20<19:39:40, 12.40s/it]08/03/2024 10:57:50 - INFO - __main__ -   Step: 3790, LR: 1.2392176546597165e-05, Loss: 507.1953125
2024-08-03T17:58:02.542217643Z 
 40%|███▉      | 3791/9500 [13:00:32<19:35:01, 12.35s/it]08/03/2024 10:58:02 - INFO - __main__ -   Step: 3791, LR: 1.2390006002909887e-05, Loss: 724.9232177734375
2024-08-03T17:58:15.408573897Z 
 40%|███▉      | 3792/9500 [13:00:45<19:49:33, 12.50s/it]08/03/2024 10:58:15 - INFO - __main__ -   Step: 3792, LR: 1.2387835459222608e-05, Loss: 658.6522216796875
2024-08-03T17:58:27.624972485Z 
 40%|███▉      | 3793/9500 [13:00:57<19:41:09, 12.42s/it]08/03/2024 10:58:27 - INFO - __main__ -   Step: 3793, LR: 1.238566491553533e-05, Loss: 638.9912109375
2024-08-03T17:58:39.651384872Z 
 40%|███▉      | 3794/9500 [13:01:09<19:29:47, 12.30s/it]08/03/2024 10:58:39 - INFO - __main__ -   Step: 3794, LR: 1.2383494371848051e-05, Loss: 482.8897705078125
2024-08-03T17:58:52.063760263Z 
 40%|███▉      | 3795/9500 [13:01:22<19:32:46, 12.33s/it]08/03/2024 10:58:52 - INFO - __main__ -   Step: 3795, LR: 1.238132382816077e-05, Loss: 450.468505859375
2024-08-03T17:59:04.276949304Z 
 40%|███▉      | 3796/9500 [13:01:34<19:29:06, 12.30s/it]08/03/2024 10:59:04 - INFO - __main__ -   Step: 3796, LR: 1.2379153284473491e-05, Loss: 568.316650390625
2024-08-03T17:59:16.322738827Z 
 40%|███▉      | 3797/9500 [13:01:46<19:21:43, 12.22s/it]08/03/2024 10:59:16 - INFO - __main__ -   Step: 3797, LR: 1.2376982740786212e-05, Loss: 571.293701171875
2024-08-03T17:59:28.858537819Z 
 40%|███▉      | 3798/9500 [13:01:58<19:30:27, 12.32s/it]08/03/2024 10:59:28 - INFO - __main__ -   Step: 3798, LR: 1.2374812197098934e-05, Loss: 597.369873046875
2024-08-03T17:59:41.000206679Z 
 40%|███▉      | 3799/9500 [13:02:10<19:25:15, 12.26s/it]08/03/2024 10:59:41 - INFO - __main__ -   Step: 3799, LR: 1.2372641653411654e-05, Loss: 519.2689208984375
2024-08-03T17:59:52.960608823Z 
 40%|████      | 3800/9500 [13:02:22<19:16:25, 12.17s/it]08/03/2024 10:59:52 - INFO - __main__ -   Step: 3800, LR: 1.2370471109724375e-05, Loss: 703.3995361328125
2024-08-03T18:00:05.305150450Z 
 40%|████      | 3801/9500 [13:02:35<19:21:06, 12.22s/it]08/03/2024 11:00:05 - INFO - __main__ -   Step: 3801, LR: 1.2368300566037097e-05, Loss: 445.98040771484375
2024-08-03T18:00:17.396744380Z 
 40%|████      | 3802/9500 [13:02:47<19:17:07, 12.18s/it]08/03/2024 11:00:17 - INFO - __main__ -   Step: 3802, LR: 1.2366130022349819e-05, Loss: 738.52099609375
2024-08-03T18:00:29.486548196Z 
 40%|████      | 3803/9500 [13:02:59<19:14:13, 12.16s/it]08/03/2024 11:00:29 - INFO - __main__ -   Step: 3803, LR: 1.236395947866254e-05, Loss: 622.6644287109375
2024-08-03T18:00:41.946535383Z 
 40%|████      | 3804/9500 [13:03:11<19:22:41, 12.25s/it]08/03/2024 11:00:41 - INFO - __main__ -   Step: 3804, LR: 1.236178893497526e-05, Loss: 662.8787231445312
2024-08-03T18:00:54.369658901Z 
 40%|████      | 3805/9500 [13:03:24<19:27:28, 12.30s/it]08/03/2024 11:00:54 - INFO - __main__ -   Step: 3805, LR: 1.2359618391287982e-05, Loss: 588.857177734375
2024-08-03T18:01:06.992081338Z 
 40%|████      | 3806/9500 [13:03:36<19:36:27, 12.40s/it]08/03/2024 11:01:06 - INFO - __main__ -   Step: 3806, LR: 1.2357447847600703e-05, Loss: 678.2181396484375
2024-08-03T18:01:19.580782732Z 
 40%|████      | 3807/9500 [13:03:49<19:41:42, 12.45s/it]08/03/2024 11:01:19 - INFO - __main__ -   Step: 3807, LR: 1.2355277303913425e-05, Loss: 627.1090698242188
2024-08-03T18:01:32.207419129Z 
 40%|████      | 3808/9500 [13:04:02<19:46:24, 12.51s/it]08/03/2024 11:01:32 - INFO - __main__ -   Step: 3808, LR: 1.2353106760226146e-05, Loss: 777.6029052734375
2024-08-03T18:01:44.421942805Z 
 40%|████      | 3809/9500 [13:04:14<19:37:53, 12.42s/it]08/03/2024 11:01:44 - INFO - __main__ -   Step: 3809, LR: 1.2350936216538864e-05, Loss: 661.0286254882812
2024-08-03T18:01:56.825328967Z 
 40%|████      | 3810/9500 [13:04:26<19:37:16, 12.41s/it]08/03/2024 11:01:56 - INFO - __main__ -   Step: 3810, LR: 1.2348765672851586e-05, Loss: 558.8403930664062
2024-08-03T18:02:09.015767606Z 
 40%|████      | 3811/9500 [13:04:38<19:30:41, 12.35s/it]08/03/2024 11:02:09 - INFO - __main__ -   Step: 3811, LR: 1.2346595129164308e-05, Loss: 475.2502746582031
2024-08-03T18:02:20.899011780Z 
 40%|████      | 3812/9500 [13:04:50<19:17:17, 12.21s/it]08/03/2024 11:02:20 - INFO - __main__ -   Step: 3812, LR: 1.2344424585477029e-05, Loss: 605.628662109375
2024-08-03T18:02:33.200473306Z 
 40%|████      | 3813/9500 [13:05:03<19:19:45, 12.24s/it]08/03/2024 11:02:33 - INFO - __main__ -   Step: 3813, LR: 1.2342254041789749e-05, Loss: 440.82525634765625
2024-08-03T18:02:45.921241767Z 
 40%|████      | 3814/9500 [13:05:15<19:33:20, 12.38s/it]08/03/2024 11:02:45 - INFO - __main__ -   Step: 3814, LR: 1.234008349810247e-05, Loss: 664.7099609375
2024-08-03T18:02:57.994488637Z 
 40%|████      | 3815/9500 [13:05:27<19:24:22, 12.29s/it]08/03/2024 11:02:57 - INFO - __main__ -   Step: 3815, LR: 1.2337912954415192e-05, Loss: 599.939697265625
2024-08-03T18:03:10.692731521Z 
 40%|████      | 3816/9500 [13:05:40<19:35:48, 12.41s/it]08/03/2024 11:03:10 - INFO - __main__ -   Step: 3816, LR: 1.2335742410727914e-05, Loss: 723.837158203125
2024-08-03T18:03:22.787431651Z 
 40%|████      | 3817/9500 [13:05:52<19:26:35, 12.32s/it]08/03/2024 11:03:22 - INFO - __main__ -   Step: 3817, LR: 1.2333571867040635e-05, Loss: 754.919921875
2024-08-03T18:03:34.951291711Z 
 40%|████      | 3818/9500 [13:06:04<19:22:02, 12.27s/it]08/03/2024 11:03:34 - INFO - __main__ -   Step: 3818, LR: 1.2331401323353355e-05, Loss: 443.3081970214844
2024-08-03T18:03:47.287015478Z 
 40%|████      | 3819/9500 [13:06:17<19:23:41, 12.29s/it]08/03/2024 11:03:47 - INFO - __main__ -   Step: 3819, LR: 1.2329230779666077e-05, Loss: 471.45599365234375
2024-08-03T18:04:00.170486176Z 
 40%|████      | 3820/9500 [13:06:30<19:40:18, 12.47s/it]08/03/2024 11:04:00 - INFO - __main__ -   Step: 3820, LR: 1.2327060235978798e-05, Loss: 611.2776489257812
2024-08-03T18:04:12.071913747Z 
 40%|████      | 3821/9500 [13:06:42<19:24:01, 12.30s/it]08/03/2024 11:04:12 - INFO - __main__ -   Step: 3821, LR: 1.232488969229152e-05, Loss: 619.8196411132812
2024-08-03T18:04:24.039848830Z 
 40%|████      | 3822/9500 [13:06:53<19:14:26, 12.20s/it]08/03/2024 11:04:24 - INFO - __main__ -   Step: 3822, LR: 1.2322719148604238e-05, Loss: 538.2020263671875
2024-08-03T18:04:36.845026782Z 
 40%|████      | 3823/9500 [13:07:06<19:31:27, 12.38s/it]08/03/2024 11:04:36 - INFO - __main__ -   Step: 3823, LR: 1.232054860491696e-05, Loss: 524.139404296875
2024-08-03T18:04:49.089334601Z 
 40%|████      | 3824/9500 [13:07:19<19:27:21, 12.34s/it]08/03/2024 11:04:49 - INFO - __main__ -   Step: 3824, LR: 1.2318378061229681e-05, Loss: 560.19091796875
2024-08-03T18:05:01.202164065Z 
 40%|████      | 3825/9500 [13:07:31<19:20:42, 12.27s/it]08/03/2024 11:05:01 - INFO - __main__ -   Step: 3825, LR: 1.2316207517542403e-05, Loss: 539.5877075195312
2024-08-03T18:05:13.876418600Z 
 40%|████      | 3826/9500 [13:07:43<19:31:55, 12.39s/it]08/03/2024 11:05:13 - INFO - __main__ -   Step: 3826, LR: 1.2314036973855124e-05, Loss: 516.1611938476562
2024-08-03T18:05:25.852952733Z 
 40%|████      | 3827/9500 [13:07:55<19:19:55, 12.27s/it]08/03/2024 11:05:25 - INFO - __main__ -   Step: 3827, LR: 1.2311866430167844e-05, Loss: 731.1943359375
2024-08-03T18:05:38.469900845Z 
 40%|████      | 3828/9500 [13:08:08<19:29:37, 12.37s/it]08/03/2024 11:05:38 - INFO - __main__ -   Step: 3828, LR: 1.2309695886480566e-05, Loss: 548.0343017578125
2024-08-03T18:05:50.945429218Z 
 40%|████      | 3829/9500 [13:08:20<19:32:18, 12.40s/it]08/03/2024 11:05:50 - INFO - __main__ -   Step: 3829, LR: 1.2307525342793287e-05, Loss: 591.979736328125
2024-08-03T18:06:02.987186096Z 
 40%|████      | 3830/9500 [13:08:32<19:21:52, 12.29s/it]08/03/2024 11:06:02 - INFO - __main__ -   Step: 3830, LR: 1.2305354799106009e-05, Loss: 492.2176513671875
2024-08-03T18:06:15.388994456Z 
 40%|████      | 3831/9500 [13:08:45<19:24:41, 12.33s/it]08/03/2024 11:06:15 - INFO - __main__ -   Step: 3831, LR: 1.230318425541873e-05, Loss: 553.4981689453125
2024-08-03T18:06:28.133482755Z 
 40%|████      | 3832/9500 [13:08:58<19:36:18, 12.45s/it]08/03/2024 11:06:28 - INFO - __main__ -   Step: 3832, LR: 1.230101371173145e-05, Loss: 528.73291015625
2024-08-03T18:06:40.330055705Z 
 40%|████      | 3833/9500 [13:09:10<19:28:52, 12.38s/it]08/03/2024 11:06:40 - INFO - __main__ -   Step: 3833, LR: 1.2298843168044172e-05, Loss: 532.1444091796875
2024-08-03T18:06:52.466456471Z 
 40%|████      | 3834/9500 [13:09:22<19:21:53, 12.30s/it]08/03/2024 11:06:52 - INFO - __main__ -   Step: 3834, LR: 1.2296672624356893e-05, Loss: 700.725830078125
2024-08-03T18:07:04.887158113Z 
 40%|████      | 3835/9500 [13:09:34<19:25:00, 12.34s/it]08/03/2024 11:07:04 - INFO - __main__ -   Step: 3835, LR: 1.2294502080669615e-05, Loss: 610.1485595703125
2024-08-03T18:07:16.921735891Z 
 40%|████      | 3836/9500 [13:09:46<19:16:09, 12.25s/it]08/03/2024 11:07:16 - INFO - __main__ -   Step: 3836, LR: 1.2292331536982333e-05, Loss: 568.802978515625
2024-08-03T18:07:28.905932879Z 
 40%|████      | 3837/9500 [13:09:58<19:08:31, 12.17s/it]08/03/2024 11:07:28 - INFO - __main__ -   Step: 3837, LR: 1.2290160993295055e-05, Loss: 538.8359375
2024-08-03T18:07:41.813251963Z 
 40%|████      | 3838/9500 [13:10:11<19:29:13, 12.39s/it]08/03/2024 11:07:41 - INFO - __main__ -   Step: 3838, LR: 1.2287990449607776e-05, Loss: 543.9033813476562
2024-08-03T18:07:53.787921863Z 
 40%|████      | 3839/9500 [13:10:23<19:17:14, 12.27s/it]08/03/2024 11:07:53 - INFO - __main__ -   Step: 3839, LR: 1.2285819905920498e-05, Loss: 537.8209838867188
2024-08-03T18:08:05.753144913Z 
 40%|████      | 3840/9500 [13:10:35<19:08:32, 12.18s/it]08/03/2024 11:08:05 - INFO - __main__ -   Step: 3840, LR: 1.228364936223322e-05, Loss: 601.2574462890625
2024-08-03T18:08:18.038305381Z 
 40%|████      | 3841/9500 [13:10:47<19:11:27, 12.21s/it]08/03/2024 11:08:18 - INFO - __main__ -   Step: 3841, LR: 1.2281478818545939e-05, Loss: 477.8993835449219
2024-08-03T18:08:30.421626540Z 
 40%|████      | 3842/9500 [13:11:00<19:16:11, 12.26s/it]08/03/2024 11:08:30 - INFO - __main__ -   Step: 3842, LR: 1.227930827485866e-05, Loss: 702.303466796875
2024-08-03T18:08:42.503090160Z 
 40%|████      | 3843/9500 [13:11:12<19:10:55, 12.21s/it]08/03/2024 11:08:42 - INFO - __main__ -   Step: 3843, LR: 1.2277137731171382e-05, Loss: 668.072509765625
2024-08-03T18:08:54.868322872Z 
 40%|████      | 3844/9500 [13:11:24<19:15:11, 12.25s/it]08/03/2024 11:08:54 - INFO - __main__ -   Step: 3844, LR: 1.2274967187484104e-05, Loss: 724.540283203125
2024-08-03T18:09:07.167486098Z 
 40%|████      | 3845/9500 [13:11:37<19:16:14, 12.27s/it]08/03/2024 11:09:07 - INFO - __main__ -   Step: 3845, LR: 1.2272796643796825e-05, Loss: 724.0263671875
2024-08-03T18:09:19.100178035Z 
 40%|████      | 3846/9500 [13:11:49<19:06:34, 12.17s/it]08/03/2024 11:09:19 - INFO - __main__ -   Step: 3846, LR: 1.2270626100109547e-05, Loss: 605.4962158203125
2024-08-03T18:09:31.372749208Z 
 40%|████      | 3847/9500 [13:12:01<19:09:20, 12.20s/it]08/03/2024 11:09:31 - INFO - __main__ -   Step: 3847, LR: 1.2268455556422267e-05, Loss: 435.13714599609375
2024-08-03T18:09:43.761674103Z 
 41%|████      | 3848/9500 [13:12:13<19:14:30, 12.26s/it]08/03/2024 11:09:43 - INFO - __main__ -   Step: 3848, LR: 1.2266285012734988e-05, Loss: 644.0667724609375
2024-08-03T18:09:55.827951590Z 
 41%|████      | 3849/9500 [13:12:25<19:08:57, 12.20s/it]08/03/2024 11:09:55 - INFO - __main__ -   Step: 3849, LR: 1.226411446904771e-05, Loss: 658.6033935546875
2024-08-03T18:10:08.342940938Z 
 41%|████      | 3850/9500 [13:12:38<19:17:39, 12.29s/it]08/03/2024 11:10:08 - INFO - __main__ -   Step: 3850, LR: 1.2261943925360428e-05, Loss: 681.005615234375
2024-08-03T18:10:20.644253852Z 
 41%|████      | 3851/9500 [13:12:50<19:17:40, 12.30s/it]08/03/2024 11:10:20 - INFO - __main__ -   Step: 3851, LR: 1.225977338167315e-05, Loss: 734.64697265625
2024-08-03T18:10:32.528353271Z 
 41%|████      | 3852/9500 [13:13:02<19:05:49, 12.17s/it]08/03/2024 11:10:32 - INFO - __main__ -   Step: 3852, LR: 1.2257602837985871e-05, Loss: 419.20660400390625
2024-08-03T18:10:45.214222660Z 
 41%|████      | 3853/9500 [13:13:15<19:20:07, 12.33s/it]08/03/2024 11:10:45 - INFO - __main__ -   Step: 3853, LR: 1.2255432294298593e-05, Loss: 521.9542846679688
2024-08-03T18:10:57.445744555Z 
 41%|████      | 3854/9500 [13:13:27<19:17:14, 12.30s/it]08/03/2024 11:10:57 - INFO - __main__ -   Step: 3854, LR: 1.2253261750611314e-05, Loss: 477.4194641113281
2024-08-03T18:11:09.771327885Z 
 41%|████      | 3855/9500 [13:13:39<19:17:49, 12.31s/it]08/03/2024 11:11:09 - INFO - __main__ -   Step: 3855, LR: 1.2251091206924036e-05, Loss: 652.74560546875
2024-08-03T18:11:22.498749416Z 
 41%|████      | 3856/9500 [13:13:52<19:29:29, 12.43s/it]08/03/2024 11:11:22 - INFO - __main__ -   Step: 3856, LR: 1.2248920663236756e-05, Loss: 704.812255859375
2024-08-03T18:11:34.922592902Z 
 41%|████      | 3857/9500 [13:14:04<19:29:03, 12.43s/it]08/03/2024 11:11:34 - INFO - __main__ -   Step: 3857, LR: 1.2246750119549477e-05, Loss: 607.2427978515625
2024-08-03T18:11:46.748224665Z 
 41%|████      | 3858/9500 [13:14:16<19:11:46, 12.25s/it]08/03/2024 11:11:46 - INFO - __main__ -   Step: 3858, LR: 1.2244579575862199e-05, Loss: 491.14031982421875
2024-08-03T18:11:59.008920803Z 
 41%|████      | 3859/9500 [13:14:28<19:11:55, 12.25s/it]08/03/2024 11:11:59 - INFO - __main__ -   Step: 3859, LR: 1.224240903217492e-05, Loss: 570.8829345703125
2024-08-03T18:12:11.738985847Z 
 41%|████      | 3860/9500 [13:14:41<19:25:11, 12.40s/it]08/03/2024 11:12:11 - INFO - __main__ -   Step: 3860, LR: 1.2240238488487642e-05, Loss: 481.2624816894531
2024-08-03T18:12:23.936084561Z 
 41%|████      | 3861/9500 [13:14:53<19:19:23, 12.34s/it]08/03/2024 11:12:23 - INFO - __main__ -   Step: 3861, LR: 1.2238067944800362e-05, Loss: 699.4410400390625
2024-08-03T18:12:36.207819842Z 
 41%|████      | 3862/9500 [13:15:06<19:17:22, 12.32s/it]08/03/2024 11:12:36 - INFO - __main__ -   Step: 3862, LR: 1.2235897401113083e-05, Loss: 439.7151794433594
2024-08-03T18:12:48.597015079Z 
 41%|████      | 3863/9500 [13:15:18<19:19:12, 12.34s/it]08/03/2024 11:12:48 - INFO - __main__ -   Step: 3863, LR: 1.2233726857425805e-05, Loss: 735.0089721679688
2024-08-03T18:13:00.716106931Z 
 41%|████      | 3864/9500 [13:15:30<19:12:45, 12.27s/it]08/03/2024 11:13:00 - INFO - __main__ -   Step: 3864, LR: 1.2231556313738525e-05, Loss: 600.7869873046875
2024-08-03T18:13:12.751010554Z 
 41%|████      | 3865/9500 [13:15:42<19:05:54, 12.20s/it]08/03/2024 11:13:12 - INFO - __main__ -   Step: 3865, LR: 1.2229385770051245e-05, Loss: 637.681396484375
2024-08-03T18:13:25.239019855Z 
 41%|████      | 3866/9500 [13:15:55<19:13:47, 12.29s/it]08/03/2024 11:13:25 - INFO - __main__ -   Step: 3866, LR: 1.2227215226363966e-05, Loss: 610.5250244140625
2024-08-03T18:13:37.489973121Z 
 41%|████      | 3867/9500 [13:16:07<19:12:33, 12.28s/it]08/03/2024 11:13:37 - INFO - __main__ -   Step: 3867, LR: 1.2225044682676688e-05, Loss: 666.1888427734375
2024-08-03T18:13:49.802708071Z 
 41%|████      | 3868/9500 [13:16:19<19:13:22, 12.29s/it]08/03/2024 11:13:49 - INFO - __main__ -   Step: 3868, LR: 1.222287413898941e-05, Loss: 732.521240234375
2024-08-03T18:14:02.722134206Z 
 41%|████      | 3869/9500 [13:16:32<19:30:57, 12.48s/it]08/03/2024 11:14:02 - INFO - __main__ -   Step: 3869, LR: 1.2220703595302131e-05, Loss: 750.869873046875
2024-08-03T18:14:14.810362855Z 
 41%|████      | 3870/9500 [13:16:44<19:19:49, 12.36s/it]08/03/2024 11:14:14 - INFO - __main__ -   Step: 3870, LR: 1.221853305161485e-05, Loss: 608.7678833007812
2024-08-03T18:14:26.972866935Z 
 41%|████      | 3871/9500 [13:16:56<19:14:02, 12.30s/it]08/03/2024 11:14:26 - INFO - __main__ -   Step: 3871, LR: 1.2216362507927572e-05, Loss: 692.2933349609375
2024-08-03T18:14:39.559392139Z 
 41%|████      | 3872/9500 [13:17:09<19:21:52, 12.39s/it]08/03/2024 11:14:39 - INFO - __main__ -   Step: 3872, LR: 1.2214191964240294e-05, Loss: 764.2010498046875
2024-08-03T18:14:52.092631558Z 
 41%|████      | 3873/9500 [13:17:22<19:25:46, 12.43s/it]08/03/2024 11:14:52 - INFO - __main__ -   Step: 3873, LR: 1.2212021420553015e-05, Loss: 602.6944580078125
2024-08-03T18:15:04.318451375Z 
 41%|████      | 3874/9500 [13:17:34<19:19:49, 12.37s/it]08/03/2024 11:15:04 - INFO - __main__ -   Step: 3874, LR: 1.2209850876865737e-05, Loss: 662.1381225585938
2024-08-03T18:15:17.318462842Z 
 41%|████      | 3875/9500 [13:17:47<19:37:21, 12.56s/it]08/03/2024 11:15:17 - INFO - __main__ -   Step: 3875, LR: 1.2207680333178457e-05, Loss: 743.2664794921875
2024-08-03T18:15:29.166352393Z 
 41%|████      | 3876/9500 [13:17:59<19:17:09, 12.35s/it]08/03/2024 11:15:29 - INFO - __main__ -   Step: 3876, LR: 1.2205509789491178e-05, Loss: 531.2723999023438
2024-08-03T18:15:41.204117716Z 
 41%|████      | 3877/9500 [13:18:11<19:08:18, 12.25s/it]08/03/2024 11:15:41 - INFO - __main__ -   Step: 3877, LR: 1.22033392458039e-05, Loss: 588.0611572265625
2024-08-03T18:15:53.380685311Z 
 41%|████      | 3878/9500 [13:18:23<19:05:57, 12.23s/it]08/03/2024 11:15:53 - INFO - __main__ -   Step: 3878, LR: 1.220116870211662e-05, Loss: 544.9518432617188
2024-08-03T18:16:05.824066628Z 
 41%|████      | 3879/9500 [13:18:35<19:11:45, 12.29s/it]08/03/2024 11:16:05 - INFO - __main__ -   Step: 3879, LR: 1.219899815842934e-05, Loss: 701.4150390625
2024-08-03T18:16:17.792845536Z 
 41%|████      | 3880/9500 [13:18:47<19:02:23, 12.20s/it]08/03/2024 11:16:17 - INFO - __main__ -   Step: 3880, LR: 1.2196827614742061e-05, Loss: 695.695556640625
2024-08-03T18:16:30.296636941Z 
 41%|████      | 3881/9500 [13:19:00<19:10:49, 12.29s/it]08/03/2024 11:16:30 - INFO - __main__ -   Step: 3881, LR: 1.2194657071054783e-05, Loss: 616.8613891601562
2024-08-03T18:16:42.262455217Z 
 41%|████      | 3882/9500 [13:19:12<19:01:33, 12.19s/it]08/03/2024 11:16:42 - INFO - __main__ -   Step: 3882, LR: 1.2192486527367504e-05, Loss: 576.3184814453125
2024-08-03T18:16:54.674737141Z 
 41%|████      | 3883/9500 [13:19:24<19:07:33, 12.26s/it]08/03/2024 11:16:54 - INFO - __main__ -   Step: 3883, LR: 1.2190315983680226e-05, Loss: 599.204345703125
2024-08-03T18:17:07.308568231Z 
 41%|████      | 3884/9500 [13:19:37<19:17:53, 12.37s/it]08/03/2024 11:17:07 - INFO - __main__ -   Step: 3884, LR: 1.2188145439992946e-05, Loss: 781.740966796875
2024-08-03T18:17:19.662566718Z 
 41%|████      | 3885/9500 [13:19:49<19:17:13, 12.37s/it]08/03/2024 11:17:19 - INFO - __main__ -   Step: 3885, LR: 1.2185974896305667e-05, Loss: 627.1021728515625
2024-08-03T18:17:31.736180293Z 
 41%|████      | 3886/9500 [13:20:01<19:08:49, 12.28s/it]08/03/2024 11:17:31 - INFO - __main__ -   Step: 3886, LR: 1.2183804352618389e-05, Loss: 589.1792602539062
2024-08-03T18:17:44.369849072Z 
 41%|████      | 3887/9500 [13:20:14<19:18:35, 12.38s/it]08/03/2024 11:17:44 - INFO - __main__ -   Step: 3887, LR: 1.218163380893111e-05, Loss: 687.8673095703125
2024-08-03T18:17:56.456673213Z 
 41%|████      | 3888/9500 [13:20:26<19:10:01, 12.30s/it]08/03/2024 11:17:56 - INFO - __main__ -   Step: 3888, LR: 1.2179463265243832e-05, Loss: 531.1161499023438
2024-08-03T18:18:08.523945996Z 
 41%|████      | 3889/9500 [13:20:38<19:03:25, 12.23s/it]08/03/2024 11:18:08 - INFO - __main__ -   Step: 3889, LR: 1.2177292721556554e-05, Loss: 512.771240234375
2024-08-03T18:18:20.884685783Z 
 41%|████      | 3890/9500 [13:20:50<19:06:58, 12.27s/it]08/03/2024 11:18:20 - INFO - __main__ -   Step: 3890, LR: 1.2175122177869273e-05, Loss: 641.310791015625
2024-08-03T18:18:32.751423804Z 
 41%|████      | 3891/9500 [13:21:02<18:55:32, 12.15s/it]08/03/2024 11:18:32 - INFO - __main__ -   Step: 3891, LR: 1.2172951634181995e-05, Loss: 518.927490234375
2024-08-03T18:18:44.588993945Z 
 41%|████      | 3892/9500 [13:21:14<18:46:39, 12.05s/it]08/03/2024 11:18:44 - INFO - __main__ -   Step: 3892, LR: 1.2170781090494715e-05, Loss: 500.8942565917969
2024-08-03T18:18:57.075239548Z 
 41%|████      | 3893/9500 [13:21:27<18:58:34, 12.18s/it]08/03/2024 11:18:57 - INFO - __main__ -   Step: 3893, LR: 1.2168610546807435e-05, Loss: 626.9154052734375
2024-08-03T18:19:09.095629302Z 
 41%|████      | 3894/9500 [13:21:39<18:53:47, 12.13s/it]08/03/2024 11:19:09 - INFO - __main__ -   Step: 3894, LR: 1.2166440003120156e-05, Loss: 530.5383911132812
2024-08-03T18:19:21.600598645Z 
 41%|████      | 3895/9500 [13:21:51<19:03:58, 12.25s/it]08/03/2024 11:19:21 - INFO - __main__ -   Step: 3895, LR: 1.2164269459432878e-05, Loss: 532.1553344726562
2024-08-03T18:19:33.886287501Z 
 41%|████      | 3896/9500 [13:22:03<19:04:52, 12.26s/it]08/03/2024 11:19:33 - INFO - __main__ -   Step: 3896, LR: 1.21620989157456e-05, Loss: 442.80059814453125
2024-08-03T18:19:46.645370590Z 
 41%|████      | 3897/9500 [13:22:16<19:18:42, 12.41s/it]08/03/2024 11:19:46 - INFO - __main__ -   Step: 3897, LR: 1.2159928372058321e-05, Loss: 586.9302368164062
2024-08-03T18:19:58.586478521Z 
 41%|████      | 3898/9500 [13:22:28<19:05:25, 12.27s/it]08/03/2024 11:19:58 - INFO - __main__ -   Step: 3898, LR: 1.2157757828371043e-05, Loss: 580.9922485351562
2024-08-03T18:20:11.045558270Z 
 41%|████      | 3899/9500 [13:22:40<19:10:34, 12.33s/it]08/03/2024 11:20:11 - INFO - __main__ -   Step: 3899, LR: 1.2155587284683762e-05, Loss: 649.0
2024-08-03T18:20:23.267686314Z 
 41%|████      | 3900/9500 [13:22:53<19:07:29, 12.29s/it]08/03/2024 11:20:23 - INFO - __main__ -   Step: 3900, LR: 1.2153416740996484e-05, Loss: 602.6131591796875
2024-08-03T18:20:35.725018941Z 
 41%|████      | 3901/9500 [13:23:05<19:11:50, 12.34s/it]08/03/2024 11:20:35 - INFO - __main__ -   Step: 3901, LR: 1.2151246197309206e-05, Loss: 654.6527099609375
2024-08-03T18:20:48.218133129Z 
 41%|████      | 3902/9500 [13:23:18<19:15:49, 12.39s/it]08/03/2024 11:20:48 - INFO - __main__ -   Step: 3902, LR: 1.2149075653621927e-05, Loss: 516.5276489257812
2024-08-03T18:21:00.788494500Z 
 41%|████      | 3903/9500 [13:23:30<19:20:42, 12.44s/it]08/03/2024 11:21:00 - INFO - __main__ -   Step: 3903, LR: 1.2146905109934649e-05, Loss: 540.41796875
2024-08-03T18:21:13.289140213Z 
 41%|████      | 3904/9500 [13:23:43<19:22:07, 12.46s/it]08/03/2024 11:21:13 - INFO - __main__ -   Step: 3904, LR: 1.2144734566247369e-05, Loss: 711.6240234375
2024-08-03T18:21:25.817221025Z 
 41%|████      | 3905/9500 [13:23:55<19:23:48, 12.48s/it]08/03/2024 11:21:25 - INFO - __main__ -   Step: 3905, LR: 1.214256402256009e-05, Loss: 590.21630859375
2024-08-03T18:21:38.281366666Z 
 41%|████      | 3906/9500 [13:24:08<19:23:06, 12.48s/it]08/03/2024 11:21:38 - INFO - __main__ -   Step: 3906, LR: 1.214039347887281e-05, Loss: 546.0103759765625
2024-08-03T18:21:50.176270252Z 
 41%|████      | 3907/9500 [13:24:20<19:06:42, 12.30s/it]08/03/2024 11:21:50 - INFO - __main__ -   Step: 3907, LR: 1.2138222935185532e-05, Loss: 424.9460754394531
2024-08-03T18:22:02.133701510Z 
 41%|████      | 3908/9500 [13:24:32<18:56:52, 12.20s/it]08/03/2024 11:22:02 - INFO - __main__ -   Step: 3908, LR: 1.2136052391498251e-05, Loss: 489.71551513671875
2024-08-03T18:22:14.653151528Z 
 41%|████      | 3909/9500 [13:24:44<19:05:38, 12.29s/it]08/03/2024 11:22:14 - INFO - __main__ -   Step: 3909, LR: 1.2133881847810973e-05, Loss: 686.7621459960938
2024-08-03T18:22:27.197216608Z 
 41%|████      | 3910/9500 [13:24:57<19:12:25, 12.37s/it]08/03/2024 11:22:27 - INFO - __main__ -   Step: 3910, LR: 1.2131711304123695e-05, Loss: 541.5634765625
2024-08-03T18:22:39.184338634Z 
 41%|████      | 3911/9500 [13:25:09<19:01:32, 12.25s/it]08/03/2024 11:22:39 - INFO - __main__ -   Step: 3911, LR: 1.2129540760436416e-05, Loss: 637.4713134765625
2024-08-03T18:22:51.690367896Z 
 41%|████      | 3912/9500 [13:25:21<19:08:21, 12.33s/it]08/03/2024 11:22:51 - INFO - __main__ -   Step: 3912, LR: 1.2127370216749138e-05, Loss: 560.198974609375
2024-08-03T18:23:03.961602430Z 
 41%|████      | 3913/9500 [13:25:33<19:06:29, 12.31s/it]08/03/2024 11:23:03 - INFO - __main__ -   Step: 3913, LR: 1.2125199673061857e-05, Loss: 477.7156066894531
2024-08-03T18:23:16.205747754Z 
 41%|████      | 3914/9500 [13:25:46<19:04:23, 12.29s/it]08/03/2024 11:23:16 - INFO - __main__ -   Step: 3914, LR: 1.2123029129374579e-05, Loss: 727.1282958984375
2024-08-03T18:23:28.557324657Z 
 41%|████      | 3915/9500 [13:25:58<19:05:50, 12.31s/it]08/03/2024 11:23:28 - INFO - __main__ -   Step: 3915, LR: 1.21208585856873e-05, Loss: 613.96923828125
2024-08-03T18:23:40.418322176Z 
 41%|████      | 3916/9500 [13:26:10<18:53:05, 12.18s/it]08/03/2024 11:23:40 - INFO - __main__ -   Step: 3916, LR: 1.2118688042000022e-05, Loss: 461.04833984375
2024-08-03T18:23:52.563096379Z 
 41%|████      | 3917/9500 [13:26:22<18:52:03, 12.17s/it]08/03/2024 11:23:52 - INFO - __main__ -   Step: 3917, LR: 1.2116517498312744e-05, Loss: 565.9874877929688
2024-08-03T18:24:05.020914630Z 
 41%|████      | 3918/9500 [13:26:34<18:59:59, 12.25s/it]08/03/2024 11:24:05 - INFO - __main__ -   Step: 3918, LR: 1.2114346954625464e-05, Loss: 648.1248779296875
2024-08-03T18:24:17.059827802Z 
 41%|████▏     | 3919/9500 [13:26:46<18:53:48, 12.19s/it]08/03/2024 11:24:17 - INFO - __main__ -   Step: 3919, LR: 1.2112176410938185e-05, Loss: 444.22698974609375
2024-08-03T18:24:29.513549499Z 
 41%|████▏     | 3920/9500 [13:26:59<19:00:58, 12.27s/it]08/03/2024 11:24:29 - INFO - __main__ -   Step: 3920, LR: 1.2110005867250905e-05, Loss: 729.557861328125
2024-08-03T18:24:42.071146522Z 
 41%|████▏     | 3921/9500 [13:27:12<19:08:49, 12.36s/it]08/03/2024 11:24:42 - INFO - __main__ -   Step: 3921, LR: 1.2107835323563627e-05, Loss: 618.77978515625
2024-08-03T18:24:54.192084940Z 
 41%|████▏     | 3922/9500 [13:27:24<19:02:05, 12.28s/it]08/03/2024 11:24:54 - INFO - __main__ -   Step: 3922, LR: 1.2105664779876346e-05, Loss: 521.1195068359375
2024-08-03T18:25:06.241270336Z 
 41%|████▏     | 3923/9500 [13:27:36<18:55:18, 12.21s/it]08/03/2024 11:25:06 - INFO - __main__ -   Step: 3923, LR: 1.2103494236189068e-05, Loss: 527.64697265625
2024-08-03T18:25:18.800446645Z 
 41%|████▏     | 3924/9500 [13:27:48<19:04:43, 12.32s/it]08/03/2024 11:25:18 - INFO - __main__ -   Step: 3924, LR: 1.210132369250179e-05, Loss: 642.3927001953125
2024-08-03T18:25:30.997743105Z 
 41%|████▏     | 3925/9500 [13:28:00<19:01:10, 12.28s/it]08/03/2024 11:25:30 - INFO - __main__ -   Step: 3925, LR: 1.2099153148814511e-05, Loss: 541.4871826171875
2024-08-03T18:25:43.047405990Z 
 41%|████▏     | 3926/9500 [13:28:12<18:54:29, 12.21s/it]08/03/2024 11:25:43 - INFO - __main__ -   Step: 3926, LR: 1.2096982605127233e-05, Loss: 514.1695556640625
2024-08-03T18:25:55.469818477Z 
 41%|████▏     | 3927/9500 [13:28:25<19:00:09, 12.28s/it]08/03/2024 11:25:55 - INFO - __main__ -   Step: 3927, LR: 1.2094812061439953e-05, Loss: 586.0362548828125
2024-08-03T18:26:07.812051670Z 
 41%|████▏     | 3928/9500 [13:28:37<19:01:49, 12.30s/it]08/03/2024 11:26:07 - INFO - __main__ -   Step: 3928, LR: 1.2092641517752674e-05, Loss: 751.90234375
2024-08-03T18:26:19.822475033Z 
 41%|████▏     | 3929/9500 [13:28:49<18:53:40, 12.21s/it]08/03/2024 11:26:19 - INFO - __main__ -   Step: 3929, LR: 1.2090470974065396e-05, Loss: 605.0363159179688
2024-08-03T18:26:32.310875201Z 
 41%|████▏     | 3930/9500 [13:29:02<19:01:14, 12.29s/it]08/03/2024 11:26:32 - INFO - __main__ -   Step: 3930, LR: 1.2088300430378117e-05, Loss: 589.7301025390625
2024-08-03T18:26:44.600816981Z 
 41%|████▏     | 3931/9500 [13:29:14<19:00:56, 12.29s/it]08/03/2024 11:26:44 - INFO - __main__ -   Step: 3931, LR: 1.2086129886690839e-05, Loss: 603.666259765625
2024-08-03T18:26:56.686111470Z 
 41%|████▏     | 3932/9500 [13:29:26<18:54:57, 12.23s/it]08/03/2024 11:26:56 - INFO - __main__ -   Step: 3932, LR: 1.208395934300356e-05, Loss: 740.5390625
2024-08-03T18:27:09.082321001Z 
 41%|████▏     | 3933/9500 [13:29:39<18:59:22, 12.28s/it]08/03/2024 11:27:09 - INFO - __main__ -   Step: 3933, LR: 1.208178879931628e-05, Loss: 665.2939453125
2024-08-03T18:27:21.116041694Z 
 41%|████▏     | 3934/9500 [13:29:51<18:52:18, 12.21s/it]08/03/2024 11:27:21 - INFO - __main__ -   Step: 3934, LR: 1.2079618255629e-05, Loss: 469.12567138671875
2024-08-03T18:27:32.942227632Z 
 41%|████▏     | 3935/9500 [13:30:02<18:41:32, 12.09s/it]08/03/2024 11:27:32 - INFO - __main__ -   Step: 3935, LR: 1.2077447711941722e-05, Loss: 527.49462890625
2024-08-03T18:27:45.606605398Z 
 41%|████▏     | 3936/9500 [13:30:15<18:57:16, 12.26s/it]08/03/2024 11:27:45 - INFO - __main__ -   Step: 3936, LR: 1.2075277168254442e-05, Loss: 339.4130554199219
2024-08-03T18:27:57.694769059Z 
 41%|████▏     | 3937/9500 [13:30:27<18:52:10, 12.21s/it]08/03/2024 11:27:57 - INFO - __main__ -   Step: 3937, LR: 1.2073106624567163e-05, Loss: 518.286376953125
2024-08-03T18:28:09.796525298Z 
 41%|████▏     | 3938/9500 [13:30:39<18:48:56, 12.18s/it]08/03/2024 11:28:09 - INFO - __main__ -   Step: 3938, LR: 1.2070936080879885e-05, Loss: 556.1596069335938
2024-08-03T18:28:22.354212594Z 
 41%|████▏     | 3939/9500 [13:30:52<18:59:16, 12.29s/it]08/03/2024 11:28:22 - INFO - __main__ -   Step: 3939, LR: 1.2068765537192606e-05, Loss: 574.630859375
2024-08-03T18:28:34.493620550Z 
 41%|████▏     | 3940/9500 [13:31:04<18:54:49, 12.25s/it]08/03/2024 11:28:34 - INFO - __main__ -   Step: 3940, LR: 1.2066594993505328e-05, Loss: 514.0439453125
2024-08-03T18:28:46.544510902Z 
 41%|████▏     | 3941/9500 [13:31:16<18:49:11, 12.19s/it]08/03/2024 11:28:46 - INFO - __main__ -   Step: 3941, LR: 1.206442444981805e-05, Loss: 622.115234375
2024-08-03T18:28:59.049185786Z 
 41%|████▏     | 3942/9500 [13:31:28<18:57:46, 12.28s/it]08/03/2024 11:28:59 - INFO - __main__ -   Step: 3942, LR: 1.206225390613077e-05, Loss: 597.313232421875
2024-08-03T18:29:11.018295474Z 
 42%|████▏     | 3943/9500 [13:31:40<18:48:52, 12.19s/it]08/03/2024 11:29:11 - INFO - __main__ -   Step: 3943, LR: 1.206008336244349e-05, Loss: 691.7297973632812
2024-08-03T18:29:23.550232950Z 
 42%|████▏     | 3944/9500 [13:31:53<18:58:12, 12.29s/it]08/03/2024 11:29:23 - INFO - __main__ -   Step: 3944, LR: 1.2057912818756212e-05, Loss: 680.8764038085938
2024-08-03T18:29:36.024729682Z 
 42%|████▏     | 3945/9500 [13:32:05<19:03:05, 12.35s/it]08/03/2024 11:29:36 - INFO - __main__ -   Step: 3945, LR: 1.2055742275068934e-05, Loss: 553.8750610351562
2024-08-03T18:29:48.289662543Z 
 42%|████▏     | 3946/9500 [13:32:18<19:00:36, 12.32s/it]08/03/2024 11:29:48 - INFO - __main__ -   Step: 3946, LR: 1.2053571731381655e-05, Loss: 451.328857421875
2024-08-03T18:30:00.368159803Z 
 42%|████▏     | 3947/9500 [13:32:30<18:53:38, 12.25s/it]08/03/2024 11:30:00 - INFO - __main__ -   Step: 3947, LR: 1.2051401187694375e-05, Loss: 593.3148193359375
2024-08-03T18:30:12.896425477Z 
 42%|████▏     | 3948/9500 [13:32:42<19:01:11, 12.33s/it]08/03/2024 11:30:12 - INFO - __main__ -   Step: 3948, LR: 1.2049230644007095e-05, Loss: 601.4417724609375
2024-08-03T18:30:25.147912715Z 
 42%|████▏     | 3949/9500 [13:32:55<18:58:43, 12.31s/it]08/03/2024 11:30:25 - INFO - __main__ -   Step: 3949, LR: 1.2047060100319817e-05, Loss: 641.77734375
2024-08-03T18:30:37.305494964Z 
 42%|████▏     | 3950/9500 [13:33:07<18:54:20, 12.26s/it]08/03/2024 11:30:37 - INFO - __main__ -   Step: 3950, LR: 1.2044889556632538e-05, Loss: 473.6003112792969
2024-08-03T18:30:49.200975914Z 
 42%|████▏     | 3951/9500 [13:33:19<18:43:55, 12.15s/it]08/03/2024 11:30:49 - INFO - __main__ -   Step: 3951, LR: 1.2042719012945258e-05, Loss: 589.3865966796875
2024-08-03T18:31:02.339791128Z 
 42%|████▏     | 3952/9500 [13:33:32<19:11:05, 12.45s/it]08/03/2024 11:31:02 - INFO - __main__ -   Step: 3952, LR: 1.204054846925798e-05, Loss: 695.0576782226562
2024-08-03T18:31:14.678585861Z 
 42%|████▏     | 3953/9500 [13:33:44<19:07:49, 12.42s/it]08/03/2024 11:31:14 - INFO - __main__ -   Step: 3953, LR: 1.2038377925570701e-05, Loss: 613.1275634765625
2024-08-03T18:31:26.843621120Z 
 42%|████▏     | 3954/9500 [13:33:56<19:00:40, 12.34s/it]08/03/2024 11:31:26 - INFO - __main__ -   Step: 3954, LR: 1.2036207381883423e-05, Loss: 542.9547119140625
2024-08-03T18:31:39.586451475Z 
 42%|████▏     | 3955/9500 [13:34:09<19:11:37, 12.46s/it]08/03/2024 11:31:39 - INFO - __main__ -   Step: 3955, LR: 1.2034036838196144e-05, Loss: 591.8286743164062
2024-08-03T18:31:51.692700852Z 
 42%|████▏     | 3956/9500 [13:34:21<19:01:34, 12.35s/it]08/03/2024 11:31:51 - INFO - __main__ -   Step: 3956, LR: 1.2031866294508864e-05, Loss: 408.5957336425781
2024-08-03T18:32:03.286807863Z 
 42%|████▏     | 3957/9500 [13:34:33<18:40:17, 12.13s/it]08/03/2024 11:32:03 - INFO - __main__ -   Step: 3957, LR: 1.2029695750821586e-05, Loss: 439.29742431640625
2024-08-03T18:32:15.573274719Z 
 42%|████▏     | 3958/9500 [13:34:45<18:44:31, 12.17s/it]08/03/2024 11:32:15 - INFO - __main__ -   Step: 3958, LR: 1.2027525207134307e-05, Loss: 537.77587890625
2024-08-03T18:32:27.817561570Z 
 42%|████▏     | 3959/9500 [13:34:57<18:46:14, 12.20s/it]08/03/2024 11:32:27 - INFO - __main__ -   Step: 3959, LR: 1.2025354663447029e-05, Loss: 593.3568115234375
2024-08-03T18:32:40.038525788Z 
 42%|████▏     | 3960/9500 [13:35:09<18:46:45, 12.20s/it]08/03/2024 11:32:40 - INFO - __main__ -   Step: 3960, LR: 1.202318411975975e-05, Loss: 574.5534057617188
2024-08-03T18:32:52.749055186Z 
 42%|████▏     | 3961/9500 [13:35:22<19:00:36, 12.36s/it]08/03/2024 11:32:52 - INFO - __main__ -   Step: 3961, LR: 1.202101357607247e-05, Loss: 585.0435791015625
2024-08-03T18:33:05.064485331Z 
 42%|████▏     | 3962/9500 [13:35:35<18:59:17, 12.34s/it]08/03/2024 11:33:05 - INFO - __main__ -   Step: 3962, LR: 1.201884303238519e-05, Loss: 729.8890991210938
2024-08-03T18:33:17.381222052Z 
 42%|████▏     | 3963/9500 [13:35:47<18:58:21, 12.34s/it]08/03/2024 11:33:17 - INFO - __main__ -   Step: 3963, LR: 1.2016672488697912e-05, Loss: 554.603759765625
2024-08-03T18:33:30.108781215Z 
 42%|████▏     | 3964/9500 [13:36:00<19:09:00, 12.45s/it]08/03/2024 11:33:30 - INFO - __main__ -   Step: 3964, LR: 1.2014501945010633e-05, Loss: 790.7055053710938
2024-08-03T18:33:42.204041100Z 
 42%|████▏     | 3965/9500 [13:36:12<18:58:53, 12.35s/it]08/03/2024 11:33:42 - INFO - __main__ -   Step: 3965, LR: 1.2012331401323353e-05, Loss: 522.4818725585938
2024-08-03T18:33:54.588242847Z 
 42%|████▏     | 3966/9500 [13:36:24<18:59:44, 12.36s/it]08/03/2024 11:33:54 - INFO - __main__ -   Step: 3966, LR: 1.2010160857636075e-05, Loss: 504.7795715332031
2024-08-03T18:34:06.988167506Z 
 42%|████▏     | 3967/9500 [13:36:36<19:00:43, 12.37s/it]08/03/2024 11:34:06 - INFO - __main__ -   Step: 3967, LR: 1.2007990313948796e-05, Loss: 484.04193115234375
2024-08-03T18:34:19.213318019Z 
 42%|████▏     | 3968/9500 [13:36:49<18:56:30, 12.33s/it]08/03/2024 11:34:19 - INFO - __main__ -   Step: 3968, LR: 1.2005819770261518e-05, Loss: 662.4371337890625
2024-08-03T18:34:31.353607815Z 
 42%|████▏     | 3969/9500 [13:37:01<18:51:09, 12.27s/it]08/03/2024 11:34:31 - INFO - __main__ -   Step: 3969, LR: 1.200364922657424e-05, Loss: 502.4678649902344
2024-08-03T18:34:43.986280044Z 
 42%|████▏     | 3970/9500 [13:37:13<19:00:57, 12.38s/it]08/03/2024 11:34:43 - INFO - __main__ -   Step: 3970, LR: 1.200147868288696e-05, Loss: 606.4876708984375
2024-08-03T18:34:56.224558415Z 
 42%|████▏     | 3971/9500 [13:37:26<18:56:51, 12.34s/it]08/03/2024 11:34:56 - INFO - __main__ -   Step: 3971, LR: 1.199930813919968e-05, Loss: 665.8087768554688
2024-08-03T18:35:08.286392566Z 
 42%|████▏     | 3972/9500 [13:37:38<18:49:02, 12.25s/it]08/03/2024 11:35:08 - INFO - __main__ -   Step: 3972, LR: 1.1997137595512402e-05, Loss: 668.830078125
2024-08-03T18:35:21.003495522Z 
 42%|████▏     | 3973/9500 [13:37:50<19:01:37, 12.39s/it]08/03/2024 11:35:21 - INFO - __main__ -   Step: 3973, LR: 1.1994967051825124e-05, Loss: 514.6937866210938
2024-08-03T18:35:33.055584848Z 
 42%|████▏     | 3974/9500 [13:38:02<18:51:59, 12.29s/it]08/03/2024 11:35:33 - INFO - __main__ -   Step: 3974, LR: 1.1992796508137846e-05, Loss: 536.5103149414062
2024-08-03T18:35:45.305815565Z 
 42%|████▏     | 3975/9500 [13:38:15<18:50:39, 12.28s/it]08/03/2024 11:35:45 - INFO - __main__ -   Step: 3975, LR: 1.1990625964450567e-05, Loss: 698.5647583007812
2024-08-03T18:35:57.978818661Z 
 42%|████▏     | 3976/9500 [13:38:27<19:01:20, 12.40s/it]08/03/2024 11:35:57 - INFO - __main__ -   Step: 3976, LR: 1.1988455420763285e-05, Loss: 553.8171997070312
2024-08-03T18:36:10.123607631Z 
 42%|████▏     | 3977/9500 [13:38:40<18:54:10, 12.32s/it]08/03/2024 11:36:10 - INFO - __main__ -   Step: 3977, LR: 1.1986284877076007e-05, Loss: 487.3940734863281
2024-08-03T18:36:22.491698575Z 
 42%|████▏     | 3978/9500 [13:38:52<18:55:15, 12.34s/it]08/03/2024 11:36:22 - INFO - __main__ -   Step: 3978, LR: 1.1984114333388728e-05, Loss: 721.6119384765625
2024-08-03T18:36:35.104765108Z 
 42%|████▏     | 3979/9500 [13:39:05<19:02:43, 12.42s/it]08/03/2024 11:36:35 - INFO - __main__ -   Step: 3979, LR: 1.1981943789701448e-05, Loss: 704.5257568359375
2024-08-03T18:36:47.572441168Z 
 42%|████▏     | 3980/9500 [13:39:17<19:03:52, 12.43s/it]08/03/2024 11:36:47 - INFO - __main__ -   Step: 3980, LR: 1.197977324601417e-05, Loss: 684.843017578125
2024-08-03T18:36:59.876984987Z 
 42%|████▏     | 3981/9500 [13:39:29<19:00:06, 12.39s/it]08/03/2024 11:36:59 - INFO - __main__ -   Step: 3981, LR: 1.1977602702326891e-05, Loss: 548.2772216796875
2024-08-03T18:37:12.232295838Z 
 42%|████▏     | 3982/9500 [13:39:42<18:58:48, 12.38s/it]08/03/2024 11:37:12 - INFO - __main__ -   Step: 3982, LR: 1.1975432158639613e-05, Loss: 586.8887939453125
2024-08-03T18:37:24.619910251Z 
 42%|████▏     | 3983/9500 [13:39:54<18:58:43, 12.38s/it]08/03/2024 11:37:24 - INFO - __main__ -   Step: 3983, LR: 1.1973261614952334e-05, Loss: 763.946533203125
2024-08-03T18:37:36.535662533Z 
 42%|████▏     | 3984/9500 [13:40:06<18:45:36, 12.24s/it]08/03/2024 11:37:36 - INFO - __main__ -   Step: 3984, LR: 1.1971091071265056e-05, Loss: 489.3587646484375
2024-08-03T18:37:48.990892497Z 
 42%|████▏     | 3985/9500 [13:40:18<18:51:14, 12.31s/it]08/03/2024 11:37:48 - INFO - __main__ -   Step: 3985, LR: 1.1968920527577776e-05, Loss: 586.7296752929688
2024-08-03T18:38:01.182329423Z 
 42%|████▏     | 3986/9500 [13:40:31<18:47:50, 12.27s/it]08/03/2024 11:38:01 - INFO - __main__ -   Step: 3986, LR: 1.1966749983890497e-05, Loss: 524.5828857421875
2024-08-03T18:38:12.966146590Z 
 42%|████▏     | 3987/9500 [13:40:42<18:34:10, 12.13s/it]08/03/2024 11:38:12 - INFO - __main__ -   Step: 3987, LR: 1.1964579440203219e-05, Loss: 529.4133911132812
2024-08-03T18:38:25.401187221Z 
 42%|████▏     | 3988/9500 [13:40:55<18:42:29, 12.22s/it]08/03/2024 11:38:25 - INFO - __main__ -   Step: 3988, LR: 1.196240889651594e-05, Loss: 476.4404296875
2024-08-03T18:38:37.661788303Z 
 42%|████▏     | 3989/9500 [13:41:07<18:43:26, 12.23s/it]08/03/2024 11:38:37 - INFO - __main__ -   Step: 3989, LR: 1.1960238352828659e-05, Loss: 686.236083984375
2024-08-03T18:38:50.050438634Z 
 42%|████▏     | 3990/9500 [13:41:19<18:47:34, 12.28s/it]08/03/2024 11:38:50 - INFO - __main__ -   Step: 3990, LR: 1.195806780914138e-05, Loss: 620.7542114257812
2024-08-03T18:39:02.649878135Z 
 42%|████▏     | 3991/9500 [13:41:32<18:56:12, 12.37s/it]08/03/2024 11:39:02 - INFO - __main__ -   Step: 3991, LR: 1.1955897265454102e-05, Loss: 546.6099243164062
2024-08-03T18:39:14.411215262Z 
 42%|████▏     | 3992/9500 [13:41:44<18:39:06, 12.19s/it]08/03/2024 11:39:14 - INFO - __main__ -   Step: 3992, LR: 1.1953726721766823e-05, Loss: 462.5855407714844
2024-08-03T18:39:26.530335234Z 
 42%|████▏     | 3993/9500 [13:41:56<18:36:56, 12.17s/it]08/03/2024 11:39:26 - INFO - __main__ -   Step: 3993, LR: 1.1951556178079545e-05, Loss: 554.35546875
2024-08-03T18:39:38.605539414Z 
 42%|████▏     | 3994/9500 [13:42:08<18:34:08, 12.14s/it]08/03/2024 11:39:38 - INFO - __main__ -   Step: 3994, LR: 1.1949385634392265e-05, Loss: 656.2916259765625
2024-08-03T18:39:50.890506718Z 
 42%|████▏     | 3995/9500 [13:42:20<18:37:54, 12.18s/it]08/03/2024 11:39:50 - INFO - __main__ -   Step: 3995, LR: 1.1947215090704986e-05, Loss: 538.0840454101562
2024-08-03T18:40:03.567841680Z 
 42%|████▏     | 3996/9500 [13:42:33<18:51:15, 12.33s/it]08/03/2024 11:40:03 - INFO - __main__ -   Step: 3996, LR: 1.1945044547017708e-05, Loss: 706.8521728515625
2024-08-03T18:40:15.889065239Z 
 42%|████▏     | 3997/9500 [13:42:45<18:50:45, 12.33s/it]08/03/2024 11:40:15 - INFO - __main__ -   Step: 3997, LR: 1.194287400333043e-05, Loss: 475.40362548828125
2024-08-03T18:40:28.398395483Z 
 42%|████▏     | 3998/9500 [13:42:58<18:55:31, 12.38s/it]08/03/2024 11:40:28 - INFO - __main__ -   Step: 3998, LR: 1.1940703459643151e-05, Loss: 532.7003173828125
2024-08-03T18:40:40.505144884Z 
 42%|████▏     | 3999/9500 [13:43:10<18:47:43, 12.30s/it]08/03/2024 11:40:40 - INFO - __main__ -   Step: 3999, LR: 1.1938532915955871e-05, Loss: 586.26025390625
2024-08-03T18:40:52.569715035Z 
 42%|████▏     | 4000/9500 [13:43:22<18:41:02, 12.23s/it]08/03/2024 11:40:52 - INFO - __main__ -   Step: 4000, LR: 1.1936362372268593e-05, Loss: 762.5756225585938
2024-08-03T18:41:04.939645702Z 
 42%|████▏     | 4001/9500 [13:43:34<18:44:41, 12.27s/it]08/03/2024 11:41:04 - INFO - __main__ -   Step: 4001, LR: 1.1934191828581314e-05, Loss: 482.21075439453125
2024-08-03T18:41:17.154492295Z 
 42%|████▏     | 4002/9500 [13:43:47<18:42:55, 12.25s/it]08/03/2024 11:41:17 - INFO - __main__ -   Step: 4002, LR: 1.1932021284894036e-05, Loss: 663.3162841796875
2024-08-03T18:41:29.205271255Z 
 42%|████▏     | 4003/9500 [13:43:59<18:37:07, 12.19s/it]08/03/2024 11:41:29 - INFO - __main__ -   Step: 4003, LR: 1.1929850741206754e-05, Loss: 584.8450317382812
2024-08-03T18:41:41.713615447Z 
 42%|████▏     | 4004/9500 [13:44:11<18:45:34, 12.29s/it]08/03/2024 11:41:41 - INFO - __main__ -   Step: 4004, LR: 1.1927680197519475e-05, Loss: 667.5945434570312
2024-08-03T18:41:53.626482961Z 
 42%|████▏     | 4005/9500 [13:44:23<18:35:04, 12.18s/it]08/03/2024 11:41:53 - INFO - __main__ -   Step: 4005, LR: 1.1925509653832197e-05, Loss: 710.69482421875
2024-08-03T18:42:05.768853977Z 
 42%|████▏     | 4006/9500 [13:44:35<18:33:57, 12.17s/it]08/03/2024 11:42:05 - INFO - __main__ -   Step: 4006, LR: 1.1923339110144918e-05, Loss: 574.3717041015625
2024-08-03T18:42:18.175875938Z 
 42%|████▏     | 4007/9500 [13:44:48<18:40:22, 12.24s/it]08/03/2024 11:42:18 - INFO - __main__ -   Step: 4007, LR: 1.192116856645764e-05, Loss: 570.6718139648438
2024-08-03T18:42:30.441296519Z 
 42%|████▏     | 4008/9500 [13:45:00<18:40:55, 12.25s/it]08/03/2024 11:42:30 - INFO - __main__ -   Step: 4008, LR: 1.191899802277036e-05, Loss: 602.3161010742188
2024-08-03T18:42:42.634889589Z 
 42%|████▏     | 4009/9500 [13:45:12<18:39:17, 12.23s/it]08/03/2024 11:42:42 - INFO - __main__ -   Step: 4009, LR: 1.1916827479083081e-05, Loss: 639.4091796875
2024-08-03T18:42:55.215641291Z 
 42%|████▏     | 4010/9500 [13:45:25<18:48:42, 12.34s/it]08/03/2024 11:42:55 - INFO - __main__ -   Step: 4010, LR: 1.1914656935395803e-05, Loss: 654.5567626953125
2024-08-03T18:43:07.457874935Z 
 42%|████▏     | 4011/9500 [13:45:37<18:45:55, 12.31s/it]08/03/2024 11:43:07 - INFO - __main__ -   Step: 4011, LR: 1.1912486391708525e-05, Loss: 497.2110900878906
2024-08-03T18:43:19.744634888Z 
 42%|████▏     | 4012/9500 [13:45:49<18:45:09, 12.30s/it]08/03/2024 11:43:19 - INFO - __main__ -   Step: 4012, LR: 1.1910315848021246e-05, Loss: 669.3612060546875
2024-08-03T18:43:32.393616104Z 
 42%|████▏     | 4013/9500 [13:46:02<18:54:29, 12.41s/it]08/03/2024 11:43:32 - INFO - __main__ -   Step: 4013, LR: 1.1908145304333966e-05, Loss: 853.1321411132812
2024-08-03T18:43:44.479755431Z 
 42%|████▏     | 4014/9500 [13:46:14<18:45:31, 12.31s/it]08/03/2024 11:43:44 - INFO - __main__ -   Step: 4014, LR: 1.1905974760646688e-05, Loss: 528.4634399414062
2024-08-03T18:43:56.493484063Z 
 42%|████▏     | 4015/9500 [13:46:26<18:37:11, 12.22s/it]08/03/2024 11:43:56 - INFO - __main__ -   Step: 4015, LR: 1.1903804216959409e-05, Loss: 439.120361328125
2024-08-03T18:44:08.821646172Z 
 42%|████▏     | 4016/9500 [13:46:38<18:39:56, 12.25s/it]08/03/2024 11:44:08 - INFO - __main__ -   Step: 4016, LR: 1.190163367327213e-05, Loss: 485.9546813964844
2024-08-03T18:44:20.955771937Z 
 42%|████▏     | 4017/9500 [13:46:50<18:36:28, 12.22s/it]08/03/2024 11:44:20 - INFO - __main__ -   Step: 4017, LR: 1.1899463129584849e-05, Loss: 768.8563232421875
2024-08-03T18:44:33.154834661Z 
 42%|████▏     | 4018/9500 [13:47:03<18:35:45, 12.21s/it]08/03/2024 11:44:33 - INFO - __main__ -   Step: 4018, LR: 1.189729258589757e-05, Loss: 565.7193603515625
2024-08-03T18:44:45.805320325Z 
 42%|████▏     | 4019/9500 [13:47:15<18:47:34, 12.34s/it]08/03/2024 11:44:45 - INFO - __main__ -   Step: 4019, LR: 1.1895122042210292e-05, Loss: 595.642822265625
2024-08-03T18:44:58.251376106Z 
 42%|████▏     | 4020/9500 [13:47:28<18:50:09, 12.37s/it]08/03/2024 11:44:58 - INFO - __main__ -   Step: 4020, LR: 1.1892951498523014e-05, Loss: 586.6544189453125
2024-08-03T18:45:10.339492799Z 
 42%|████▏     | 4021/9500 [13:47:40<18:42:08, 12.29s/it]08/03/2024 11:45:10 - INFO - __main__ -   Step: 4021, LR: 1.1890780954835735e-05, Loss: 590.71044921875
2024-08-03T18:45:23.026188823Z 
 42%|████▏     | 4022/9500 [13:47:52<18:52:51, 12.41s/it]08/03/2024 11:45:23 - INFO - __main__ -   Step: 4022, LR: 1.1888610411148455e-05, Loss: 590.7487182617188
2024-08-03T18:45:35.115058547Z 
 42%|████▏     | 4023/9500 [13:48:05<18:43:54, 12.31s/it]08/03/2024 11:45:35 - INFO - __main__ -   Step: 4023, LR: 1.1886439867461177e-05, Loss: 709.3457641601562
2024-08-03T18:45:47.292643120Z 
 42%|████▏     | 4024/9500 [13:48:17<18:40:01, 12.27s/it]08/03/2024 11:45:47 - INFO - __main__ -   Step: 4024, LR: 1.1884269323773898e-05, Loss: 628.177978515625
2024-08-03T18:46:00.005536735Z 
 42%|████▏     | 4025/9500 [13:48:29<18:51:52, 12.40s/it]08/03/2024 11:46:00 - INFO - __main__ -   Step: 4025, LR: 1.188209878008662e-05, Loss: 642.207275390625
2024-08-03T18:46:12.213148430Z 
 42%|████▏     | 4026/9500 [13:48:42<18:46:17, 12.35s/it]08/03/2024 11:46:12 - INFO - __main__ -   Step: 4026, LR: 1.1879928236399341e-05, Loss: 705.0055541992188
2024-08-03T18:46:24.311905959Z 
 42%|████▏     | 4027/9500 [13:48:54<18:39:20, 12.27s/it]08/03/2024 11:46:24 - INFO - __main__ -   Step: 4027, LR: 1.1877757692712061e-05, Loss: 605.857666015625
2024-08-03T18:46:36.803378248Z 
 42%|████▏     | 4028/9500 [13:49:06<18:45:09, 12.34s/it]08/03/2024 11:46:36 - INFO - __main__ -   Step: 4028, LR: 1.1875587149024783e-05, Loss: 493.4255676269531
2024-08-03T18:46:48.774640261Z 
 42%|████▏     | 4029/9500 [13:49:18<18:34:56, 12.23s/it]08/03/2024 11:46:48 - INFO - __main__ -   Step: 4029, LR: 1.1873416605337504e-05, Loss: 522.1118774414062
2024-08-03T18:47:00.916103610Z 
 42%|████▏     | 4030/9500 [13:49:30<18:32:22, 12.20s/it]08/03/2024 11:47:00 - INFO - __main__ -   Step: 4030, LR: 1.1871246061650226e-05, Loss: 592.6862182617188
2024-08-03T18:47:14.067583807Z 
 42%|████▏     | 4031/9500 [13:49:44<18:58:09, 12.49s/it]08/03/2024 11:47:14 - INFO - __main__ -   Step: 4031, LR: 1.1869075517962944e-05, Loss: 509.21234130859375
2024-08-03T18:47:26.258515429Z 
 42%|████▏     | 4032/9500 [13:49:56<18:49:52, 12.40s/it]08/03/2024 11:47:26 - INFO - __main__ -   Step: 4032, LR: 1.1866904974275665e-05, Loss: 847.3096923828125
2024-08-03T18:47:38.596990292Z 
 42%|████▏     | 4033/9500 [13:50:08<18:48:01, 12.38s/it]08/03/2024 11:47:38 - INFO - __main__ -   Step: 4033, LR: 1.1864734430588387e-05, Loss: 544.24169921875
2024-08-03T18:47:51.276655070Z 
 42%|████▏     | 4034/9500 [13:50:21<18:56:01, 12.47s/it]08/03/2024 11:47:51 - INFO - __main__ -   Step: 4034, LR: 1.1862563886901109e-05, Loss: 699.5662231445312
2024-08-03T18:48:03.486144028Z 
 42%|████▏     | 4035/9500 [13:50:33<18:48:40, 12.39s/it]08/03/2024 11:48:03 - INFO - __main__ -   Step: 4035, LR: 1.186039334321383e-05, Loss: 631.712158203125
2024-08-03T18:48:15.649297072Z 
 42%|████▏     | 4036/9500 [13:50:45<18:42:14, 12.32s/it]08/03/2024 11:48:15 - INFO - __main__ -   Step: 4036, LR: 1.185822279952655e-05, Loss: 808.8463745117188
2024-08-03T18:48:27.773707138Z 
 42%|████▏     | 4037/9500 [13:50:57<18:36:35, 12.26s/it]08/03/2024 11:48:27 - INFO - __main__ -   Step: 4037, LR: 1.1856052255839272e-05, Loss: 604.9451904296875
2024-08-03T18:48:40.034549904Z 
 43%|████▎     | 4038/9500 [13:51:09<18:36:19, 12.26s/it]08/03/2024 11:48:40 - INFO - __main__ -   Step: 4038, LR: 1.1853881712151993e-05, Loss: 390.1885070800781
2024-08-03T18:48:52.364765728Z 
 43%|████▎     | 4039/9500 [13:51:22<18:37:57, 12.28s/it]08/03/2024 11:48:52 - INFO - __main__ -   Step: 4039, LR: 1.1851711168464715e-05, Loss: 710.8484497070312
2024-08-03T18:49:04.584681856Z 
 43%|████▎     | 4040/9500 [13:51:34<18:36:01, 12.26s/it]08/03/2024 11:49:04 - INFO - __main__ -   Step: 4040, LR: 1.1849540624777436e-05, Loss: 620.21240234375
2024-08-03T18:49:17.235821312Z 
 43%|████▎     | 4041/9500 [13:51:47<18:46:23, 12.38s/it]08/03/2024 11:49:17 - INFO - __main__ -   Step: 4041, LR: 1.1847370081090158e-05, Loss: 590.5960083007812
2024-08-03T18:49:29.768756113Z 
 43%|████▎     | 4042/9500 [13:51:59<18:50:19, 12.43s/it]08/03/2024 11:49:29 - INFO - __main__ -   Step: 4042, LR: 1.1845199537402878e-05, Loss: 607.0171508789062
2024-08-03T18:49:42.011718315Z 
 43%|████▎     | 4043/9500 [13:52:11<18:45:09, 12.37s/it]08/03/2024 11:49:42 - INFO - __main__ -   Step: 4043, LR: 1.18430289937156e-05, Loss: 552.8399658203125
2024-08-03T18:49:54.366689741Z 
 43%|████▎     | 4044/9500 [13:52:24<18:44:30, 12.37s/it]08/03/2024 11:49:54 - INFO - __main__ -   Step: 4044, LR: 1.184085845002832e-05, Loss: 490.5936584472656
2024-08-03T18:50:06.806763287Z 
 43%|████▎     | 4045/9500 [13:52:36<18:46:18, 12.39s/it]08/03/2024 11:50:06 - INFO - __main__ -   Step: 4045, LR: 1.1838687906341039e-05, Loss: 688.993408203125
2024-08-03T18:50:18.729562591Z 
 43%|████▎     | 4046/9500 [13:52:48<18:33:24, 12.25s/it]08/03/2024 11:50:18 - INFO - __main__ -   Step: 4046, LR: 1.183651736265376e-05, Loss: 573.753173828125
2024-08-03T18:50:31.249757140Z 
 43%|████▎     | 4047/9500 [13:53:01<18:40:36, 12.33s/it]08/03/2024 11:50:31 - INFO - __main__ -   Step: 4047, LR: 1.1834346818966482e-05, Loss: 509.4790344238281
2024-08-03T18:50:43.619881668Z 
 43%|████▎     | 4048/9500 [13:53:13<18:41:29, 12.34s/it]08/03/2024 11:50:43 - INFO - __main__ -   Step: 4048, LR: 1.1832176275279204e-05, Loss: 590.420654296875
2024-08-03T18:50:55.434913538Z 
 43%|████▎     | 4049/9500 [13:53:25<18:26:55, 12.18s/it]08/03/2024 11:50:55 - INFO - __main__ -   Step: 4049, LR: 1.1830005731591925e-05, Loss: 420.59039306640625
2024-08-03T18:51:08.075783296Z 
 43%|████▎     | 4050/9500 [13:53:38<18:39:10, 12.32s/it]08/03/2024 11:51:08 - INFO - __main__ -   Step: 4050, LR: 1.1827835187904647e-05, Loss: 577.5427856445312
2024-08-03T18:51:20.440858315Z 
 43%|████▎     | 4051/9500 [13:53:50<18:40:09, 12.33s/it]08/03/2024 11:51:20 - INFO - __main__ -   Step: 4051, LR: 1.1825664644217367e-05, Loss: 799.3941650390625
2024-08-03T18:51:32.836855596Z 
 43%|████▎     | 4052/9500 [13:54:02<18:41:37, 12.35s/it]08/03/2024 11:51:32 - INFO - __main__ -   Step: 4052, LR: 1.1823494100530088e-05, Loss: 727.647705078125
2024-08-03T18:51:45.575931832Z 
 43%|████▎     | 4053/9500 [13:54:15<18:51:56, 12.47s/it]08/03/2024 11:51:45 - INFO - __main__ -   Step: 4053, LR: 1.182132355684281e-05, Loss: 626.5098876953125
2024-08-03T18:51:57.947100205Z 
 43%|████▎     | 4054/9500 [13:54:27<18:49:05, 12.44s/it]08/03/2024 11:51:57 - INFO - __main__ -   Step: 4054, LR: 1.1819153013155531e-05, Loss: 595.3195190429688
2024-08-03T18:52:10.108592867Z 
 43%|████▎     | 4055/9500 [13:54:40<18:41:18, 12.36s/it]08/03/2024 11:52:10 - INFO - __main__ -   Step: 4055, LR: 1.1816982469468253e-05, Loss: 548.4647827148438
2024-08-03T18:52:22.388358791Z 
 43%|████▎     | 4056/9500 [13:54:52<18:39:01, 12.33s/it]08/03/2024 11:52:22 - INFO - __main__ -   Step: 4056, LR: 1.1814811925780973e-05, Loss: 567.5589599609375
2024-08-03T18:52:34.690900490Z 
 43%|████▎     | 4057/9500 [13:55:04<18:37:59, 12.32s/it]08/03/2024 11:52:34 - INFO - __main__ -   Step: 4057, LR: 1.1812641382093694e-05, Loss: 610.7801513671875
2024-08-03T18:52:46.905019308Z 
 43%|████▎     | 4058/9500 [13:55:16<18:34:47, 12.29s/it]08/03/2024 11:52:46 - INFO - __main__ -   Step: 4058, LR: 1.1810470838406416e-05, Loss: 613.1824340820312
2024-08-03T18:52:59.545337652Z 
 43%|████▎     | 4059/9500 [13:55:29<18:44:05, 12.40s/it]08/03/2024 11:52:59 - INFO - __main__ -   Step: 4059, LR: 1.1808300294719136e-05, Loss: 589.93896484375
2024-08-03T18:53:12.080368791Z 
 43%|████▎     | 4060/9500 [13:55:42<18:47:40, 12.44s/it]08/03/2024 11:53:12 - INFO - __main__ -   Step: 4060, LR: 1.1806129751031856e-05, Loss: 721.674072265625
2024-08-03T18:53:24.115176188Z 
 43%|████▎     | 4061/9500 [13:55:54<18:36:30, 12.32s/it]08/03/2024 11:53:24 - INFO - __main__ -   Step: 4061, LR: 1.1803959207344577e-05, Loss: 646.0115356445312
2024-08-03T18:53:36.522012345Z 
 43%|████▎     | 4062/9500 [13:56:06<18:38:45, 12.34s/it]08/03/2024 11:53:36 - INFO - __main__ -   Step: 4062, LR: 1.1801788663657299e-05, Loss: 635.7259521484375
2024-08-03T18:53:48.434074950Z 
 43%|████▎     | 4063/9500 [13:56:18<18:26:48, 12.21s/it]08/03/2024 11:53:48 - INFO - __main__ -   Step: 4063, LR: 1.179961811997002e-05, Loss: 588.1465454101562
2024-08-03T18:54:00.277092094Z 
 43%|████▎     | 4064/9500 [13:56:30<18:16:30, 12.10s/it]08/03/2024 11:54:00 - INFO - __main__ -   Step: 4064, LR: 1.1797447576282742e-05, Loss: 455.10906982421875
2024-08-03T18:54:12.653700438Z 
 43%|████▎     | 4065/9500 [13:56:42<18:23:45, 12.19s/it]08/03/2024 11:54:12 - INFO - __main__ -   Step: 4065, LR: 1.1795277032595462e-05, Loss: 540.5696411132812
2024-08-03T18:54:25.006697188Z 
 43%|████▎     | 4066/9500 [13:56:54<18:28:05, 12.24s/it]08/03/2024 11:54:25 - INFO - __main__ -   Step: 4066, LR: 1.1793106488908183e-05, Loss: 685.0111694335938
2024-08-03T18:54:37.139336651Z 
 43%|████▎     | 4067/9500 [13:57:07<18:25:07, 12.20s/it]08/03/2024 11:54:37 - INFO - __main__ -   Step: 4067, LR: 1.1790935945220905e-05, Loss: 463.0715637207031
2024-08-03T18:54:49.466154636Z 
 43%|████▎     | 4068/9500 [13:57:19<18:28:15, 12.24s/it]08/03/2024 11:54:49 - INFO - __main__ -   Step: 4068, LR: 1.1788765401533626e-05, Loss: 612.5155029296875
2024-08-03T18:55:01.486286877Z 
 43%|████▎     | 4069/9500 [13:57:31<18:22:02, 12.18s/it]08/03/2024 11:55:01 - INFO - __main__ -   Step: 4069, LR: 1.1786594857846348e-05, Loss: 488.43463134765625
2024-08-03T18:55:13.718872814Z 
 43%|████▎     | 4070/9500 [13:57:43<18:23:24, 12.19s/it]08/03/2024 11:55:13 - INFO - __main__ -   Step: 4070, LR: 1.1784424314159068e-05, Loss: 513.206787109375
2024-08-03T18:55:26.283799809Z 
 43%|████▎     | 4071/9500 [13:57:56<18:33:18, 12.30s/it]08/03/2024 11:55:26 - INFO - __main__ -   Step: 4071, LR: 1.178225377047179e-05, Loss: 522.850830078125
2024-08-03T18:55:38.237180614Z 
 43%|████▎     | 4072/9500 [13:58:08<18:23:35, 12.20s/it]08/03/2024 11:55:38 - INFO - __main__ -   Step: 4072, LR: 1.1780083226784511e-05, Loss: 482.73846435546875
2024-08-03T18:55:50.473897862Z 
 43%|████▎     | 4073/9500 [13:58:20<18:24:23, 12.21s/it]08/03/2024 11:55:50 - INFO - __main__ -   Step: 4073, LR: 1.177791268309723e-05, Loss: 618.5863037109375
2024-08-03T18:56:02.788569065Z 
 43%|████▎     | 4074/9500 [13:58:32<18:27:02, 12.24s/it]08/03/2024 11:56:02 - INFO - __main__ -   Step: 4074, LR: 1.177574213940995e-05, Loss: 580.46533203125
2024-08-03T18:56:14.852362726Z 
 43%|████▎     | 4075/9500 [13:58:44<18:22:00, 12.19s/it]08/03/2024 11:56:14 - INFO - __main__ -   Step: 4075, LR: 1.1773571595722672e-05, Loss: 695.8219604492188
2024-08-03T18:56:26.863935915Z 
 43%|████▎     | 4076/9500 [13:58:56<18:17:01, 12.14s/it]08/03/2024 11:56:26 - INFO - __main__ -   Step: 4076, LR: 1.1771401052035394e-05, Loss: 522.3516845703125
2024-08-03T18:56:39.338055682Z 
 43%|████▎     | 4077/9500 [13:59:09<18:26:00, 12.24s/it]08/03/2024 11:56:39 - INFO - __main__ -   Step: 4077, LR: 1.1769230508348115e-05, Loss: 521.8873291015625
2024-08-03T18:56:51.440903168Z 
 43%|████▎     | 4078/9500 [13:59:21<18:22:10, 12.20s/it]08/03/2024 11:56:51 - INFO - __main__ -   Step: 4078, LR: 1.1767059964660837e-05, Loss: 603.2481689453125
2024-08-03T18:57:03.677992676Z 
 43%|████▎     | 4079/9500 [13:59:33<18:23:04, 12.21s/it]08/03/2024 11:57:03 - INFO - __main__ -   Step: 4079, LR: 1.1764889420973557e-05, Loss: 479.9330749511719
2024-08-03T18:57:15.647183187Z 
 43%|████▎     | 4080/9500 [13:59:45<18:16:21, 12.14s/it]08/03/2024 11:57:15 - INFO - __main__ -   Step: 4080, LR: 1.1762718877286278e-05, Loss: 477.36627197265625
2024-08-03T18:57:28.236817907Z 
 43%|████▎     | 4081/9500 [13:59:58<18:28:25, 12.27s/it]08/03/2024 11:57:28 - INFO - __main__ -   Step: 4081, LR: 1.1760548333599e-05, Loss: 646.5697021484375
2024-08-03T18:57:40.416989885Z 
 43%|████▎     | 4082/9500 [14:00:10<18:25:43, 12.24s/it]08/03/2024 11:57:40 - INFO - __main__ -   Step: 4082, LR: 1.1758377789911721e-05, Loss: 802.315673828125
2024-08-03T18:57:52.897218410Z 
 43%|████▎     | 4083/9500 [14:00:22<18:31:53, 12.32s/it]08/03/2024 11:57:52 - INFO - __main__ -   Step: 4083, LR: 1.1756207246224443e-05, Loss: 668.5625
2024-08-03T18:58:05.640678705Z 
 43%|████▎     | 4084/9500 [14:00:35<18:43:16, 12.44s/it]08/03/2024 11:58:05 - INFO - __main__ -   Step: 4084, LR: 1.1754036702537165e-05, Loss: 507.398681640625
2024-08-03T18:58:17.806369704Z 
 43%|████▎     | 4085/9500 [14:00:47<18:35:31, 12.36s/it]08/03/2024 11:58:17 - INFO - __main__ -   Step: 4085, LR: 1.1751866158849884e-05, Loss: 678.75048828125
2024-08-03T18:58:30.074224840Z 
 43%|████▎     | 4086/9500 [14:01:00<18:32:48, 12.33s/it]08/03/2024 11:58:30 - INFO - __main__ -   Step: 4086, LR: 1.1749695615162606e-05, Loss: 534.1854858398438
2024-08-03T18:58:42.597457393Z 
 43%|████▎     | 4087/9500 [14:01:12<18:37:46, 12.39s/it]08/03/2024 11:58:42 - INFO - __main__ -   Step: 4087, LR: 1.1747525071475326e-05, Loss: 463.7225341796875
2024-08-03T18:58:54.894153082Z 
 43%|████▎     | 4088/9500 [14:01:24<18:35:02, 12.36s/it]08/03/2024 11:58:54 - INFO - __main__ -   Step: 4088, LR: 1.1745354527788046e-05, Loss: 563.218505859375
2024-08-03T18:59:07.354469188Z 
 43%|████▎     | 4089/9500 [14:01:37<18:37:29, 12.39s/it]08/03/2024 11:59:07 - INFO - __main__ -   Step: 4089, LR: 1.1743183984100767e-05, Loss: 611.61181640625
2024-08-03T18:59:20.206647113Z 
 43%|████▎     | 4090/9500 [14:01:50<18:49:45, 12.53s/it]08/03/2024 11:59:20 - INFO - __main__ -   Step: 4090, LR: 1.1741013440413489e-05, Loss: 508.3453369140625
2024-08-03T18:59:32.753627403Z 
 43%|████▎     | 4091/9500 [14:02:02<18:50:00, 12.53s/it]08/03/2024 11:59:32 - INFO - __main__ -   Step: 4091, LR: 1.173884289672621e-05, Loss: 629.9349365234375
2024-08-03T18:59:45.169474955Z 
 43%|████▎     | 4092/9500 [14:02:15<18:46:34, 12.50s/it]08/03/2024 11:59:45 - INFO - __main__ -   Step: 4092, LR: 1.1736672353038932e-05, Loss: 672.015625
2024-08-03T18:59:57.794223485Z 
 43%|████▎     | 4093/9500 [14:02:27<18:49:46, 12.54s/it]08/03/2024 11:59:57 - INFO - __main__ -   Step: 4093, LR: 1.1734501809351654e-05, Loss: 628.53125
2024-08-03T19:00:09.795810876Z 
 43%|████▎     | 4094/9500 [14:02:39<18:35:05, 12.38s/it]08/03/2024 12:00:09 - INFO - __main__ -   Step: 4094, LR: 1.1732331265664373e-05, Loss: 614.2833251953125
2024-08-03T19:00:22.175144612Z 
 43%|████▎     | 4095/9500 [14:02:52<18:34:59, 12.38s/it]08/03/2024 12:00:22 - INFO - __main__ -   Step: 4095, LR: 1.1730160721977095e-05, Loss: 603.12451171875
2024-08-03T19:00:34.602775031Z 
 43%|████▎     | 4096/9500 [14:03:04<18:36:08, 12.39s/it]08/03/2024 12:00:34 - INFO - __main__ -   Step: 4096, LR: 1.1727990178289816e-05, Loss: 701.7598876953125
2024-08-03T19:00:46.811980766Z 
 43%|████▎     | 4097/9500 [14:03:16<18:30:59, 12.34s/it]08/03/2024 12:00:46 - INFO - __main__ -   Step: 4097, LR: 1.1725819634602538e-05, Loss: 608.8370361328125
2024-08-03T19:00:58.637724070Z 
 43%|████▎     | 4098/9500 [14:03:28<18:16:55, 12.18s/it]08/03/2024 12:00:58 - INFO - __main__ -   Step: 4098, LR: 1.172364909091526e-05, Loss: 441.28350830078125
2024-08-03T19:01:10.992785634Z 
 43%|████▎     | 4099/9500 [14:03:40<18:21:23, 12.24s/it]08/03/2024 12:01:10 - INFO - __main__ -   Step: 4099, LR: 1.172147854722798e-05, Loss: 626.1168212890625
2024-08-03T19:01:23.297428263Z 
 43%|████▎     | 4100/9500 [14:03:53<18:23:03, 12.26s/it]08/03/2024 12:01:23 - INFO - __main__ -   Step: 4100, LR: 1.1719308003540701e-05, Loss: 617.612548828125
2024-08-03T19:01:35.291801844Z 
 43%|████▎     | 4101/9500 [14:04:05<18:15:46, 12.18s/it]08/03/2024 12:01:35 - INFO - __main__ -   Step: 4101, LR: 1.1717137459853421e-05, Loss: 605.140869140625
2024-08-03T19:01:47.734236334Z 
 43%|████▎     | 4102/9500 [14:04:17<18:22:43, 12.26s/it]08/03/2024 12:01:47 - INFO - __main__ -   Step: 4102, LR: 1.1714966916166142e-05, Loss: 680.069580078125
2024-08-03T19:02:00.170711972Z 
 43%|████▎     | 4103/9500 [14:04:30<18:27:21, 12.31s/it]08/03/2024 12:02:00 - INFO - __main__ -   Step: 4103, LR: 1.1712796372478862e-05, Loss: 563.708251953125
2024-08-03T19:02:12.634021798Z 
 43%|████▎     | 4104/9500 [14:04:42<18:31:16, 12.36s/it]08/03/2024 12:02:12 - INFO - __main__ -   Step: 4104, LR: 1.1710625828791584e-05, Loss: 673.0621337890625
2024-08-03T19:02:25.423573798Z 
 43%|████▎     | 4105/9500 [14:04:55<18:42:44, 12.49s/it]08/03/2024 12:02:25 - INFO - __main__ -   Step: 4105, LR: 1.1708455285104305e-05, Loss: 728.4639282226562
2024-08-03T19:02:37.571178405Z 
 43%|████▎     | 4106/9500 [14:05:07<18:33:23, 12.38s/it]08/03/2024 12:02:37 - INFO - __main__ -   Step: 4106, LR: 1.1706284741417027e-05, Loss: 526.63037109375
2024-08-03T19:02:49.689009543Z 
 43%|████▎     | 4107/9500 [14:05:19<18:25:58, 12.30s/it]08/03/2024 12:02:49 - INFO - __main__ -   Step: 4107, LR: 1.1704114197729749e-05, Loss: 555.7838134765625
2024-08-03T19:03:02.542700945Z 
 43%|████▎     | 4108/9500 [14:05:32<18:40:35, 12.47s/it]08/03/2024 12:03:02 - INFO - __main__ -   Step: 4108, LR: 1.1701943654042468e-05, Loss: 627.9860229492188
2024-08-03T19:03:14.509508934Z 
 43%|████▎     | 4109/9500 [14:05:44<18:26:49, 12.32s/it]08/03/2024 12:03:14 - INFO - __main__ -   Step: 4109, LR: 1.169977311035519e-05, Loss: 630.9488525390625
2024-08-03T19:03:26.520668023Z 
 43%|████▎     | 4110/9500 [14:05:56<18:18:20, 12.23s/it]08/03/2024 12:03:26 - INFO - __main__ -   Step: 4110, LR: 1.1697602566667912e-05, Loss: 606.5355224609375
2024-08-03T19:03:38.971067190Z 
 43%|████▎     | 4111/9500 [14:06:08<18:24:10, 12.29s/it]08/03/2024 12:03:38 - INFO - __main__ -   Step: 4111, LR: 1.1695432022980633e-05, Loss: 693.6332397460938
2024-08-03T19:03:50.954736205Z 
 43%|████▎     | 4112/9500 [14:06:20<18:15:37, 12.20s/it]08/03/2024 12:03:50 - INFO - __main__ -   Step: 4112, LR: 1.1693261479293355e-05, Loss: 578.0003662109375
2024-08-03T19:04:03.466695675Z 
 43%|████▎     | 4113/9500 [14:06:33<18:23:48, 12.29s/it]08/03/2024 12:04:03 - INFO - __main__ -   Step: 4113, LR: 1.1691090935606075e-05, Loss: 837.6672973632812
2024-08-03T19:04:16.163222036Z 
 43%|████▎     | 4114/9500 [14:06:46<18:34:25, 12.41s/it]08/03/2024 12:04:16 - INFO - __main__ -   Step: 4114, LR: 1.1688920391918796e-05, Loss: 575.9210205078125
2024-08-03T19:04:28.432752633Z 
 43%|████▎     | 4115/9500 [14:06:58<18:30:19, 12.37s/it]08/03/2024 12:04:28 - INFO - __main__ -   Step: 4115, LR: 1.1686749848231516e-05, Loss: 879.0830688476562
2024-08-03T19:04:40.930256636Z 
 43%|████▎     | 4116/9500 [14:07:10<18:33:30, 12.41s/it]08/03/2024 12:04:40 - INFO - __main__ -   Step: 4116, LR: 1.1684579304544238e-05, Loss: 610.071044921875
2024-08-03T19:04:53.455896740Z 
 43%|████▎     | 4117/9500 [14:07:23<18:36:25, 12.44s/it]08/03/2024 12:04:53 - INFO - __main__ -   Step: 4117, LR: 1.1682408760856957e-05, Loss: 591.583984375
2024-08-03T19:05:05.623283492Z 
 43%|████▎     | 4118/9500 [14:07:35<18:28:47, 12.36s/it]08/03/2024 12:05:05 - INFO - __main__ -   Step: 4118, LR: 1.1680238217169679e-05, Loss: 602.1810302734375
2024-08-03T19:05:17.598163755Z 
 43%|████▎     | 4119/9500 [14:07:47<18:18:11, 12.25s/it]08/03/2024 12:05:17 - INFO - __main__ -   Step: 4119, LR: 1.16780676734824e-05, Loss: 572.9600219726562
2024-08-03T19:05:30.036801022Z 
 43%|████▎     | 4120/9500 [14:07:59<18:23:10, 12.30s/it]08/03/2024 12:05:30 - INFO - __main__ -   Step: 4120, LR: 1.1675897129795122e-05, Loss: 585.1375732421875
2024-08-03T19:05:42.171921753Z 
 43%|████▎     | 4121/9500 [14:08:12<18:18:27, 12.25s/it]08/03/2024 12:05:42 - INFO - __main__ -   Step: 4121, LR: 1.1673726586107844e-05, Loss: 611.4061889648438
2024-08-03T19:05:54.577266712Z 
 43%|████▎     | 4122/9500 [14:08:24<18:22:21, 12.30s/it]08/03/2024 12:05:54 - INFO - __main__ -   Step: 4122, LR: 1.1671556042420563e-05, Loss: 670.0564575195312
2024-08-03T19:06:06.612107709Z 
 43%|████▎     | 4123/9500 [14:08:36<18:15:03, 12.22s/it]08/03/2024 12:06:06 - INFO - __main__ -   Step: 4123, LR: 1.1669385498733285e-05, Loss: 561.5556030273438
2024-08-03T19:06:19.225595044Z 
 43%|████▎     | 4124/9500 [14:08:49<18:25:27, 12.34s/it]08/03/2024 12:06:19 - INFO - __main__ -   Step: 4124, LR: 1.1667214955046007e-05, Loss: 585.54931640625
2024-08-03T19:06:31.241984742Z 
 43%|████▎     | 4125/9500 [14:09:01<18:16:37, 12.24s/it]08/03/2024 12:06:31 - INFO - __main__ -   Step: 4125, LR: 1.1665044411358728e-05, Loss: 524.685791015625
2024-08-03T19:06:43.713985432Z 
 43%|████▎     | 4126/9500 [14:09:13<18:22:36, 12.31s/it]08/03/2024 12:06:43 - INFO - __main__ -   Step: 4126, LR: 1.166287386767145e-05, Loss: 583.020263671875
2024-08-03T19:06:56.605156245Z 
 43%|████▎     | 4127/9500 [14:09:26<18:37:59, 12.48s/it]08/03/2024 12:06:56 - INFO - __main__ -   Step: 4127, LR: 1.1660703323984171e-05, Loss: 661.3250732421875
2024-08-03T19:07:08.916874787Z 
 43%|████▎     | 4128/9500 [14:09:38<18:33:08, 12.43s/it]08/03/2024 12:07:08 - INFO - __main__ -   Step: 4128, LR: 1.1658532780296891e-05, Loss: 491.675537109375
2024-08-03T19:07:20.932528627Z 
 43%|████▎     | 4129/9500 [14:09:50<18:21:44, 12.31s/it]08/03/2024 12:07:20 - INFO - __main__ -   Step: 4129, LR: 1.1656362236609611e-05, Loss: 536.009521484375
2024-08-03T19:07:33.719386129Z 
 43%|████▎     | 4130/9500 [14:10:03<18:34:23, 12.45s/it]08/03/2024 12:07:33 - INFO - __main__ -   Step: 4130, LR: 1.1654191692922333e-05, Loss: 613.7484130859375
2024-08-03T19:07:45.816555628Z 
 43%|████▎     | 4131/9500 [14:10:15<18:24:41, 12.35s/it]08/03/2024 12:07:45 - INFO - __main__ -   Step: 4131, LR: 1.1652021149235052e-05, Loss: 615.2847290039062
2024-08-03T19:07:57.850945308Z 
 43%|████▎     | 4132/9500 [14:10:27<18:16:08, 12.25s/it]08/03/2024 12:07:57 - INFO - __main__ -   Step: 4132, LR: 1.1649850605547774e-05, Loss: 518.6995849609375
2024-08-03T19:08:10.123982450Z 
 44%|████▎     | 4133/9500 [14:10:40<18:16:30, 12.26s/it]08/03/2024 12:08:10 - INFO - __main__ -   Step: 4133, LR: 1.1647680061860496e-05, Loss: 497.02239990234375
2024-08-03T19:08:22.737516616Z 
 44%|████▎     | 4134/9500 [14:10:52<18:25:47, 12.36s/it]08/03/2024 12:08:22 - INFO - __main__ -   Step: 4134, LR: 1.1645509518173217e-05, Loss: 676.245849609375
2024-08-03T19:08:34.848236444Z 
 44%|████▎     | 4135/9500 [14:11:04<18:18:49, 12.29s/it]08/03/2024 12:08:34 - INFO - __main__ -   Step: 4135, LR: 1.1643338974485939e-05, Loss: 432.68701171875
2024-08-03T19:08:47.234088661Z 
 44%|████▎     | 4136/9500 [14:11:17<18:21:13, 12.32s/it]08/03/2024 12:08:47 - INFO - __main__ -   Step: 4136, LR: 1.164116843079866e-05, Loss: 548.562255859375
2024-08-03T19:08:59.463516155Z 
 44%|████▎     | 4137/9500 [14:11:29<18:18:38, 12.29s/it]08/03/2024 12:08:59 - INFO - __main__ -   Step: 4137, LR: 1.163899788711138e-05, Loss: 686.7855834960938
2024-08-03T19:09:11.556860630Z 
 44%|████▎     | 4138/9500 [14:11:41<18:13:07, 12.23s/it]08/03/2024 12:09:11 - INFO - __main__ -   Step: 4138, LR: 1.1636827343424102e-05, Loss: 556.220703125
2024-08-03T19:09:24.072364390Z 
 44%|████▎     | 4139/9500 [14:11:54<18:20:31, 12.32s/it]08/03/2024 12:09:24 - INFO - __main__ -   Step: 4139, LR: 1.1634656799736823e-05, Loss: 538.5490112304688
2024-08-03T19:09:36.271344533Z 
 44%|████▎     | 4140/9500 [14:12:06<18:17:09, 12.28s/it]08/03/2024 12:09:36 - INFO - __main__ -   Step: 4140, LR: 1.1632486256049545e-05, Loss: 529.30224609375
2024-08-03T19:09:48.498379048Z 
 44%|████▎     | 4141/9500 [14:12:18<18:15:29, 12.27s/it]08/03/2024 12:09:48 - INFO - __main__ -   Step: 4141, LR: 1.1630315712362266e-05, Loss: 495.18267822265625
2024-08-03T19:10:01.154852899Z 
 44%|████▎     | 4142/9500 [14:12:31<18:25:44, 12.38s/it]08/03/2024 12:10:01 - INFO - __main__ -   Step: 4142, LR: 1.1628145168674986e-05, Loss: 611.8233032226562
2024-08-03T19:10:13.096078448Z 
 44%|████▎     | 4143/9500 [14:12:43<18:13:44, 12.25s/it]08/03/2024 12:10:13 - INFO - __main__ -   Step: 4143, LR: 1.1625974624987706e-05, Loss: 557.427978515625
2024-08-03T19:10:25.621957214Z 
 44%|████▎     | 4144/9500 [14:12:55<18:20:54, 12.33s/it]08/03/2024 12:10:25 - INFO - __main__ -   Step: 4144, LR: 1.1623804081300428e-05, Loss: 636.4559326171875
2024-08-03T19:10:38.360175874Z 
 44%|████▎     | 4145/9500 [14:13:08<18:31:34, 12.45s/it]08/03/2024 12:10:38 - INFO - __main__ -   Step: 4145, LR: 1.162163353761315e-05, Loss: 582.7695922851562
2024-08-03T19:10:50.553502073Z 
 44%|████▎     | 4146/9500 [14:13:20<18:24:21, 12.38s/it]08/03/2024 12:10:50 - INFO - __main__ -   Step: 4146, LR: 1.1619462993925869e-05, Loss: 657.8309326171875
2024-08-03T19:11:02.942934807Z 
 44%|████▎     | 4147/9500 [14:13:32<18:24:30, 12.38s/it]08/03/2024 12:11:02 - INFO - __main__ -   Step: 4147, LR: 1.161729245023859e-05, Loss: 538.5714111328125
2024-08-03T19:11:15.634917708Z 
 44%|████▎     | 4148/9500 [14:13:45<18:32:38, 12.47s/it]08/03/2024 12:11:15 - INFO - __main__ -   Step: 4148, LR: 1.1615121906551312e-05, Loss: 602.8990478515625
2024-08-03T19:11:27.606523877Z 
 44%|████▎     | 4149/9500 [14:13:57<18:19:01, 12.32s/it]08/03/2024 12:11:27 - INFO - __main__ -   Step: 4149, LR: 1.1612951362864034e-05, Loss: 478.02392578125
2024-08-03T19:11:39.590415327Z 
 44%|████▎     | 4150/9500 [14:14:09<18:09:44, 12.22s/it]08/03/2024 12:11:39 - INFO - __main__ -   Step: 4150, LR: 1.1610780819176755e-05, Loss: 492.8087463378906
2024-08-03T19:11:51.940436034Z 
 44%|████▎     | 4151/9500 [14:14:21<18:12:58, 12.26s/it]08/03/2024 12:11:51 - INFO - __main__ -   Step: 4151, LR: 1.1608610275489475e-05, Loss: 517.317138671875
2024-08-03T19:12:03.826822634Z 
 44%|████▎     | 4152/9500 [14:14:33<18:02:46, 12.15s/it]08/03/2024 12:12:03 - INFO - __main__ -   Step: 4152, LR: 1.1606439731802197e-05, Loss: 500.16693115234375
2024-08-03T19:12:15.969042724Z 
 44%|████▎     | 4153/9500 [14:14:45<18:02:25, 12.15s/it]08/03/2024 12:12:15 - INFO - __main__ -   Step: 4153, LR: 1.1604269188114918e-05, Loss: 686.6678466796875
2024-08-03T19:12:28.614968449Z 
 44%|████▎     | 4154/9500 [14:14:58<18:15:35, 12.30s/it]08/03/2024 12:12:28 - INFO - __main__ -   Step: 4154, LR: 1.160209864442764e-05, Loss: 534.8785400390625
2024-08-03T19:12:40.611193577Z 
 44%|████▎     | 4155/9500 [14:15:10<18:07:21, 12.21s/it]08/03/2024 12:12:40 - INFO - __main__ -   Step: 4155, LR: 1.1599928100740361e-05, Loss: 495.9940185546875
2024-08-03T19:12:52.731175965Z 
 44%|████▎     | 4156/9500 [14:15:22<18:04:51, 12.18s/it]08/03/2024 12:12:52 - INFO - __main__ -   Step: 4156, LR: 1.159775755705308e-05, Loss: 445.8651123046875
2024-08-03T19:13:05.128307599Z 
 44%|████▍     | 4157/9500 [14:15:35<18:10:26, 12.25s/it]08/03/2024 12:13:05 - INFO - __main__ -   Step: 4157, LR: 1.1595587013365801e-05, Loss: 655.247314453125
2024-08-03T19:13:17.193709258Z 
 44%|████▍     | 4158/9500 [14:15:47<18:05:25, 12.19s/it]08/03/2024 12:13:17 - INFO - __main__ -   Step: 4158, LR: 1.1593416469678523e-05, Loss: 536.1713256835938
2024-08-03T19:13:28.902118660Z 
 44%|████▍     | 4159/9500 [14:15:58<17:52:20, 12.05s/it]08/03/2024 12:13:28 - INFO - __main__ -   Step: 4159, LR: 1.1591245925991244e-05, Loss: 458.54119873046875
2024-08-03T19:13:41.344676123Z 
 44%|████▍     | 4160/9500 [14:16:11<18:02:42, 12.17s/it]08/03/2024 12:13:41 - INFO - __main__ -   Step: 4160, LR: 1.1589075382303964e-05, Loss: 454.23699951171875
2024-08-03T19:13:53.614824774Z 
 44%|████▍     | 4161/9500 [14:16:23<18:05:18, 12.20s/it]08/03/2024 12:13:53 - INFO - __main__ -   Step: 4161, LR: 1.1586904838616686e-05, Loss: 564.5405883789062
2024-08-03T19:14:05.822616629Z 
 44%|████▍     | 4162/9500 [14:16:35<18:05:24, 12.20s/it]08/03/2024 12:14:05 - INFO - __main__ -   Step: 4162, LR: 1.1584734294929407e-05, Loss: 727.7572631835938
2024-08-03T19:14:18.180801607Z 
 44%|████▍     | 4163/9500 [14:16:48<18:09:25, 12.25s/it]08/03/2024 12:14:18 - INFO - __main__ -   Step: 4163, LR: 1.1582563751242129e-05, Loss: 587.1263427734375
2024-08-03T19:14:30.341450368Z 
 44%|████▍     | 4164/9500 [14:17:00<18:06:53, 12.22s/it]08/03/2024 12:14:30 - INFO - __main__ -   Step: 4164, LR: 1.158039320755485e-05, Loss: 489.3016357421875
2024-08-03T19:14:42.529496753Z 
 44%|████▍     | 4165/9500 [14:17:12<18:05:48, 12.21s/it]08/03/2024 12:14:42 - INFO - __main__ -   Step: 4165, LR: 1.157822266386757e-05, Loss: 546.5675048828125
2024-08-03T19:14:54.781562999Z 
 44%|████▍     | 4166/9500 [14:17:24<18:06:40, 12.22s/it]08/03/2024 12:14:54 - INFO - __main__ -   Step: 4166, LR: 1.1576052120180292e-05, Loss: 545.3421630859375
2024-08-03T19:15:07.345846830Z 
 44%|████▍     | 4167/9500 [14:17:37<18:15:33, 12.33s/it]08/03/2024 12:15:07 - INFO - __main__ -   Step: 4167, LR: 1.1573881576493013e-05, Loss: 719.7339477539062
2024-08-03T19:15:19.822285124Z 
 44%|████▍     | 4168/9500 [14:17:49<18:19:22, 12.37s/it]08/03/2024 12:15:19 - INFO - __main__ -   Step: 4168, LR: 1.1571711032805735e-05, Loss: 758.44970703125
2024-08-03T19:15:31.947894344Z 
 44%|████▍     | 4169/9500 [14:18:01<18:12:36, 12.30s/it]08/03/2024 12:15:31 - INFO - __main__ -   Step: 4169, LR: 1.1569540489118456e-05, Loss: 538.662353515625
2024-08-03T19:15:44.349837123Z 
 44%|████▍     | 4170/9500 [14:18:14<18:15:12, 12.33s/it]08/03/2024 12:15:44 - INFO - __main__ -   Step: 4170, LR: 1.1567369945431175e-05, Loss: 548.1891479492188
2024-08-03T19:15:56.736186243Z 
 44%|████▍     | 4171/9500 [14:18:26<18:16:31, 12.35s/it]08/03/2024 12:15:56 - INFO - __main__ -   Step: 4171, LR: 1.1565199401743896e-05, Loss: 572.6488037109375
2024-08-03T19:16:08.771631584Z 
 44%|████▍     | 4172/9500 [14:18:38<18:08:03, 12.25s/it]08/03/2024 12:16:08 - INFO - __main__ -   Step: 4172, LR: 1.1563028858056618e-05, Loss: 577.1212158203125
2024-08-03T19:16:21.439316816Z 
 44%|████▍     | 4173/9500 [14:18:51<18:18:54, 12.38s/it]08/03/2024 12:16:21 - INFO - __main__ -   Step: 4173, LR: 1.156085831436934e-05, Loss: 692.765869140625
2024-08-03T19:16:34.066846861Z 
 44%|████▍     | 4174/9500 [14:19:04<18:25:21, 12.45s/it]08/03/2024 12:16:34 - INFO - __main__ -   Step: 4174, LR: 1.155868777068206e-05, Loss: 572.4168701171875
2024-08-03T19:16:45.983564101Z 
 44%|████▍     | 4175/9500 [14:19:15<18:10:52, 12.29s/it]08/03/2024 12:16:45 - INFO - __main__ -   Step: 4175, LR: 1.155651722699478e-05, Loss: 456.5682373046875
2024-08-03T19:16:58.655283294Z 
 44%|████▍     | 4176/9500 [14:19:28<18:20:48, 12.41s/it]08/03/2024 12:16:58 - INFO - __main__ -   Step: 4176, LR: 1.1554346683307502e-05, Loss: 589.4029541015625
2024-08-03T19:17:10.835182034Z 
 44%|████▍     | 4177/9500 [14:19:40<18:14:35, 12.34s/it]08/03/2024 12:17:10 - INFO - __main__ -   Step: 4177, LR: 1.1552176139620224e-05, Loss: 630.0433349609375
2024-08-03T19:17:23.242162422Z 
 44%|████▍     | 4178/9500 [14:19:53<18:16:13, 12.36s/it]08/03/2024 12:17:23 - INFO - __main__ -   Step: 4178, LR: 1.1550005595932945e-05, Loss: 503.479248046875
2024-08-03T19:17:35.613682116Z 
 44%|████▍     | 4179/9500 [14:20:05<18:16:20, 12.36s/it]08/03/2024 12:17:35 - INFO - __main__ -   Step: 4179, LR: 1.1547835052245667e-05, Loss: 548.8837890625
2024-08-03T19:17:47.915482104Z 
 44%|████▍     | 4180/9500 [14:20:17<18:14:31, 12.34s/it]08/03/2024 12:17:47 - INFO - __main__ -   Step: 4180, LR: 1.1545664508558387e-05, Loss: 573.2872314453125
2024-08-03T19:18:00.016135634Z 
 44%|████▍     | 4181/9500 [14:20:29<18:07:50, 12.27s/it]08/03/2024 12:18:00 - INFO - __main__ -   Step: 4181, LR: 1.1543493964871108e-05, Loss: 467.578125
2024-08-03T19:18:12.750447219Z 
 44%|████▍     | 4182/9500 [14:20:42<18:19:57, 12.41s/it]08/03/2024 12:18:12 - INFO - __main__ -   Step: 4182, LR: 1.154132342118383e-05, Loss: 535.490234375
2024-08-03T19:18:24.853560430Z 
 44%|████▍     | 4183/9500 [14:20:54<18:11:35, 12.32s/it]08/03/2024 12:18:24 - INFO - __main__ -   Step: 4183, LR: 1.1539152877496552e-05, Loss: 578.57373046875
2024-08-03T19:18:37.082491007Z 
 44%|████▍     | 4184/9500 [14:21:07<18:08:59, 12.29s/it]08/03/2024 12:18:37 - INFO - __main__ -   Step: 4184, LR: 1.153698233380927e-05, Loss: 534.999267578125
2024-08-03T19:18:49.544482592Z 
 44%|████▍     | 4185/9500 [14:21:19<18:13:20, 12.34s/it]08/03/2024 12:18:49 - INFO - __main__ -   Step: 4185, LR: 1.1534811790121991e-05, Loss: 622.4558715820312
2024-08-03T19:19:01.374713886Z 
 44%|████▍     | 4186/9500 [14:21:31<17:59:31, 12.19s/it]08/03/2024 12:19:01 - INFO - __main__ -   Step: 4186, LR: 1.1532641246434713e-05, Loss: 584.4028930664062
2024-08-03T19:19:13.429590257Z 
 44%|████▍     | 4187/9500 [14:21:43<17:55:45, 12.15s/it]08/03/2024 12:19:13 - INFO - __main__ -   Step: 4187, LR: 1.1530470702747434e-05, Loss: 413.0798645019531
2024-08-03T19:19:26.153814161Z 
 44%|████▍     | 4188/9500 [14:21:56<18:10:50, 12.32s/it]08/03/2024 12:19:26 - INFO - __main__ -   Step: 4188, LR: 1.1528300159060156e-05, Loss: 510.12664794921875
2024-08-03T19:19:38.412371491Z 
 44%|████▍     | 4189/9500 [14:22:08<18:08:58, 12.30s/it]08/03/2024 12:19:38 - INFO - __main__ -   Step: 4189, LR: 1.1526129615372876e-05, Loss: 816.6862182617188
2024-08-03T19:19:50.467706616Z 
 44%|████▍     | 4190/9500 [14:22:20<18:02:12, 12.23s/it]08/03/2024 12:19:50 - INFO - __main__ -   Step: 4190, LR: 1.1523959071685597e-05, Loss: 506.2405700683594
2024-08-03T19:20:03.311895793Z 
 44%|████▍     | 4191/9500 [14:22:33<18:18:21, 12.41s/it]08/03/2024 12:20:03 - INFO - __main__ -   Step: 4191, LR: 1.1521788527998319e-05, Loss: 492.314697265625
2024-08-03T19:20:15.249308464Z 
 44%|████▍     | 4192/9500 [14:22:45<18:05:31, 12.27s/it]08/03/2024 12:20:15 - INFO - __main__ -   Step: 4192, LR: 1.151961798431104e-05, Loss: 658.9502563476562
2024-08-03T19:20:27.790717660Z 
 44%|████▍     | 4193/9500 [14:22:57<18:12:30, 12.35s/it]08/03/2024 12:20:27 - INFO - __main__ -   Step: 4193, LR: 1.1517447440623762e-05, Loss: 570.9993896484375
2024-08-03T19:20:40.459363238Z 
 44%|████▍     | 4194/9500 [14:23:10<18:20:42, 12.45s/it]08/03/2024 12:20:40 - INFO - __main__ -   Step: 4194, LR: 1.1515276896936482e-05, Loss: 608.6813354492188
2024-08-03T19:20:52.321585681Z 
 44%|████▍     | 4195/9500 [14:23:22<18:04:59, 12.27s/it]08/03/2024 12:20:52 - INFO - __main__ -   Step: 4195, LR: 1.1513106353249203e-05, Loss: 454.3174133300781
2024-08-03T19:21:04.553662131Z 
 44%|████▍     | 4196/9500 [14:23:34<18:03:44, 12.26s/it]08/03/2024 12:21:04 - INFO - __main__ -   Step: 4196, LR: 1.1510935809561925e-05, Loss: 652.1953125
2024-08-03T19:21:17.037118913Z 
 44%|████▍     | 4197/9500 [14:23:46<18:09:28, 12.33s/it]08/03/2024 12:21:17 - INFO - __main__ -   Step: 4197, LR: 1.1508765265874647e-05, Loss: 687.0105590820312
2024-08-03T19:21:29.036035925Z 
 44%|████▍     | 4198/9500 [14:23:58<18:00:35, 12.23s/it]08/03/2024 12:21:29 - INFO - __main__ -   Step: 4198, LR: 1.1506594722187365e-05, Loss: 716.2347412109375
2024-08-03T19:21:41.025388183Z 
 44%|████▍     | 4199/9500 [14:24:10<17:54:02, 12.16s/it]08/03/2024 12:21:41 - INFO - __main__ -   Step: 4199, LR: 1.1504424178500086e-05, Loss: 517.3140258789062
2024-08-03T19:21:53.831648860Z 
 44%|████▍     | 4200/9500 [14:24:23<18:11:03, 12.35s/it]08/03/2024 12:21:53 - INFO - __main__ -   Step: 4200, LR: 1.1502253634812808e-05, Loss: 666.1580810546875
2024-08-03T19:22:06.059937735Z 
 44%|████▍     | 4201/9500 [14:24:35<18:07:35, 12.31s/it]08/03/2024 12:22:06 - INFO - __main__ -   Step: 4201, LR: 1.150008309112553e-05, Loss: 620.0804443359375
2024-08-03T19:22:18.090275615Z 
 44%|████▍     | 4202/9500 [14:24:48<17:59:51, 12.23s/it]08/03/2024 12:22:18 - INFO - __main__ -   Step: 4202, LR: 1.1497912547438251e-05, Loss: 557.8380126953125
2024-08-03T19:22:30.366376832Z 
 44%|████▍     | 4203/9500 [14:25:00<18:00:52, 12.24s/it]08/03/2024 12:22:30 - INFO - __main__ -   Step: 4203, LR: 1.149574200375097e-05, Loss: 492.2638244628906
2024-08-03T19:22:42.366662935Z 
 44%|████▍     | 4204/9500 [14:25:12<17:54:14, 12.17s/it]08/03/2024 12:22:42 - INFO - __main__ -   Step: 4204, LR: 1.1493571460063692e-05, Loss: 534.0025634765625
2024-08-03T19:22:54.823250523Z 
 44%|████▍     | 4205/9500 [14:25:24<18:01:36, 12.26s/it]08/03/2024 12:22:54 - INFO - __main__ -   Step: 4205, LR: 1.1491400916376414e-05, Loss: 674.71435546875
2024-08-03T19:23:07.379040638Z 
 44%|████▍     | 4206/9500 [14:25:37<18:09:20, 12.35s/it]08/03/2024 12:23:07 - INFO - __main__ -   Step: 4206, LR: 1.1489230372689136e-05, Loss: 697.478515625
2024-08-03T19:23:19.579697603Z 
 44%|████▍     | 4207/9500 [14:25:49<18:05:17, 12.30s/it]08/03/2024 12:23:19 - INFO - __main__ -   Step: 4207, LR: 1.1487059829001857e-05, Loss: 858.2116088867188
2024-08-03T19:23:31.614184305Z 
 44%|████▍     | 4208/9500 [14:26:01<17:57:59, 12.22s/it]08/03/2024 12:23:31 - INFO - __main__ -   Step: 4208, LR: 1.1484889285314577e-05, Loss: 642.273681640625
2024-08-03T19:23:43.674616055Z 
 44%|████▍     | 4209/9500 [14:26:13<17:53:30, 12.17s/it]08/03/2024 12:23:43 - INFO - __main__ -   Step: 4209, LR: 1.1482718741627299e-05, Loss: 667.92919921875
2024-08-03T19:23:56.238208925Z 
 44%|████▍     | 4210/9500 [14:26:26<18:03:37, 12.29s/it]08/03/2024 12:23:56 - INFO - __main__ -   Step: 4210, LR: 1.148054819794002e-05, Loss: 634.1900634765625
2024-08-03T19:24:08.343022623Z 
 44%|████▍     | 4211/9500 [14:26:38<17:58:30, 12.23s/it]08/03/2024 12:24:08 - INFO - __main__ -   Step: 4211, LR: 1.1478377654252742e-05, Loss: 590.7266845703125
2024-08-03T19:24:20.332147684Z 
 44%|████▍     | 4212/9500 [14:26:50<17:51:48, 12.16s/it]08/03/2024 12:24:20 - INFO - __main__ -   Step: 4212, LR: 1.147620711056546e-05, Loss: 543.649658203125
2024-08-03T19:24:32.608372421Z 
 44%|████▍     | 4213/9500 [14:27:02<17:54:38, 12.20s/it]08/03/2024 12:24:32 - INFO - __main__ -   Step: 4213, LR: 1.1474036566878181e-05, Loss: 449.0509033203125
2024-08-03T19:24:44.904557806Z 
 44%|████▍     | 4214/9500 [14:27:14<17:57:06, 12.23s/it]08/03/2024 12:24:44 - INFO - __main__ -   Step: 4214, LR: 1.1471866023190903e-05, Loss: 564.6146240234375
2024-08-03T19:24:56.864476494Z 
 44%|████▍     | 4215/9500 [14:27:26<17:49:51, 12.15s/it]08/03/2024 12:24:56 - INFO - __main__ -   Step: 4215, LR: 1.1469695479503624e-05, Loss: 604.291015625
2024-08-03T19:25:09.469725451Z 
 44%|████▍     | 4216/9500 [14:27:39<18:01:47, 12.28s/it]08/03/2024 12:25:09 - INFO - __main__ -   Step: 4216, LR: 1.1467524935816346e-05, Loss: 676.892578125
2024-08-03T19:25:21.741383448Z 
 44%|████▍     | 4217/9500 [14:27:51<18:01:15, 12.28s/it]08/03/2024 12:25:21 - INFO - __main__ -   Step: 4217, LR: 1.1465354392129066e-05, Loss: 596.4542846679688
2024-08-03T19:25:33.841001256Z 
 44%|████▍     | 4218/9500 [14:28:03<17:56:18, 12.23s/it]08/03/2024 12:25:33 - INFO - __main__ -   Step: 4218, LR: 1.1463183848441787e-05, Loss: 573.2406616210938
2024-08-03T19:25:46.301649200Z 
 44%|████▍     | 4219/9500 [14:28:16<18:02:17, 12.30s/it]08/03/2024 12:25:46 - INFO - __main__ -   Step: 4219, LR: 1.1461013304754509e-05, Loss: 672.5430297851562
2024-08-03T19:25:58.751695292Z 
 44%|████▍     | 4220/9500 [14:28:28<18:06:07, 12.34s/it]08/03/2024 12:25:58 - INFO - __main__ -   Step: 4220, LR: 1.145884276106723e-05, Loss: 738.529541015625
2024-08-03T19:26:11.032806873Z 
 44%|████▍     | 4221/9500 [14:28:40<18:04:18, 12.32s/it]08/03/2024 12:26:11 - INFO - __main__ -   Step: 4221, LR: 1.1456672217379952e-05, Loss: 692.5952758789062
2024-08-03T19:26:23.921402034Z 
 44%|████▍     | 4222/9500 [14:28:53<18:19:00, 12.49s/it]08/03/2024 12:26:23 - INFO - __main__ -   Step: 4222, LR: 1.1454501673692672e-05, Loss: 597.5264892578125
2024-08-03T19:26:35.856581131Z 
 44%|████▍     | 4223/9500 [14:29:05<18:04:04, 12.33s/it]08/03/2024 12:26:35 - INFO - __main__ -   Step: 4223, LR: 1.1452331130005394e-05, Loss: 615.1287231445312
2024-08-03T19:26:48.019105138Z 
 44%|████▍     | 4224/9500 [14:29:17<17:59:33, 12.28s/it]08/03/2024 12:26:48 - INFO - __main__ -   Step: 4224, LR: 1.1450160586318115e-05, Loss: 487.4187316894531
2024-08-03T19:27:00.757153828Z 
 44%|████▍     | 4225/9500 [14:29:30<18:11:30, 12.42s/it]08/03/2024 12:27:00 - INFO - __main__ -   Step: 4225, LR: 1.1447990042630837e-05, Loss: 636.6943359375
2024-08-03T19:27:12.794133518Z 
 44%|████▍     | 4226/9500 [14:29:42<18:01:19, 12.30s/it]08/03/2024 12:27:12 - INFO - __main__ -   Step: 4226, LR: 1.1445819498943555e-05, Loss: 568.03173828125
2024-08-03T19:27:25.009486439Z 
 44%|████▍     | 4227/9500 [14:29:54<17:58:50, 12.28s/it]08/03/2024 12:27:25 - INFO - __main__ -   Step: 4227, LR: 1.1443648955256276e-05, Loss: 575.0233154296875
2024-08-03T19:27:37.367166411Z 
 45%|████▍     | 4228/9500 [14:30:07<18:00:47, 12.30s/it]08/03/2024 12:27:37 - INFO - __main__ -   Step: 4228, LR: 1.1441478411568998e-05, Loss: 490.3439025878906
2024-08-03T19:27:49.414651401Z 
 45%|████▍     | 4229/9500 [14:30:19<17:53:55, 12.22s/it]08/03/2024 12:27:49 - INFO - __main__ -   Step: 4229, LR: 1.143930786788172e-05, Loss: 505.1763916015625
2024-08-03T19:28:01.814309506Z 
 45%|████▍     | 4230/9500 [14:30:31<17:58:19, 12.28s/it]08/03/2024 12:28:01 - INFO - __main__ -   Step: 4230, LR: 1.1437137324194441e-05, Loss: 581.6923828125
2024-08-03T19:28:14.228833999Z 
 45%|████▍     | 4231/9500 [14:30:44<18:01:45, 12.32s/it]08/03/2024 12:28:14 - INFO - __main__ -   Step: 4231, LR: 1.1434966780507161e-05, Loss: 602.2453002929688
2024-08-03T19:28:26.294835994Z 
 45%|████▍     | 4232/9500 [14:30:56<17:54:54, 12.24s/it]08/03/2024 12:28:26 - INFO - __main__ -   Step: 4232, LR: 1.1432796236819883e-05, Loss: 437.007080078125
2024-08-03T19:28:38.782279843Z 
 45%|████▍     | 4233/9500 [14:31:08<18:01:08, 12.32s/it]08/03/2024 12:28:38 - INFO - __main__ -   Step: 4233, LR: 1.1430625693132604e-05, Loss: 491.9666748046875
2024-08-03T19:28:51.406658461Z 
 45%|████▍     | 4234/9500 [14:31:21<18:09:03, 12.41s/it]08/03/2024 12:28:51 - INFO - __main__ -   Step: 4234, LR: 1.1428455149445326e-05, Loss: 568.3533935546875
2024-08-03T19:29:03.650620814Z 
 45%|████▍     | 4235/9500 [14:31:33<18:04:31, 12.36s/it]08/03/2024 12:29:03 - INFO - __main__ -   Step: 4235, LR: 1.1426284605758047e-05, Loss: 630.947021484375
2024-08-03T19:29:15.692211765Z 
 45%|████▍     | 4236/9500 [14:31:45<17:55:57, 12.26s/it]08/03/2024 12:29:15 - INFO - __main__ -   Step: 4236, LR: 1.1424114062070769e-05, Loss: 624.3251953125
2024-08-03T19:29:28.272866226Z 
 45%|████▍     | 4237/9500 [14:31:58<18:04:04, 12.36s/it]08/03/2024 12:29:28 - INFO - __main__ -   Step: 4237, LR: 1.1421943518383489e-05, Loss: 751.2972412109375
2024-08-03T19:29:40.349679819Z 
 45%|████▍     | 4238/9500 [14:32:10<17:56:27, 12.27s/it]08/03/2024 12:29:40 - INFO - __main__ -   Step: 4238, LR: 1.141977297469621e-05, Loss: 621.339599609375
2024-08-03T19:29:52.664104030Z 
 45%|████▍     | 4239/9500 [14:32:22<17:57:18, 12.29s/it]08/03/2024 12:29:52 - INFO - __main__ -   Step: 4239, LR: 1.1417602431008932e-05, Loss: 591.0590209960938
2024-08-03T19:30:05.342995832Z 
 45%|████▍     | 4240/9500 [14:32:35<18:07:25, 12.40s/it]08/03/2024 12:30:05 - INFO - __main__ -   Step: 4240, LR: 1.141543188732165e-05, Loss: 496.6204833984375
2024-08-03T19:30:17.359483276Z 
 45%|████▍     | 4241/9500 [14:32:47<17:57:01, 12.29s/it]08/03/2024 12:30:17 - INFO - __main__ -   Step: 4241, LR: 1.1413261343634371e-05, Loss: 533.56494140625
2024-08-03T19:30:29.855783023Z 
 45%|████▍     | 4242/9500 [14:32:59<18:02:18, 12.35s/it]08/03/2024 12:30:29 - INFO - __main__ -   Step: 4242, LR: 1.1411090799947093e-05, Loss: 619.2181396484375
2024-08-03T19:30:42.207302648Z 
 45%|████▍     | 4243/9500 [14:33:12<18:02:07, 12.35s/it]08/03/2024 12:30:42 - INFO - __main__ -   Step: 4243, LR: 1.1408920256259815e-05, Loss: 521.5348510742188
2024-08-03T19:30:54.346988771Z 
 45%|████▍     | 4244/9500 [14:33:24<17:56:22, 12.29s/it]08/03/2024 12:30:54 - INFO - __main__ -   Step: 4244, LR: 1.1406749712572536e-05, Loss: 495.8558349609375
2024-08-03T19:31:06.717206065Z 
 45%|████▍     | 4245/9500 [14:33:36<17:58:20, 12.31s/it]08/03/2024 12:31:06 - INFO - __main__ -   Step: 4245, LR: 1.1404579168885258e-05, Loss: 608.2361450195312
2024-08-03T19:31:19.075165659Z 
 45%|████▍     | 4246/9500 [14:33:49<17:59:20, 12.33s/it]08/03/2024 12:31:19 - INFO - __main__ -   Step: 4246, LR: 1.1402408625197978e-05, Loss: 663.4915161132812
2024-08-03T19:31:31.280451233Z 
 45%|████▍     | 4247/9500 [14:34:01<17:55:58, 12.29s/it]08/03/2024 12:31:31 - INFO - __main__ -   Step: 4247, LR: 1.1400238081510699e-05, Loss: 667.916259765625
2024-08-03T19:31:43.540538107Z 
 45%|████▍     | 4248/9500 [14:34:13<17:54:59, 12.28s/it]08/03/2024 12:31:43 - INFO - __main__ -   Step: 4248, LR: 1.139806753782342e-05, Loss: 554.8436279296875
2024-08-03T19:31:55.974621648Z 
 45%|████▍     | 4249/9500 [14:34:25<17:58:48, 12.33s/it]08/03/2024 12:31:55 - INFO - __main__ -   Step: 4249, LR: 1.1395896994136142e-05, Loss: 561.1680908203125
2024-08-03T19:32:08.682515183Z 
 45%|████▍     | 4250/9500 [14:34:38<18:08:36, 12.44s/it]08/03/2024 12:32:08 - INFO - __main__ -   Step: 4250, LR: 1.1393726450448864e-05, Loss: 711.1699829101562
2024-08-03T19:32:20.836920660Z 
 45%|████▍     | 4251/9500 [14:34:50<18:00:52, 12.36s/it]08/03/2024 12:32:20 - INFO - __main__ -   Step: 4251, LR: 1.1391555906761584e-05, Loss: 507.9690246582031
2024-08-03T19:32:32.784362285Z 
 45%|████▍     | 4252/9500 [14:35:02<17:49:57, 12.23s/it]08/03/2024 12:32:32 - INFO - __main__ -   Step: 4252, LR: 1.1389385363074305e-05, Loss: 639.0242919921875
2024-08-03T19:32:45.536781201Z 
 45%|████▍     | 4253/9500 [14:35:15<18:03:23, 12.39s/it]08/03/2024 12:32:45 - INFO - __main__ -   Step: 4253, LR: 1.1387214819387027e-05, Loss: 609.6463012695312
2024-08-03T19:32:57.768531092Z 
 45%|████▍     | 4254/9500 [14:35:27<17:59:04, 12.34s/it]08/03/2024 12:32:57 - INFO - __main__ -   Step: 4254, LR: 1.1385044275699747e-05, Loss: 573.6401977539062
2024-08-03T19:33:09.970233002Z 
 45%|████▍     | 4255/9500 [14:35:39<17:55:11, 12.30s/it]08/03/2024 12:33:09 - INFO - __main__ -   Step: 4255, LR: 1.1382873732012467e-05, Loss: 658.7272338867188
2024-08-03T19:33:22.457823303Z 
 45%|████▍     | 4256/9500 [14:35:52<17:59:55, 12.36s/it]08/03/2024 12:33:22 - INFO - __main__ -   Step: 4256, LR: 1.1380703188325188e-05, Loss: 476.56317138671875
2024-08-03T19:33:34.903763286Z 
 45%|████▍     | 4257/9500 [14:36:04<18:02:04, 12.38s/it]08/03/2024 12:33:34 - INFO - __main__ -   Step: 4257, LR: 1.137853264463791e-05, Loss: 574.711181640625
2024-08-03T19:33:46.898424395Z 
 45%|████▍     | 4258/9500 [14:36:16<17:51:40, 12.27s/it]08/03/2024 12:33:46 - INFO - __main__ -   Step: 4258, LR: 1.1376362100950631e-05, Loss: 587.7552490234375
2024-08-03T19:33:59.634164304Z 
 45%|████▍     | 4259/9500 [14:36:29<18:03:46, 12.41s/it]08/03/2024 12:33:59 - INFO - __main__ -   Step: 4259, LR: 1.1374191557263353e-05, Loss: 457.537109375
2024-08-03T19:34:11.670053297Z 
 45%|████▍     | 4260/9500 [14:36:41<17:53:50, 12.30s/it]08/03/2024 12:34:11 - INFO - __main__ -   Step: 4260, LR: 1.1372021013576073e-05, Loss: 516.4569091796875
2024-08-03T19:34:23.658083112Z 
 45%|████▍     | 4261/9500 [14:36:53<17:45:34, 12.20s/it]08/03/2024 12:34:23 - INFO - __main__ -   Step: 4261, LR: 1.1369850469888794e-05, Loss: 578.2039184570312
2024-08-03T19:34:36.090186476Z 
 45%|████▍     | 4262/9500 [14:37:06<17:51:21, 12.27s/it]08/03/2024 12:34:36 - INFO - __main__ -   Step: 4262, LR: 1.1367679926201516e-05, Loss: 523.5438232421875
2024-08-03T19:34:48.469630758Z 
 45%|████▍     | 4263/9500 [14:37:18<17:53:57, 12.30s/it]08/03/2024 12:34:48 - INFO - __main__ -   Step: 4263, LR: 1.1365509382514237e-05, Loss: 610.284423828125
2024-08-03T19:35:00.713457956Z 
 45%|████▍     | 4264/9500 [14:37:30<17:52:10, 12.29s/it]08/03/2024 12:35:00 - INFO - __main__ -   Step: 4264, LR: 1.1363338838826959e-05, Loss: 744.392333984375
2024-08-03T19:35:13.157558968Z 
 45%|████▍     | 4265/9500 [14:37:43<17:56:05, 12.33s/it]08/03/2024 12:35:13 - INFO - __main__ -   Step: 4265, LR: 1.1361168295139679e-05, Loss: 475.4706115722656
2024-08-03T19:35:25.630636004Z 
 45%|████▍     | 4266/9500 [14:37:55<17:59:32, 12.38s/it]08/03/2024 12:35:25 - INFO - __main__ -   Step: 4266, LR: 1.13589977514524e-05, Loss: 476.417236328125
2024-08-03T19:35:38.036183734Z 
 45%|████▍     | 4267/9500 [14:38:07<18:00:08, 12.38s/it]08/03/2024 12:35:38 - INFO - __main__ -   Step: 4267, LR: 1.1356827207765122e-05, Loss: 598.72998046875
2024-08-03T19:35:50.644048179Z 
 45%|████▍     | 4268/9500 [14:38:20<18:05:45, 12.45s/it]08/03/2024 12:35:50 - INFO - __main__ -   Step: 4268, LR: 1.1354656664077842e-05, Loss: 755.8934326171875
2024-08-03T19:36:02.823734631Z 
 45%|████▍     | 4269/9500 [14:38:32<17:58:26, 12.37s/it]08/03/2024 12:36:02 - INFO - __main__ -   Step: 4269, LR: 1.1352486120390562e-05, Loss: 639.467529296875
2024-08-03T19:36:15.393458731Z 
 45%|████▍     | 4270/9500 [14:38:45<18:03:28, 12.43s/it]08/03/2024 12:36:15 - INFO - __main__ -   Step: 4270, LR: 1.1350315576703283e-05, Loss: 677.6959838867188
2024-08-03T19:36:28.222204531Z 
 45%|████▍     | 4271/9500 [14:38:58<18:13:36, 12.55s/it]08/03/2024 12:36:28 - INFO - __main__ -   Step: 4271, LR: 1.1348145033016005e-05, Loss: 738.17236328125
2024-08-03T19:36:40.317199745Z 
 45%|████▍     | 4272/9500 [14:39:10<18:01:37, 12.41s/it]08/03/2024 12:36:40 - INFO - __main__ -   Step: 4272, LR: 1.1345974489328726e-05, Loss: 547.3338623046875
2024-08-03T19:36:52.567028725Z 
 45%|████▍     | 4273/9500 [14:39:22<17:57:08, 12.36s/it]08/03/2024 12:36:52 - INFO - __main__ -   Step: 4273, LR: 1.1343803945641448e-05, Loss: 618.22900390625
2024-08-03T19:37:04.952666052Z 
 45%|████▍     | 4274/9500 [14:39:34<17:57:29, 12.37s/it]08/03/2024 12:37:04 - INFO - __main__ -   Step: 4274, LR: 1.1341633401954168e-05, Loss: 532.9364013671875
2024-08-03T19:37:17.094500780Z 
 45%|████▌     | 4275/9500 [14:39:47<17:51:17, 12.30s/it]08/03/2024 12:37:17 - INFO - __main__ -   Step: 4275, LR: 1.133946285826689e-05, Loss: 539.0235595703125
2024-08-03T19:37:29.426062510Z 
 45%|████▌     | 4276/9500 [14:39:59<17:51:51, 12.31s/it]08/03/2024 12:37:29 - INFO - __main__ -   Step: 4276, LR: 1.133729231457961e-05, Loss: 644.539306640625
2024-08-03T19:37:42.051304099Z 
 45%|████▌     | 4277/9500 [14:40:11<17:59:52, 12.41s/it]08/03/2024 12:37:42 - INFO - __main__ -   Step: 4277, LR: 1.1335121770892332e-05, Loss: 614.8500366210938
2024-08-03T19:37:54.063088395Z 
 45%|████▌     | 4278/9500 [14:40:23<17:49:23, 12.29s/it]08/03/2024 12:37:54 - INFO - __main__ -   Step: 4278, LR: 1.1332951227205054e-05, Loss: 717.3499755859375
2024-08-03T19:38:06.516733830Z 
 45%|████▌     | 4279/9500 [14:40:36<17:53:32, 12.34s/it]08/03/2024 12:38:06 - INFO - __main__ -   Step: 4279, LR: 1.1330780683517775e-05, Loss: 685.2403564453125
2024-08-03T19:38:19.017008296Z 
 45%|████▌     | 4280/9500 [14:40:48<17:57:35, 12.39s/it]08/03/2024 12:38:19 - INFO - __main__ -   Step: 4280, LR: 1.1328610139830495e-05, Loss: 525.736572265625
2024-08-03T19:38:31.364301527Z 
 45%|████▌     | 4281/9500 [14:41:01<17:56:21, 12.37s/it]08/03/2024 12:38:31 - INFO - __main__ -   Step: 4281, LR: 1.1326439596143217e-05, Loss: 772.9742431640625
2024-08-03T19:38:43.582081614Z 
 45%|████▌     | 4282/9500 [14:41:13<17:52:04, 12.33s/it]08/03/2024 12:38:43 - INFO - __main__ -   Step: 4282, LR: 1.1324269052455937e-05, Loss: 472.5849914550781
2024-08-03T19:38:56.365346835Z 
 45%|████▌     | 4283/9500 [14:41:26<18:03:45, 12.46s/it]08/03/2024 12:38:56 - INFO - __main__ -   Step: 4283, LR: 1.1322098508768657e-05, Loss: 685.9852294921875
2024-08-03T19:39:08.606341832Z 
 45%|████▌     | 4284/9500 [14:41:38<17:57:44, 12.40s/it]08/03/2024 12:39:08 - INFO - __main__ -   Step: 4284, LR: 1.1319927965081378e-05, Loss: 656.82666015625
2024-08-03T19:39:20.806973020Z 
 45%|████▌     | 4285/9500 [14:41:50<17:52:24, 12.34s/it]08/03/2024 12:39:20 - INFO - __main__ -   Step: 4285, LR: 1.13177574213941e-05, Loss: 575.454345703125
2024-08-03T19:39:33.256003232Z 
 45%|████▌     | 4286/9500 [14:42:03<17:55:04, 12.37s/it]08/03/2024 12:39:33 - INFO - __main__ -   Step: 4286, LR: 1.1315586877706821e-05, Loss: 475.4676513671875
2024-08-03T19:39:45.217683686Z 
 45%|████▌     | 4287/9500 [14:42:15<17:44:11, 12.25s/it]08/03/2024 12:39:45 - INFO - __main__ -   Step: 4287, LR: 1.1313416334019543e-05, Loss: 649.767578125
2024-08-03T19:39:57.357228733Z 
 45%|████▌     | 4288/9500 [14:42:27<17:41:09, 12.22s/it]08/03/2024 12:39:57 - INFO - __main__ -   Step: 4288, LR: 1.1311245790332264e-05, Loss: 574.123046875
2024-08-03T19:40:09.781128519Z 
 45%|████▌     | 4289/9500 [14:42:39<17:46:22, 12.28s/it]08/03/2024 12:40:09 - INFO - __main__ -   Step: 4289, LR: 1.1309075246644984e-05, Loss: 589.771484375
2024-08-03T19:40:22.102658027Z 
 45%|████▌     | 4290/9500 [14:42:52<17:47:16, 12.29s/it]08/03/2024 12:40:22 - INFO - __main__ -   Step: 4290, LR: 1.1306904702957706e-05, Loss: 633.6712646484375
2024-08-03T19:40:34.222108181Z 
 45%|████▌     | 4291/9500 [14:43:04<17:42:37, 12.24s/it]08/03/2024 12:40:34 - INFO - __main__ -   Step: 4291, LR: 1.1304734159270427e-05, Loss: 769.871337890625
2024-08-03T19:40:46.896787946Z 
 45%|████▌     | 4292/9500 [14:43:16<17:53:44, 12.37s/it]08/03/2024 12:40:46 - INFO - __main__ -   Step: 4292, LR: 1.1302563615583149e-05, Loss: 686.5474853515625
2024-08-03T19:40:59.468466928Z 
 45%|████▌     | 4293/9500 [14:43:29<17:58:46, 12.43s/it]08/03/2024 12:40:59 - INFO - __main__ -   Step: 4293, LR: 1.130039307189587e-05, Loss: 686.035400390625
2024-08-03T19:41:11.729058633Z 
 45%|████▌     | 4294/9500 [14:43:41<17:54:08, 12.38s/it]08/03/2024 12:41:11 - INFO - __main__ -   Step: 4294, LR: 1.129822252820859e-05, Loss: 555.35302734375
2024-08-03T19:41:23.808174473Z 
 45%|████▌     | 4295/9500 [14:43:53<17:46:06, 12.29s/it]08/03/2024 12:41:23 - INFO - __main__ -   Step: 4295, LR: 1.1296051984521312e-05, Loss: 694.1566162109375
2024-08-03T19:41:36.548441814Z 
 45%|████▌     | 4296/9500 [14:44:06<17:57:38, 12.42s/it]08/03/2024 12:41:36 - INFO - __main__ -   Step: 4296, LR: 1.1293881440834032e-05, Loss: 695.4900512695312
2024-08-03T19:41:49.218601434Z 
 45%|████▌     | 4297/9500 [14:44:19<18:03:48, 12.50s/it]08/03/2024 12:41:49 - INFO - __main__ -   Step: 4297, LR: 1.1291710897146753e-05, Loss: 615.8701171875
2024-08-03T19:42:01.249257490Z 
 45%|████▌     | 4298/9500 [14:44:31<17:51:26, 12.36s/it]08/03/2024 12:42:01 - INFO - __main__ -   Step: 4298, LR: 1.1289540353459473e-05, Loss: 645.8423461914062
2024-08-03T19:42:13.689885482Z 
 45%|████▌     | 4299/9500 [14:44:43<17:53:22, 12.38s/it]08/03/2024 12:42:13 - INFO - __main__ -   Step: 4299, LR: 1.1287369809772195e-05, Loss: 576.9354858398438
2024-08-03T19:42:25.873433116Z 
 45%|████▌     | 4300/9500 [14:44:55<17:48:00, 12.32s/it]08/03/2024 12:42:25 - INFO - __main__ -   Step: 4300, LR: 1.1285199266084916e-05, Loss: 620.4005126953125
2024-08-03T19:42:37.841619511Z 
 45%|████▌     | 4301/9500 [14:45:07<17:38:34, 12.22s/it]08/03/2024 12:42:37 - INFO - __main__ -   Step: 4301, LR: 1.1283028722397638e-05, Loss: 525.4144287109375
2024-08-03T19:42:50.419300220Z 
 45%|████▌     | 4302/9500 [14:45:20<17:47:44, 12.32s/it]08/03/2024 12:42:50 - INFO - __main__ -   Step: 4302, LR: 1.128085817871036e-05, Loss: 671.9218139648438
2024-08-03T19:43:02.867251669Z 
 45%|████▌     | 4303/9500 [14:45:32<17:50:44, 12.36s/it]08/03/2024 12:43:02 - INFO - __main__ -   Step: 4303, LR: 1.127868763502308e-05, Loss: 605.447998046875
2024-08-03T19:43:14.959744019Z 
 45%|████▌     | 4304/9500 [14:45:44<17:43:31, 12.28s/it]08/03/2024 12:43:14 - INFO - __main__ -   Step: 4304, LR: 1.1276517091335801e-05, Loss: 747.33154296875
2024-08-03T19:43:27.427437235Z 
 45%|████▌     | 4305/9500 [14:45:57<17:48:10, 12.34s/it]08/03/2024 12:43:27 - INFO - __main__ -   Step: 4305, LR: 1.1274346547648522e-05, Loss: 539.522216796875
2024-08-03T19:43:39.734911659Z 
 45%|████▌     | 4306/9500 [14:46:09<17:47:12, 12.33s/it]08/03/2024 12:43:39 - INFO - __main__ -   Step: 4306, LR: 1.1272176003961244e-05, Loss: 569.183837890625
2024-08-03T19:43:51.875303321Z 
 45%|████▌     | 4307/9500 [14:46:21<17:42:07, 12.27s/it]08/03/2024 12:43:51 - INFO - __main__ -   Step: 4307, LR: 1.1270005460273966e-05, Loss: 691.296142578125
2024-08-03T19:44:04.263681268Z 
 45%|████▌     | 4308/9500 [14:46:34<17:44:57, 12.31s/it]08/03/2024 12:44:04 - INFO - __main__ -   Step: 4308, LR: 1.1267834916586685e-05, Loss: 589.80859375
2024-08-03T19:44:16.706036216Z 
 45%|████▌     | 4309/9500 [14:46:46<17:48:15, 12.35s/it]08/03/2024 12:44:16 - INFO - __main__ -   Step: 4309, LR: 1.1265664372899405e-05, Loss: 638.297119140625
2024-08-03T19:44:28.929272819Z 
 45%|████▌     | 4310/9500 [14:46:58<17:44:49, 12.31s/it]08/03/2024 12:44:28 - INFO - __main__ -   Step: 4310, LR: 1.1263493829212127e-05, Loss: 709.8904418945312
2024-08-03T19:44:41.758722176Z 
 45%|████▌     | 4311/9500 [14:47:11<17:58:05, 12.47s/it]08/03/2024 12:44:41 - INFO - __main__ -   Step: 4311, LR: 1.1261323285524848e-05, Loss: 526.425537109375
2024-08-03T19:44:53.979235248Z 
 45%|████▌     | 4312/9500 [14:47:23<17:51:31, 12.39s/it]08/03/2024 12:44:53 - INFO - __main__ -   Step: 4312, LR: 1.1259152741837568e-05, Loss: 637.2145385742188
2024-08-03T19:45:06.020178655Z 
 45%|████▌     | 4313/9500 [14:47:35<17:42:12, 12.29s/it]08/03/2024 12:45:06 - INFO - __main__ -   Step: 4313, LR: 1.125698219815029e-05, Loss: 492.0415344238281
2024-08-03T19:45:18.625862441Z 
 45%|████▌     | 4314/9500 [14:47:48<17:50:15, 12.38s/it]08/03/2024 12:45:18 - INFO - __main__ -   Step: 4314, LR: 1.1254811654463011e-05, Loss: 703.2606201171875
2024-08-03T19:45:30.632695050Z 
 45%|████▌     | 4315/9500 [14:48:00<17:40:18, 12.27s/it]08/03/2024 12:45:30 - INFO - __main__ -   Step: 4315, LR: 1.1252641110775733e-05, Loss: 609.8245849609375
2024-08-03T19:45:42.585344837Z 
 45%|████▌     | 4316/9500 [14:48:12<17:31:53, 12.17s/it]08/03/2024 12:45:42 - INFO - __main__ -   Step: 4316, LR: 1.1250470567088455e-05, Loss: 525.1193237304688
2024-08-03T19:45:54.908306481Z 
 45%|████▌     | 4317/9500 [14:48:24<17:35:31, 12.22s/it]08/03/2024 12:45:54 - INFO - __main__ -   Step: 4317, LR: 1.1248300023401174e-05, Loss: 490.95831298828125
2024-08-03T19:46:07.051678713Z 
 45%|████▌     | 4318/9500 [14:48:36<17:33:22, 12.20s/it]08/03/2024 12:46:07 - INFO - __main__ -   Step: 4318, LR: 1.1246129479713896e-05, Loss: 592.23046875
2024-08-03T19:46:19.530112271Z 
 45%|████▌     | 4319/9500 [14:48:49<17:40:28, 12.28s/it]08/03/2024 12:46:19 - INFO - __main__ -   Step: 4319, LR: 1.1243958936026618e-05, Loss: 492.9532470703125
2024-08-03T19:46:32.161830592Z 
 45%|████▌     | 4320/9500 [14:49:02<17:49:20, 12.39s/it]08/03/2024 12:46:32 - INFO - __main__ -   Step: 4320, LR: 1.1241788392339339e-05, Loss: 667.754150390625
2024-08-03T19:46:44.184310647Z 
 45%|████▌     | 4321/9500 [14:49:14<17:39:43, 12.28s/it]08/03/2024 12:46:44 - INFO - __main__ -   Step: 4321, LR: 1.123961784865206e-05, Loss: 555.6897583007812
2024-08-03T19:46:56.680504885Z 
 45%|████▌     | 4322/9500 [14:49:26<17:45:11, 12.34s/it]08/03/2024 12:46:56 - INFO - __main__ -   Step: 4322, LR: 1.1237447304964782e-05, Loss: 509.65972900390625
2024-08-03T19:47:09.044464081Z 
 46%|████▌     | 4323/9500 [14:49:38<17:45:29, 12.35s/it]08/03/2024 12:47:09 - INFO - __main__ -   Step: 4323, LR: 1.12352767612775e-05, Loss: 419.048583984375
2024-08-03T19:47:21.075279009Z 
 46%|████▌     | 4324/9500 [14:49:51<17:37:05, 12.25s/it]08/03/2024 12:47:21 - INFO - __main__ -   Step: 4324, LR: 1.1233106217590222e-05, Loss: 544.0538330078125
2024-08-03T19:47:33.290834457Z 
 46%|████▌     | 4325/9500 [14:50:03<17:35:53, 12.24s/it]08/03/2024 12:47:33 - INFO - __main__ -   Step: 4325, LR: 1.1230935673902944e-05, Loss: 617.3200073242188
2024-08-03T19:47:45.835077859Z 
 46%|████▌     | 4326/9500 [14:50:15<17:43:30, 12.33s/it]08/03/2024 12:47:45 - INFO - __main__ -   Step: 4326, LR: 1.1228765130215663e-05, Loss: 451.01824951171875
2024-08-03T19:47:58.154573414Z 
 46%|████▌     | 4327/9500 [14:50:28<17:42:57, 12.33s/it]08/03/2024 12:47:58 - INFO - __main__ -   Step: 4327, LR: 1.1226594586528385e-05, Loss: 470.9215393066406
2024-08-03T19:48:10.456555825Z 
 46%|████▌     | 4328/9500 [14:50:40<17:42:02, 12.32s/it]08/03/2024 12:48:10 - INFO - __main__ -   Step: 4328, LR: 1.1224424042841106e-05, Loss: 754.0083618164062
2024-08-03T19:48:23.666342391Z 
 46%|████▌     | 4329/9500 [14:50:53<18:04:50, 12.59s/it]08/03/2024 12:48:23 - INFO - __main__ -   Step: 4329, LR: 1.1222253499153828e-05, Loss: 665.5113525390625
2024-08-03T19:48:35.654128577Z 
 46%|████▌     | 4330/9500 [14:51:05<17:49:07, 12.41s/it]08/03/2024 12:48:35 - INFO - __main__ -   Step: 4330, LR: 1.122008295546655e-05, Loss: 659.4932861328125
2024-08-03T19:48:47.847506244Z 
 46%|████▌     | 4331/9500 [14:51:17<17:43:23, 12.34s/it]08/03/2024 12:48:47 - INFO - __main__ -   Step: 4331, LR: 1.1217912411779271e-05, Loss: 490.20953369140625
2024-08-03T19:49:00.304849039Z 
 46%|████▌     | 4332/9500 [14:51:30<17:46:07, 12.38s/it]08/03/2024 12:49:00 - INFO - __main__ -   Step: 4332, LR: 1.1215741868091991e-05, Loss: 472.7148132324219
2024-08-03T19:49:12.456593530Z 
 46%|████▌     | 4333/9500 [14:51:42<17:40:04, 12.31s/it]08/03/2024 12:49:12 - INFO - __main__ -   Step: 4333, LR: 1.1213571324404713e-05, Loss: 534.859130859375
2024-08-03T19:49:24.417293779Z 
 46%|████▌     | 4334/9500 [14:51:54<17:30:51, 12.21s/it]08/03/2024 12:49:24 - INFO - __main__ -   Step: 4334, LR: 1.1211400780717434e-05, Loss: 614.066650390625
2024-08-03T19:49:36.917321015Z 
 46%|████▌     | 4335/9500 [14:52:06<17:38:16, 12.29s/it]08/03/2024 12:49:36 - INFO - __main__ -   Step: 4335, LR: 1.1209230237030156e-05, Loss: 590.864990234375
2024-08-03T19:49:49.043737756Z 
 46%|████▌     | 4336/9500 [14:52:18<17:33:44, 12.24s/it]08/03/2024 12:49:49 - INFO - __main__ -   Step: 4336, LR: 1.1207059693342877e-05, Loss: 615.1054077148438
2024-08-03T19:50:01.122365339Z 
 46%|████▌     | 4337/9500 [14:52:31<17:29:16, 12.19s/it]08/03/2024 12:50:01 - INFO - __main__ -   Step: 4337, LR: 1.1204889149655595e-05, Loss: 512.1947631835938
2024-08-03T19:50:13.130833633Z 
 46%|████▌     | 4338/9500 [14:52:43<17:24:17, 12.14s/it]08/03/2024 12:50:13 - INFO - __main__ -   Step: 4338, LR: 1.1202718605968317e-05, Loss: 602.4713745117188
2024-08-03T19:50:25.500133871Z 
 46%|████▌     | 4339/9500 [14:52:55<17:30:03, 12.21s/it]08/03/2024 12:50:25 - INFO - __main__ -   Step: 4339, LR: 1.1200548062281039e-05, Loss: 530.2078857421875
2024-08-03T19:50:37.350274218Z 
 46%|████▌     | 4340/9500 [14:53:07<17:20:38, 12.10s/it]08/03/2024 12:50:37 - INFO - __main__ -   Step: 4340, LR: 1.119837751859376e-05, Loss: 445.50189208984375
2024-08-03T19:50:49.650868375Z 
 46%|████▌     | 4341/9500 [14:53:19<17:25:35, 12.16s/it]08/03/2024 12:50:49 - INFO - __main__ -   Step: 4341, LR: 1.119620697490648e-05, Loss: 586.70361328125
2024-08-03T19:51:02.335477725Z 
 46%|████▌     | 4342/9500 [14:53:32<17:38:54, 12.32s/it]08/03/2024 12:51:02 - INFO - __main__ -   Step: 4342, LR: 1.1194036431219202e-05, Loss: 711.3753662109375
2024-08-03T19:51:14.381500383Z 
 46%|████▌     | 4343/9500 [14:53:44<17:31:42, 12.24s/it]08/03/2024 12:51:14 - INFO - __main__ -   Step: 4343, LR: 1.1191865887531923e-05, Loss: 561.2489013671875
2024-08-03T19:51:26.708866980Z 
 46%|████▌     | 4344/9500 [14:53:56<17:33:50, 12.26s/it]08/03/2024 12:51:26 - INFO - __main__ -   Step: 4344, LR: 1.1189695343844645e-05, Loss: 661.1795654296875
2024-08-03T19:51:39.104307106Z 
 46%|████▌     | 4345/9500 [14:54:09<17:37:03, 12.30s/it]08/03/2024 12:51:39 - INFO - __main__ -   Step: 4345, LR: 1.1187524800157366e-05, Loss: 547.6939697265625
2024-08-03T19:51:51.218643743Z 
 46%|████▌     | 4346/9500 [14:54:21<17:31:58, 12.25s/it]08/03/2024 12:51:51 - INFO - __main__ -   Step: 4346, LR: 1.1185354256470086e-05, Loss: 656.641845703125
2024-08-03T19:52:03.462988329Z 
 46%|████▌     | 4347/9500 [14:54:33<17:31:42, 12.25s/it]08/03/2024 12:52:03 - INFO - __main__ -   Step: 4347, LR: 1.1183183712782808e-05, Loss: 618.806396484375
2024-08-03T19:52:16.361458210Z 
 46%|████▌     | 4348/9500 [14:54:46<17:48:19, 12.44s/it]08/03/2024 12:52:16 - INFO - __main__ -   Step: 4348, LR: 1.118101316909553e-05, Loss: 603.42919921875
2024-08-03T19:52:28.423828627Z 
 46%|████▌     | 4349/9500 [14:54:58<17:38:20, 12.33s/it]08/03/2024 12:52:28 - INFO - __main__ -   Step: 4349, LR: 1.117884262540825e-05, Loss: 587.7247924804688
2024-08-03T19:52:40.470273627Z 
 46%|████▌     | 4350/9500 [14:55:10<17:30:53, 12.24s/it]08/03/2024 12:52:40 - INFO - __main__ -   Step: 4350, LR: 1.1176672081720972e-05, Loss: 490.94952392578125
2024-08-03T19:52:53.262802109Z 
 46%|████▌     | 4351/9500 [14:55:23<17:44:48, 12.41s/it]08/03/2024 12:52:53 - INFO - __main__ -   Step: 4351, LR: 1.117450153803369e-05, Loss: 981.88330078125
2024-08-03T19:53:05.315575998Z 
 46%|████▌     | 4352/9500 [14:55:35<17:35:28, 12.30s/it]08/03/2024 12:53:05 - INFO - __main__ -   Step: 4352, LR: 1.1172330994346412e-05, Loss: 467.5057373046875
2024-08-03T19:53:17.441837696Z 
 46%|████▌     | 4353/9500 [14:55:47<17:30:45, 12.25s/it]08/03/2024 12:53:17 - INFO - __main__ -   Step: 4353, LR: 1.1170160450659134e-05, Loss: 503.8791809082031
2024-08-03T19:53:29.966226979Z 
 46%|████▌     | 4354/9500 [14:55:59<17:37:38, 12.33s/it]08/03/2024 12:53:29 - INFO - __main__ -   Step: 4354, LR: 1.1167989906971855e-05, Loss: 525.578125
2024-08-03T19:53:42.357181880Z 
 46%|████▌     | 4355/9500 [14:56:12<17:38:57, 12.35s/it]08/03/2024 12:53:42 - INFO - __main__ -   Step: 4355, LR: 1.1165819363284575e-05, Loss: 615.2930908203125
2024-08-03T19:53:54.476658721Z 
 46%|████▌     | 4356/9500 [14:56:24<17:32:50, 12.28s/it]08/03/2024 12:53:54 - INFO - __main__ -   Step: 4356, LR: 1.1163648819597297e-05, Loss: 605.3572998046875
2024-08-03T19:54:07.083647753Z 
 46%|████▌     | 4357/9500 [14:56:37<17:41:01, 12.38s/it]08/03/2024 12:54:07 - INFO - __main__ -   Step: 4357, LR: 1.1161478275910018e-05, Loss: 637.1536865234375
2024-08-03T19:54:19.429340252Z 
 46%|████▌     | 4358/9500 [14:56:49<17:39:58, 12.37s/it]08/03/2024 12:54:19 - INFO - __main__ -   Step: 4358, LR: 1.115930773222274e-05, Loss: 636.1409912109375
2024-08-03T19:54:31.549398895Z 
 46%|████▌     | 4359/9500 [14:57:01<17:33:23, 12.29s/it]08/03/2024 12:54:31 - INFO - __main__ -   Step: 4359, LR: 1.1157137188535461e-05, Loss: 547.2993774414062
2024-08-03T19:54:43.951168778Z 
 46%|████▌     | 4360/9500 [14:57:13<17:35:57, 12.33s/it]08/03/2024 12:54:43 - INFO - __main__ -   Step: 4360, LR: 1.1154966644848181e-05, Loss: 489.7392272949219
2024-08-03T19:54:56.085497492Z 
 46%|████▌     | 4361/9500 [14:57:26<17:30:48, 12.27s/it]08/03/2024 12:54:56 - INFO - __main__ -   Step: 4361, LR: 1.1152796101160903e-05, Loss: 622.8851318359375
2024-08-03T19:55:09.048297280Z 
 46%|████▌     | 4362/9500 [14:57:38<17:48:24, 12.48s/it]08/03/2024 12:55:09 - INFO - __main__ -   Step: 4362, LR: 1.1150625557473624e-05, Loss: 736.099853515625
2024-08-03T19:55:21.467799558Z 
 46%|████▌     | 4363/9500 [14:57:51<17:46:46, 12.46s/it]08/03/2024 12:55:21 - INFO - __main__ -   Step: 4363, LR: 1.1148455013786346e-05, Loss: 534.775634765625
2024-08-03T19:55:33.576979417Z 
 46%|████▌     | 4364/9500 [14:58:03<17:37:33, 12.35s/it]08/03/2024 12:55:33 - INFO - __main__ -   Step: 4364, LR: 1.1146284470099067e-05, Loss: 617.1661987304688
2024-08-03T19:55:45.852118530Z 
 46%|████▌     | 4365/9500 [14:58:15<17:35:18, 12.33s/it]08/03/2024 12:55:45 - INFO - __main__ -   Step: 4365, LR: 1.1144113926411786e-05, Loss: 567.4029541015625
2024-08-03T19:55:58.428212515Z 
 46%|████▌     | 4366/9500 [14:58:28<17:41:24, 12.40s/it]08/03/2024 12:55:58 - INFO - __main__ -   Step: 4366, LR: 1.1141943382724507e-05, Loss: 579.2025146484375
2024-08-03T19:56:10.520609848Z 
 46%|████▌     | 4367/9500 [14:58:40<17:33:10, 12.31s/it]08/03/2024 12:56:10 - INFO - __main__ -   Step: 4367, LR: 1.1139772839037229e-05, Loss: 649.4227294921875
2024-08-03T19:56:23.037809191Z 
 46%|████▌     | 4368/9500 [14:58:52<17:38:17, 12.37s/it]08/03/2024 12:56:23 - INFO - __main__ -   Step: 4368, LR: 1.113760229534995e-05, Loss: 816.2408447265625
2024-08-03T19:56:35.760636844Z 
 46%|████▌     | 4369/9500 [14:59:05<17:47:03, 12.48s/it]08/03/2024 12:56:35 - INFO - __main__ -   Step: 4369, LR: 1.113543175166267e-05, Loss: 548.744140625
2024-08-03T19:56:47.790949498Z 
 46%|████▌     | 4370/9500 [14:59:17<17:35:22, 12.34s/it]08/03/2024 12:56:47 - INFO - __main__ -   Step: 4370, LR: 1.1133261207975392e-05, Loss: 691.8804931640625
2024-08-03T19:56:59.815145035Z 
 46%|████▌     | 4371/9500 [14:59:29<17:26:58, 12.25s/it]08/03/2024 12:56:59 - INFO - __main__ -   Step: 4371, LR: 1.1131090664288113e-05, Loss: 544.8134765625
2024-08-03T19:57:12.490230114Z 
 46%|████▌     | 4372/9500 [14:59:42<17:37:43, 12.38s/it]08/03/2024 12:57:12 - INFO - __main__ -   Step: 4372, LR: 1.1128920120600835e-05, Loss: 742.302001953125
2024-08-03T19:57:24.639919955Z 
 46%|████▌     | 4373/9500 [14:59:54<17:31:42, 12.31s/it]08/03/2024 12:57:24 - INFO - __main__ -   Step: 4373, LR: 1.1126749576913556e-05, Loss: 489.66546630859375
2024-08-03T19:57:36.964480095Z 
 46%|████▌     | 4374/9500 [15:00:06<17:31:56, 12.31s/it]08/03/2024 12:57:36 - INFO - __main__ -   Step: 4374, LR: 1.1124579033226278e-05, Loss: 710.46337890625
2024-08-03T19:57:49.474662599Z 
 46%|████▌     | 4375/9500 [15:00:19<17:36:47, 12.37s/it]08/03/2024 12:57:49 - INFO - __main__ -   Step: 4375, LR: 1.1122408489538998e-05, Loss: 655.371826171875
2024-08-03T19:58:01.462579153Z 
 46%|████▌     | 4376/9500 [15:00:31<17:26:44, 12.26s/it]08/03/2024 12:58:01 - INFO - __main__ -   Step: 4376, LR: 1.112023794585172e-05, Loss: 511.2392883300781
2024-08-03T19:58:13.851043920Z 
 46%|████▌     | 4377/9500 [15:00:43<17:29:53, 12.30s/it]08/03/2024 12:58:13 - INFO - __main__ -   Step: 4377, LR: 1.1118067402164441e-05, Loss: 592.364013671875
2024-08-03T19:58:26.215254577Z 
 46%|████▌     | 4378/9500 [15:00:56<17:31:26, 12.32s/it]08/03/2024 12:58:26 - INFO - __main__ -   Step: 4378, LR: 1.1115896858477162e-05, Loss: 701.137451171875
2024-08-03T19:58:38.595043692Z 
 46%|████▌     | 4379/9500 [15:01:08<17:32:50, 12.34s/it]08/03/2024 12:58:38 - INFO - __main__ -   Step: 4379, LR: 1.111372631478988e-05, Loss: 615.088623046875
2024-08-03T19:58:50.730696452Z 
 46%|████▌     | 4380/9500 [15:01:20<17:27:31, 12.28s/it]08/03/2024 12:58:50 - INFO - __main__ -   Step: 4380, LR: 1.1111555771102602e-05, Loss: 696.5924072265625
2024-08-03T19:59:02.829777007Z 
 46%|████▌     | 4381/9500 [15:01:32<17:22:47, 12.22s/it]08/03/2024 12:59:02 - INFO - __main__ -   Step: 4381, LR: 1.1109385227415324e-05, Loss: 630.2467041015625
2024-08-03T19:59:15.556589324Z 
 46%|████▌     | 4382/9500 [15:01:45<17:35:29, 12.37s/it]08/03/2024 12:59:15 - INFO - __main__ -   Step: 4382, LR: 1.1107214683728045e-05, Loss: 578.978271484375
2024-08-03T19:59:27.521383828Z 
 46%|████▌     | 4383/9500 [15:01:57<17:24:49, 12.25s/it]08/03/2024 12:59:27 - INFO - __main__ -   Step: 4383, LR: 1.1105044140040767e-05, Loss: 542.484130859375
2024-08-03T19:59:40.060403434Z 
 46%|████▌     | 4384/9500 [15:02:09<17:31:59, 12.34s/it]08/03/2024 12:59:40 - INFO - __main__ -   Step: 4384, LR: 1.1102873596353487e-05, Loss: 610.6925048828125
2024-08-03T19:59:52.741517644Z 
 46%|████▌     | 4385/9500 [15:02:22<17:40:33, 12.44s/it]08/03/2024 12:59:52 - INFO - __main__ -   Step: 4385, LR: 1.1100703052666208e-05, Loss: 741.2328491210938
2024-08-03T20:00:04.954732633Z 
 46%|████▌     | 4386/9500 [15:02:34<17:34:32, 12.37s/it]08/03/2024 13:00:04 - INFO - __main__ -   Step: 4386, LR: 1.109853250897893e-05, Loss: 617.7493286132812
2024-08-03T20:00:17.439931017Z 
 46%|████▌     | 4387/9500 [15:02:47<17:37:12, 12.41s/it]08/03/2024 13:00:17 - INFO - __main__ -   Step: 4387, LR: 1.1096361965291651e-05, Loss: 618.1680297851562
2024-08-03T20:00:29.769892362Z 
 46%|████▌     | 4388/9500 [15:02:59<17:35:03, 12.38s/it]08/03/2024 13:00:29 - INFO - __main__ -   Step: 4388, LR: 1.1094191421604373e-05, Loss: 606.7416381835938
2024-08-03T20:00:41.802149481Z 
 46%|████▌     | 4389/9500 [15:03:11<17:25:52, 12.28s/it]08/03/2024 13:00:41 - INFO - __main__ -   Step: 4389, LR: 1.1092020877917093e-05, Loss: 540.1102294921875
2024-08-03T20:00:54.145795738Z 
 46%|████▌     | 4390/9500 [15:03:24<17:27:21, 12.30s/it]08/03/2024 13:00:54 - INFO - __main__ -   Step: 4390, LR: 1.1089850334229814e-05, Loss: 586.385986328125
2024-08-03T20:01:06.769189158Z 
 46%|████▌     | 4391/9500 [15:03:36<17:35:28, 12.40s/it]08/03/2024 13:01:06 - INFO - __main__ -   Step: 4391, LR: 1.1087679790542536e-05, Loss: 697.143310546875
2024-08-03T20:01:19.287950505Z 
 46%|████▌     | 4392/9500 [15:03:49<17:38:24, 12.43s/it]08/03/2024 13:01:19 - INFO - __main__ -   Step: 4392, LR: 1.1085509246855258e-05, Loss: 628.9593505859375
2024-08-03T20:01:31.490759011Z 
 46%|████▌     | 4393/9500 [15:04:01<17:32:20, 12.36s/it]08/03/2024 13:01:31 - INFO - __main__ -   Step: 4393, LR: 1.1083338703167976e-05, Loss: 663.752197265625
2024-08-03T20:01:44.148402181Z 
 46%|████▋     | 4394/9500 [15:04:14<17:39:38, 12.45s/it]08/03/2024 13:01:44 - INFO - __main__ -   Step: 4394, LR: 1.1081168159480697e-05, Loss: 479.1072082519531
2024-08-03T20:01:56.231660316Z 
 46%|████▋     | 4395/9500 [15:04:26<17:30:01, 12.34s/it]08/03/2024 13:01:56 - INFO - __main__ -   Step: 4395, LR: 1.1078997615793419e-05, Loss: 658.002685546875
2024-08-03T20:02:08.161147461Z 
 46%|████▋     | 4396/9500 [15:04:38<17:19:19, 12.22s/it]08/03/2024 13:02:08 - INFO - __main__ -   Step: 4396, LR: 1.107682707210614e-05, Loss: 573.4766845703125
2024-08-03T20:02:20.642244490Z 
 46%|████▋     | 4397/9500 [15:04:50<17:25:50, 12.30s/it]08/03/2024 13:02:20 - INFO - __main__ -   Step: 4397, LR: 1.1074656528418862e-05, Loss: 726.9752197265625
2024-08-03T20:02:32.749907221Z 
 46%|████▋     | 4398/9500 [15:05:02<17:20:48, 12.24s/it]08/03/2024 13:02:32 - INFO - __main__ -   Step: 4398, LR: 1.1072485984731582e-05, Loss: 582.7171020507812
2024-08-03T20:02:45.065585354Z 
 46%|████▋     | 4399/9500 [15:05:15<17:22:32, 12.26s/it]08/03/2024 13:02:45 - INFO - __main__ -   Step: 4399, LR: 1.1070315441044303e-05, Loss: 576.906005859375
2024-08-03T20:02:57.225516936Z 
 46%|████▋     | 4400/9500 [15:05:27<17:19:42, 12.23s/it]08/03/2024 13:02:57 - INFO - __main__ -   Step: 4400, LR: 1.1068144897357025e-05, Loss: 400.667724609375
2024-08-03T20:03:09.106716073Z 
 46%|████▋     | 4401/9500 [15:05:39<17:10:33, 12.13s/it]08/03/2024 13:03:09 - INFO - __main__ -   Step: 4401, LR: 1.1065974353669746e-05, Loss: 525.5103149414062
2024-08-03T20:03:21.480287502Z 
 46%|████▋     | 4402/9500 [15:05:51<17:16:39, 12.20s/it]08/03/2024 13:03:21 - INFO - __main__ -   Step: 4402, LR: 1.1063803809982468e-05, Loss: 592.9451293945312
2024-08-03T20:03:33.776599309Z 
 46%|████▋     | 4403/9500 [15:06:03<17:18:53, 12.23s/it]08/03/2024 13:03:33 - INFO - __main__ -   Step: 4403, LR: 1.1061633266295188e-05, Loss: 512.7271118164062
2024-08-03T20:03:46.091548826Z 
 46%|████▋     | 4404/9500 [15:06:16<17:20:51, 12.26s/it]08/03/2024 13:03:46 - INFO - __main__ -   Step: 4404, LR: 1.105946272260791e-05, Loss: 568.382080078125
2024-08-03T20:03:57.905476957Z 
 46%|████▋     | 4405/9500 [15:06:27<17:09:25, 12.12s/it]08/03/2024 13:03:57 - INFO - __main__ -   Step: 4405, LR: 1.1057292178920631e-05, Loss: 642.0512084960938
2024-08-03T20:04:10.765252728Z 
 46%|████▋     | 4406/9500 [15:06:40<17:27:59, 12.34s/it]08/03/2024 13:04:10 - INFO - __main__ -   Step: 4406, LR: 1.1055121635233353e-05, Loss: 696.6338500976562
2024-08-03T20:04:23.393880577Z 
 46%|████▋     | 4407/9500 [15:06:53<17:35:02, 12.43s/it]08/03/2024 13:04:23 - INFO - __main__ -   Step: 4407, LR: 1.105295109154607e-05, Loss: 631.3241577148438
2024-08-03T20:04:35.998721443Z 
 46%|████▋     | 4408/9500 [15:07:05<17:39:18, 12.48s/it]08/03/2024 13:04:35 - INFO - __main__ -   Step: 4408, LR: 1.1050780547858792e-05, Loss: 616.996337890625
2024-08-03T20:04:48.422500712Z 
 46%|████▋     | 4409/9500 [15:07:18<17:37:36, 12.46s/it]08/03/2024 13:04:48 - INFO - __main__ -   Step: 4409, LR: 1.1048610004171514e-05, Loss: 599.2298583984375
2024-08-03T20:05:00.629594054Z 
 46%|████▋     | 4410/9500 [15:07:30<17:30:50, 12.39s/it]08/03/2024 13:05:00 - INFO - __main__ -   Step: 4410, LR: 1.1046439460484235e-05, Loss: 611.5017700195312
2024-08-03T20:05:12.631508068Z 
 46%|████▋     | 4411/9500 [15:07:42<17:20:50, 12.27s/it]08/03/2024 13:05:12 - INFO - __main__ -   Step: 4411, LR: 1.1044268916796957e-05, Loss: 604.30859375
2024-08-03T20:05:25.705100117Z 
 46%|████▋     | 4412/9500 [15:07:55<17:41:02, 12.51s/it]08/03/2024 13:05:25 - INFO - __main__ -   Step: 4412, LR: 1.1042098373109677e-05, Loss: 638.0003662109375
2024-08-03T20:05:37.781103070Z 
 46%|████▋     | 4413/9500 [15:08:07<17:29:44, 12.38s/it]08/03/2024 13:05:37 - INFO - __main__ -   Step: 4413, LR: 1.1039927829422398e-05, Loss: 521.266357421875
2024-08-03T20:05:49.815419673Z 
 46%|████▋     | 4414/9500 [15:08:19<17:20:42, 12.28s/it]08/03/2024 13:05:49 - INFO - __main__ -   Step: 4414, LR: 1.103775728573512e-05, Loss: 504.2174072265625
2024-08-03T20:06:02.341683532Z 
 46%|████▋     | 4415/9500 [15:08:32<17:26:49, 12.35s/it]08/03/2024 13:06:02 - INFO - __main__ -   Step: 4415, LR: 1.1035586742047842e-05, Loss: 541.1397705078125
2024-08-03T20:06:14.536065979Z 
 46%|████▋     | 4416/9500 [15:08:44<17:22:36, 12.30s/it]08/03/2024 13:06:14 - INFO - __main__ -   Step: 4416, LR: 1.1033416198360563e-05, Loss: 613.4342041015625
2024-08-03T20:06:27.308010514Z 
 46%|████▋     | 4417/9500 [15:08:57<17:34:17, 12.44s/it]08/03/2024 13:06:27 - INFO - __main__ -   Step: 4417, LR: 1.1031245654673285e-05, Loss: 669.2415771484375
2024-08-03T20:06:39.885301667Z 
 47%|████▋     | 4418/9500 [15:09:09<17:37:26, 12.48s/it]08/03/2024 13:06:39 - INFO - __main__ -   Step: 4418, LR: 1.1029075110986004e-05, Loss: 603.380859375
2024-08-03T20:06:51.902075972Z 
 47%|████▋     | 4419/9500 [15:09:21<17:25:21, 12.34s/it]08/03/2024 13:06:51 - INFO - __main__ -   Step: 4419, LR: 1.1026904567298726e-05, Loss: 615.7437744140625
2024-08-03T20:07:04.252903247Z 
 47%|████▋     | 4420/9500 [15:09:34<17:25:18, 12.35s/it]08/03/2024 13:07:04 - INFO - __main__ -   Step: 4420, LR: 1.1024734023611448e-05, Loss: 598.890625
2024-08-03T20:07:16.941029442Z 
 47%|████▋     | 4421/9500 [15:09:46<17:33:47, 12.45s/it]08/03/2024 13:07:16 - INFO - __main__ -   Step: 4421, LR: 1.1022563479924166e-05, Loss: 576.66064453125
2024-08-03T20:07:29.451516549Z 
 47%|████▋     | 4422/9500 [15:09:59<17:35:09, 12.47s/it]08/03/2024 13:07:29 - INFO - __main__ -   Step: 4422, LR: 1.1020392936236887e-05, Loss: 664.55615234375
2024-08-03T20:07:41.691077244Z 
 47%|████▋     | 4423/9500 [15:10:11<17:29:09, 12.40s/it]08/03/2024 13:07:41 - INFO - __main__ -   Step: 4423, LR: 1.1018222392549609e-05, Loss: 728.604736328125
2024-08-03T20:07:54.053932587Z 
 47%|████▋     | 4424/9500 [15:10:23<17:28:02, 12.39s/it]08/03/2024 13:07:54 - INFO - __main__ -   Step: 4424, LR: 1.101605184886233e-05, Loss: 726.202392578125
2024-08-03T20:08:06.554002219Z 
 47%|████▋     | 4425/9500 [15:10:36<17:30:40, 12.42s/it]08/03/2024 13:08:06 - INFO - __main__ -   Step: 4425, LR: 1.1013881305175052e-05, Loss: 700.55810546875
2024-08-03T20:08:18.580913748Z 
 47%|████▋     | 4426/9500 [15:10:48<17:20:26, 12.30s/it]08/03/2024 13:08:18 - INFO - __main__ -   Step: 4426, LR: 1.1011710761487772e-05, Loss: 528.93017578125
2024-08-03T20:08:30.552007676Z 
 47%|████▋     | 4427/9500 [15:11:00<17:11:48, 12.20s/it]08/03/2024 13:08:30 - INFO - __main__ -   Step: 4427, LR: 1.1009540217800493e-05, Loss: 528.45654296875
2024-08-03T20:08:43.027674392Z 
 47%|████▋     | 4428/9500 [15:11:12<17:18:31, 12.29s/it]08/03/2024 13:08:43 - INFO - __main__ -   Step: 4428, LR: 1.1007369674113215e-05, Loss: 554.2574462890625
2024-08-03T20:08:55.158020224Z 
 47%|████▋     | 4429/9500 [15:11:25<17:14:22, 12.24s/it]08/03/2024 13:08:55 - INFO - __main__ -   Step: 4429, LR: 1.1005199130425937e-05, Loss: 691.641357421875
2024-08-03T20:09:07.120006784Z 
 47%|████▋     | 4430/9500 [15:11:37<17:07:09, 12.16s/it]08/03/2024 13:09:07 - INFO - __main__ -   Step: 4430, LR: 1.1003028586738658e-05, Loss: 549.8505859375
2024-08-03T20:09:19.726073362Z 
 47%|████▋     | 4431/9500 [15:11:49<17:18:21, 12.29s/it]08/03/2024 13:09:19 - INFO - __main__ -   Step: 4431, LR: 1.100085804305138e-05, Loss: 576.6448364257812
2024-08-03T20:09:31.910096653Z 
 47%|████▋     | 4432/9500 [15:12:01<17:15:27, 12.26s/it]08/03/2024 13:09:31 - INFO - __main__ -   Step: 4432, LR: 1.09986874993641e-05, Loss: 637.4461669921875
2024-08-03T20:09:44.234039444Z 
 47%|████▋     | 4433/9500 [15:12:14<17:16:53, 12.28s/it]08/03/2024 13:09:44 - INFO - __main__ -   Step: 4433, LR: 1.0996516955676821e-05, Loss: 709.0604248046875
2024-08-03T20:09:56.632588208Z 
 47%|████▋     | 4434/9500 [15:12:26<17:19:44, 12.31s/it]08/03/2024 13:09:56 - INFO - __main__ -   Step: 4434, LR: 1.0994346411989543e-05, Loss: 591.2814331054688
2024-08-03T20:10:08.778406235Z 
 47%|████▋     | 4435/9500 [15:12:38<17:15:16, 12.26s/it]08/03/2024 13:10:08 - INFO - __main__ -   Step: 4435, LR: 1.099217586830226e-05, Loss: 632.4616088867188
2024-08-03T20:10:21.454054432Z 
 47%|████▋     | 4436/9500 [15:12:51<17:25:29, 12.39s/it]08/03/2024 13:10:21 - INFO - __main__ -   Step: 4436, LR: 1.0990005324614982e-05, Loss: 628.9517822265625
2024-08-03T20:10:33.838587761Z 
 47%|████▋     | 4437/9500 [15:13:03<17:25:13, 12.39s/it]08/03/2024 13:10:33 - INFO - __main__ -   Step: 4437, LR: 1.0987834780927704e-05, Loss: 564.1882934570312
2024-08-03T20:10:46.048382028Z 
 47%|████▋     | 4438/9500 [15:13:15<17:20:32, 12.33s/it]08/03/2024 13:10:46 - INFO - __main__ -   Step: 4438, LR: 1.0985664237240426e-05, Loss: 587.1314697265625
2024-08-03T20:10:58.251296711Z 
 47%|████▋     | 4439/9500 [15:13:28<17:17:01, 12.29s/it]08/03/2024 13:10:58 - INFO - __main__ -   Step: 4439, LR: 1.0983493693553147e-05, Loss: 493.1768493652344
2024-08-03T20:11:10.609094429Z 
 47%|████▋     | 4440/9500 [15:13:40<17:18:25, 12.31s/it]08/03/2024 13:11:10 - INFO - __main__ -   Step: 4440, LR: 1.0981323149865869e-05, Loss: 619.909912109375
2024-08-03T20:11:22.709792534Z 
 47%|████▋     | 4441/9500 [15:13:52<17:12:50, 12.25s/it]08/03/2024 13:11:22 - INFO - __main__ -   Step: 4441, LR: 1.0979152606178589e-05, Loss: 597.5656127929688
2024-08-03T20:11:35.153185595Z 
 47%|████▋     | 4442/9500 [15:14:05<17:17:32, 12.31s/it]08/03/2024 13:11:35 - INFO - __main__ -   Step: 4442, LR: 1.097698206249131e-05, Loss: 559.4339599609375
2024-08-03T20:11:47.719945070Z 
 47%|████▋     | 4443/9500 [15:14:17<17:23:53, 12.39s/it]08/03/2024 13:11:47 - INFO - __main__ -   Step: 4443, LR: 1.0974811518804032e-05, Loss: 517.463623046875
2024-08-03T20:11:59.609268839Z 
 47%|████▋     | 4444/9500 [15:14:29<17:11:08, 12.24s/it]08/03/2024 13:11:59 - INFO - __main__ -   Step: 4444, LR: 1.0972640975116753e-05, Loss: 542.7332763671875
2024-08-03T20:12:11.660447349Z 
 47%|████▋     | 4445/9500 [15:14:41<17:06:15, 12.18s/it]08/03/2024 13:12:11 - INFO - __main__ -   Step: 4445, LR: 1.0970470431429475e-05, Loss: 497.28472900390625
2024-08-03T20:12:23.921841591Z 
 47%|████▋     | 4446/9500 [15:14:53<17:08:03, 12.20s/it]08/03/2024 13:12:23 - INFO - __main__ -   Step: 4446, LR: 1.0968299887742195e-05, Loss: 580.6751098632812
2024-08-03T20:12:35.952762322Z 
 47%|████▋     | 4447/9500 [15:15:05<17:03:28, 12.15s/it]08/03/2024 13:12:35 - INFO - __main__ -   Step: 4447, LR: 1.0966129344054916e-05, Loss: 513.5166015625
2024-08-03T20:12:48.381847608Z 
 47%|████▋     | 4448/9500 [15:15:18<17:10:15, 12.24s/it]08/03/2024 13:12:48 - INFO - __main__ -   Step: 4448, LR: 1.0963958800367638e-05, Loss: 648.4432373046875
2024-08-03T20:13:00.874055882Z 
 47%|████▋     | 4449/9500 [15:15:30<17:16:31, 12.31s/it]08/03/2024 13:13:00 - INFO - __main__ -   Step: 4449, LR: 1.0961788256680358e-05, Loss: 647.7613525390625
2024-08-03T20:13:13.150189867Z 
 47%|████▋     | 4450/9500 [15:15:43<17:15:23, 12.30s/it]08/03/2024 13:13:13 - INFO - __main__ -   Step: 4450, LR: 1.0959617712993077e-05, Loss: 692.46142578125
2024-08-03T20:13:25.143380253Z 
 47%|████▋     | 4451/9500 [15:15:55<17:07:23, 12.21s/it]08/03/2024 13:13:25 - INFO - __main__ -   Step: 4451, LR: 1.0957447169305799e-05, Loss: 549.017822265625
2024-08-03T20:13:37.760275935Z 
 47%|████▋     | 4452/9500 [15:16:07<17:17:29, 12.33s/it]08/03/2024 13:13:37 - INFO - __main__ -   Step: 4452, LR: 1.095527662561852e-05, Loss: 606.1917114257812
2024-08-03T20:13:49.940487725Z 
 47%|████▋     | 4453/9500 [15:16:19<17:13:28, 12.29s/it]08/03/2024 13:13:49 - INFO - __main__ -   Step: 4453, LR: 1.0953106081931242e-05, Loss: 632.1959228515625
2024-08-03T20:14:02.380670809Z 
 47%|████▋     | 4454/9500 [15:16:32<17:17:09, 12.33s/it]08/03/2024 13:14:02 - INFO - __main__ -   Step: 4454, LR: 1.0950935538243964e-05, Loss: 575.1757202148438
2024-08-03T20:14:14.794813662Z 
 47%|████▋     | 4455/9500 [15:16:44<17:19:00, 12.36s/it]08/03/2024 13:14:14 - INFO - __main__ -   Step: 4455, LR: 1.0948764994556684e-05, Loss: 557.4431762695312
2024-08-03T20:14:26.549451454Z 
 47%|████▋     | 4456/9500 [15:16:56<17:03:36, 12.18s/it]08/03/2024 13:14:26 - INFO - __main__ -   Step: 4456, LR: 1.0946594450869405e-05, Loss: 560.9139404296875
2024-08-03T20:14:38.627696954Z 
 47%|████▋     | 4457/9500 [15:17:08<17:00:56, 12.15s/it]08/03/2024 13:14:38 - INFO - __main__ -   Step: 4457, LR: 1.0944423907182127e-05, Loss: 496.88873291015625
2024-08-03T20:14:51.237839690Z 
 47%|████▋     | 4458/9500 [15:17:21<17:12:25, 12.29s/it]08/03/2024 13:14:51 - INFO - __main__ -   Step: 4458, LR: 1.0942253363494848e-05, Loss: 612.5994873046875
2024-08-03T20:15:03.552166150Z 
 47%|████▋     | 4459/9500 [15:17:33<17:12:55, 12.29s/it]08/03/2024 13:15:03 - INFO - __main__ -   Step: 4459, LR: 1.094008281980757e-05, Loss: 604.4090576171875
2024-08-03T20:15:15.601402513Z 
 47%|████▋     | 4460/9500 [15:17:45<17:06:32, 12.22s/it]08/03/2024 13:15:15 - INFO - __main__ -   Step: 4460, LR: 1.093791227612029e-05, Loss: 607.0059204101562
2024-08-03T20:15:28.351406318Z 
 47%|████▋     | 4461/9500 [15:17:58<17:19:40, 12.38s/it]08/03/2024 13:15:28 - INFO - __main__ -   Step: 4461, LR: 1.0935741732433011e-05, Loss: 660.8565673828125
2024-08-03T20:15:40.543365535Z 
 47%|████▋     | 4462/9500 [15:18:10<17:14:45, 12.32s/it]08/03/2024 13:15:40 - INFO - __main__ -   Step: 4462, LR: 1.0933571188745733e-05, Loss: 579.30224609375
2024-08-03T20:15:52.327441808Z 
 47%|████▋     | 4463/9500 [15:18:22<17:00:57, 12.16s/it]08/03/2024 13:15:52 - INFO - __main__ -   Step: 4463, LR: 1.0931400645058453e-05, Loss: 530.4229736328125
2024-08-03T20:16:04.830242146Z 
 47%|████▋     | 4464/9500 [15:18:34<17:09:21, 12.26s/it]08/03/2024 13:16:04 - INFO - __main__ -   Step: 4464, LR: 1.0929230101371173e-05, Loss: 661.0133056640625
2024-08-03T20:16:16.688301228Z 
 47%|████▋     | 4465/9500 [15:18:46<16:58:55, 12.14s/it]08/03/2024 13:16:16 - INFO - __main__ -   Step: 4465, LR: 1.0927059557683894e-05, Loss: 453.21807861328125
2024-08-03T20:16:28.552258383Z 
 47%|████▋     | 4466/9500 [15:18:58<16:51:43, 12.06s/it]08/03/2024 13:16:28 - INFO - __main__ -   Step: 4466, LR: 1.0924889013996616e-05, Loss: 475.916015625
2024-08-03T20:16:40.782525179Z 
 47%|████▋     | 4467/9500 [15:19:10<16:55:50, 12.11s/it]08/03/2024 13:16:40 - INFO - __main__ -   Step: 4467, LR: 1.0922718470309337e-05, Loss: 687.31982421875
2024-08-03T20:16:53.635558647Z 
 47%|████▋     | 4468/9500 [15:19:23<17:14:20, 12.33s/it]08/03/2024 13:16:53 - INFO - __main__ -   Step: 4468, LR: 1.0920547926622059e-05, Loss: 613.0390625
2024-08-03T20:17:06.124809414Z 
 47%|████▋     | 4469/9500 [15:19:36<17:18:03, 12.38s/it]08/03/2024 13:17:06 - INFO - __main__ -   Step: 4469, LR: 1.0918377382934779e-05, Loss: 597.4984130859375
2024-08-03T20:17:18.879756030Z 
 47%|████▋     | 4470/9500 [15:19:48<17:27:16, 12.49s/it]08/03/2024 13:17:18 - INFO - __main__ -   Step: 4470, LR: 1.09162068392475e-05, Loss: 898.31591796875
2024-08-03T20:17:31.277267445Z 
 47%|████▋     | 4471/9500 [15:20:01<17:24:40, 12.46s/it]08/03/2024 13:17:31 - INFO - __main__ -   Step: 4471, LR: 1.0914036295560222e-05, Loss: 489.60247802734375
2024-08-03T20:17:43.363931264Z 
 47%|████▋     | 4472/9500 [15:20:13<17:14:59, 12.35s/it]08/03/2024 13:17:43 - INFO - __main__ -   Step: 4472, LR: 1.0911865751872943e-05, Loss: 657.0699462890625
2024-08-03T20:17:55.700383636Z 
 47%|████▋     | 4473/9500 [15:20:25<17:14:25, 12.35s/it]08/03/2024 13:17:55 - INFO - __main__ -   Step: 4473, LR: 1.0909695208185665e-05, Loss: 575.1104125976562
2024-08-03T20:18:08.092550434Z 
 47%|████▋     | 4474/9500 [15:20:38<17:15:22, 12.36s/it]08/03/2024 13:18:08 - INFO - __main__ -   Step: 4474, LR: 1.0907524664498386e-05, Loss: 477.06396484375
2024-08-03T20:18:20.300524003Z 
 47%|████▋     | 4475/9500 [15:20:50<17:11:20, 12.31s/it]08/03/2024 13:18:20 - INFO - __main__ -   Step: 4475, LR: 1.0905354120811106e-05, Loss: 441.0130615234375
2024-08-03T20:18:32.734923503Z 
 47%|████▋     | 4476/9500 [15:21:02<17:14:07, 12.35s/it]08/03/2024 13:18:32 - INFO - __main__ -   Step: 4476, LR: 1.0903183577123826e-05, Loss: 729.7681884765625
2024-08-03T20:18:45.261983224Z 
 47%|████▋     | 4477/9500 [15:21:15<17:18:22, 12.40s/it]08/03/2024 13:18:45 - INFO - __main__ -   Step: 4477, LR: 1.0901013033436548e-05, Loss: 741.5704956054688
2024-08-03T20:18:57.321643044Z 
 47%|████▋     | 4478/9500 [15:21:27<17:09:31, 12.30s/it]08/03/2024 13:18:57 - INFO - __main__ -   Step: 4478, LR: 1.0898842489749268e-05, Loss: 557.642333984375
2024-08-03T20:19:09.135730419Z 
 47%|████▋     | 4479/9500 [15:21:39<16:57:07, 12.15s/it]08/03/2024 13:19:09 - INFO - __main__ -   Step: 4479, LR: 1.0896671946061989e-05, Loss: 452.5363464355469
2024-08-03T20:19:21.659546106Z 
 47%|████▋     | 4480/9500 [15:21:51<17:06:12, 12.27s/it]08/03/2024 13:19:21 - INFO - __main__ -   Step: 4480, LR: 1.089450140237471e-05, Loss: 670.7674560546875
2024-08-03T20:19:33.748155341Z 
 47%|████▋     | 4481/9500 [15:22:03<17:01:33, 12.21s/it]08/03/2024 13:19:33 - INFO - __main__ -   Step: 4481, LR: 1.0892330858687432e-05, Loss: 417.86376953125
2024-08-03T20:19:45.997414850Z 
 47%|████▋     | 4482/9500 [15:22:15<17:02:16, 12.22s/it]08/03/2024 13:19:45 - INFO - __main__ -   Step: 4482, LR: 1.0890160315000154e-05, Loss: 675.232666015625
2024-08-03T20:19:58.989805198Z 
 47%|████▋     | 4483/9500 [15:22:28<17:21:21, 12.45s/it]08/03/2024 13:19:58 - INFO - __main__ -   Step: 4483, LR: 1.0887989771312875e-05, Loss: 532.7984008789062
2024-08-03T20:20:11.118512767Z 
 47%|████▋     | 4484/9500 [15:22:41<17:13:00, 12.36s/it]08/03/2024 13:20:11 - INFO - __main__ -   Step: 4484, LR: 1.0885819227625595e-05, Loss: 480.6024169921875
2024-08-03T20:20:23.145627852Z 
 47%|████▋     | 4485/9500 [15:22:53<17:04:32, 12.26s/it]08/03/2024 13:20:23 - INFO - __main__ -   Step: 4485, LR: 1.0883648683938317e-05, Loss: 586.9351196289062
2024-08-03T20:20:36.008897425Z 
 47%|████▋     | 4486/9500 [15:23:05<17:19:30, 12.44s/it]08/03/2024 13:20:36 - INFO - __main__ -   Step: 4486, LR: 1.0881478140251038e-05, Loss: 641.065673828125
2024-08-03T20:20:48.110921121Z 
 47%|████▋     | 4487/9500 [15:23:18<17:10:51, 12.34s/it]08/03/2024 13:20:48 - INFO - __main__ -   Step: 4487, LR: 1.087930759656376e-05, Loss: 693.1233520507812
2024-08-03T20:21:00.090602706Z 
 47%|████▋     | 4488/9500 [15:23:30<17:01:39, 12.23s/it]08/03/2024 13:21:00 - INFO - __main__ -   Step: 4488, LR: 1.0877137052876481e-05, Loss: 622.1860961914062
2024-08-03T20:21:12.651305316Z 
 47%|████▋     | 4489/9500 [15:23:42<17:09:43, 12.33s/it]08/03/2024 13:21:12 - INFO - __main__ -   Step: 4489, LR: 1.0874966509189201e-05, Loss: 661.1302490234375
2024-08-03T20:21:24.551201434Z 
 47%|████▋     | 4490/9500 [15:23:54<16:58:45, 12.20s/it]08/03/2024 13:21:24 - INFO - __main__ -   Step: 4490, LR: 1.0872795965501921e-05, Loss: 539.6373291015625
2024-08-03T20:21:36.911805425Z 
 47%|████▋     | 4491/9500 [15:24:06<17:02:33, 12.25s/it]08/03/2024 13:21:36 - INFO - __main__ -   Step: 4491, LR: 1.0870625421814643e-05, Loss: 701.3259887695312
2024-08-03T20:21:49.713775643Z 
 47%|████▋     | 4492/9500 [15:24:19<17:16:13, 12.41s/it]08/03/2024 13:21:49 - INFO - __main__ -   Step: 4492, LR: 1.0868454878127364e-05, Loss: 609.960693359375
2024-08-03T20:22:01.484622408Z 
 47%|████▋     | 4493/9500 [15:24:31<16:59:52, 12.22s/it]08/03/2024 13:22:01 - INFO - __main__ -   Step: 4493, LR: 1.0866284334440084e-05, Loss: 445.768310546875
2024-08-03T20:22:13.439176054Z 
 47%|████▋     | 4494/9500 [15:24:43<16:53:00, 12.14s/it]08/03/2024 13:22:13 - INFO - __main__ -   Step: 4494, LR: 1.0864113790752806e-05, Loss: 486.7420959472656
2024-08-03T20:22:25.901975116Z 
 47%|████▋     | 4495/9500 [15:24:55<17:00:50, 12.24s/it]08/03/2024 13:22:25 - INFO - __main__ -   Step: 4495, LR: 1.0861943247065527e-05, Loss: 727.768310546875
2024-08-03T20:22:38.177247338Z 
 47%|████▋     | 4496/9500 [15:25:08<17:01:33, 12.25s/it]08/03/2024 13:22:38 - INFO - __main__ -   Step: 4496, LR: 1.0859772703378249e-05, Loss: 552.0556030273438
2024-08-03T20:22:50.826803818Z 
 47%|████▋     | 4497/9500 [15:25:20<17:11:23, 12.37s/it]08/03/2024 13:22:50 - INFO - __main__ -   Step: 4497, LR: 1.085760215969097e-05, Loss: 776.3190307617188
2024-08-03T20:23:03.739680668Z 
 47%|████▋     | 4498/9500 [15:25:33<17:24:46, 12.53s/it]08/03/2024 13:23:03 - INFO - __main__ -   Step: 4498, LR: 1.085543161600369e-05, Loss: 536.1473999023438
2024-08-03T20:23:15.893063802Z 
 47%|████▋     | 4499/9500 [15:25:45<17:15:05, 12.42s/it]08/03/2024 13:23:15 - INFO - __main__ -   Step: 4499, LR: 1.0853261072316412e-05, Loss: 497.8409423828125
2024-08-03T20:23:28.235853036Z 
 47%|████▋     | 4500/9500 [15:25:58<17:12:59, 12.40s/it]08/03/2024 13:23:28 - INFO - __main__ -   Step: 4500, LR: 1.0851090528629133e-05, Loss: 630.713623046875
2024-08-03T20:23:40.948079059Z 
 47%|████▋     | 4501/9500 [15:26:10<17:20:41, 12.49s/it]08/03/2024 13:23:40 - INFO - __main__ -   Step: 4501, LR: 1.0848919984941855e-05, Loss: 694.7938842773438
2024-08-03T20:23:53.026800783Z 
 47%|████▋     | 4502/9500 [15:26:22<17:10:11, 12.37s/it]08/03/2024 13:23:53 - INFO - __main__ -   Step: 4502, LR: 1.0846749441254577e-05, Loss: 520.3040161132812
2024-08-03T20:24:05.073469172Z 
 47%|████▋     | 4503/9500 [15:26:35<17:01:58, 12.27s/it]08/03/2024 13:24:05 - INFO - __main__ -   Step: 4503, LR: 1.0844578897567296e-05, Loss: 563.095947265625
2024-08-03T20:24:18.204421712Z 
 47%|████▋     | 4504/9500 [15:26:48<17:23:14, 12.53s/it]08/03/2024 13:24:18 - INFO - __main__ -   Step: 4504, LR: 1.0842408353880016e-05, Loss: 532.0099487304688
2024-08-03T20:24:30.658657297Z 
 47%|████▋     | 4505/9500 [15:27:00<17:21:09, 12.51s/it]08/03/2024 13:24:30 - INFO - __main__ -   Step: 4505, LR: 1.0840237810192738e-05, Loss: 685.53662109375
2024-08-03T20:24:42.747305574Z 
 47%|████▋     | 4506/9500 [15:27:12<17:10:31, 12.38s/it]08/03/2024 13:24:42 - INFO - __main__ -   Step: 4506, LR: 1.083806726650546e-05, Loss: 689.943603515625
2024-08-03T20:24:55.360416784Z 
 47%|████▋     | 4507/9500 [15:27:25<17:16:06, 12.45s/it]08/03/2024 13:24:55 - INFO - __main__ -   Step: 4507, LR: 1.083589672281818e-05, Loss: 587.0589599609375
2024-08-03T20:25:07.746038909Z 
 47%|████▋     | 4508/9500 [15:27:37<17:14:16, 12.43s/it]08/03/2024 13:25:07 - INFO - __main__ -   Step: 4508, LR: 1.08337261791309e-05, Loss: 608.666748046875
2024-08-03T20:25:19.732306333Z 
 47%|████▋     | 4509/9500 [15:27:49<17:02:57, 12.30s/it]08/03/2024 13:25:19 - INFO - __main__ -   Step: 4509, LR: 1.0831555635443622e-05, Loss: 517.8974609375
2024-08-03T20:25:31.901172314Z 
 47%|████▋     | 4510/9500 [15:28:01<16:59:33, 12.26s/it]08/03/2024 13:25:31 - INFO - __main__ -   Step: 4510, LR: 1.0829385091756344e-05, Loss: 536.5860595703125
2024-08-03T20:25:44.447052356Z 
 47%|████▋     | 4511/9500 [15:28:14<17:06:29, 12.35s/it]08/03/2024 13:25:44 - INFO - __main__ -   Step: 4511, LR: 1.0827214548069065e-05, Loss: 560.8897705078125
2024-08-03T20:25:56.700070813Z 
 47%|████▋     | 4512/9500 [15:28:26<17:03:59, 12.32s/it]08/03/2024 13:25:56 - INFO - __main__ -   Step: 4512, LR: 1.0825044004381785e-05, Loss: 609.8690185546875
2024-08-03T20:26:09.051698255Z 
 48%|████▊     | 4513/9500 [15:28:38<17:04:38, 12.33s/it]08/03/2024 13:26:09 - INFO - __main__ -   Step: 4513, LR: 1.0822873460694507e-05, Loss: 639.2056884765625
2024-08-03T20:26:22.030739280Z 
 48%|████▊     | 4514/9500 [15:28:51<17:20:40, 12.52s/it]08/03/2024 13:26:22 - INFO - __main__ -   Step: 4514, LR: 1.0820702917007228e-05, Loss: 787.3326416015625
2024-08-03T20:26:34.229763105Z 
 48%|████▊     | 4515/9500 [15:29:04<17:12:23, 12.43s/it]08/03/2024 13:26:34 - INFO - __main__ -   Step: 4515, LR: 1.081853237331995e-05, Loss: 725.4208984375
2024-08-03T20:26:46.483123888Z 
 48%|████▊     | 4516/9500 [15:29:16<17:07:52, 12.37s/it]08/03/2024 13:26:46 - INFO - __main__ -   Step: 4516, LR: 1.0816361829632672e-05, Loss: 615.662109375
2024-08-03T20:26:59.389246553Z 
 48%|████▊     | 4517/9500 [15:29:29<17:20:55, 12.53s/it]08/03/2024 13:26:59 - INFO - __main__ -   Step: 4517, LR: 1.0814191285945393e-05, Loss: 415.1011657714844
2024-08-03T20:27:11.441205531Z 
 48%|████▊     | 4518/9500 [15:29:41<17:08:43, 12.39s/it]08/03/2024 13:27:11 - INFO - __main__ -   Step: 4518, LR: 1.0812020742258111e-05, Loss: 423.28863525390625
2024-08-03T20:27:23.504793264Z 
 48%|████▊     | 4519/9500 [15:29:53<17:00:24, 12.29s/it]08/03/2024 13:27:23 - INFO - __main__ -   Step: 4519, LR: 1.0809850198570833e-05, Loss: 559.2234497070312
2024-08-03T20:27:36.013280130Z 
 48%|████▊     | 4520/9500 [15:30:05<17:05:35, 12.36s/it]08/03/2024 13:27:36 - INFO - __main__ -   Step: 4520, LR: 1.0807679654883554e-05, Loss: 421.567138671875
2024-08-03T20:27:48.205503791Z 
 48%|████▊     | 4521/9500 [15:30:18<17:01:18, 12.31s/it]08/03/2024 13:27:48 - INFO - __main__ -   Step: 4521, LR: 1.0805509111196274e-05, Loss: 376.84149169921875
2024-08-03T20:28:00.698652457Z 
 48%|████▊     | 4522/9500 [15:30:30<17:05:43, 12.36s/it]08/03/2024 13:28:00 - INFO - __main__ -   Step: 4522, LR: 1.0803338567508996e-05, Loss: 685.2570190429688
2024-08-03T20:28:13.269419770Z 
 48%|████▊     | 4523/9500 [15:30:43<17:10:41, 12.43s/it]08/03/2024 13:28:13 - INFO - __main__ -   Step: 4523, LR: 1.0801168023821717e-05, Loss: 526.6751708984375
2024-08-03T20:28:25.514019650Z 
 48%|████▊     | 4524/9500 [15:30:55<17:05:58, 12.37s/it]08/03/2024 13:28:25 - INFO - __main__ -   Step: 4524, LR: 1.0798997480134439e-05, Loss: 541.878662109375
2024-08-03T20:28:37.577590208Z 
 48%|████▊     | 4525/9500 [15:31:07<16:58:07, 12.28s/it]08/03/2024 13:28:37 - INFO - __main__ -   Step: 4525, LR: 1.079682693644716e-05, Loss: 580.6351318359375
2024-08-03T20:28:50.171544920Z 
 48%|████▊     | 4526/9500 [15:31:20<17:05:45, 12.37s/it]08/03/2024 13:28:50 - INFO - __main__ -   Step: 4526, LR: 1.0794656392759882e-05, Loss: 743.719482421875
2024-08-03T20:29:02.420425639Z 
 48%|████▊     | 4527/9500 [15:31:32<17:02:26, 12.34s/it]08/03/2024 13:29:02 - INFO - __main__ -   Step: 4527, LR: 1.0792485849072602e-05, Loss: 671.97802734375
2024-08-03T20:29:14.460927042Z 
 48%|████▊     | 4528/9500 [15:31:44<16:54:53, 12.25s/it]08/03/2024 13:29:14 - INFO - __main__ -   Step: 4528, LR: 1.0790315305385324e-05, Loss: 645.4596557617188
2024-08-03T20:29:26.982479632Z 
 48%|████▊     | 4529/9500 [15:31:56<17:01:30, 12.33s/it]08/03/2024 13:29:26 - INFO - __main__ -   Step: 4529, LR: 1.0788144761698045e-05, Loss: 587.8880615234375
2024-08-03T20:29:39.378221770Z 
 48%|████▊     | 4530/9500 [15:32:09<17:02:57, 12.35s/it]08/03/2024 13:29:39 - INFO - __main__ -   Step: 4530, LR: 1.0785974218010767e-05, Loss: 650.7095947265625
2024-08-03T20:29:51.472305256Z 
 48%|████▊     | 4531/9500 [15:32:21<16:56:23, 12.27s/it]08/03/2024 13:29:51 - INFO - __main__ -   Step: 4531, LR: 1.0783803674323488e-05, Loss: 483.9983215332031
2024-08-03T20:30:04.175805663Z 
 48%|████▊     | 4532/9500 [15:32:34<17:06:53, 12.40s/it]08/03/2024 13:30:04 - INFO - __main__ -   Step: 4532, LR: 1.0781633130636206e-05, Loss: 469.8567199707031
2024-08-03T20:30:16.866001182Z 
 48%|████▊     | 4533/9500 [15:32:46<17:13:50, 12.49s/it]08/03/2024 13:30:16 - INFO - __main__ -   Step: 4533, LR: 1.0779462586948928e-05, Loss: 557.00048828125
2024-08-03T20:30:28.992434051Z 
 48%|████▊     | 4534/9500 [15:32:58<17:04:38, 12.38s/it]08/03/2024 13:30:28 - INFO - __main__ -   Step: 4534, LR: 1.077729204326165e-05, Loss: 563.9765625
2024-08-03T20:30:41.745100201Z 
 48%|████▊     | 4535/9500 [15:33:11<17:13:41, 12.49s/it]08/03/2024 13:30:41 - INFO - __main__ -   Step: 4535, LR: 1.0775121499574371e-05, Loss: 650.8615112304688
2024-08-03T20:30:53.918246148Z 
 48%|████▊     | 4536/9500 [15:33:23<17:05:34, 12.40s/it]08/03/2024 13:30:53 - INFO - __main__ -   Step: 4536, LR: 1.0772950955887091e-05, Loss: 510.4083251953125
2024-08-03T20:31:06.244700394Z 
 48%|████▊     | 4537/9500 [15:33:36<17:03:38, 12.38s/it]08/03/2024 13:31:06 - INFO - __main__ -   Step: 4537, LR: 1.0770780412199812e-05, Loss: 561.3807373046875
2024-08-03T20:31:18.837493118Z 
 48%|████▊     | 4538/9500 [15:33:48<17:08:49, 12.44s/it]08/03/2024 13:31:18 - INFO - __main__ -   Step: 4538, LR: 1.0768609868512534e-05, Loss: 638.1209106445312
2024-08-03T20:31:31.225140099Z 
 48%|████▊     | 4539/9500 [15:34:01<17:07:17, 12.42s/it]08/03/2024 13:31:31 - INFO - __main__ -   Step: 4539, LR: 1.0766439324825256e-05, Loss: 639.8470458984375
2024-08-03T20:31:43.157601319Z 
 48%|████▊     | 4540/9500 [15:34:13<16:54:53, 12.28s/it]08/03/2024 13:31:43 - INFO - __main__ -   Step: 4540, LR: 1.0764268781137977e-05, Loss: 475.1461181640625
2024-08-03T20:31:55.670614965Z 
 48%|████▊     | 4541/9500 [15:34:25<17:00:32, 12.35s/it]08/03/2024 13:31:55 - INFO - __main__ -   Step: 4541, LR: 1.0762098237450697e-05, Loss: 582.802490234375
2024-08-03T20:32:08.118883553Z 
 48%|████▊     | 4542/9500 [15:34:38<17:02:49, 12.38s/it]08/03/2024 13:32:08 - INFO - __main__ -   Step: 4542, LR: 1.0759927693763419e-05, Loss: 672.6739501953125
2024-08-03T20:32:20.092741366Z 
 48%|████▊     | 4543/9500 [15:34:50<16:52:36, 12.26s/it]08/03/2024 13:32:20 - INFO - __main__ -   Step: 4543, LR: 1.075775715007614e-05, Loss: 586.7247314453125
2024-08-03T20:32:33.005334988Z 
 48%|████▊     | 4544/9500 [15:35:02<17:08:39, 12.45s/it]08/03/2024 13:32:33 - INFO - __main__ -   Step: 4544, LR: 1.0755586606388862e-05, Loss: 748.4597778320312
2024-08-03T20:32:44.953724034Z 
 48%|████▊     | 4545/9500 [15:35:14<16:55:56, 12.30s/it]08/03/2024 13:32:44 - INFO - __main__ -   Step: 4545, LR: 1.0753416062701583e-05, Loss: 602.5660400390625
2024-08-03T20:32:57.477023463Z 
 48%|████▊     | 4546/9500 [15:35:27<17:01:12, 12.37s/it]08/03/2024 13:32:57 - INFO - __main__ -   Step: 4546, LR: 1.0751245519014301e-05, Loss: 530.46826171875
2024-08-03T20:33:09.912338511Z 
 48%|████▊     | 4547/9500 [15:35:39<17:02:39, 12.39s/it]08/03/2024 13:33:09 - INFO - __main__ -   Step: 4547, LR: 1.0749074975327023e-05, Loss: 604.1103515625
2024-08-03T20:33:22.014193408Z 
 48%|████▊     | 4548/9500 [15:35:51<16:55:21, 12.30s/it]08/03/2024 13:33:22 - INFO - __main__ -   Step: 4548, LR: 1.0746904431639745e-05, Loss: 684.69775390625
2024-08-03T20:33:34.037463806Z 
 48%|████▊     | 4549/9500 [15:36:03<16:48:14, 12.22s/it]08/03/2024 13:33:34 - INFO - __main__ -   Step: 4549, LR: 1.0744733887952466e-05, Loss: 480.4202880859375
2024-08-03T20:33:46.745512089Z 
 48%|████▊     | 4550/9500 [15:36:16<17:00:08, 12.37s/it]08/03/2024 13:33:46 - INFO - __main__ -   Step: 4550, LR: 1.0742563344265186e-05, Loss: 677.3980102539062
2024-08-03T20:33:59.093355487Z 
 48%|████▊     | 4551/9500 [15:36:29<16:59:30, 12.36s/it]08/03/2024 13:33:59 - INFO - __main__ -   Step: 4551, LR: 1.0740392800577908e-05, Loss: 606.4386596679688
2024-08-03T20:34:11.058630262Z 
 48%|████▊     | 4552/9500 [15:36:40<16:49:32, 12.24s/it]08/03/2024 13:34:11 - INFO - __main__ -   Step: 4552, LR: 1.0738222256890629e-05, Loss: 579.5210571289062
2024-08-03T20:34:23.436459915Z 
 48%|████▊     | 4553/9500 [15:36:53<16:52:41, 12.28s/it]08/03/2024 13:34:23 - INFO - __main__ -   Step: 4553, LR: 1.073605171320335e-05, Loss: 603.7123413085938
2024-08-03T20:34:35.858926373Z 
 48%|████▊     | 4554/9500 [15:37:05<16:55:57, 12.32s/it]08/03/2024 13:34:35 - INFO - __main__ -   Step: 4554, LR: 1.0733881169516072e-05, Loss: 575.503173828125
2024-08-03T20:34:47.738556829Z 
 48%|████▊     | 4555/9500 [15:37:17<16:44:44, 12.19s/it]08/03/2024 13:34:47 - INFO - __main__ -   Step: 4555, LR: 1.0731710625828792e-05, Loss: 490.5921630859375
2024-08-03T20:34:59.903061657Z 
 48%|████▊     | 4556/9500 [15:37:29<16:43:52, 12.18s/it]08/03/2024 13:34:59 - INFO - __main__ -   Step: 4556, LR: 1.0729540082141514e-05, Loss: 518.3568725585938
2024-08-03T20:35:12.764467481Z 
 48%|████▊     | 4557/9500 [15:37:42<17:00:27, 12.39s/it]08/03/2024 13:35:12 - INFO - __main__ -   Step: 4557, LR: 1.0727369538454235e-05, Loss: 504.30035400390625
2024-08-03T20:35:24.872216394Z 
 48%|████▊     | 4558/9500 [15:37:54<16:53:20, 12.30s/it]08/03/2024 13:35:24 - INFO - __main__ -   Step: 4558, LR: 1.0725198994766957e-05, Loss: 646.83349609375
2024-08-03T20:35:37.437506803Z 
 48%|████▊     | 4559/9500 [15:38:07<16:59:37, 12.38s/it]08/03/2024 13:35:37 - INFO - __main__ -   Step: 4559, LR: 1.0723028451079678e-05, Loss: 594.5259399414062
2024-08-03T20:35:49.986297758Z 
 48%|████▊     | 4560/9500 [15:38:19<17:03:32, 12.43s/it]08/03/2024 13:35:49 - INFO - __main__ -   Step: 4560, LR: 1.0720857907392396e-05, Loss: 518.8118896484375
2024-08-03T20:36:02.217671267Z 
 48%|████▊     | 4561/9500 [15:38:32<16:58:24, 12.37s/it]08/03/2024 13:36:02 - INFO - __main__ -   Step: 4561, LR: 1.0718687363705118e-05, Loss: 629.99658203125
2024-08-03T20:36:14.410220113Z 
 48%|████▊     | 4562/9500 [15:38:44<16:53:45, 12.32s/it]08/03/2024 13:36:14 - INFO - __main__ -   Step: 4562, LR: 1.071651682001784e-05, Loss: 678.8238525390625
2024-08-03T20:36:26.787264562Z 
 48%|████▊     | 4563/9500 [15:38:56<16:55:01, 12.34s/it]08/03/2024 13:36:26 - INFO - __main__ -   Step: 4563, LR: 1.0714346276330561e-05, Loss: 488.4982604980469
2024-08-03T20:36:38.871497516Z 
 48%|████▊     | 4564/9500 [15:39:08<16:48:36, 12.26s/it]08/03/2024 13:36:38 - INFO - __main__ -   Step: 4564, LR: 1.0712175732643281e-05, Loss: 631.97705078125
2024-08-03T20:36:50.971053974Z 
 48%|████▊     | 4565/9500 [15:39:20<16:44:26, 12.21s/it]08/03/2024 13:36:50 - INFO - __main__ -   Step: 4565, LR: 1.0710005188956003e-05, Loss: 491.00042724609375
2024-08-03T20:37:03.688659432Z 
 48%|████▊     | 4566/9500 [15:39:33<16:56:42, 12.36s/it]08/03/2024 13:37:03 - INFO - __main__ -   Step: 4566, LR: 1.0707834645268724e-05, Loss: 598.9452514648438
2024-08-03T20:37:15.836747575Z 
 48%|████▊     | 4567/9500 [15:39:45<16:51:11, 12.30s/it]08/03/2024 13:37:15 - INFO - __main__ -   Step: 4567, LR: 1.0705664101581446e-05, Loss: 451.7103271484375
2024-08-03T20:37:27.869875714Z 
 48%|████▊     | 4568/9500 [15:39:57<16:44:25, 12.22s/it]08/03/2024 13:37:27 - INFO - __main__ -   Step: 4568, LR: 1.0703493557894167e-05, Loss: 729.940185546875
2024-08-03T20:37:40.297933255Z 
 48%|████▊     | 4569/9500 [15:40:10<16:49:22, 12.28s/it]08/03/2024 13:37:40 - INFO - __main__ -   Step: 4569, LR: 1.0701323014206889e-05, Loss: 585.4178466796875
2024-08-03T20:37:52.513764872Z 
 48%|████▊     | 4570/9500 [15:40:22<16:47:31, 12.26s/it]08/03/2024 13:37:52 - INFO - __main__ -   Step: 4570, LR: 1.0699152470519609e-05, Loss: 511.7063903808594
2024-08-03T20:38:04.577829274Z 
 48%|████▊     | 4571/9500 [15:40:34<16:42:25, 12.20s/it]08/03/2024 13:38:04 - INFO - __main__ -   Step: 4571, LR: 1.069698192683233e-05, Loss: 558.8558959960938
2024-08-03T20:38:16.956441496Z 
 48%|████▊     | 4572/9500 [15:40:46<16:46:35, 12.26s/it]08/03/2024 13:38:16 - INFO - __main__ -   Step: 4572, LR: 1.0694811383145052e-05, Loss: 596.8880615234375
2024-08-03T20:38:29.454962658Z 
 48%|████▊     | 4573/9500 [15:40:59<16:52:22, 12.33s/it]08/03/2024 13:38:29 - INFO - __main__ -   Step: 4573, LR: 1.0692640839457773e-05, Loss: 505.0638122558594
2024-08-03T20:38:41.639289506Z 
 48%|████▊     | 4574/9500 [15:41:11<16:48:36, 12.29s/it]08/03/2024 13:38:41 - INFO - __main__ -   Step: 4574, LR: 1.0690470295770492e-05, Loss: 519.529296875
2024-08-03T20:38:54.160286991Z 
 48%|████▊     | 4575/9500 [15:41:24<16:54:12, 12.36s/it]08/03/2024 13:38:54 - INFO - __main__ -   Step: 4575, LR: 1.0688299752083213e-05, Loss: 457.7127380371094
2024-08-03T20:39:06.791225575Z 
 48%|████▊     | 4576/9500 [15:41:36<17:00:46, 12.44s/it]08/03/2024 13:39:06 - INFO - __main__ -   Step: 4576, LR: 1.0686129208395935e-05, Loss: 612.765869140625
2024-08-03T20:39:19.000722292Z 
 48%|████▊     | 4577/9500 [15:41:48<16:54:56, 12.37s/it]08/03/2024 13:39:19 - INFO - __main__ -   Step: 4577, LR: 1.0683958664708656e-05, Loss: 643.5162963867188
2024-08-03T20:39:31.675753606Z 
 48%|████▊     | 4578/9500 [15:42:01<17:02:14, 12.46s/it]08/03/2024 13:39:31 - INFO - __main__ -   Step: 4578, LR: 1.0681788121021378e-05, Loss: 495.2686462402344
2024-08-03T20:39:43.570671271Z 
 48%|████▊     | 4579/9500 [15:42:13<16:48:06, 12.29s/it]08/03/2024 13:39:43 - INFO - __main__ -   Step: 4579, LR: 1.0679617577334098e-05, Loss: 565.77392578125
2024-08-03T20:39:55.821186490Z 
 48%|████▊     | 4580/9500 [15:42:25<16:46:53, 12.28s/it]08/03/2024 13:39:55 - INFO - __main__ -   Step: 4580, LR: 1.067744703364682e-05, Loss: 601.4342041015625
2024-08-03T20:40:08.430329697Z 
 48%|████▊     | 4581/9500 [15:42:38<16:54:48, 12.38s/it]08/03/2024 13:40:08 - INFO - __main__ -   Step: 4581, LR: 1.067527648995954e-05, Loss: 651.734375
2024-08-03T20:40:20.671423368Z 
 48%|████▊     | 4582/9500 [15:42:50<16:51:13, 12.34s/it]08/03/2024 13:40:20 - INFO - __main__ -   Step: 4582, LR: 1.0673105946272262e-05, Loss: 562.7088623046875
2024-08-03T20:40:32.630706597Z 
 48%|████▊     | 4583/9500 [15:43:02<16:41:44, 12.22s/it]08/03/2024 13:40:32 - INFO - __main__ -   Step: 4583, LR: 1.0670935402584984e-05, Loss: 598.4703369140625
2024-08-03T20:40:45.095123079Z 
 48%|████▊     | 4584/9500 [15:43:15<16:47:26, 12.30s/it]08/03/2024 13:40:45 - INFO - __main__ -   Step: 4584, LR: 1.0668764858897704e-05, Loss: 466.55743408203125
2024-08-03T20:40:57.060966190Z 
 48%|████▊     | 4585/9500 [15:43:26<16:39:08, 12.20s/it]08/03/2024 13:40:57 - INFO - __main__ -   Step: 4585, LR: 1.0666594315210425e-05, Loss: 514.6844482421875
2024-08-03T20:41:09.667356002Z 
 48%|████▊     | 4586/9500 [15:43:39<16:48:59, 12.32s/it]08/03/2024 13:41:09 - INFO - __main__ -   Step: 4586, LR: 1.0664423771523147e-05, Loss: 723.6138916015625
2024-08-03T20:41:22.121340163Z 
 48%|████▊     | 4587/9500 [15:43:52<16:52:04, 12.36s/it]08/03/2024 13:41:22 - INFO - __main__ -   Step: 4587, LR: 1.0662253227835868e-05, Loss: 535.487548828125
2024-08-03T20:41:34.442153416Z 
 48%|████▊     | 4588/9500 [15:44:04<16:50:54, 12.35s/it]08/03/2024 13:41:34 - INFO - __main__ -   Step: 4588, LR: 1.0660082684148587e-05, Loss: 701.1231079101562
2024-08-03T20:41:46.804147079Z 
 48%|████▊     | 4589/9500 [15:44:16<16:51:02, 12.35s/it]08/03/2024 13:41:46 - INFO - __main__ -   Step: 4589, LR: 1.0657912140461308e-05, Loss: 504.9549560546875
2024-08-03T20:41:59.692249361Z 
 48%|████▊     | 4590/9500 [15:44:29<17:03:58, 12.51s/it]08/03/2024 13:41:59 - INFO - __main__ -   Step: 4590, LR: 1.065574159677403e-05, Loss: 622.0328369140625
2024-08-03T20:42:12.177303846Z 
 48%|████▊     | 4591/9500 [15:44:42<17:03:05, 12.50s/it]08/03/2024 13:42:12 - INFO - __main__ -   Step: 4591, LR: 1.0653571053086751e-05, Loss: 807.4732055664062
2024-08-03T20:42:24.319307789Z 
 48%|████▊     | 4592/9500 [15:44:54<16:53:58, 12.40s/it]08/03/2024 13:42:24 - INFO - __main__ -   Step: 4592, LR: 1.0651400509399473e-05, Loss: 568.5772094726562
2024-08-03T20:42:36.854489010Z 
 48%|████▊     | 4593/9500 [15:45:06<16:57:12, 12.44s/it]08/03/2024 13:42:36 - INFO - __main__ -   Step: 4593, LR: 1.0649229965712193e-05, Loss: 595.4530029296875
2024-08-03T20:42:49.384758046Z 
 48%|████▊     | 4594/9500 [15:45:19<16:59:15, 12.47s/it]08/03/2024 13:42:49 - INFO - __main__ -   Step: 4594, LR: 1.0647059422024914e-05, Loss: 631.230712890625
2024-08-03T20:43:01.665510867Z 
 48%|████▊     | 4595/9500 [15:45:31<16:54:31, 12.41s/it]08/03/2024 13:43:01 - INFO - __main__ -   Step: 4595, LR: 1.0644888878337636e-05, Loss: 765.71875
2024-08-03T20:43:13.741885893Z 
 48%|████▊     | 4596/9500 [15:45:43<16:46:07, 12.31s/it]08/03/2024 13:43:13 - INFO - __main__ -   Step: 4596, LR: 1.0642718334650357e-05, Loss: 519.4130249023438
2024-08-03T20:43:26.179791283Z 
 48%|████▊     | 4597/9500 [15:45:56<16:49:03, 12.35s/it]08/03/2024 13:43:26 - INFO - __main__ -   Step: 4597, LR: 1.0640547790963079e-05, Loss: 601.8002319335938
2024-08-03T20:43:38.536557903Z 
 48%|████▊     | 4598/9500 [15:46:08<16:49:03, 12.35s/it]08/03/2024 13:43:38 - INFO - __main__ -   Step: 4598, LR: 1.0638377247275799e-05, Loss: 623.2559814453125
2024-08-03T20:43:50.950830830Z 
 48%|████▊     | 4599/9500 [15:46:20<16:50:24, 12.37s/it]08/03/2024 13:43:50 - INFO - __main__ -   Step: 4599, LR: 1.063620670358852e-05, Loss: 651.3262939453125
2024-08-03T20:44:03.650741226Z 
 48%|████▊     | 4600/9500 [15:46:33<16:58:17, 12.47s/it]08/03/2024 13:44:03 - INFO - __main__ -   Step: 4600, LR: 1.0634036159901242e-05, Loss: 667.78662109375
2024-08-03T20:44:15.850325555Z 
 48%|████▊     | 4601/9500 [15:46:45<16:51:29, 12.39s/it]08/03/2024 13:44:15 - INFO - __main__ -   Step: 4601, LR: 1.0631865616213963e-05, Loss: 481.8863830566406
2024-08-03T20:44:28.231700967Z 
 48%|████▊     | 4602/9500 [15:46:58<16:51:06, 12.39s/it]08/03/2024 13:44:28 - INFO - __main__ -   Step: 4602, LR: 1.0629695072526682e-05, Loss: 760.0479736328125
2024-08-03T20:44:40.873368209Z 
 48%|████▊     | 4603/9500 [15:47:10<16:57:10, 12.46s/it]08/03/2024 13:44:40 - INFO - __main__ -   Step: 4603, LR: 1.0627524528839403e-05, Loss: 855.9755859375
2024-08-03T20:44:52.957092495Z 
 48%|████▊     | 4604/9500 [15:47:22<16:47:41, 12.35s/it]08/03/2024 13:44:52 - INFO - __main__ -   Step: 4604, LR: 1.0625353985152125e-05, Loss: 584.30029296875
2024-08-03T20:45:05.040902209Z 
 48%|████▊     | 4605/9500 [15:47:34<16:40:59, 12.27s/it]08/03/2024 13:45:05 - INFO - __main__ -   Step: 4605, LR: 1.0623183441464846e-05, Loss: 540.1021118164062
2024-08-03T20:45:17.608140882Z 
 48%|████▊     | 4606/9500 [15:47:47<16:48:04, 12.36s/it]08/03/2024 13:45:17 - INFO - __main__ -   Step: 4606, LR: 1.0621012897777568e-05, Loss: 578.8704833984375
2024-08-03T20:45:29.586059298Z 
 48%|████▊     | 4607/9500 [15:47:59<16:38:32, 12.24s/it]08/03/2024 13:45:29 - INFO - __main__ -   Step: 4607, LR: 1.0618842354090288e-05, Loss: 471.6352844238281
2024-08-03T20:45:41.568326345Z 
 49%|████▊     | 4608/9500 [15:48:11<16:31:55, 12.17s/it]08/03/2024 13:45:41 - INFO - __main__ -   Step: 4608, LR: 1.061667181040301e-05, Loss: 555.5958251953125
2024-08-03T20:45:54.479376216Z 
 49%|████▊     | 4609/9500 [15:48:24<16:49:56, 12.39s/it]08/03/2024 13:45:54 - INFO - __main__ -   Step: 4609, LR: 1.0614501266715731e-05, Loss: 709.1776123046875
2024-08-03T20:46:06.620278613Z 
 49%|████▊     | 4610/9500 [15:48:36<16:43:39, 12.31s/it]08/03/2024 13:46:06 - INFO - __main__ -   Step: 4610, LR: 1.0612330723028452e-05, Loss: 419.8444519042969
2024-08-03T20:46:18.632050901Z 
 49%|████▊     | 4611/9500 [15:48:48<16:36:02, 12.22s/it]08/03/2024 13:46:18 - INFO - __main__ -   Step: 4611, LR: 1.0610160179341174e-05, Loss: 706.84423828125
2024-08-03T20:46:31.289478750Z 
 49%|████▊     | 4612/9500 [15:49:01<16:46:26, 12.35s/it]08/03/2024 13:46:31 - INFO - __main__ -   Step: 4612, LR: 1.0607989635653896e-05, Loss: 458.4019470214844
2024-08-03T20:46:43.304661409Z 
 49%|████▊     | 4613/9500 [15:49:13<16:37:57, 12.25s/it]08/03/2024 13:46:43 - INFO - __main__ -   Step: 4613, LR: 1.0605819091966615e-05, Loss: 608.8594970703125
2024-08-03T20:46:55.482229865Z 
 49%|████▊     | 4614/9500 [15:49:25<16:35:55, 12.23s/it]08/03/2024 13:46:55 - INFO - __main__ -   Step: 4614, LR: 1.0603648548279337e-05, Loss: 619.765625
2024-08-03T20:47:08.333252309Z 
 49%|████▊     | 4615/9500 [15:49:38<16:50:53, 12.42s/it]08/03/2024 13:47:08 - INFO - __main__ -   Step: 4615, LR: 1.0601478004592059e-05, Loss: 581.2091674804688
2024-08-03T20:47:20.382548152Z 
 49%|████▊     | 4616/9500 [15:49:50<16:41:43, 12.31s/it]08/03/2024 13:47:20 - INFO - __main__ -   Step: 4616, LR: 1.0599307460904777e-05, Loss: 498.25750732421875
2024-08-03T20:47:32.417768611Z 
 49%|████▊     | 4617/9500 [15:50:02<16:34:53, 12.22s/it]08/03/2024 13:47:32 - INFO - __main__ -   Step: 4617, LR: 1.0597136917217498e-05, Loss: 593.151611328125
2024-08-03T20:47:44.847242590Z 
 49%|████▊     | 4618/9500 [15:50:14<16:39:41, 12.29s/it]08/03/2024 13:47:44 - INFO - __main__ -   Step: 4618, LR: 1.059496637353022e-05, Loss: 502.46722412109375
2024-08-03T20:47:57.383057861Z 
 49%|████▊     | 4619/9500 [15:50:27<16:45:34, 12.36s/it]08/03/2024 13:47:57 - INFO - __main__ -   Step: 4619, LR: 1.0592795829842941e-05, Loss: 625.1489868164062
2024-08-03T20:48:09.679972494Z 
 49%|████▊     | 4620/9500 [15:50:39<16:43:47, 12.34s/it]08/03/2024 13:48:09 - INFO - __main__ -   Step: 4620, LR: 1.0590625286155663e-05, Loss: 523.0374755859375
2024-08-03T20:48:22.028435674Z 
 49%|████▊     | 4621/9500 [15:50:51<16:43:45, 12.34s/it]08/03/2024 13:48:22 - INFO - __main__ -   Step: 4621, LR: 1.0588454742468383e-05, Loss: 493.7652282714844
2024-08-03T20:48:34.059527749Z 
 49%|████▊     | 4622/9500 [15:51:03<16:35:55, 12.25s/it]08/03/2024 13:48:34 - INFO - __main__ -   Step: 4622, LR: 1.0586284198781104e-05, Loss: 622.359375
2024-08-03T20:48:46.411218499Z 
 49%|████▊     | 4623/9500 [15:51:16<16:38:12, 12.28s/it]08/03/2024 13:48:46 - INFO - __main__ -   Step: 4623, LR: 1.0584113655093826e-05, Loss: 689.1034545898438
2024-08-03T20:48:59.008510889Z 
 49%|████▊     | 4624/9500 [15:51:28<16:45:43, 12.38s/it]08/03/2024 13:48:59 - INFO - __main__ -   Step: 4624, LR: 1.0581943111406548e-05, Loss: 694.2056884765625
2024-08-03T20:49:11.392857390Z 
 49%|████▊     | 4625/9500 [15:51:41<16:45:43, 12.38s/it]08/03/2024 13:49:11 - INFO - __main__ -   Step: 4625, LR: 1.0579772567719269e-05, Loss: 565.281494140625
2024-08-03T20:49:23.620629806Z 
 49%|████▊     | 4626/9500 [15:51:53<16:41:51, 12.33s/it]08/03/2024 13:49:23 - INFO - __main__ -   Step: 4626, LR: 1.057760202403199e-05, Loss: 735.18603515625
2024-08-03T20:49:35.946915414Z 
 49%|████▊     | 4627/9500 [15:52:05<16:41:29, 12.33s/it]08/03/2024 13:49:35 - INFO - __main__ -   Step: 4627, LR: 1.057543148034471e-05, Loss: 539.0145874023438
2024-08-03T20:49:47.947379979Z 
 49%|████▊     | 4628/9500 [15:52:17<16:33:13, 12.23s/it]08/03/2024 13:49:47 - INFO - __main__ -   Step: 4628, LR: 1.0573260936657432e-05, Loss: 573.7216796875
2024-08-03T20:50:00.248800477Z 
 49%|████▊     | 4629/9500 [15:52:30<16:34:42, 12.25s/it]08/03/2024 13:50:00 - INFO - __main__ -   Step: 4629, LR: 1.0571090392970154e-05, Loss: 456.17987060546875
2024-08-03T20:50:12.718171516Z 
 49%|████▊     | 4630/9500 [15:52:42<16:39:46, 12.32s/it]08/03/2024 13:50:12 - INFO - __main__ -   Step: 4630, LR: 1.0568919849282872e-05, Loss: 489.31805419921875
2024-08-03T20:50:24.853879701Z 
 49%|████▊     | 4631/9500 [15:52:54<16:35:09, 12.26s/it]08/03/2024 13:50:24 - INFO - __main__ -   Step: 4631, LR: 1.0566749305595593e-05, Loss: 645.44287109375
2024-08-03T20:50:36.900249271Z 
 49%|████▉     | 4632/9500 [15:53:06<16:29:40, 12.20s/it]08/03/2024 13:50:36 - INFO - __main__ -   Step: 4632, LR: 1.0564578761908315e-05, Loss: 478.74652099609375
2024-08-03T20:50:49.361285197Z 
 49%|████▉     | 4633/9500 [15:53:19<16:35:51, 12.28s/it]08/03/2024 13:50:49 - INFO - __main__ -   Step: 4633, LR: 1.0562408218221036e-05, Loss: 674.1528930664062
2024-08-03T20:51:01.561140235Z 
 49%|████▉     | 4634/9500 [15:53:31<16:33:47, 12.25s/it]08/03/2024 13:51:01 - INFO - __main__ -   Step: 4634, LR: 1.0560237674533758e-05, Loss: 583.333740234375
2024-08-03T20:51:13.393142840Z 
 49%|████▉     | 4635/9500 [15:53:43<16:23:19, 12.13s/it]08/03/2024 13:51:13 - INFO - __main__ -   Step: 4635, LR: 1.055806713084648e-05, Loss: 427.86004638671875
2024-08-03T20:51:26.379742052Z 
 49%|████▉     | 4636/9500 [15:53:56<16:44:00, 12.39s/it]08/03/2024 13:51:26 - INFO - __main__ -   Step: 4636, LR: 1.05558965871592e-05, Loss: 562.1959228515625
2024-08-03T20:51:38.415065875Z 
 49%|████▉     | 4637/9500 [15:54:08<16:35:18, 12.28s/it]08/03/2024 13:51:38 - INFO - __main__ -   Step: 4637, LR: 1.0553726043471921e-05, Loss: 524.9227294921875
2024-08-03T20:51:50.457413255Z 
 49%|████▉     | 4638/9500 [15:54:20<16:29:18, 12.21s/it]08/03/2024 13:51:50 - INFO - __main__ -   Step: 4638, LR: 1.0551555499784643e-05, Loss: 510.52081298828125
2024-08-03T20:52:02.569596667Z 
 49%|████▉     | 4639/9500 [15:54:32<16:26:46, 12.18s/it]08/03/2024 13:52:02 - INFO - __main__ -   Step: 4639, LR: 1.0549384956097364e-05, Loss: 665.3724365234375
2024-08-03T20:52:15.225033926Z 
 49%|████▉     | 4640/9500 [15:54:45<16:38:07, 12.32s/it]08/03/2024 13:52:15 - INFO - __main__ -   Step: 4640, LR: 1.0547214412410086e-05, Loss: 622.4390869140625
2024-08-03T20:52:27.178505207Z 
 49%|████▉     | 4641/9500 [15:54:57<16:28:57, 12.21s/it]08/03/2024 13:52:27 - INFO - __main__ -   Step: 4641, LR: 1.0545043868722806e-05, Loss: 619.4137573242188
2024-08-03T20:52:39.181404187Z 
 49%|████▉     | 4642/9500 [15:55:09<16:23:40, 12.15s/it]08/03/2024 13:52:39 - INFO - __main__ -   Step: 4642, LR: 1.0542873325035527e-05, Loss: 545.479736328125
2024-08-03T20:52:51.799046462Z 
 49%|████▉     | 4643/9500 [15:55:21<16:34:51, 12.29s/it]08/03/2024 13:52:51 - INFO - __main__ -   Step: 4643, LR: 1.0540702781348247e-05, Loss: 521.6551513671875
2024-08-03T20:53:03.951337601Z 
 49%|████▉     | 4644/9500 [15:55:33<16:31:18, 12.25s/it]08/03/2024 13:53:03 - INFO - __main__ -   Step: 4644, LR: 1.0538532237660969e-05, Loss: 456.779541015625
2024-08-03T20:53:16.230571290Z 
 49%|████▉     | 4645/9500 [15:55:46<16:31:51, 12.26s/it]08/03/2024 13:53:16 - INFO - __main__ -   Step: 4645, LR: 1.0536361693973688e-05, Loss: 577.626708984375
2024-08-03T20:53:28.785431906Z 
 49%|████▉     | 4646/9500 [15:55:58<16:38:51, 12.35s/it]08/03/2024 13:53:28 - INFO - __main__ -   Step: 4646, LR: 1.053419115028641e-05, Loss: 551.020751953125
2024-08-03T20:53:40.844653085Z 
 49%|████▉     | 4647/9500 [15:56:10<16:31:40, 12.26s/it]08/03/2024 13:53:40 - INFO - __main__ -   Step: 4647, LR: 1.0532020606599132e-05, Loss: 585.4099731445312
2024-08-03T20:53:53.056722327Z 
 49%|████▉     | 4648/9500 [15:56:22<16:30:16, 12.25s/it]08/03/2024 13:53:53 - INFO - __main__ -   Step: 4648, LR: 1.0529850062911853e-05, Loss: 524.640380859375
2024-08-03T20:54:05.477180428Z 
 49%|████▉     | 4649/9500 [15:56:35<16:34:19, 12.30s/it]08/03/2024 13:54:05 - INFO - __main__ -   Step: 4649, LR: 1.0527679519224575e-05, Loss: 436.4625549316406
2024-08-03T20:54:17.377764465Z 
 49%|████▉     | 4650/9500 [15:56:47<16:24:28, 12.18s/it]08/03/2024 13:54:17 - INFO - __main__ -   Step: 4650, LR: 1.0525508975537295e-05, Loss: 556.199462890625
2024-08-03T20:54:29.165364075Z 
 49%|████▉     | 4651/9500 [15:56:59<16:14:46, 12.06s/it]08/03/2024 13:54:29 - INFO - __main__ -   Step: 4651, LR: 1.0523338431850016e-05, Loss: 449.23583984375
2024-08-03T20:54:41.547548216Z 
 49%|████▉     | 4652/9500 [15:57:11<16:22:20, 12.16s/it]08/03/2024 13:54:41 - INFO - __main__ -   Step: 4652, LR: 1.0521167888162738e-05, Loss: 468.06884765625
2024-08-03T20:54:53.744602328Z 
 49%|████▉     | 4653/9500 [15:57:23<16:23:06, 12.17s/it]08/03/2024 13:54:53 - INFO - __main__ -   Step: 4653, LR: 1.051899734447546e-05, Loss: 610.6021728515625
2024-08-03T20:55:05.631568405Z 
 49%|████▉     | 4654/9500 [15:57:35<16:16:02, 12.08s/it]08/03/2024 13:55:05 - INFO - __main__ -   Step: 4654, LR: 1.051682680078818e-05, Loss: 475.65301513671875
2024-08-03T20:55:18.271252609Z 
 49%|████▉     | 4655/9500 [15:57:48<16:29:17, 12.25s/it]08/03/2024 13:55:18 - INFO - __main__ -   Step: 4655, LR: 1.05146562571009e-05, Loss: 681.9085693359375
2024-08-03T20:55:30.402990110Z 
 49%|████▉     | 4656/9500 [15:58:00<16:26:11, 12.22s/it]08/03/2024 13:55:30 - INFO - __main__ -   Step: 4656, LR: 1.0512485713413622e-05, Loss: 557.1528930664062
2024-08-03T20:55:42.667998441Z 
 49%|████▉     | 4657/9500 [15:58:12<16:27:10, 12.23s/it]08/03/2024 13:55:42 - INFO - __main__ -   Step: 4657, LR: 1.0510315169726342e-05, Loss: 523.2127075195312
2024-08-03T20:55:55.666054554Z 
 49%|████▉     | 4658/9500 [15:58:25<16:45:34, 12.46s/it]08/03/2024 13:55:55 - INFO - __main__ -   Step: 4658, LR: 1.0508144626039064e-05, Loss: 669.4652099609375
2024-08-03T20:56:07.585441236Z 
 49%|████▉     | 4659/9500 [15:58:37<16:32:15, 12.30s/it]08/03/2024 13:56:07 - INFO - __main__ -   Step: 4659, LR: 1.0505974082351783e-05, Loss: 489.66705322265625
2024-08-03T20:56:19.753014913Z 
 49%|████▉     | 4660/9500 [15:58:49<16:28:51, 12.26s/it]08/03/2024 13:56:19 - INFO - __main__ -   Step: 4660, LR: 1.0503803538664505e-05, Loss: 657.4849853515625
2024-08-03T20:56:32.195880520Z 
 49%|████▉     | 4661/9500 [15:59:02<16:33:09, 12.31s/it]08/03/2024 13:56:32 - INFO - __main__ -   Step: 4661, LR: 1.0501632994977227e-05, Loss: 547.8482666015625
2024-08-03T20:56:44.369343337Z 
 49%|████▉     | 4662/9500 [15:59:14<16:29:32, 12.27s/it]08/03/2024 13:56:44 - INFO - __main__ -   Step: 4662, LR: 1.0499462451289948e-05, Loss: 630.9190063476562
2024-08-03T20:56:56.330793341Z 
 49%|████▉     | 4663/9500 [15:59:26<16:21:49, 12.18s/it]08/03/2024 13:56:56 - INFO - __main__ -   Step: 4663, LR: 1.049729190760267e-05, Loss: 653.7965087890625
2024-08-03T20:57:09.207268355Z 
 49%|████▉     | 4664/9500 [15:59:39<16:38:28, 12.39s/it]08/03/2024 13:57:09 - INFO - __main__ -   Step: 4664, LR: 1.049512136391539e-05, Loss: 542.2852783203125
2024-08-03T20:57:21.264245149Z 
 49%|████▉     | 4665/9500 [15:59:51<16:30:16, 12.29s/it]08/03/2024 13:57:21 - INFO - __main__ -   Step: 4665, LR: 1.0492950820228111e-05, Loss: 519.933837890625
2024-08-03T20:57:33.445527213Z 
 49%|████▉     | 4666/9500 [16:00:03<16:27:28, 12.26s/it]08/03/2024 13:57:33 - INFO - __main__ -   Step: 4666, LR: 1.0490780276540833e-05, Loss: 525.15380859375
2024-08-03T20:57:45.994042652Z 
 49%|████▉     | 4667/9500 [16:00:15<16:34:19, 12.34s/it]08/03/2024 13:57:45 - INFO - __main__ -   Step: 4667, LR: 1.0488609732853554e-05, Loss: 716.125244140625
2024-08-03T20:57:57.736888831Z 
 49%|████▉     | 4668/9500 [16:00:27<16:19:34, 12.16s/it]08/03/2024 13:57:57 - INFO - __main__ -   Step: 4668, LR: 1.0486439189166276e-05, Loss: 514.6763916015625
2024-08-03T20:58:09.966356440Z 
 49%|████▉     | 4669/9500 [16:00:39<16:20:58, 12.18s/it]08/03/2024 13:58:09 - INFO - __main__ -   Step: 4669, LR: 1.0484268645478997e-05, Loss: 637.4482421875
2024-08-03T20:58:22.752593859Z 
 49%|████▉     | 4670/9500 [16:00:52<16:35:18, 12.36s/it]08/03/2024 13:58:22 - INFO - __main__ -   Step: 4670, LR: 1.0482098101791717e-05, Loss: 491.07080078125
2024-08-03T20:58:34.635058717Z 
 49%|████▉     | 4671/9500 [16:01:04<16:23:29, 12.22s/it]08/03/2024 13:58:34 - INFO - __main__ -   Step: 4671, LR: 1.0479927558104437e-05, Loss: 505.40673828125
2024-08-03T20:58:46.805716802Z 
 49%|████▉     | 4672/9500 [16:01:16<16:22:05, 12.20s/it]08/03/2024 13:58:46 - INFO - __main__ -   Step: 4672, LR: 1.0477757014417159e-05, Loss: 530.2366333007812
2024-08-03T20:58:59.440627124Z 
 49%|████▉     | 4673/9500 [16:01:29<16:32:16, 12.33s/it]08/03/2024 13:58:59 - INFO - __main__ -   Step: 4673, LR: 1.0475586470729879e-05, Loss: 630.3950805664062
2024-08-03T20:59:11.634855254Z 
 49%|████▉     | 4674/9500 [16:01:41<16:28:41, 12.29s/it]08/03/2024 13:59:11 - INFO - __main__ -   Step: 4674, LR: 1.04734159270426e-05, Loss: 622.060791015625
2024-08-03T20:59:23.638366043Z 
 49%|████▉     | 4675/9500 [16:01:53<16:21:31, 12.21s/it]08/03/2024 13:59:23 - INFO - __main__ -   Step: 4675, LR: 1.0471245383355322e-05, Loss: 633.8867797851562
2024-08-03T20:59:36.532003135Z 
 49%|████▉     | 4676/9500 [16:02:06<16:37:55, 12.41s/it]08/03/2024 13:59:36 - INFO - __main__ -   Step: 4676, LR: 1.0469074839668043e-05, Loss: 648.9559326171875
2024-08-03T20:59:48.663422484Z 
 49%|████▉     | 4677/9500 [16:02:18<16:30:56, 12.33s/it]08/03/2024 13:59:48 - INFO - __main__ -   Step: 4677, LR: 1.0466904295980765e-05, Loss: 568.5396728515625
2024-08-03T21:00:00.790463082Z 
 49%|████▉     | 4678/9500 [16:02:30<16:25:54, 12.27s/it]08/03/2024 14:00:00 - INFO - __main__ -   Step: 4678, LR: 1.0464733752293486e-05, Loss: 459.92034912109375
2024-08-03T21:00:13.097705624Z 
 49%|████▉     | 4679/9500 [16:02:43<16:26:39, 12.28s/it]08/03/2024 14:00:13 - INFO - __main__ -   Step: 4679, LR: 1.0462563208606206e-05, Loss: 618.6880493164062
2024-08-03T21:00:25.632706618Z 
 49%|████▉     | 4680/9500 [16:02:55<16:32:36, 12.36s/it]08/03/2024 14:00:25 - INFO - __main__ -   Step: 4680, LR: 1.0460392664918928e-05, Loss: 584.0574951171875
2024-08-03T21:00:38.023430136Z 
 49%|████▉     | 4681/9500 [16:03:07<16:33:13, 12.37s/it]08/03/2024 14:00:38 - INFO - __main__ -   Step: 4681, LR: 1.045822212123165e-05, Loss: 671.3689575195312
2024-08-03T21:00:50.230104948Z 
 49%|████▉     | 4682/9500 [16:03:20<16:29:10, 12.32s/it]08/03/2024 14:00:50 - INFO - __main__ -   Step: 4682, LR: 1.0456051577544371e-05, Loss: 602.83935546875
2024-08-03T21:01:03.305847149Z 
 49%|████▉     | 4683/9500 [16:03:33<16:47:12, 12.55s/it]08/03/2024 14:01:03 - INFO - __main__ -   Step: 4683, LR: 1.0453881033857092e-05, Loss: 645.038330078125
2024-08-03T21:01:15.691270256Z 
 49%|████▉     | 4684/9500 [16:03:45<16:43:08, 12.50s/it]08/03/2024 14:01:15 - INFO - __main__ -   Step: 4684, LR: 1.0451710490169812e-05, Loss: 659.9864501953125
2024-08-03T21:01:27.811276230Z 
 49%|████▉     | 4685/9500 [16:03:57<16:33:50, 12.38s/it]08/03/2024 14:01:27 - INFO - __main__ -   Step: 4685, LR: 1.0449539946482532e-05, Loss: 615.8656005859375
2024-08-03T21:01:40.395135728Z 
 49%|████▉     | 4686/9500 [16:04:10<16:38:26, 12.44s/it]08/03/2024 14:01:40 - INFO - __main__ -   Step: 4686, LR: 1.0447369402795254e-05, Loss: 458.85894775390625
2024-08-03T21:01:52.374353007Z 
 49%|████▉     | 4687/9500 [16:04:22<16:27:02, 12.30s/it]08/03/2024 14:01:52 - INFO - __main__ -   Step: 4687, LR: 1.0445198859107975e-05, Loss: 358.7278747558594
2024-08-03T21:02:04.604786603Z 
 49%|████▉     | 4688/9500 [16:04:34<16:25:02, 12.28s/it]08/03/2024 14:02:04 - INFO - __main__ -   Step: 4688, LR: 1.0443028315420695e-05, Loss: 655.0518798828125
2024-08-03T21:02:17.131744063Z 
 49%|████▉     | 4689/9500 [16:04:47<16:30:43, 12.36s/it]08/03/2024 14:02:17 - INFO - __main__ -   Step: 4689, LR: 1.0440857771733417e-05, Loss: 515.0364990234375
2024-08-03T21:02:29.520677659Z 
 49%|████▉     | 4690/9500 [16:04:59<16:31:19, 12.37s/it]08/03/2024 14:02:29 - INFO - __main__ -   Step: 4690, LR: 1.0438687228046138e-05, Loss: 727.3802490234375
2024-08-03T21:02:41.850554006Z 
 49%|████▉     | 4691/9500 [16:05:11<16:30:15, 12.36s/it]08/03/2024 14:02:41 - INFO - __main__ -   Step: 4691, LR: 1.043651668435886e-05, Loss: 592.2822265625
2024-08-03T21:02:54.716695750Z 
 49%|████▉     | 4692/9500 [16:05:24<16:42:20, 12.51s/it]08/03/2024 14:02:54 - INFO - __main__ -   Step: 4692, LR: 1.0434346140671581e-05, Loss: 637.916748046875
2024-08-03T21:03:06.304304455Z 
 49%|████▉     | 4693/9500 [16:05:36<16:19:59, 12.23s/it]08/03/2024 14:03:06 - INFO - __main__ -   Step: 4693, LR: 1.0432175596984301e-05, Loss: 470.5774841308594
2024-08-03T21:03:18.769905709Z 
 49%|████▉     | 4694/9500 [16:05:48<16:25:24, 12.30s/it]08/03/2024 14:03:18 - INFO - __main__ -   Step: 4694, LR: 1.0430005053297023e-05, Loss: 487.19342041015625
2024-08-03T21:03:31.388457823Z 
 49%|████▉     | 4695/9500 [16:06:01<16:32:47, 12.40s/it]08/03/2024 14:03:31 - INFO - __main__ -   Step: 4695, LR: 1.0427834509609744e-05, Loss: 523.6962890625
2024-08-03T21:03:43.300575907Z 
 49%|████▉     | 4696/9500 [16:06:13<16:20:56, 12.25s/it]08/03/2024 14:03:43 - INFO - __main__ -   Step: 4696, LR: 1.0425663965922466e-05, Loss: 586.329345703125
2024-08-03T21:03:55.419291970Z 
 49%|████▉     | 4697/9500 [16:06:25<16:17:32, 12.21s/it]08/03/2024 14:03:55 - INFO - __main__ -   Step: 4697, LR: 1.0423493422235187e-05, Loss: 740.818115234375
2024-08-03T21:04:07.849546667Z 
 49%|████▉     | 4698/9500 [16:06:37<16:22:35, 12.28s/it]08/03/2024 14:04:07 - INFO - __main__ -   Step: 4698, LR: 1.0421322878547907e-05, Loss: 622.9464721679688
2024-08-03T21:04:20.031822416Z 
 49%|████▉     | 4699/9500 [16:06:49<16:20:06, 12.25s/it]08/03/2024 14:04:20 - INFO - __main__ -   Step: 4699, LR: 1.0419152334860627e-05, Loss: 576.1353759765625
2024-08-03T21:04:31.904790724Z 
 49%|████▉     | 4700/9500 [16:07:01<16:10:53, 12.14s/it]08/03/2024 14:04:31 - INFO - __main__ -   Step: 4700, LR: 1.0416981791173349e-05, Loss: 604.113037109375
2024-08-03T21:04:44.435151144Z 
 49%|████▉     | 4701/9500 [16:07:14<16:20:08, 12.25s/it]08/03/2024 14:04:44 - INFO - __main__ -   Step: 4701, LR: 1.041481124748607e-05, Loss: 703.5751953125
2024-08-03T21:04:56.409171394Z 
 49%|████▉     | 4702/9500 [16:07:26<16:13:12, 12.17s/it]08/03/2024 14:04:56 - INFO - __main__ -   Step: 4702, LR: 1.041264070379879e-05, Loss: 468.9067077636719
2024-08-03T21:05:08.641812013Z 
 50%|████▉     | 4703/9500 [16:07:38<16:14:29, 12.19s/it]08/03/2024 14:05:08 - INFO - __main__ -   Step: 4703, LR: 1.0410470160111512e-05, Loss: 564.6847534179688
2024-08-03T21:05:21.397231728Z 
 50%|████▉     | 4704/9500 [16:07:51<16:27:51, 12.36s/it]08/03/2024 14:05:21 - INFO - __main__ -   Step: 4704, LR: 1.0408299616424233e-05, Loss: 565.3406982421875
2024-08-03T21:05:33.714682077Z 
 50%|████▉     | 4705/9500 [16:08:03<16:26:41, 12.35s/it]08/03/2024 14:05:33 - INFO - __main__ -   Step: 4705, LR: 1.0406129072736955e-05, Loss: 529.0740966796875
2024-08-03T21:05:45.787041280Z 
 50%|████▉     | 4706/9500 [16:08:15<16:19:55, 12.26s/it]08/03/2024 14:05:45 - INFO - __main__ -   Step: 4706, LR: 1.0403958529049676e-05, Loss: 519.0106201171875
2024-08-03T21:05:58.470769824Z 
 50%|████▉     | 4707/9500 [16:08:28<16:29:46, 12.39s/it]08/03/2024 14:05:58 - INFO - __main__ -   Step: 4707, LR: 1.0401787985362396e-05, Loss: 562.046875
2024-08-03T21:06:10.610966567Z 
 50%|████▉     | 4708/9500 [16:08:40<16:23:34, 12.32s/it]08/03/2024 14:06:10 - INFO - __main__ -   Step: 4708, LR: 1.0399617441675118e-05, Loss: 551.9700927734375
2024-08-03T21:06:23.133726005Z 
 50%|████▉     | 4709/9500 [16:08:53<16:28:20, 12.38s/it]08/03/2024 14:06:23 - INFO - __main__ -   Step: 4709, LR: 1.039744689798784e-05, Loss: 605.773681640625
2024-08-03T21:06:35.767080876Z 
 50%|████▉     | 4710/9500 [16:09:05<16:34:15, 12.45s/it]08/03/2024 14:06:35 - INFO - __main__ -   Step: 4710, LR: 1.0395276354300561e-05, Loss: 605.90869140625
2024-08-03T21:06:47.797259604Z 
 50%|████▉     | 4711/9500 [16:09:17<16:23:53, 12.33s/it]08/03/2024 14:06:47 - INFO - __main__ -   Step: 4711, LR: 1.0393105810613283e-05, Loss: 429.4432067871094
2024-08-03T21:06:59.797864983Z 
 50%|████▉     | 4712/9500 [16:09:29<16:15:52, 12.23s/it]08/03/2024 14:06:59 - INFO - __main__ -   Step: 4712, LR: 1.0390935266926004e-05, Loss: 525.2481689453125
2024-08-03T21:07:12.092716322Z 
 50%|████▉     | 4713/9500 [16:09:42<16:17:15, 12.25s/it]08/03/2024 14:07:12 - INFO - __main__ -   Step: 4713, LR: 1.0388764723238722e-05, Loss: 459.1097412109375
2024-08-03T21:07:24.475498324Z 
 50%|████▉     | 4714/9500 [16:09:54<16:20:15, 12.29s/it]08/03/2024 14:07:24 - INFO - __main__ -   Step: 4714, LR: 1.0386594179551444e-05, Loss: 500.697021484375
2024-08-03T21:07:36.510111371Z 
 50%|████▉     | 4715/9500 [16:10:06<16:13:57, 12.21s/it]08/03/2024 14:07:36 - INFO - __main__ -   Step: 4715, LR: 1.0384423635864165e-05, Loss: 503.66607666015625
2024-08-03T21:07:49.049423736Z 
 50%|████▉     | 4716/9500 [16:10:18<16:21:34, 12.31s/it]08/03/2024 14:07:49 - INFO - __main__ -   Step: 4716, LR: 1.0382253092176885e-05, Loss: 535.1116943359375
2024-08-03T21:08:01.372311502Z 
 50%|████▉     | 4717/9500 [16:10:31<16:21:39, 12.31s/it]08/03/2024 14:08:01 - INFO - __main__ -   Step: 4717, LR: 1.0380082548489607e-05, Loss: 643.77734375
2024-08-03T21:08:13.300922284Z 
 50%|████▉     | 4718/9500 [16:10:43<16:12:13, 12.20s/it]08/03/2024 14:08:13 - INFO - __main__ -   Step: 4718, LR: 1.0377912004802328e-05, Loss: 424.0888671875
2024-08-03T21:08:25.802570695Z 
 50%|████▉     | 4719/9500 [16:10:55<16:19:16, 12.29s/it]08/03/2024 14:08:25 - INFO - __main__ -   Step: 4719, LR: 1.037574146111505e-05, Loss: 780.2987060546875
2024-08-03T21:08:37.876274156Z 
 50%|████▉     | 4720/9500 [16:11:07<16:13:54, 12.22s/it]08/03/2024 14:08:37 - INFO - __main__ -   Step: 4720, LR: 1.0373570917427771e-05, Loss: 470.19989013671875
2024-08-03T21:08:49.898908505Z 
 50%|████▉     | 4721/9500 [16:11:19<16:08:52, 12.16s/it]08/03/2024 14:08:49 - INFO - __main__ -   Step: 4721, LR: 1.0371400373740493e-05, Loss: 511.0289611816406
2024-08-03T21:09:02.147994473Z 
 50%|████▉     | 4722/9500 [16:11:32<16:10:42, 12.19s/it]08/03/2024 14:09:02 - INFO - __main__ -   Step: 4722, LR: 1.0369229830053213e-05, Loss: 610.1083374023438
2024-08-03T21:09:14.295202204Z 
 50%|████▉     | 4723/9500 [16:11:44<16:09:29, 12.18s/it]08/03/2024 14:09:14 - INFO - __main__ -   Step: 4723, LR: 1.0367059286365934e-05, Loss: 542.7539672851562
2024-08-03T21:09:26.479656728Z 
 50%|████▉     | 4724/9500 [16:11:56<16:09:27, 12.18s/it]08/03/2024 14:09:26 - INFO - __main__ -   Step: 4724, LR: 1.0364888742678656e-05, Loss: 457.02801513671875
2024-08-03T21:09:38.601869742Z 
 50%|████▉     | 4725/9500 [16:12:08<16:07:54, 12.16s/it]08/03/2024 14:09:38 - INFO - __main__ -   Step: 4725, LR: 1.0362718198991378e-05, Loss: 390.06829833984375
2024-08-03T21:09:51.486779875Z 
 50%|████▉     | 4726/9500 [16:12:21<16:24:56, 12.38s/it]08/03/2024 14:09:51 - INFO - __main__ -   Step: 4726, LR: 1.0360547655304099e-05, Loss: 506.8336181640625
2024-08-03T21:10:03.627213075Z 
 50%|████▉     | 4727/9500 [16:12:33<16:19:03, 12.31s/it]08/03/2024 14:10:03 - INFO - __main__ -   Step: 4727, LR: 1.0358377111616817e-05, Loss: 546.2725830078125
2024-08-03T21:10:15.423321759Z 
 50%|████▉     | 4728/9500 [16:12:45<16:06:39, 12.15s/it]08/03/2024 14:10:15 - INFO - __main__ -   Step: 4728, LR: 1.0356206567929539e-05, Loss: 585.0744018554688
2024-08-03T21:10:28.125709449Z 
 50%|████▉     | 4729/9500 [16:12:58<16:19:31, 12.32s/it]08/03/2024 14:10:28 - INFO - __main__ -   Step: 4729, LR: 1.035403602424226e-05, Loss: 473.4066162109375
2024-08-03T21:10:40.098235217Z 
 50%|████▉     | 4730/9500 [16:13:10<16:11:04, 12.21s/it]08/03/2024 14:10:40 - INFO - __main__ -   Step: 4730, LR: 1.0351865480554982e-05, Loss: 518.646728515625
2024-08-03T21:10:52.048187146Z 
 50%|████▉     | 4731/9500 [16:13:21<16:04:32, 12.14s/it]08/03/2024 14:10:52 - INFO - __main__ -   Step: 4731, LR: 1.0349694936867702e-05, Loss: 442.1405029296875
2024-08-03T21:11:04.614468916Z 
 50%|████▉     | 4732/9500 [16:13:34<16:14:37, 12.26s/it]08/03/2024 14:11:04 - INFO - __main__ -   Step: 4732, LR: 1.0347524393180423e-05, Loss: 574.8514404296875
2024-08-03T21:11:16.527079857Z 
 50%|████▉     | 4733/9500 [16:13:46<16:06:02, 12.16s/it]08/03/2024 14:11:16 - INFO - __main__ -   Step: 4733, LR: 1.0345353849493145e-05, Loss: 587.3372192382812
2024-08-03T21:11:28.676725653Z 
 50%|████▉     | 4734/9500 [16:13:58<16:05:35, 12.16s/it]08/03/2024 14:11:28 - INFO - __main__ -   Step: 4734, LR: 1.0343183305805867e-05, Loss: 658.9241943359375
2024-08-03T21:11:41.178823217Z 
 50%|████▉     | 4735/9500 [16:14:11<16:13:38, 12.26s/it]08/03/2024 14:11:41 - INFO - __main__ -   Step: 4735, LR: 1.0341012762118588e-05, Loss: 549.362060546875
2024-08-03T21:11:53.552439422Z 
 50%|████▉     | 4736/9500 [16:14:23<16:16:09, 12.29s/it]08/03/2024 14:11:53 - INFO - __main__ -   Step: 4736, LR: 1.0338842218431308e-05, Loss: 577.32373046875
2024-08-03T21:12:05.931410235Z 
 50%|████▉     | 4737/9500 [16:14:35<16:17:58, 12.32s/it]08/03/2024 14:12:05 - INFO - __main__ -   Step: 4737, LR: 1.033667167474403e-05, Loss: 652.4701538085938
2024-08-03T21:12:18.258023424Z 
 50%|████▉     | 4738/9500 [16:14:48<16:17:55, 12.32s/it]08/03/2024 14:12:18 - INFO - __main__ -   Step: 4738, LR: 1.0334501131056751e-05, Loss: 637.78466796875
2024-08-03T21:12:30.625682225Z 
 50%|████▉     | 4739/9500 [16:15:00<16:18:49, 12.34s/it]08/03/2024 14:12:30 - INFO - __main__ -   Step: 4739, LR: 1.0332330587369473e-05, Loss: 569.1536254882812
2024-08-03T21:12:43.059481835Z 
 50%|████▉     | 4740/9500 [16:15:12<16:20:57, 12.36s/it]08/03/2024 14:12:43 - INFO - __main__ -   Step: 4740, LR: 1.0330160043682194e-05, Loss: 614.970458984375
2024-08-03T21:12:55.634741971Z 
 50%|████▉     | 4741/9500 [16:15:25<16:25:45, 12.43s/it]08/03/2024 14:12:55 - INFO - __main__ -   Step: 4741, LR: 1.0327989499994912e-05, Loss: 662.4839477539062
2024-08-03T21:13:07.883821500Z 
 50%|████▉     | 4742/9500 [16:15:37<16:21:16, 12.37s/it]08/03/2024 14:13:07 - INFO - __main__ -   Step: 4742, LR: 1.0325818956307634e-05, Loss: 502.5831298828125
2024-08-03T21:13:20.281931443Z 
 50%|████▉     | 4743/9500 [16:15:50<16:21:38, 12.38s/it]08/03/2024 14:13:20 - INFO - __main__ -   Step: 4743, LR: 1.0323648412620355e-05, Loss: 505.7378234863281
2024-08-03T21:13:32.953125250Z 
 50%|████▉     | 4744/9500 [16:16:02<16:28:19, 12.47s/it]08/03/2024 14:13:32 - INFO - __main__ -   Step: 4744, LR: 1.0321477868933077e-05, Loss: 653.2735595703125
2024-08-03T21:13:45.195023603Z 
 50%|████▉     | 4745/9500 [16:16:15<16:22:43, 12.40s/it]08/03/2024 14:13:45 - INFO - __main__ -   Step: 4745, LR: 1.0319307325245797e-05, Loss: 836.872314453125
2024-08-03T21:13:57.276541384Z 
 50%|████▉     | 4746/9500 [16:16:27<16:14:56, 12.30s/it]08/03/2024 14:13:57 - INFO - __main__ -   Step: 4746, LR: 1.0317136781558518e-05, Loss: 720.2276000976562
2024-08-03T21:14:09.800303614Z 
 50%|████▉     | 4747/9500 [16:16:39<16:19:57, 12.37s/it]08/03/2024 14:14:09 - INFO - __main__ -   Step: 4747, LR: 1.031496623787124e-05, Loss: 558.6318969726562
2024-08-03T21:14:21.886140525Z 
 50%|████▉     | 4748/9500 [16:16:51<16:12:58, 12.29s/it]08/03/2024 14:14:21 - INFO - __main__ -   Step: 4748, LR: 1.0312795694183962e-05, Loss: 519.0830688476562
2024-08-03T21:14:33.836417107Z 
 50%|████▉     | 4749/9500 [16:17:03<16:04:48, 12.18s/it]08/03/2024 14:14:33 - INFO - __main__ -   Step: 4749, LR: 1.0310625150496683e-05, Loss: 491.5986328125
2024-08-03T21:14:46.718295647Z 
 50%|█████     | 4750/9500 [16:17:16<16:21:10, 12.39s/it]08/03/2024 14:14:46 - INFO - __main__ -   Step: 4750, LR: 1.0308454606809403e-05, Loss: 465.30511474609375
2024-08-03T21:14:58.866571446Z 
 50%|█████     | 4751/9500 [16:17:28<16:15:08, 12.32s/it]08/03/2024 14:14:58 - INFO - __main__ -   Step: 4751, LR: 1.0306284063122125e-05, Loss: 429.3454284667969
2024-08-03T21:15:11.395502223Z 
 50%|█████     | 4752/9500 [16:17:41<16:19:53, 12.38s/it]08/03/2024 14:15:11 - INFO - __main__ -   Step: 4752, LR: 1.0304113519434846e-05, Loss: 490.564453125
2024-08-03T21:15:23.989660691Z 
 50%|█████     | 4753/9500 [16:17:53<16:24:42, 12.45s/it]08/03/2024 14:15:23 - INFO - __main__ -   Step: 4753, LR: 1.0301942975747568e-05, Loss: 404.6873779296875
2024-08-03T21:15:36.197473755Z 
 50%|█████     | 4754/9500 [16:18:06<16:18:50, 12.37s/it]08/03/2024 14:15:36 - INFO - __main__ -   Step: 4754, LR: 1.029977243206029e-05, Loss: 413.6280517578125
2024-08-03T21:15:48.402865699Z 
 50%|█████     | 4755/9500 [16:18:18<16:14:37, 12.32s/it]08/03/2024 14:15:48 - INFO - __main__ -   Step: 4755, LR: 1.0297601888373007e-05, Loss: 422.58392333984375
2024-08-03T21:16:01.340165097Z 
 50%|█████     | 4756/9500 [16:18:31<16:28:57, 12.51s/it]08/03/2024 14:16:01 - INFO - __main__ -   Step: 4756, LR: 1.0295431344685729e-05, Loss: 424.349853515625
2024-08-03T21:16:13.649148800Z 
 50%|█████     | 4757/9500 [16:18:43<16:24:02, 12.45s/it]08/03/2024 14:16:13 - INFO - __main__ -   Step: 4757, LR: 1.029326080099845e-05, Loss: 410.84283447265625
2024-08-03T21:16:25.814019573Z 
 50%|█████     | 4758/9500 [16:18:55<16:17:06, 12.36s/it]08/03/2024 14:16:25 - INFO - __main__ -   Step: 4758, LR: 1.0291090257311172e-05, Loss: 476.2625732421875
2024-08-03T21:16:38.462338215Z 
 50%|█████     | 4759/9500 [16:19:08<16:23:39, 12.45s/it]08/03/2024 14:16:38 - INFO - __main__ -   Step: 4759, LR: 1.0288919713623892e-05, Loss: 340.8231201171875
2024-08-03T21:16:50.766150373Z 
 50%|█████     | 4760/9500 [16:19:20<16:20:01, 12.41s/it]08/03/2024 14:16:50 - INFO - __main__ -   Step: 4760, LR: 1.0286749169936614e-05, Loss: 462.93609619140625
2024-08-03T21:17:03.191972741Z 
 50%|█████     | 4761/9500 [16:19:33<16:20:17, 12.41s/it]08/03/2024 14:17:03 - INFO - __main__ -   Step: 4761, LR: 1.0284578626249335e-05, Loss: 501.2066650390625
2024-08-03T21:17:15.783931196Z 
 50%|█████     | 4762/9500 [16:19:45<16:24:22, 12.47s/it]08/03/2024 14:17:15 - INFO - __main__ -   Step: 4762, LR: 1.0282408082562057e-05, Loss: 411.9332275390625
2024-08-03T21:17:28.234301602Z 
 50%|█████     | 4763/9500 [16:19:58<16:23:47, 12.46s/it]08/03/2024 14:17:28 - INFO - __main__ -   Step: 4763, LR: 1.0280237538874778e-05, Loss: 481.849609375
2024-08-03T21:17:40.600364761Z 
 50%|█████     | 4764/9500 [16:20:10<16:21:20, 12.43s/it]08/03/2024 14:17:40 - INFO - __main__ -   Step: 4764, LR: 1.02780669951875e-05, Loss: 524.929443359375
2024-08-03T21:17:53.216092099Z 
 50%|█████     | 4765/9500 [16:20:23<16:25:28, 12.49s/it]08/03/2024 14:17:53 - INFO - __main__ -   Step: 4765, LR: 1.027589645150022e-05, Loss: 414.29705810546875
2024-08-03T21:18:05.318602190Z 
 50%|█████     | 4766/9500 [16:20:35<16:16:08, 12.37s/it]08/03/2024 14:18:05 - INFO - __main__ -   Step: 4766, LR: 1.0273725907812941e-05, Loss: 366.49761962890625
2024-08-03T21:18:17.352430380Z 
 50%|█████     | 4767/9500 [16:20:47<16:07:56, 12.27s/it]08/03/2024 14:18:17 - INFO - __main__ -   Step: 4767, LR: 1.0271555364125663e-05, Loss: 310.5792541503906
2024-08-03T21:18:29.465588595Z 
 50%|█████     | 4768/9500 [16:20:59<16:04:00, 12.22s/it]08/03/2024 14:18:29 - INFO - __main__ -   Step: 4768, LR: 1.0269384820438384e-05, Loss: 506.8317565917969
2024-08-03T21:18:41.953789181Z 
 50%|█████     | 4769/9500 [16:21:11<16:10:04, 12.30s/it]08/03/2024 14:18:41 - INFO - __main__ -   Step: 4769, LR: 1.0267214276751102e-05, Loss: 387.2008056640625
2024-08-03T21:18:54.157238535Z 
 50%|█████     | 4770/9500 [16:21:24<16:07:30, 12.27s/it]08/03/2024 14:18:54 - INFO - __main__ -   Step: 4770, LR: 1.0265043733063824e-05, Loss: 440.981689453125
2024-08-03T21:19:06.234312754Z 
 50%|█████     | 4771/9500 [16:21:36<16:02:41, 12.21s/it]08/03/2024 14:19:06 - INFO - __main__ -   Step: 4771, LR: 1.0262873189376546e-05, Loss: 428.37310791015625
2024-08-03T21:19:18.896381791Z 
 50%|█████     | 4772/9500 [16:21:48<16:13:04, 12.35s/it]08/03/2024 14:19:18 - INFO - __main__ -   Step: 4772, LR: 1.0260702645689267e-05, Loss: 501.80853271484375
2024-08-03T21:19:31.201739669Z 
 50%|█████     | 4773/9500 [16:22:01<16:11:50, 12.34s/it]08/03/2024 14:19:31 - INFO - __main__ -   Step: 4773, LR: 1.0258532102001989e-05, Loss: 264.362060546875
2024-08-03T21:19:43.622049749Z 
 50%|█████     | 4774/9500 [16:22:13<16:13:38, 12.36s/it]08/03/2024 14:19:43 - INFO - __main__ -   Step: 4774, LR: 1.0256361558314709e-05, Loss: 370.43218994140625
2024-08-03T21:19:56.179038841Z 
 50%|█████     | 4775/9500 [16:22:26<16:18:03, 12.42s/it]08/03/2024 14:19:56 - INFO - __main__ -   Step: 4775, LR: 1.025419101462743e-05, Loss: 552.846435546875
2024-08-03T21:20:08.069639606Z 
 50%|█████     | 4776/9500 [16:22:38<16:05:21, 12.26s/it]08/03/2024 14:20:08 - INFO - __main__ -   Step: 4776, LR: 1.0252020470940152e-05, Loss: 430.15753173828125
2024-08-03T21:20:20.418773057Z 
 50%|█████     | 4777/9500 [16:22:50<16:07:13, 12.29s/it]08/03/2024 14:20:20 - INFO - __main__ -   Step: 4777, LR: 1.0249849927252873e-05, Loss: 417.97998046875
2024-08-03T21:20:32.891335367Z 
 50%|█████     | 4778/9500 [16:23:02<16:11:23, 12.34s/it]08/03/2024 14:20:32 - INFO - __main__ -   Step: 4778, LR: 1.0247679383565595e-05, Loss: 439.3720703125
2024-08-03T21:20:45.143455378Z 
 50%|█████     | 4779/9500 [16:23:15<16:09:02, 12.32s/it]08/03/2024 14:20:45 - INFO - __main__ -   Step: 4779, LR: 1.0245508839878315e-05, Loss: 524.933837890625
2024-08-03T21:20:57.164298301Z 
 50%|█████     | 4780/9500 [16:23:27<16:01:52, 12.23s/it]08/03/2024 14:20:57 - INFO - __main__ -   Step: 4780, LR: 1.0243338296191036e-05, Loss: 340.4206237792969
2024-08-03T21:21:09.950187293Z 
 50%|█████     | 4781/9500 [16:23:39<16:14:51, 12.39s/it]08/03/2024 14:21:09 - INFO - __main__ -   Step: 4781, LR: 1.0241167752503758e-05, Loss: 548.7039794921875
2024-08-03T21:21:22.340706119Z 
 50%|█████     | 4782/9500 [16:23:52<16:14:32, 12.39s/it]08/03/2024 14:21:22 - INFO - __main__ -   Step: 4782, LR: 1.023899720881648e-05, Loss: 451.5361022949219
2024-08-03T21:21:34.614793148Z 
 50%|█████     | 4783/9500 [16:24:04<16:11:31, 12.36s/it]08/03/2024 14:21:34 - INFO - __main__ -   Step: 4783, LR: 1.0236826665129198e-05, Loss: 353.3633728027344
2024-08-03T21:21:47.279384915Z 
 50%|█████     | 4784/9500 [16:24:17<16:18:32, 12.45s/it]08/03/2024 14:21:47 - INFO - __main__ -   Step: 4784, LR: 1.0234656121441919e-05, Loss: 428.9302062988281
2024-08-03T21:21:59.277557622Z 
 50%|█████     | 4785/9500 [16:24:29<16:07:41, 12.31s/it]08/03/2024 14:21:59 - INFO - __main__ -   Step: 4785, LR: 1.023248557775464e-05, Loss: 387.41839599609375
2024-08-03T21:22:11.577819826Z 
 50%|█████     | 4786/9500 [16:24:41<16:07:10, 12.31s/it]08/03/2024 14:22:11 - INFO - __main__ -   Step: 4786, LR: 1.0230315034067362e-05, Loss: 527.9840087890625
2024-08-03T21:22:24.022862881Z 
 50%|█████     | 4787/9500 [16:24:53<16:10:08, 12.35s/it]08/03/2024 14:22:24 - INFO - __main__ -   Step: 4787, LR: 1.0228144490380084e-05, Loss: 372.3883056640625
2024-08-03T21:22:36.195803811Z 
 50%|█████     | 4788/9500 [16:25:06<16:05:44, 12.30s/it]08/03/2024 14:22:36 - INFO - __main__ -   Step: 4788, LR: 1.0225973946692804e-05, Loss: 474.2613525390625
2024-08-03T21:22:48.155429648Z 
 50%|█████     | 4789/9500 [16:25:18<15:57:35, 12.20s/it]08/03/2024 14:22:48 - INFO - __main__ -   Step: 4789, LR: 1.0223803403005525e-05, Loss: 469.46807861328125
2024-08-03T21:23:00.723559176Z 
 50%|█████     | 4790/9500 [16:25:30<16:06:07, 12.31s/it]08/03/2024 14:23:00 - INFO - __main__ -   Step: 4790, LR: 1.0221632859318247e-05, Loss: 467.6131591796875
2024-08-03T21:23:13.278631284Z 
 50%|█████     | 4791/9500 [16:25:43<16:11:46, 12.38s/it]08/03/2024 14:23:13 - INFO - __main__ -   Step: 4791, LR: 1.0219462315630968e-05, Loss: 415.00946044921875
2024-08-03T21:23:25.410722100Z 
 50%|█████     | 4792/9500 [16:25:55<16:05:41, 12.31s/it]08/03/2024 14:23:25 - INFO - __main__ -   Step: 4792, LR: 1.021729177194369e-05, Loss: 496.55938720703125
2024-08-03T21:23:38.176412287Z 
 50%|█████     | 4793/9500 [16:26:08<16:16:17, 12.44s/it]08/03/2024 14:23:38 - INFO - __main__ -   Step: 4793, LR: 1.021512122825641e-05, Loss: 379.76715087890625
2024-08-03T21:23:50.697060386Z 
 50%|█████     | 4794/9500 [16:26:20<16:17:51, 12.47s/it]08/03/2024 14:23:50 - INFO - __main__ -   Step: 4794, LR: 1.0212950684569131e-05, Loss: 544.450927734375
2024-08-03T21:24:02.688673288Z 
 50%|█████     | 4795/9500 [16:26:32<16:06:27, 12.32s/it]08/03/2024 14:24:02 - INFO - __main__ -   Step: 4795, LR: 1.0210780140881853e-05, Loss: 411.00457763671875
2024-08-03T21:24:15.357497046Z 
 50%|█████     | 4796/9500 [16:26:45<16:14:20, 12.43s/it]08/03/2024 14:24:15 - INFO - __main__ -   Step: 4796, LR: 1.0208609597194574e-05, Loss: 418.34149169921875
2024-08-03T21:24:27.764708073Z 
 50%|█████     | 4797/9500 [16:26:57<16:13:39, 12.42s/it]08/03/2024 14:24:27 - INFO - __main__ -   Step: 4797, LR: 1.0206439053507293e-05, Loss: 453.2077941894531
2024-08-03T21:24:39.986042754Z 
 51%|█████     | 4798/9500 [16:27:09<16:08:44, 12.36s/it]08/03/2024 14:24:39 - INFO - __main__ -   Step: 4798, LR: 1.0204268509820014e-05, Loss: 333.48858642578125
2024-08-03T21:24:52.654686793Z 
 51%|█████     | 4799/9500 [16:27:22<16:15:44, 12.45s/it]08/03/2024 14:24:52 - INFO - __main__ -   Step: 4799, LR: 1.0202097966132736e-05, Loss: 394.926025390625
2024-08-03T21:25:04.795247963Z 
 51%|█████     | 4800/9500 [16:27:34<16:08:11, 12.36s/it]08/03/2024 14:25:04 - INFO - __main__ -   Step: 4800, LR: 1.0199927422445457e-05, Loss: 392.21160888671875
2024-08-03T21:25:17.199843808Z 
 51%|█████     | 4801/9500 [16:27:47<16:09:01, 12.37s/it]08/03/2024 14:25:17 - INFO - __main__ -   Step: 4801, LR: 1.0197756878758179e-05, Loss: 339.9950256347656
2024-08-03T21:25:29.797149926Z 
 51%|█████     | 4802/9500 [16:27:59<16:14:05, 12.44s/it]08/03/2024 14:25:29 - INFO - __main__ -   Step: 4802, LR: 1.0195586335070899e-05, Loss: 338.18597412109375
2024-08-03T21:25:41.951822383Z 
 51%|█████     | 4803/9500 [16:28:11<16:07:09, 12.35s/it]08/03/2024 14:25:41 - INFO - __main__ -   Step: 4803, LR: 1.019341579138362e-05, Loss: 390.29351806640625
2024-08-03T21:25:54.404704440Z 
 51%|█████     | 4804/9500 [16:28:24<16:09:15, 12.38s/it]08/03/2024 14:25:54 - INFO - __main__ -   Step: 4804, LR: 1.0191245247696342e-05, Loss: 446.7116394042969
2024-08-03T21:26:06.898433434Z 
 51%|█████     | 4805/9500 [16:28:36<16:11:38, 12.42s/it]08/03/2024 14:26:06 - INFO - __main__ -   Step: 4805, LR: 1.0189074704009063e-05, Loss: 381.83184814453125
2024-08-03T21:26:19.076765599Z 
 51%|█████     | 4806/9500 [16:28:49<16:05:49, 12.35s/it]08/03/2024 14:26:19 - INFO - __main__ -   Step: 4806, LR: 1.0186904160321785e-05, Loss: 394.09320068359375
2024-08-03T21:26:31.239191028Z 
 51%|█████     | 4807/9500 [16:29:01<16:01:19, 12.29s/it]08/03/2024 14:26:31 - INFO - __main__ -   Step: 4807, LR: 1.0184733616634507e-05, Loss: 413.27825927734375
2024-08-03T21:26:43.962045280Z 
 51%|█████     | 4808/9500 [16:29:13<16:11:15, 12.42s/it]08/03/2024 14:26:43 - INFO - __main__ -   Step: 4808, LR: 1.0182563072947226e-05, Loss: 566.7191162109375
2024-08-03T21:26:56.498799359Z 
 51%|█████     | 4809/9500 [16:29:26<16:13:47, 12.46s/it]08/03/2024 14:26:56 - INFO - __main__ -   Step: 4809, LR: 1.0180392529259948e-05, Loss: 486.0235290527344
2024-08-03T21:27:08.756367118Z 
 51%|█████     | 4810/9500 [16:29:38<16:08:57, 12.40s/it]08/03/2024 14:27:08 - INFO - __main__ -   Step: 4810, LR: 1.0178221985572668e-05, Loss: 386.9651794433594
2024-08-03T21:27:20.928554898Z 
 51%|█████     | 4811/9500 [16:29:50<16:03:29, 12.33s/it]08/03/2024 14:27:20 - INFO - __main__ -   Step: 4811, LR: 1.0176051441885388e-05, Loss: 394.0804748535156
2024-08-03T21:27:33.543857302Z 
 51%|█████     | 4812/9500 [16:30:03<16:10:00, 12.41s/it]08/03/2024 14:27:33 - INFO - __main__ -   Step: 4812, LR: 1.017388089819811e-05, Loss: 474.4390869140625
2024-08-03T21:27:45.722768107Z 
 51%|█████     | 4813/9500 [16:30:15<16:04:16, 12.34s/it]08/03/2024 14:27:45 - INFO - __main__ -   Step: 4813, LR: 1.017171035451083e-05, Loss: 545.9036865234375
2024-08-03T21:27:57.933385370Z 
 51%|█████     | 4814/9500 [16:30:27<16:00:56, 12.30s/it]08/03/2024 14:27:57 - INFO - __main__ -   Step: 4814, LR: 1.0169539810823552e-05, Loss: 435.92486572265625
2024-08-03T21:28:10.795024262Z 
 51%|█████     | 4815/9500 [16:30:40<16:13:48, 12.47s/it]08/03/2024 14:28:10 - INFO - __main__ -   Step: 4815, LR: 1.0167369267136274e-05, Loss: 465.204833984375
2024-08-03T21:28:22.883285184Z 
 51%|█████     | 4816/9500 [16:30:52<16:04:37, 12.36s/it]08/03/2024 14:28:22 - INFO - __main__ -   Step: 4816, LR: 1.0165198723448994e-05, Loss: 564.816650390625
2024-08-03T21:28:35.203279360Z 
 51%|█████     | 4817/9500 [16:31:05<16:03:33, 12.35s/it]08/03/2024 14:28:35 - INFO - __main__ -   Step: 4817, LR: 1.0163028179761715e-05, Loss: 299.0638427734375
2024-08-03T21:28:48.183627910Z 
 51%|█████     | 4818/9500 [16:31:18<16:18:11, 12.54s/it]08/03/2024 14:28:48 - INFO - __main__ -   Step: 4818, LR: 1.0160857636074437e-05, Loss: 512.5934448242188
2024-08-03T21:29:00.296168606Z 
 51%|█████     | 4819/9500 [16:31:30<16:08:06, 12.41s/it]08/03/2024 14:29:00 - INFO - __main__ -   Step: 4819, LR: 1.0158687092387158e-05, Loss: 446.25946044921875
2024-08-03T21:29:12.729617013Z 
 51%|█████     | 4820/9500 [16:31:42<16:08:28, 12.42s/it]08/03/2024 14:29:12 - INFO - __main__ -   Step: 4820, LR: 1.015651654869988e-05, Loss: 414.8257751464844
2024-08-03T21:29:25.585632446Z 
 51%|█████     | 4821/9500 [16:31:55<16:18:33, 12.55s/it]08/03/2024 14:29:25 - INFO - __main__ -   Step: 4821, LR: 1.0154346005012602e-05, Loss: 379.16278076171875
2024-08-03T21:29:37.759603509Z 
 51%|█████     | 4822/9500 [16:32:07<16:09:35, 12.44s/it]08/03/2024 14:29:37 - INFO - __main__ -   Step: 4822, LR: 1.0152175461325321e-05, Loss: 443.11480712890625
2024-08-03T21:29:49.802718042Z 
 51%|█████     | 4823/9500 [16:32:19<16:00:12, 12.32s/it]08/03/2024 14:29:49 - INFO - __main__ -   Step: 4823, LR: 1.0150004917638043e-05, Loss: 476.55914306640625
2024-08-03T21:30:02.355932422Z 
 51%|█████     | 4824/9500 [16:32:32<16:05:29, 12.39s/it]08/03/2024 14:30:02 - INFO - __main__ -   Step: 4824, LR: 1.0147834373950763e-05, Loss: 444.6247253417969
2024-08-03T21:30:14.655171805Z 
 51%|█████     | 4825/9500 [16:32:44<16:03:11, 12.36s/it]08/03/2024 14:30:14 - INFO - __main__ -   Step: 4825, LR: 1.0145663830263483e-05, Loss: 355.4815368652344
2024-08-03T21:30:27.010117870Z 
 51%|█████     | 4826/9500 [16:32:56<16:02:49, 12.36s/it]08/03/2024 14:30:27 - INFO - __main__ -   Step: 4826, LR: 1.0143493286576204e-05, Loss: 490.3376159667969
2024-08-03T21:30:39.597809438Z 
 51%|█████     | 4827/9500 [16:33:09<16:07:56, 12.43s/it]08/03/2024 14:30:39 - INFO - __main__ -   Step: 4827, LR: 1.0141322742888926e-05, Loss: 413.79998779296875
2024-08-03T21:30:51.906231617Z 
 51%|█████     | 4828/9500 [16:33:21<16:04:56, 12.39s/it]08/03/2024 14:30:51 - INFO - __main__ -   Step: 4828, LR: 1.0139152199201647e-05, Loss: 315.1439208984375
2024-08-03T21:31:04.060688632Z 
 51%|█████     | 4829/9500 [16:33:33<15:59:10, 12.32s/it]08/03/2024 14:31:04 - INFO - __main__ -   Step: 4829, LR: 1.0136981655514369e-05, Loss: 408.22723388671875
2024-08-03T21:31:16.468094609Z 
 51%|█████     | 4830/9500 [16:33:46<16:00:57, 12.35s/it]08/03/2024 14:31:16 - INFO - __main__ -   Step: 4830, LR: 1.013481111182709e-05, Loss: 500.45880126953125
2024-08-03T21:31:28.489768321Z 
 51%|█████     | 4831/9500 [16:33:58<15:53:12, 12.25s/it]08/03/2024 14:31:28 - INFO - __main__ -   Step: 4831, LR: 1.013264056813981e-05, Loss: 376.2130432128906
2024-08-03T21:31:40.890378433Z 
 51%|█████     | 4832/9500 [16:34:10<15:56:32, 12.29s/it]08/03/2024 14:31:40 - INFO - __main__ -   Step: 4832, LR: 1.0130470024452532e-05, Loss: 444.5518798828125
2024-08-03T21:31:53.080276986Z 
 51%|█████     | 4833/9500 [16:34:23<15:53:52, 12.26s/it]08/03/2024 14:31:53 - INFO - __main__ -   Step: 4833, LR: 1.0128299480765253e-05, Loss: 395.235595703125
2024-08-03T21:32:05.612243481Z 
 51%|█████     | 4834/9500 [16:34:35<15:59:56, 12.34s/it]08/03/2024 14:32:05 - INFO - __main__ -   Step: 4834, LR: 1.0126128937077975e-05, Loss: 396.51092529296875
2024-08-03T21:32:17.635982550Z 
 51%|█████     | 4835/9500 [16:34:47<15:52:16, 12.25s/it]08/03/2024 14:32:17 - INFO - __main__ -   Step: 4835, LR: 1.0123958393390697e-05, Loss: 448.1776123046875
2024-08-03T21:32:30.470542766Z 
 51%|█████     | 4836/9500 [16:35:00<16:05:44, 12.42s/it]08/03/2024 14:32:30 - INFO - __main__ -   Step: 4836, LR: 1.0121787849703416e-05, Loss: 402.24749755859375
2024-08-03T21:32:42.888662096Z 
 51%|█████     | 4837/9500 [16:35:12<16:05:24, 12.42s/it]08/03/2024 14:32:42 - INFO - __main__ -   Step: 4837, LR: 1.0119617306016138e-05, Loss: 528.0595092773438
2024-08-03T21:32:54.783238940Z 
 51%|█████     | 4838/9500 [16:35:24<15:52:53, 12.26s/it]08/03/2024 14:32:54 - INFO - __main__ -   Step: 4838, LR: 1.0117446762328858e-05, Loss: 379.9451904296875
2024-08-03T21:33:07.502789108Z 
 51%|█████     | 4839/9500 [16:35:37<16:03:19, 12.40s/it]08/03/2024 14:33:07 - INFO - __main__ -   Step: 4839, LR: 1.011527621864158e-05, Loss: 464.2506408691406
2024-08-03T21:33:20.030490740Z 
 51%|█████     | 4840/9500 [16:35:49<16:06:04, 12.44s/it]08/03/2024 14:33:20 - INFO - __main__ -   Step: 4840, LR: 1.01131056749543e-05, Loss: 537.1431884765625
2024-08-03T21:33:32.102175179Z 
 51%|█████     | 4841/9500 [16:36:02<15:57:19, 12.33s/it]08/03/2024 14:33:32 - INFO - __main__ -   Step: 4841, LR: 1.0110935131267021e-05, Loss: 468.2579650878906
2024-08-03T21:33:44.520852642Z 
 51%|█████     | 4842/9500 [16:36:14<15:59:12, 12.36s/it]08/03/2024 14:33:44 - INFO - __main__ -   Step: 4842, LR: 1.0108764587579742e-05, Loss: 363.07598876953125
2024-08-03T21:33:56.922341002Z 
 51%|█████     | 4843/9500 [16:36:26<16:00:04, 12.37s/it]08/03/2024 14:33:56 - INFO - __main__ -   Step: 4843, LR: 1.0106594043892464e-05, Loss: 414.15472412109375
2024-08-03T21:34:09.243971394Z 
 51%|█████     | 4844/9500 [16:36:39<15:58:45, 12.36s/it]08/03/2024 14:34:09 - INFO - __main__ -   Step: 4844, LR: 1.0104423500205186e-05, Loss: 375.73876953125
2024-08-03T21:34:21.714781265Z 
 51%|█████     | 4845/9500 [16:36:51<16:01:14, 12.39s/it]08/03/2024 14:34:21 - INFO - __main__ -   Step: 4845, LR: 1.0102252956517905e-05, Loss: 411.25439453125
2024-08-03T21:34:33.730831733Z 
 51%|█████     | 4846/9500 [16:37:03<15:52:20, 12.28s/it]08/03/2024 14:34:33 - INFO - __main__ -   Step: 4846, LR: 1.0100082412830627e-05, Loss: 355.80255126953125
2024-08-03T21:34:46.009192696Z 
 51%|█████     | 4847/9500 [16:37:15<15:52:08, 12.28s/it]08/03/2024 14:34:46 - INFO - __main__ -   Step: 4847, LR: 1.0097911869143349e-05, Loss: 503.724365234375
2024-08-03T21:34:58.495693645Z 
 51%|█████     | 4848/9500 [16:37:28<15:56:47, 12.34s/it]08/03/2024 14:34:58 - INFO - __main__ -   Step: 4848, LR: 1.009574132545607e-05, Loss: 439.91900634765625
2024-08-03T21:35:11.229699948Z 
 51%|█████     | 4849/9500 [16:37:41<16:05:44, 12.46s/it]08/03/2024 14:35:11 - INFO - __main__ -   Step: 4849, LR: 1.0093570781768792e-05, Loss: 405.62652587890625
2024-08-03T21:35:23.671493396Z 
 51%|█████     | 4850/9500 [16:37:53<16:05:09, 12.45s/it]08/03/2024 14:35:23 - INFO - __main__ -   Step: 4850, LR: 1.0091400238081512e-05, Loss: 421.7899169921875
2024-08-03T21:35:36.222287668Z 
 51%|█████     | 4851/9500 [16:38:06<16:07:11, 12.48s/it]08/03/2024 14:35:36 - INFO - __main__ -   Step: 4851, LR: 1.0089229694394233e-05, Loss: 426.75921630859375
2024-08-03T21:35:48.439611120Z 
 51%|█████     | 4852/9500 [16:38:18<16:00:49, 12.40s/it]08/03/2024 14:35:48 - INFO - __main__ -   Step: 4852, LR: 1.0087059150706953e-05, Loss: 381.0748291015625
2024-08-03T21:36:00.487186149Z 
 51%|█████     | 4853/9500 [16:38:30<15:52:21, 12.30s/it]08/03/2024 14:36:00 - INFO - __main__ -   Step: 4853, LR: 1.0084888607019675e-05, Loss: 350.1763916015625
2024-08-03T21:36:12.639579191Z 
 51%|█████     | 4854/9500 [16:38:42<15:48:47, 12.25s/it]08/03/2024 14:36:12 - INFO - __main__ -   Step: 4854, LR: 1.0082718063332394e-05, Loss: 418.5489501953125
2024-08-03T21:36:25.565045370Z 
 51%|█████     | 4855/9500 [16:38:55<16:04:13, 12.46s/it]08/03/2024 14:36:25 - INFO - __main__ -   Step: 4855, LR: 1.0080547519645116e-05, Loss: 470.734130859375
2024-08-03T21:36:37.958035491Z 
 51%|█████     | 4856/9500 [16:39:07<16:02:34, 12.44s/it]08/03/2024 14:36:37 - INFO - __main__ -   Step: 4856, LR: 1.0078376975957838e-05, Loss: 365.4860534667969
2024-08-03T21:36:50.054455191Z 
 51%|█████     | 4857/9500 [16:39:19<15:54:28, 12.33s/it]08/03/2024 14:36:50 - INFO - __main__ -   Step: 4857, LR: 1.0076206432270559e-05, Loss: 461.2852478027344
2024-08-03T21:37:02.942631059Z 
 51%|█████     | 4858/9500 [16:39:32<16:07:07, 12.50s/it]08/03/2024 14:37:02 - INFO - __main__ -   Step: 4858, LR: 1.007403588858328e-05, Loss: 425.76715087890625
2024-08-03T21:37:14.964116359Z 
 51%|█████     | 4859/9500 [16:39:44<15:55:48, 12.36s/it]08/03/2024 14:37:14 - INFO - __main__ -   Step: 4859, LR: 1.0071865344896e-05, Loss: 489.52288818359375
2024-08-03T21:37:27.048439883Z 
 51%|█████     | 4860/9500 [16:39:56<15:49:16, 12.28s/it]08/03/2024 14:37:27 - INFO - __main__ -   Step: 4860, LR: 1.0069694801208722e-05, Loss: 317.78826904296875
2024-08-03T21:37:39.356997372Z 
 51%|█████     | 4861/9500 [16:40:09<15:49:50, 12.28s/it]08/03/2024 14:37:39 - INFO - __main__ -   Step: 4861, LR: 1.0067524257521444e-05, Loss: 309.61456298828125
2024-08-03T21:37:51.649664992Z 
 51%|█████     | 4862/9500 [16:40:21<15:49:48, 12.29s/it]08/03/2024 14:37:51 - INFO - __main__ -   Step: 4862, LR: 1.0065353713834165e-05, Loss: 395.469482421875
2024-08-03T21:38:03.785697093Z 
 51%|█████     | 4863/9500 [16:40:33<15:46:05, 12.24s/it]08/03/2024 14:38:03 - INFO - __main__ -   Step: 4863, LR: 1.0063183170146887e-05, Loss: 412.5941162109375
2024-08-03T21:38:16.854321084Z 
 51%|█████     | 4864/9500 [16:40:46<16:05:03, 12.49s/it]08/03/2024 14:38:16 - INFO - __main__ -   Step: 4864, LR: 1.0061012626459608e-05, Loss: 399.2677001953125
2024-08-03T21:38:28.999758811Z 
 51%|█████     | 4865/9500 [16:40:58<15:56:51, 12.39s/it]08/03/2024 14:38:28 - INFO - __main__ -   Step: 4865, LR: 1.0058842082772328e-05, Loss: 489.78326416015625
2024-08-03T21:38:41.339197100Z 
 51%|█████     | 4866/9500 [16:41:11<15:55:33, 12.37s/it]08/03/2024 14:38:41 - INFO - __main__ -   Step: 4866, LR: 1.0056671539085048e-05, Loss: 540.0342407226562
2024-08-03T21:38:54.297905871Z 
 51%|█████     | 4867/9500 [16:41:24<16:08:56, 12.55s/it]08/03/2024 14:38:54 - INFO - __main__ -   Step: 4867, LR: 1.005450099539777e-05, Loss: 468.1866149902344
2024-08-03T21:39:06.470889472Z 
 51%|█████     | 4868/9500 [16:41:36<16:00:02, 12.44s/it]08/03/2024 14:39:06 - INFO - __main__ -   Step: 4868, LR: 1.005233045171049e-05, Loss: 441.72113037109375
2024-08-03T21:39:18.709092480Z 
 51%|█████▏    | 4869/9500 [16:41:48<15:55:15, 12.38s/it]08/03/2024 14:39:18 - INFO - __main__ -   Step: 4869, LR: 1.0050159908023211e-05, Loss: 400.2337646484375
2024-08-03T21:39:31.612330889Z 
 51%|█████▏    | 4870/9500 [16:42:01<16:07:14, 12.53s/it]08/03/2024 14:39:31 - INFO - __main__ -   Step: 4870, LR: 1.0047989364335933e-05, Loss: 417.04498291015625
2024-08-03T21:39:43.702661457Z 
 51%|█████▏    | 4871/9500 [16:42:13<15:56:45, 12.40s/it]08/03/2024 14:39:43 - INFO - __main__ -   Step: 4871, LR: 1.0045818820648654e-05, Loss: 437.0533447265625
2024-08-03T21:39:55.698567356Z 
 51%|█████▏    | 4872/9500 [16:42:25<15:47:10, 12.28s/it]08/03/2024 14:39:55 - INFO - __main__ -   Step: 4872, LR: 1.0043648276961376e-05, Loss: 375.3026428222656
2024-08-03T21:40:08.736421012Z 
 51%|█████▏    | 4873/9500 [16:42:38<16:04:30, 12.51s/it]08/03/2024 14:40:08 - INFO - __main__ -   Step: 4873, LR: 1.0041477733274097e-05, Loss: 362.2744445800781
2024-08-03T21:40:21.178894509Z 
 51%|█████▏    | 4874/9500 [16:42:51<16:02:47, 12.49s/it]08/03/2024 14:40:21 - INFO - __main__ -   Step: 4874, LR: 1.0039307189586817e-05, Loss: 432.0477600097656
2024-08-03T21:40:33.251797299Z 
 51%|█████▏    | 4875/9500 [16:43:03<15:52:59, 12.36s/it]08/03/2024 14:40:33 - INFO - __main__ -   Step: 4875, LR: 1.0037136645899539e-05, Loss: 382.7181091308594
2024-08-03T21:40:46.015532948Z 
 51%|█████▏    | 4876/9500 [16:43:15<16:02:03, 12.48s/it]08/03/2024 14:40:46 - INFO - __main__ -   Step: 4876, LR: 1.003496610221226e-05, Loss: 478.2093505859375
2024-08-03T21:40:58.171609869Z 
 51%|█████▏    | 4877/9500 [16:43:28<15:54:17, 12.39s/it]08/03/2024 14:40:58 - INFO - __main__ -   Step: 4877, LR: 1.0032795558524982e-05, Loss: 454.63702392578125
2024-08-03T21:41:09.983241779Z 
 51%|█████▏    | 4878/9500 [16:43:39<15:40:49, 12.21s/it]08/03/2024 14:41:09 - INFO - __main__ -   Step: 4878, LR: 1.0030625014837703e-05, Loss: 349.4506530761719
2024-08-03T21:41:22.546377863Z 
 51%|█████▏    | 4879/9500 [16:43:52<15:48:41, 12.32s/it]08/03/2024 14:41:22 - INFO - __main__ -   Step: 4879, LR: 1.0028454471150423e-05, Loss: 321.02801513671875
2024-08-03T21:41:34.848625019Z 
 51%|█████▏    | 4880/9500 [16:44:04<15:48:07, 12.31s/it]08/03/2024 14:41:34 - INFO - __main__ -   Step: 4880, LR: 1.0026283927463143e-05, Loss: 469.4208984375
2024-08-03T21:41:46.806651683Z 
 51%|█████▏    | 4881/9500 [16:44:16<15:39:43, 12.21s/it]08/03/2024 14:41:46 - INFO - __main__ -   Step: 4881, LR: 1.0024113383775865e-05, Loss: 391.177978515625
2024-08-03T21:41:59.417475290Z 
 51%|█████▏    | 4882/9500 [16:44:29<15:48:49, 12.33s/it]08/03/2024 14:41:59 - INFO - __main__ -   Step: 4882, LR: 1.0021942840088586e-05, Loss: 397.50274658203125
2024-08-03T21:42:12.120715563Z 
 51%|█████▏    | 4883/9500 [16:44:42<15:57:18, 12.44s/it]08/03/2024 14:42:12 - INFO - __main__ -   Step: 4883, LR: 1.0019772296401306e-05, Loss: 563.701416015625
2024-08-03T21:42:24.448039025Z 
 51%|█████▏    | 4884/9500 [16:44:54<15:54:28, 12.41s/it]08/03/2024 14:42:24 - INFO - __main__ -   Step: 4884, LR: 1.0017601752714028e-05, Loss: 515.8181762695312
2024-08-03T21:42:37.095971046Z 
 51%|█████▏    | 4885/9500 [16:45:07<15:59:50, 12.48s/it]08/03/2024 14:42:37 - INFO - __main__ -   Step: 4885, LR: 1.001543120902675e-05, Loss: 450.2780456542969
2024-08-03T21:42:49.604117036Z 
 51%|█████▏    | 4886/9500 [16:45:19<16:00:18, 12.49s/it]08/03/2024 14:42:49 - INFO - __main__ -   Step: 4886, LR: 1.001326066533947e-05, Loss: 377.2232971191406
2024-08-03T21:43:01.647744475Z 
 51%|█████▏    | 4887/9500 [16:45:31<15:49:51, 12.35s/it]08/03/2024 14:43:01 - INFO - __main__ -   Step: 4887, LR: 1.0011090121652192e-05, Loss: 473.385986328125
2024-08-03T21:43:14.194576155Z 
 51%|█████▏    | 4888/9500 [16:45:44<15:54:05, 12.41s/it]08/03/2024 14:43:14 - INFO - __main__ -   Step: 4888, LR: 1.0008919577964912e-05, Loss: 581.1839599609375
2024-08-03T21:43:26.496144504Z 
 51%|█████▏    | 4889/9500 [16:45:56<15:51:19, 12.38s/it]08/03/2024 14:43:26 - INFO - __main__ -   Step: 4889, LR: 1.0006749034277634e-05, Loss: 377.78936767578125
2024-08-03T21:43:38.734389421Z 
 51%|█████▏    | 4890/9500 [16:46:08<15:47:52, 12.34s/it]08/03/2024 14:43:38 - INFO - __main__ -   Step: 4890, LR: 1.0004578490590355e-05, Loss: 514.8863525390625
2024-08-03T21:43:51.118677183Z 
 51%|█████▏    | 4891/9500 [16:46:21<15:48:45, 12.35s/it]08/03/2024 14:43:51 - INFO - __main__ -   Step: 4891, LR: 1.0002407946903077e-05, Loss: 382.9641418457031
2024-08-03T21:44:03.494925657Z 
 51%|█████▏    | 4892/9500 [16:46:33<15:49:08, 12.36s/it]08/03/2024 14:44:03 - INFO - __main__ -   Step: 4892, LR: 1.0000237403215798e-05, Loss: 462.826171875
2024-08-03T21:44:15.377640823Z 
 52%|█████▏    | 4893/9500 [16:46:45<15:37:58, 12.22s/it]08/03/2024 14:44:15 - INFO - __main__ -   Step: 4893, LR: 9.998066859528518e-06, Loss: 425.8421325683594
2024-08-03T21:44:28.219168992Z 
 52%|█████▏    | 4894/9500 [16:46:58<15:52:10, 12.40s/it]08/03/2024 14:44:28 - INFO - __main__ -   Step: 4894, LR: 9.99589631584124e-06, Loss: 485.6290588378906
2024-08-03T21:44:40.322334311Z 
 52%|█████▏    | 4895/9500 [16:47:10<15:45:03, 12.31s/it]08/03/2024 14:44:40 - INFO - __main__ -   Step: 4895, LR: 9.99372577215396e-06, Loss: 467.4732971191406
2024-08-03T21:44:52.649348915Z 
 52%|█████▏    | 4896/9500 [16:47:22<15:45:09, 12.32s/it]08/03/2024 14:44:52 - INFO - __main__ -   Step: 4896, LR: 9.991555228466681e-06, Loss: 459.82763671875
2024-08-03T21:45:04.625113137Z 
 52%|█████▏    | 4897/9500 [16:47:34<15:37:05, 12.22s/it]08/03/2024 14:45:04 - INFO - __main__ -   Step: 4897, LR: 9.989384684779401e-06, Loss: 457.49786376953125
2024-08-03T21:45:17.287639082Z 
 52%|█████▏    | 4898/9500 [16:47:47<15:47:10, 12.35s/it]08/03/2024 14:45:17 - INFO - __main__ -   Step: 4898, LR: 9.987214141092123e-06, Loss: 327.331787109375
2024-08-03T21:45:29.082218410Z 
 52%|█████▏    | 4899/9500 [16:47:59<15:34:13, 12.18s/it]08/03/2024 14:45:29 - INFO - __main__ -   Step: 4899, LR: 9.985043597404844e-06, Loss: 309.54071044921875
2024-08-03T21:45:41.150914865Z 
 52%|█████▏    | 4900/9500 [16:48:11<15:31:23, 12.15s/it]08/03/2024 14:45:41 - INFO - __main__ -   Step: 4900, LR: 9.982873053717566e-06, Loss: 445.30657958984375
2024-08-03T21:45:54.585362376Z 
 52%|█████▏    | 4901/9500 [16:48:24<16:00:46, 12.53s/it]08/03/2024 14:45:54 - INFO - __main__ -   Step: 4901, LR: 9.980702510030287e-06, Loss: 457.13922119140625
2024-08-03T21:46:07.006544840Z 
 52%|█████▏    | 4902/9500 [16:48:36<15:57:56, 12.50s/it]08/03/2024 14:46:07 - INFO - __main__ -   Step: 4902, LR: 9.978531966343007e-06, Loss: 468.5211181640625
2024-08-03T21:46:19.176987939Z 
 52%|█████▏    | 4903/9500 [16:48:49<15:50:09, 12.40s/it]08/03/2024 14:46:19 - INFO - __main__ -   Step: 4903, LR: 9.976361422655729e-06, Loss: 392.85565185546875
2024-08-03T21:46:31.476346104Z 
 52%|█████▏    | 4904/9500 [16:49:01<15:47:36, 12.37s/it]08/03/2024 14:46:31 - INFO - __main__ -   Step: 4904, LR: 9.974190878968449e-06, Loss: 378.45843505859375
2024-08-03T21:46:43.769845359Z 
 52%|█████▏    | 4905/9500 [16:49:13<15:45:36, 12.35s/it]08/03/2024 14:46:43 - INFO - __main__ -   Step: 4905, LR: 9.97202033528117e-06, Loss: 441.60595703125
2024-08-03T21:46:55.968989748Z 
 52%|█████▏    | 4906/9500 [16:49:25<15:42:00, 12.30s/it]08/03/2024 14:46:55 - INFO - __main__ -   Step: 4906, LR: 9.969849791593892e-06, Loss: 327.63458251953125
2024-08-03T21:47:08.659886017Z 
 52%|█████▏    | 4907/9500 [16:49:38<15:50:42, 12.42s/it]08/03/2024 14:47:08 - INFO - __main__ -   Step: 4907, LR: 9.967679247906613e-06, Loss: 368.967041015625
2024-08-03T21:47:21.229188945Z 
 52%|█████▏    | 4908/9500 [16:49:51<15:53:56, 12.46s/it]08/03/2024 14:47:21 - INFO - __main__ -   Step: 4908, LR: 9.965508704219335e-06, Loss: 433.0684814453125
2024-08-03T21:47:33.663730271Z 
 52%|█████▏    | 4909/9500 [16:50:03<15:53:03, 12.46s/it]08/03/2024 14:47:33 - INFO - __main__ -   Step: 4909, LR: 9.963338160532056e-06, Loss: 366.913818359375
2024-08-03T21:47:46.243674076Z 
 52%|█████▏    | 4910/9500 [16:50:16<15:55:40, 12.49s/it]08/03/2024 14:47:46 - INFO - __main__ -   Step: 4910, LR: 9.961167616844776e-06, Loss: 367.3520812988281
2024-08-03T21:47:58.590179712Z 
 52%|█████▏    | 4911/9500 [16:50:28<15:52:08, 12.45s/it]08/03/2024 14:47:58 - INFO - __main__ -   Step: 4911, LR: 9.958997073157496e-06, Loss: 493.6631774902344
2024-08-03T21:48:10.727405573Z 
 52%|█████▏    | 4912/9500 [16:50:40<15:44:46, 12.36s/it]08/03/2024 14:48:10 - INFO - __main__ -   Step: 4912, LR: 9.956826529470218e-06, Loss: 478.49261474609375
2024-08-03T21:48:23.265476560Z 
 52%|█████▏    | 4913/9500 [16:50:53<15:48:46, 12.41s/it]08/03/2024 14:48:23 - INFO - __main__ -   Step: 4913, LR: 9.95465598578294e-06, Loss: 412.71807861328125
2024-08-03T21:48:35.516018806Z 
 52%|█████▏    | 4914/9500 [16:51:05<15:44:53, 12.36s/it]08/03/2024 14:48:35 - INFO - __main__ -   Step: 4914, LR: 9.952485442095661e-06, Loss: 552.4935302734375
2024-08-03T21:48:47.878100766Z 
 52%|█████▏    | 4915/9500 [16:51:17<15:44:40, 12.36s/it]08/03/2024 14:48:47 - INFO - __main__ -   Step: 4915, LR: 9.950314898408382e-06, Loss: 422.06634521484375
2024-08-03T21:49:00.598202483Z 
 52%|█████▏    | 4916/9500 [16:51:30<15:52:40, 12.47s/it]08/03/2024 14:49:00 - INFO - __main__ -   Step: 4916, LR: 9.948144354721104e-06, Loss: 512.882568359375
2024-08-03T21:49:12.487394317Z 
 52%|█████▏    | 4917/9500 [16:51:42<15:39:10, 12.30s/it]08/03/2024 14:49:12 - INFO - __main__ -   Step: 4917, LR: 9.945973811033824e-06, Loss: 316.99468994140625
2024-08-03T21:49:24.942504651Z 
 52%|█████▏    | 4918/9500 [16:51:54<15:42:36, 12.34s/it]08/03/2024 14:49:24 - INFO - __main__ -   Step: 4918, LR: 9.943803267346544e-06, Loss: 532.7220458984375
2024-08-03T21:49:37.511500644Z 
 52%|█████▏    | 4919/9500 [16:52:07<15:47:34, 12.41s/it]08/03/2024 14:49:37 - INFO - __main__ -   Step: 4919, LR: 9.941632723659265e-06, Loss: 437.8063659667969
2024-08-03T21:49:49.883655792Z 
 52%|█████▏    | 4920/9500 [16:52:19<15:46:29, 12.40s/it]08/03/2024 14:49:49 - INFO - __main__ -   Step: 4920, LR: 9.939462179971987e-06, Loss: 421.5905456542969
2024-08-03T21:50:01.902332269Z 
 52%|█████▏    | 4921/9500 [16:52:31<15:37:33, 12.29s/it]08/03/2024 14:50:01 - INFO - __main__ -   Step: 4921, LR: 9.937291636284708e-06, Loss: 428.1048583984375
2024-08-03T21:50:14.669923204Z 
 52%|█████▏    | 4922/9500 [16:52:44<15:48:24, 12.43s/it]08/03/2024 14:50:14 - INFO - __main__ -   Step: 4922, LR: 9.93512109259743e-06, Loss: 335.26019287109375
2024-08-03T21:50:26.925969604Z 
 52%|█████▏    | 4923/9500 [16:52:56<15:44:12, 12.38s/it]08/03/2024 14:50:26 - INFO - __main__ -   Step: 4923, LR: 9.932950548910152e-06, Loss: 485.9586181640625
2024-08-03T21:50:38.859032270Z 
 52%|█████▏    | 4924/9500 [16:53:08<15:33:49, 12.24s/it]08/03/2024 14:50:38 - INFO - __main__ -   Step: 4924, LR: 9.930780005222871e-06, Loss: 435.17523193359375
2024-08-03T21:50:51.257476702Z 
 52%|█████▏    | 4925/9500 [16:53:21<15:37:09, 12.29s/it]08/03/2024 14:50:51 - INFO - __main__ -   Step: 4925, LR: 9.928609461535593e-06, Loss: 291.3671875
2024-08-03T21:51:03.705724078Z 
 52%|█████▏    | 4926/9500 [16:53:33<15:40:33, 12.34s/it]08/03/2024 14:51:03 - INFO - __main__ -   Step: 4926, LR: 9.926438917848313e-06, Loss: 506.3831787109375
2024-08-03T21:51:16.159537996Z 
 52%|█████▏    | 4927/9500 [16:53:46<15:43:00, 12.37s/it]08/03/2024 14:51:16 - INFO - __main__ -   Step: 4927, LR: 9.924268374161034e-06, Loss: 519.1016845703125
2024-08-03T21:51:28.667289239Z 
 52%|█████▏    | 4928/9500 [16:53:58<15:45:52, 12.41s/it]08/03/2024 14:51:28 - INFO - __main__ -   Step: 4928, LR: 9.922097830473756e-06, Loss: 496.9654235839844
2024-08-03T21:51:40.927941569Z 
 52%|█████▏    | 4929/9500 [16:54:10<15:42:11, 12.37s/it]08/03/2024 14:51:40 - INFO - __main__ -   Step: 4929, LR: 9.919927286786477e-06, Loss: 420.10064697265625
2024-08-03T21:51:53.164033386Z 
 52%|█████▏    | 4930/9500 [16:54:23<15:38:58, 12.33s/it]08/03/2024 14:51:53 - INFO - __main__ -   Step: 4930, LR: 9.917756743099199e-06, Loss: 457.6851501464844
2024-08-03T21:52:05.811420413Z 
 52%|█████▏    | 4931/9500 [16:54:35<15:46:04, 12.42s/it]08/03/2024 14:52:05 - INFO - __main__ -   Step: 4931, LR: 9.915586199411919e-06, Loss: 604.4700317382812
2024-08-03T21:52:18.198432086Z 
 52%|█████▏    | 4932/9500 [16:54:48<15:44:59, 12.41s/it]08/03/2024 14:52:18 - INFO - __main__ -   Step: 4932, LR: 9.91341565572464e-06, Loss: 395.3406066894531
2024-08-03T21:52:30.109041898Z 
 52%|█████▏    | 4933/9500 [16:55:00<15:33:21, 12.26s/it]08/03/2024 14:52:30 - INFO - __main__ -   Step: 4933, LR: 9.91124511203736e-06, Loss: 295.0246887207031
2024-08-03T21:52:42.672053873Z 
 52%|█████▏    | 4934/9500 [16:55:12<15:40:01, 12.35s/it]08/03/2024 14:52:42 - INFO - __main__ -   Step: 4934, LR: 9.909074568350082e-06, Loss: 419.08203125
2024-08-03T21:52:55.280448370Z 
 52%|█████▏    | 4935/9500 [16:55:25<15:45:39, 12.43s/it]08/03/2024 14:52:55 - INFO - __main__ -   Step: 4935, LR: 9.906904024662803e-06, Loss: 691.9842529296875
2024-08-03T21:53:07.358057911Z 
 52%|█████▏    | 4936/9500 [16:55:37<15:37:25, 12.32s/it]08/03/2024 14:53:07 - INFO - __main__ -   Step: 4936, LR: 9.904733480975525e-06, Loss: 429.115478515625
2024-08-03T21:53:19.861450177Z 
 52%|█████▏    | 4937/9500 [16:55:49<15:41:19, 12.38s/it]08/03/2024 14:53:19 - INFO - __main__ -   Step: 4937, LR: 9.902562937288247e-06, Loss: 353.0767822265625
2024-08-03T21:53:31.945181917Z 
 52%|█████▏    | 4938/9500 [16:56:01<15:34:24, 12.29s/it]08/03/2024 14:53:31 - INFO - __main__ -   Step: 4938, LR: 9.900392393600966e-06, Loss: 463.48541259765625
2024-08-03T21:53:44.047001675Z 
 52%|█████▏    | 4939/9500 [16:56:13<15:29:55, 12.23s/it]08/03/2024 14:53:44 - INFO - __main__ -   Step: 4939, LR: 9.898221849913688e-06, Loss: 514.6196899414062
2024-08-03T21:53:56.221206793Z 
 52%|█████▏    | 4940/9500 [16:56:26<15:28:22, 12.22s/it]08/03/2024 14:53:56 - INFO - __main__ -   Step: 4940, LR: 9.896051306226408e-06, Loss: 334.41522216796875
2024-08-03T21:54:08.591770073Z 
 52%|█████▏    | 4941/9500 [16:56:38<15:31:42, 12.26s/it]08/03/2024 14:54:08 - INFO - __main__ -   Step: 4941, LR: 9.89388076253913e-06, Loss: 415.18035888671875
2024-08-03T21:54:20.860310104Z 
 52%|█████▏    | 4942/9500 [16:56:50<15:31:39, 12.26s/it]08/03/2024 14:54:20 - INFO - __main__ -   Step: 4942, LR: 9.891710218851851e-06, Loss: 428.742919921875
2024-08-03T21:54:33.177574638Z 
 52%|█████▏    | 4943/9500 [16:57:03<15:32:39, 12.28s/it]08/03/2024 14:54:33 - INFO - __main__ -   Step: 4943, LR: 9.889539675164573e-06, Loss: 417.360107421875
2024-08-03T21:54:45.675412676Z 
 52%|█████▏    | 4944/9500 [16:57:15<15:37:25, 12.35s/it]08/03/2024 14:54:45 - INFO - __main__ -   Step: 4944, LR: 9.887369131477294e-06, Loss: 440.97991943359375
2024-08-03T21:54:58.082003134Z 
 52%|█████▏    | 4945/9500 [16:57:28<15:38:36, 12.36s/it]08/03/2024 14:54:58 - INFO - __main__ -   Step: 4945, LR: 9.885198587790014e-06, Loss: 376.21221923828125
2024-08-03T21:55:10.150814538Z 
 52%|█████▏    | 4946/9500 [16:57:40<15:31:41, 12.28s/it]08/03/2024 14:55:10 - INFO - __main__ -   Step: 4946, LR: 9.883028044102736e-06, Loss: 440.59814453125
2024-08-03T21:55:22.956776306Z 
 52%|█████▏    | 4947/9500 [16:57:52<15:43:34, 12.43s/it]08/03/2024 14:55:22 - INFO - __main__ -   Step: 4947, LR: 9.880857500415455e-06, Loss: 324.17584228515625
2024-08-03T21:55:35.397754641Z 
 52%|█████▏    | 4948/9500 [16:58:05<15:43:30, 12.44s/it]08/03/2024 14:55:35 - INFO - __main__ -   Step: 4948, LR: 9.878686956728177e-06, Loss: 439.997802734375
2024-08-03T21:55:47.598354679Z 
 52%|█████▏    | 4949/9500 [16:58:17<15:37:55, 12.37s/it]08/03/2024 14:55:47 - INFO - __main__ -   Step: 4949, LR: 9.876516413040899e-06, Loss: 293.2774658203125
2024-08-03T21:56:00.288054522Z 
 52%|█████▏    | 4950/9500 [16:58:30<15:45:06, 12.46s/it]08/03/2024 14:56:00 - INFO - __main__ -   Step: 4950, LR: 9.87434586935362e-06, Loss: 565.75
2024-08-03T21:56:12.424684217Z 
 52%|█████▏    | 4951/9500 [16:58:42<15:37:28, 12.36s/it]08/03/2024 14:56:12 - INFO - __main__ -   Step: 4951, LR: 9.872175325666342e-06, Loss: 364.8769226074219
2024-08-03T21:56:24.558111329Z 
 52%|█████▏    | 4952/9500 [16:58:54<15:31:59, 12.30s/it]08/03/2024 14:56:24 - INFO - __main__ -   Step: 4952, LR: 9.870004781979061e-06, Loss: 409.71026611328125
2024-08-03T21:56:36.941592330Z 
 52%|█████▏    | 4953/9500 [16:59:06<15:33:47, 12.32s/it]08/03/2024 14:56:36 - INFO - __main__ -   Step: 4953, LR: 9.867834238291783e-06, Loss: 332.381103515625
2024-08-03T21:56:49.232740483Z 
 52%|█████▏    | 4954/9500 [16:59:19<15:32:53, 12.31s/it]08/03/2024 14:56:49 - INFO - __main__ -   Step: 4954, LR: 9.865663694604503e-06, Loss: 369.71160888671875
2024-08-03T21:57:01.378834020Z 
 52%|█████▏    | 4955/9500 [16:59:31<15:28:53, 12.26s/it]08/03/2024 14:57:01 - INFO - __main__ -   Step: 4955, LR: 9.863493150917224e-06, Loss: 452.40093994140625
2024-08-03T21:57:13.581072152Z 
 52%|█████▏    | 4956/9500 [16:59:43<15:27:19, 12.24s/it]08/03/2024 14:57:13 - INFO - __main__ -   Step: 4956, LR: 9.861322607229946e-06, Loss: 442.4913024902344
2024-08-03T21:57:26.103407535Z 
 52%|█████▏    | 4957/9500 [16:59:56<15:33:25, 12.33s/it]08/03/2024 14:57:26 - INFO - __main__ -   Step: 4957, LR: 9.859152063542668e-06, Loss: 491.57232666015625
2024-08-03T21:57:38.121931832Z 
 52%|█████▏    | 4958/9500 [17:00:08<15:26:11, 12.24s/it]08/03/2024 14:57:38 - INFO - __main__ -   Step: 4958, LR: 9.856981519855389e-06, Loss: 407.5662841796875
2024-08-03T21:57:50.748443116Z 
 52%|█████▏    | 4959/9500 [17:00:20<15:34:53, 12.35s/it]08/03/2024 14:57:50 - INFO - __main__ -   Step: 4959, LR: 9.85481097616811e-06, Loss: 396.1379089355469
2024-08-03T21:58:03.194632675Z 
 52%|█████▏    | 4960/9500 [17:00:33<15:36:48, 12.38s/it]08/03/2024 14:58:03 - INFO - __main__ -   Step: 4960, LR: 9.85264043248083e-06, Loss: 453.4283142089844
2024-08-03T21:58:15.479410148Z 
 52%|█████▏    | 4961/9500 [17:00:45<15:34:25, 12.35s/it]08/03/2024 14:58:15 - INFO - __main__ -   Step: 4961, LR: 9.85046988879355e-06, Loss: 634.6375732421875
2024-08-03T21:58:28.119083279Z 
 52%|█████▏    | 4962/9500 [17:00:58<15:40:44, 12.44s/it]08/03/2024 14:58:28 - INFO - __main__ -   Step: 4962, LR: 9.848299345106272e-06, Loss: 457.71148681640625
2024-08-03T21:58:40.422673787Z 
 52%|█████▏    | 4963/9500 [17:01:10<15:37:28, 12.40s/it]08/03/2024 14:58:40 - INFO - __main__ -   Step: 4963, LR: 9.846128801418994e-06, Loss: 335.79791259765625
2024-08-03T21:58:52.566090290Z 
 52%|█████▏    | 4964/9500 [17:01:22<15:31:30, 12.32s/it]08/03/2024 14:58:52 - INFO - __main__ -   Step: 4964, LR: 9.843958257731715e-06, Loss: 386.606689453125
2024-08-03T21:59:05.002149427Z 
 52%|█████▏    | 4965/9500 [17:01:34<15:33:54, 12.36s/it]08/03/2024 14:59:05 - INFO - __main__ -   Step: 4965, LR: 9.841787714044437e-06, Loss: 490.4940185546875
2024-08-03T21:59:17.188489118Z 
 52%|█████▏    | 4966/9500 [17:01:47<15:29:50, 12.30s/it]08/03/2024 14:59:17 - INFO - __main__ -   Step: 4966, LR: 9.839617170357158e-06, Loss: 525.484375
2024-08-03T21:59:29.314570159Z 
 52%|█████▏    | 4967/9500 [17:01:59<15:25:35, 12.25s/it]08/03/2024 14:59:29 - INFO - __main__ -   Step: 4967, LR: 9.837446626669878e-06, Loss: 523.3646240234375
2024-08-03T21:59:41.787177624Z 
 52%|█████▏    | 4968/9500 [17:02:11<15:30:23, 12.32s/it]08/03/2024 14:59:41 - INFO - __main__ -   Step: 4968, LR: 9.8352760829826e-06, Loss: 313.13580322265625
2024-08-03T21:59:54.186129667Z 
 52%|█████▏    | 4969/9500 [17:02:24<15:32:02, 12.34s/it]08/03/2024 14:59:54 - INFO - __main__ -   Step: 4969, LR: 9.83310553929532e-06, Loss: 510.7505798339844
2024-08-03T22:00:06.147220281Z 
 52%|█████▏    | 4970/9500 [17:02:36<15:23:11, 12.23s/it]08/03/2024 15:00:06 - INFO - __main__ -   Step: 4970, LR: 9.830934995608041e-06, Loss: 440.80914306640625
2024-08-03T22:00:18.985421137Z 
 52%|█████▏    | 4971/9500 [17:02:48<15:36:49, 12.41s/it]08/03/2024 15:00:18 - INFO - __main__ -   Step: 4971, LR: 9.828764451920763e-06, Loss: 477.93768310546875
2024-08-03T22:00:31.177142752Z 
 52%|█████▏    | 4972/9500 [17:03:01<15:31:38, 12.35s/it]08/03/2024 15:00:31 - INFO - __main__ -   Step: 4972, LR: 9.826593908233484e-06, Loss: 406.21527099609375
2024-08-03T22:00:43.088806623Z 
 52%|█████▏    | 4973/9500 [17:03:13<15:21:37, 12.22s/it]08/03/2024 15:00:43 - INFO - __main__ -   Step: 4973, LR: 9.824423364546206e-06, Loss: 342.8856506347656
2024-08-03T22:00:55.835353637Z 
 52%|█████▏    | 4974/9500 [17:03:25<15:33:27, 12.37s/it]08/03/2024 15:00:55 - INFO - __main__ -   Step: 4974, LR: 9.822252820858926e-06, Loss: 444.6219787597656
2024-08-03T22:01:07.724266266Z 
 52%|█████▏    | 4975/9500 [17:03:37<15:22:15, 12.23s/it]08/03/2024 15:01:07 - INFO - __main__ -   Step: 4975, LR: 9.820082277171647e-06, Loss: 391.16900634765625
2024-08-03T22:01:19.835211785Z 
 52%|█████▏    | 4976/9500 [17:03:49<15:19:23, 12.19s/it]08/03/2024 15:01:19 - INFO - __main__ -   Step: 4976, LR: 9.817911733484367e-06, Loss: 340.5842590332031
2024-08-03T22:01:32.163022550Z 
 52%|█████▏    | 4977/9500 [17:04:02<15:22:13, 12.23s/it]08/03/2024 15:01:32 - INFO - __main__ -   Step: 4977, LR: 9.815741189797089e-06, Loss: 541.6632080078125
2024-08-03T22:01:44.641039085Z 
 52%|█████▏    | 4978/9500 [17:04:14<15:27:32, 12.31s/it]08/03/2024 15:01:44 - INFO - __main__ -   Step: 4978, LR: 9.81357064610981e-06, Loss: 355.2742919921875
2024-08-03T22:01:56.816897689Z 
 52%|█████▏    | 4979/9500 [17:04:26<15:24:22, 12.27s/it]08/03/2024 15:01:56 - INFO - __main__ -   Step: 4979, LR: 9.811400102422532e-06, Loss: 449.97869873046875
2024-08-03T22:02:09.116406393Z 
 52%|█████▏    | 4980/9500 [17:04:39<15:24:52, 12.28s/it]08/03/2024 15:02:09 - INFO - __main__ -   Step: 4980, LR: 9.809229558735253e-06, Loss: 452.6910400390625
2024-08-03T22:02:21.073114523Z 
 52%|█████▏    | 4981/9500 [17:04:51<15:17:26, 12.18s/it]08/03/2024 15:02:21 - INFO - __main__ -   Step: 4981, LR: 9.807059015047973e-06, Loss: 330.66204833984375
2024-08-03T22:02:33.342874928Z 
 52%|█████▏    | 4982/9500 [17:05:03<15:19:14, 12.21s/it]08/03/2024 15:02:33 - INFO - __main__ -   Step: 4982, LR: 9.804888471360695e-06, Loss: 334.94964599609375
2024-08-03T22:02:45.592519520Z 
 52%|█████▏    | 4983/9500 [17:05:15<15:19:58, 12.22s/it]08/03/2024 15:02:45 - INFO - __main__ -   Step: 4983, LR: 9.802717927673415e-06, Loss: 519.43896484375
2024-08-03T22:02:58.187519520Z 
 52%|█████▏    | 4984/9500 [17:05:28<15:28:14, 12.33s/it]08/03/2024 15:02:58 - INFO - __main__ -   Step: 4984, LR: 9.800547383986136e-06, Loss: 487.57452392578125
2024-08-03T22:03:10.555222856Z 
 52%|█████▏    | 4985/9500 [17:05:40<15:28:49, 12.34s/it]08/03/2024 15:03:10 - INFO - __main__ -   Step: 4985, LR: 9.798376840298858e-06, Loss: 403.85986328125
2024-08-03T22:03:22.677070528Z 
 52%|█████▏    | 4986/9500 [17:05:52<15:23:38, 12.28s/it]08/03/2024 15:03:22 - INFO - __main__ -   Step: 4986, LR: 9.79620629661158e-06, Loss: 387.7705078125
2024-08-03T22:03:35.288603179Z 
 52%|█████▏    | 4987/9500 [17:06:05<15:30:58, 12.38s/it]08/03/2024 15:03:35 - INFO - __main__ -   Step: 4987, LR: 9.7940357529243e-06, Loss: 406.3675537109375
2024-08-03T22:03:47.305206653Z 
 53%|█████▎    | 4988/9500 [17:06:17<15:22:37, 12.27s/it]08/03/2024 15:03:47 - INFO - __main__ -   Step: 4988, LR: 9.79186520923702e-06, Loss: 411.5872497558594
2024-08-03T22:03:59.466743833Z 
 53%|█████▎    | 4989/9500 [17:06:29<15:19:59, 12.24s/it]08/03/2024 15:03:59 - INFO - __main__ -   Step: 4989, LR: 9.789694665549742e-06, Loss: 447.6180725097656
2024-08-03T22:04:12.034495389Z 
 53%|█████▎    | 4990/9500 [17:06:41<15:27:16, 12.34s/it]08/03/2024 15:04:12 - INFO - __main__ -   Step: 4990, LR: 9.787524121862462e-06, Loss: 417.50103759765625
2024-08-03T22:04:24.256420350Z 
 53%|█████▎    | 4991/9500 [17:06:54<15:24:29, 12.30s/it]08/03/2024 15:04:24 - INFO - __main__ -   Step: 4991, LR: 9.785353578175184e-06, Loss: 409.1997985839844
2024-08-03T22:04:36.326037079Z 
 53%|█████▎    | 4992/9500 [17:07:06<15:19:02, 12.23s/it]08/03/2024 15:04:36 - INFO - __main__ -   Step: 4992, LR: 9.783183034487905e-06, Loss: 420.1269836425781
2024-08-03T22:04:48.733649638Z 
 53%|█████▎    | 4993/9500 [17:07:18<15:22:47, 12.28s/it]08/03/2024 15:04:48 - INFO - __main__ -   Step: 4993, LR: 9.781012490800627e-06, Loss: 454.81524658203125
2024-08-03T22:05:01.232304146Z 
 53%|█████▎    | 4994/9500 [17:07:31<15:27:24, 12.35s/it]08/03/2024 15:05:01 - INFO - __main__ -   Step: 4994, LR: 9.778841947113348e-06, Loss: 465.20318603515625
2024-08-03T22:05:13.507997765Z 
 53%|█████▎    | 4995/9500 [17:07:43<15:25:33, 12.33s/it]08/03/2024 15:05:13 - INFO - __main__ -   Step: 4995, LR: 9.776671403426068e-06, Loss: 406.00775146484375
2024-08-03T22:05:26.046415954Z 
 53%|█████▎    | 4996/9500 [17:07:55<15:30:06, 12.39s/it]08/03/2024 15:05:26 - INFO - __main__ -   Step: 4996, LR: 9.77450085973879e-06, Loss: 445.59814453125
2024-08-03T22:05:38.185252591Z 
 53%|█████▎    | 4997/9500 [17:08:08<15:24:14, 12.31s/it]08/03/2024 15:05:38 - INFO - __main__ -   Step: 4997, LR: 9.77233031605151e-06, Loss: 281.6214599609375
2024-08-03T22:05:50.545792898Z 
 53%|█████▎    | 4998/9500 [17:08:20<15:25:03, 12.33s/it]08/03/2024 15:05:50 - INFO - __main__ -   Step: 4998, LR: 9.770159772364231e-06, Loss: 479.3547668457031
2024-08-03T22:06:03.070432920Z 
 53%|█████▎    | 4999/9500 [17:08:33<15:29:16, 12.39s/it]08/03/2024 15:06:03 - INFO - __main__ -   Step: 4999, LR: 9.767989228676953e-06, Loss: 419.936279296875
2024-08-03T22:06:15.297243004Z 
 53%|█████▎    | 5000/9500 [17:08:45<15:25:26, 12.34s/it]08/03/2024 15:06:15 - INFO - __main__ -   Step: 5000, LR: 9.765818684989674e-06, Loss: 450.9425354003906
2024-08-03T22:06:27.791648767Z 
 53%|█████▎    | 5001/9500 [17:08:57<15:28:43, 12.39s/it]08/03/2024 15:06:27 - INFO - __main__ -   Step: 5001, LR: 9.763648141302396e-06, Loss: 450.06134033203125
2024-08-03T22:06:40.644819297Z 
 53%|█████▎    | 5002/9500 [17:09:10<15:39:02, 12.53s/it]08/03/2024 15:06:40 - INFO - __main__ -   Step: 5002, LR: 9.761477597615116e-06, Loss: 463.58251953125
2024-08-03T22:06:52.798112028Z 
 53%|█████▎    | 5003/9500 [17:09:22<15:30:26, 12.41s/it]08/03/2024 15:06:52 - INFO - __main__ -   Step: 5003, LR: 9.759307053927837e-06, Loss: 485.3280334472656
2024-08-03T22:07:05.227981354Z 
 53%|█████▎    | 5004/9500 [17:09:35<15:30:35, 12.42s/it]08/03/2024 15:07:05 - INFO - __main__ -   Step: 5004, LR: 9.757136510240557e-06, Loss: 408.00140380859375
2024-08-03T22:07:17.801693228Z 
 53%|█████▎    | 5005/9500 [17:09:47<15:33:51, 12.47s/it]08/03/2024 15:07:17 - INFO - __main__ -   Step: 5005, LR: 9.754965966553279e-06, Loss: 425.91705322265625
2024-08-03T22:07:30.354272136Z 
 53%|█████▎    | 5006/9500 [17:10:00<15:35:36, 12.49s/it]08/03/2024 15:07:30 - INFO - __main__ -   Step: 5006, LR: 9.752795422866e-06, Loss: 523.86376953125
2024-08-03T22:07:42.442294209Z 
 53%|█████▎    | 5007/9500 [17:10:12<15:26:20, 12.37s/it]08/03/2024 15:07:42 - INFO - __main__ -   Step: 5007, LR: 9.750624879178722e-06, Loss: 362.56817626953125
2024-08-03T22:07:55.021462445Z 
 53%|█████▎    | 5008/9500 [17:10:24<15:30:49, 12.43s/it]08/03/2024 15:07:55 - INFO - __main__ -   Step: 5008, LR: 9.748454335491443e-06, Loss: 453.51837158203125
2024-08-03T22:08:07.334100087Z 
 53%|█████▎    | 5009/9500 [17:10:37<15:27:54, 12.40s/it]08/03/2024 15:08:07 - INFO - __main__ -   Step: 5009, LR: 9.746283791804163e-06, Loss: 500.2447204589844
2024-08-03T22:08:19.529703726Z 
 53%|█████▎    | 5010/9500 [17:10:49<15:23:10, 12.34s/it]08/03/2024 15:08:19 - INFO - __main__ -   Step: 5010, LR: 9.744113248116885e-06, Loss: 498.24005126953125
2024-08-03T22:08:32.445418079Z 
 53%|█████▎    | 5011/9500 [17:11:02<15:35:58, 12.51s/it]08/03/2024 15:08:32 - INFO - __main__ -   Step: 5011, LR: 9.741942704429606e-06, Loss: 419.4917907714844
2024-08-03T22:08:44.724636555Z 
 53%|█████▎    | 5012/9500 [17:11:14<15:30:35, 12.44s/it]08/03/2024 15:08:44 - INFO - __main__ -   Step: 5012, LR: 9.739772160742326e-06, Loss: 407.06536865234375
2024-08-03T22:08:56.713245681Z 
 53%|█████▎    | 5013/9500 [17:11:26<15:20:13, 12.31s/it]08/03/2024 15:08:56 - INFO - __main__ -   Step: 5013, LR: 9.737601617055048e-06, Loss: 330.1858215332031
2024-08-03T22:09:09.133550247Z 
 53%|█████▎    | 5014/9500 [17:11:39<15:22:36, 12.34s/it]08/03/2024 15:09:09 - INFO - __main__ -   Step: 5014, LR: 9.73543107336777e-06, Loss: 431.2021484375
2024-08-03T22:09:21.689985998Z 
 53%|█████▎    | 5015/9500 [17:11:51<15:27:14, 12.40s/it]08/03/2024 15:09:21 - INFO - __main__ -   Step: 5015, LR: 9.733260529680491e-06, Loss: 456.3260498046875
2024-08-03T22:09:33.803513435Z 
 53%|█████▎    | 5016/9500 [17:12:03<15:20:31, 12.32s/it]08/03/2024 15:09:33 - INFO - __main__ -   Step: 5016, LR: 9.73108998599321e-06, Loss: 335.61810302734375
2024-08-03T22:09:46.108291362Z 
 53%|█████▎    | 5017/9500 [17:12:16<15:20:02, 12.31s/it]08/03/2024 15:09:46 - INFO - __main__ -   Step: 5017, LR: 9.728919442305932e-06, Loss: 366.21337890625
2024-08-03T22:09:58.536459702Z 
 53%|█████▎    | 5018/9500 [17:12:28<15:22:23, 12.35s/it]08/03/2024 15:09:58 - INFO - __main__ -   Step: 5018, LR: 9.726748898618654e-06, Loss: 593.158447265625
2024-08-03T22:10:10.612146449Z 
 53%|█████▎    | 5019/9500 [17:12:40<15:16:05, 12.27s/it]08/03/2024 15:10:10 - INFO - __main__ -   Step: 5019, LR: 9.724578354931374e-06, Loss: 453.4776611328125
2024-08-03T22:10:23.064835373Z 
 53%|█████▎    | 5020/9500 [17:12:53<15:20:03, 12.32s/it]08/03/2024 15:10:23 - INFO - __main__ -   Step: 5020, LR: 9.722407811244095e-06, Loss: 392.2245178222656
2024-08-03T22:10:35.020423067Z 
 53%|█████▎    | 5021/9500 [17:13:04<15:11:38, 12.21s/it]08/03/2024 15:10:35 - INFO - __main__ -   Step: 5021, LR: 9.720237267556817e-06, Loss: 403.1329650878906
2024-08-03T22:10:47.217720281Z 
 53%|█████▎    | 5022/9500 [17:13:17<15:11:06, 12.21s/it]08/03/2024 15:10:47 - INFO - __main__ -   Step: 5022, LR: 9.718066723869538e-06, Loss: 455.2414245605469
2024-08-03T22:10:59.499290277Z 
 53%|█████▎    | 5023/9500 [17:13:29<15:12:33, 12.23s/it]08/03/2024 15:10:59 - INFO - __main__ -   Step: 5023, LR: 9.715896180182258e-06, Loss: 411.6966552734375
2024-08-03T22:11:11.686706327Z 
 53%|█████▎    | 5024/9500 [17:13:41<15:11:23, 12.22s/it]08/03/2024 15:11:11 - INFO - __main__ -   Step: 5024, LR: 9.71372563649498e-06, Loss: 379.8944091796875
2024-08-03T22:11:23.843389509Z 
 53%|█████▎    | 5025/9500 [17:13:53<15:09:50, 12.20s/it]08/03/2024 15:11:23 - INFO - __main__ -   Step: 5025, LR: 9.711555092807701e-06, Loss: 340.8298034667969
2024-08-03T22:11:35.877817639Z 
 53%|█████▎    | 5026/9500 [17:14:05<15:05:57, 12.15s/it]08/03/2024 15:11:35 - INFO - __main__ -   Step: 5026, LR: 9.709384549120421e-06, Loss: 493.63885498046875
2024-08-03T22:11:48.468051020Z 
 53%|█████▎    | 5027/9500 [17:14:18<15:15:36, 12.28s/it]08/03/2024 15:11:48 - INFO - __main__ -   Step: 5027, LR: 9.707214005433143e-06, Loss: 384.27825927734375
2024-08-03T22:12:01.065831139Z 
 53%|█████▎    | 5028/9500 [17:14:31<15:22:28, 12.38s/it]08/03/2024 15:12:01 - INFO - __main__ -   Step: 5028, LR: 9.705043461745864e-06, Loss: 534.008056640625
2024-08-03T22:12:13.189163651Z 
 53%|█████▎    | 5029/9500 [17:14:43<15:16:35, 12.30s/it]08/03/2024 15:12:13 - INFO - __main__ -   Step: 5029, LR: 9.702872918058586e-06, Loss: 374.42431640625
2024-08-03T22:12:25.737823487Z 
 53%|█████▎    | 5030/9500 [17:14:55<15:21:56, 12.38s/it]08/03/2024 15:12:25 - INFO - __main__ -   Step: 5030, LR: 9.700702374371306e-06, Loss: 399.0707092285156
2024-08-03T22:12:37.835647580Z 
 53%|█████▎    | 5031/9500 [17:15:07<15:15:32, 12.29s/it]08/03/2024 15:12:37 - INFO - __main__ -   Step: 5031, LR: 9.698531830684027e-06, Loss: 350.14337158203125
2024-08-03T22:12:50.107330629Z 
 53%|█████▎    | 5032/9500 [17:15:20<15:14:53, 12.29s/it]08/03/2024 15:12:50 - INFO - __main__ -   Step: 5032, LR: 9.696361286996749e-06, Loss: 324.00146484375
2024-08-03T22:13:02.608592601Z 
 53%|█████▎    | 5033/9500 [17:15:32<15:19:28, 12.35s/it]08/03/2024 15:13:02 - INFO - __main__ -   Step: 5033, LR: 9.694190743309469e-06, Loss: 455.4969482421875
2024-08-03T22:13:14.990443436Z 
 53%|█████▎    | 5034/9500 [17:15:44<15:19:58, 12.36s/it]08/03/2024 15:13:14 - INFO - __main__ -   Step: 5034, LR: 9.69202019962219e-06, Loss: 327.9158630371094
2024-08-03T22:13:26.837195404Z 
 53%|█████▎    | 5035/9500 [17:15:56<15:08:19, 12.21s/it]08/03/2024 15:13:26 - INFO - __main__ -   Step: 5035, LR: 9.689849655934912e-06, Loss: 380.1837463378906
2024-08-03T22:13:39.470709713Z 
 53%|█████▎    | 5036/9500 [17:16:09<15:17:39, 12.33s/it]08/03/2024 15:13:39 - INFO - __main__ -   Step: 5036, LR: 9.687679112247634e-06, Loss: 519.2005615234375
2024-08-03T22:13:52.089009142Z 
 53%|█████▎    | 5037/9500 [17:16:22<15:23:48, 12.42s/it]08/03/2024 15:13:52 - INFO - __main__ -   Step: 5037, LR: 9.685508568560353e-06, Loss: 484.8294372558594
2024-08-03T22:14:04.099571760Z 
 53%|█████▎    | 5038/9500 [17:16:34<15:14:28, 12.30s/it]08/03/2024 15:14:04 - INFO - __main__ -   Step: 5038, LR: 9.683338024873075e-06, Loss: 331.04437255859375
2024-08-03T22:14:16.631814394Z 
 53%|█████▎    | 5039/9500 [17:16:46<15:19:31, 12.37s/it]08/03/2024 15:14:16 - INFO - __main__ -   Step: 5039, LR: 9.681167481185797e-06, Loss: 432.20721435546875
2024-08-03T22:14:28.612513413Z 
 53%|█████▎    | 5040/9500 [17:16:58<15:10:41, 12.25s/it]08/03/2024 15:14:28 - INFO - __main__ -   Step: 5040, LR: 9.678996937498516e-06, Loss: 341.34539794921875
2024-08-03T22:14:40.580857573Z 
 53%|█████▎    | 5041/9500 [17:17:10<15:04:10, 12.17s/it]08/03/2024 15:14:40 - INFO - __main__ -   Step: 5041, LR: 9.676826393811238e-06, Loss: 386.0113525390625
2024-08-03T22:14:53.367298750Z 
 53%|█████▎    | 5042/9500 [17:17:23<15:17:47, 12.35s/it]08/03/2024 15:14:53 - INFO - __main__ -   Step: 5042, LR: 9.67465585012396e-06, Loss: 335.5653076171875
2024-08-03T22:15:05.660524697Z 
 53%|█████▎    | 5043/9500 [17:17:35<15:16:15, 12.33s/it]08/03/2024 15:15:05 - INFO - __main__ -   Step: 5043, LR: 9.67248530643668e-06, Loss: 477.12353515625
2024-08-03T22:15:17.970494323Z 
 53%|█████▎    | 5044/9500 [17:17:47<15:15:30, 12.33s/it]08/03/2024 15:15:17 - INFO - __main__ -   Step: 5044, LR: 9.670314762749401e-06, Loss: 457.5635681152344
2024-08-03T22:15:30.613245976Z 
 53%|█████▎    | 5045/9500 [17:18:00<15:22:19, 12.42s/it]08/03/2024 15:15:30 - INFO - __main__ -   Step: 5045, LR: 9.668144219062122e-06, Loss: 405.6638488769531
2024-08-03T22:15:42.931587211Z 
 53%|█████▎    | 5046/9500 [17:18:12<15:19:48, 12.39s/it]08/03/2024 15:15:42 - INFO - __main__ -   Step: 5046, LR: 9.665973675374844e-06, Loss: 390.83697509765625
2024-08-03T22:15:55.058131463Z 
 53%|█████▎    | 5047/9500 [17:18:24<15:13:41, 12.31s/it]08/03/2024 15:15:55 - INFO - __main__ -   Step: 5047, LR: 9.663803131687564e-06, Loss: 422.40313720703125
2024-08-03T22:16:07.716217016Z 
 53%|█████▎    | 5048/9500 [17:18:37<15:21:14, 12.42s/it]08/03/2024 15:16:07 - INFO - __main__ -   Step: 5048, LR: 9.661632588000285e-06, Loss: 426.03094482421875
2024-08-03T22:16:20.392345997Z 
 53%|█████▎    | 5049/9500 [17:18:50<15:26:49, 12.49s/it]08/03/2024 15:16:20 - INFO - __main__ -   Step: 5049, LR: 9.659462044313007e-06, Loss: 468.82244873046875
2024-08-03T22:16:32.306128306Z 
 53%|█████▎    | 5050/9500 [17:19:02<15:13:43, 12.32s/it]08/03/2024 15:16:32 - INFO - __main__ -   Step: 5050, LR: 9.657291500625727e-06, Loss: 329.749755859375
2024-08-03T22:16:44.805143551Z 
 53%|█████▎    | 5051/9500 [17:19:14<15:17:29, 12.37s/it]08/03/2024 15:16:44 - INFO - __main__ -   Step: 5051, LR: 9.655120956938448e-06, Loss: 393.120361328125
2024-08-03T22:16:57.094886648Z 
 53%|█████▎    | 5052/9500 [17:19:27<15:15:25, 12.35s/it]08/03/2024 15:16:57 - INFO - __main__ -   Step: 5052, LR: 9.65295041325117e-06, Loss: 397.71337890625
2024-08-03T22:17:09.113650652Z 
 53%|█████▎    | 5053/9500 [17:19:39<15:07:53, 12.25s/it]08/03/2024 15:17:09 - INFO - __main__ -   Step: 5053, LR: 9.650779869563892e-06, Loss: 398.037353515625
2024-08-03T22:17:21.692272534Z 
 53%|█████▎    | 5054/9500 [17:19:51<15:15:00, 12.35s/it]08/03/2024 15:17:21 - INFO - __main__ -   Step: 5054, LR: 9.648609325876611e-06, Loss: 426.50543212890625
2024-08-03T22:17:33.862766037Z 
 53%|█████▎    | 5055/9500 [17:20:03<15:10:50, 12.29s/it]08/03/2024 15:17:33 - INFO - __main__ -   Step: 5055, LR: 9.646438782189333e-06, Loss: 446.8644104003906
2024-08-03T22:17:45.947628499Z 
 53%|█████▎    | 5056/9500 [17:20:15<15:05:58, 12.23s/it]08/03/2024 15:17:45 - INFO - __main__ -   Step: 5056, LR: 9.644268238502055e-06, Loss: 429.41522216796875
2024-08-03T22:17:58.406892506Z 
 53%|█████▎    | 5057/9500 [17:20:28<15:10:49, 12.30s/it]08/03/2024 15:17:58 - INFO - __main__ -   Step: 5057, LR: 9.642097694814774e-06, Loss: 411.6007385253906
2024-08-03T22:18:10.638157581Z 
 53%|█████▎    | 5058/9500 [17:20:40<15:09:05, 12.28s/it]08/03/2024 15:18:10 - INFO - __main__ -   Step: 5058, LR: 9.639927151127496e-06, Loss: 329.3659362792969
2024-08-03T22:18:22.828976570Z 
 53%|█████▎    | 5059/9500 [17:20:52<15:06:54, 12.25s/it]08/03/2024 15:18:22 - INFO - __main__ -   Step: 5059, LR: 9.637756607440218e-06, Loss: 502.7664794921875
2024-08-03T22:18:35.310983953Z 
 53%|█████▎    | 5060/9500 [17:21:05<15:11:47, 12.32s/it]08/03/2024 15:18:35 - INFO - __main__ -   Step: 5060, LR: 9.635586063752939e-06, Loss: 507.4689636230469
2024-08-03T22:18:47.339513960Z 
 53%|█████▎    | 5061/9500 [17:21:17<15:05:04, 12.23s/it]08/03/2024 15:18:47 - INFO - __main__ -   Step: 5061, LR: 9.63341552006566e-06, Loss: 387.117431640625
2024-08-03T22:18:59.599922401Z 
 53%|█████▎    | 5062/9500 [17:21:29<15:05:29, 12.24s/it]08/03/2024 15:18:59 - INFO - __main__ -   Step: 5062, LR: 9.63124497637838e-06, Loss: 424.54803466796875
2024-08-03T22:19:11.960851935Z 
 53%|█████▎    | 5063/9500 [17:21:41<15:07:55, 12.28s/it]08/03/2024 15:19:11 - INFO - __main__ -   Step: 5063, LR: 9.629074432691102e-06, Loss: 333.65545654296875
2024-08-03T22:19:24.050437897Z 
 53%|█████▎    | 5064/9500 [17:21:53<15:03:32, 12.22s/it]08/03/2024 15:19:24 - INFO - __main__ -   Step: 5064, LR: 9.626903889003822e-06, Loss: 547.0352172851562
2024-08-03T22:19:36.172373576Z 
 53%|█████▎    | 5065/9500 [17:22:06<15:01:08, 12.19s/it]08/03/2024 15:19:36 - INFO - __main__ -   Step: 5065, LR: 9.624733345316544e-06, Loss: 395.6197814941406
2024-08-03T22:19:48.570176573Z 
 53%|█████▎    | 5066/9500 [17:22:18<15:05:31, 12.25s/it]08/03/2024 15:19:48 - INFO - __main__ -   Step: 5066, LR: 9.622562801629265e-06, Loss: 467.46331787109375
2024-08-03T22:20:00.762995725Z 
 53%|█████▎    | 5067/9500 [17:22:30<15:03:58, 12.24s/it]08/03/2024 15:20:00 - INFO - __main__ -   Step: 5067, LR: 9.620392257941987e-06, Loss: 383.89056396484375
2024-08-03T22:20:13.345091718Z 
 53%|█████▎    | 5068/9500 [17:22:43<15:11:27, 12.34s/it]08/03/2024 15:20:13 - INFO - __main__ -   Step: 5068, LR: 9.618221714254708e-06, Loss: 467.3959655761719
2024-08-03T22:20:25.388660809Z 
 53%|█████▎    | 5069/9500 [17:22:55<15:04:41, 12.25s/it]08/03/2024 15:20:25 - INFO - __main__ -   Step: 5069, LR: 9.616051170567428e-06, Loss: 497.6953125
2024-08-03T22:20:37.908608690Z 
 53%|█████▎    | 5070/9500 [17:23:07<15:10:28, 12.33s/it]08/03/2024 15:20:37 - INFO - __main__ -   Step: 5070, LR: 9.61388062688015e-06, Loss: 329.98175048828125
2024-08-03T22:20:49.868972421Z 
 53%|█████▎    | 5071/9500 [17:23:19<15:02:02, 12.22s/it]08/03/2024 15:20:49 - INFO - __main__ -   Step: 5071, LR: 9.61171008319287e-06, Loss: 372.54364013671875
2024-08-03T22:21:01.966894509Z 
 53%|█████▎    | 5072/9500 [17:23:31<14:59:08, 12.18s/it]08/03/2024 15:21:01 - INFO - __main__ -   Step: 5072, LR: 9.609539539505591e-06, Loss: 408.40606689453125
2024-08-03T22:21:14.531490194Z 
 53%|█████▎    | 5073/9500 [17:23:44<15:07:21, 12.30s/it]08/03/2024 15:21:14 - INFO - __main__ -   Step: 5073, LR: 9.607368995818313e-06, Loss: 383.28851318359375
2024-08-03T22:21:27.063361116Z 
 53%|█████▎    | 5074/9500 [17:23:57<15:12:20, 12.37s/it]08/03/2024 15:21:27 - INFO - __main__ -   Step: 5074, LR: 9.605198452131034e-06, Loss: 591.4042358398438
2024-08-03T22:21:39.114647454Z 
 53%|█████▎    | 5075/9500 [17:24:09<15:05:07, 12.27s/it]08/03/2024 15:21:39 - INFO - __main__ -   Step: 5075, LR: 9.603027908443756e-06, Loss: 379.1675109863281
2024-08-03T22:21:51.768767679Z 
 53%|█████▎    | 5076/9500 [17:24:21<15:13:21, 12.39s/it]08/03/2024 15:21:51 - INFO - __main__ -   Step: 5076, LR: 9.600857364756476e-06, Loss: 429.64996337890625
2024-08-03T22:22:04.226732403Z 
 53%|█████▎    | 5077/9500 [17:24:34<15:14:42, 12.41s/it]08/03/2024 15:22:04 - INFO - __main__ -   Step: 5077, LR: 9.598686821069197e-06, Loss: 503.86175537109375
2024-08-03T22:22:16.351824845Z 
 53%|█████▎    | 5078/9500 [17:24:46<15:08:14, 12.32s/it]08/03/2024 15:22:16 - INFO - __main__ -   Step: 5078, LR: 9.596516277381917e-06, Loss: 361.2012939453125
2024-08-03T22:22:28.943658904Z 
 53%|█████▎    | 5079/9500 [17:24:58<15:13:58, 12.40s/it]08/03/2024 15:22:28 - INFO - __main__ -   Step: 5079, LR: 9.594345733694639e-06, Loss: 377.82965087890625
2024-08-03T22:22:41.204229541Z 
 53%|█████▎    | 5080/9500 [17:25:11<15:10:35, 12.36s/it]08/03/2024 15:22:41 - INFO - __main__ -   Step: 5080, LR: 9.59217519000736e-06, Loss: 530.4345703125
2024-08-03T22:22:53.441145432Z 
 53%|█████▎    | 5081/9500 [17:25:23<15:07:38, 12.32s/it]08/03/2024 15:22:53 - INFO - __main__ -   Step: 5081, LR: 9.590004646320082e-06, Loss: 426.52642822265625
2024-08-03T22:23:05.990139544Z 
 53%|█████▎    | 5082/9500 [17:25:35<15:12:24, 12.39s/it]08/03/2024 15:23:05 - INFO - __main__ -   Step: 5082, LR: 9.587834102632803e-06, Loss: 300.17669677734375
2024-08-03T22:23:18.229963535Z 
 54%|█████▎    | 5083/9500 [17:25:48<15:08:51, 12.35s/it]08/03/2024 15:23:18 - INFO - __main__ -   Step: 5083, LR: 9.585663558945523e-06, Loss: 422.87457275390625
2024-08-03T22:23:30.245080120Z 
 54%|█████▎    | 5084/9500 [17:26:00<15:01:21, 12.25s/it]08/03/2024 15:23:30 - INFO - __main__ -   Step: 5084, LR: 9.583493015258245e-06, Loss: 331.9542541503906
2024-08-03T22:23:43.101984696Z 
 54%|█████▎    | 5085/9500 [17:26:13<15:14:37, 12.43s/it]08/03/2024 15:23:43 - INFO - __main__ -   Step: 5085, LR: 9.581322471570965e-06, Loss: 524.5518798828125
2024-08-03T22:23:55.192256577Z 
 54%|█████▎    | 5086/9500 [17:26:25<15:06:55, 12.33s/it]08/03/2024 15:23:55 - INFO - __main__ -   Step: 5086, LR: 9.579151927883686e-06, Loss: 340.80572509765625
2024-08-03T22:24:07.292094075Z 
 54%|█████▎    | 5087/9500 [17:26:37<15:01:38, 12.26s/it]08/03/2024 15:24:07 - INFO - __main__ -   Step: 5087, LR: 9.576981384196408e-06, Loss: 447.0014953613281
2024-08-03T22:24:20.417547619Z 
 54%|█████▎    | 5088/9500 [17:26:50<15:20:35, 12.52s/it]08/03/2024 15:24:20 - INFO - __main__ -   Step: 5088, LR: 9.57481084050913e-06, Loss: 456.12847900390625
2024-08-03T22:24:32.580968540Z 
 54%|█████▎    | 5089/9500 [17:27:02<15:12:32, 12.41s/it]08/03/2024 15:24:32 - INFO - __main__ -   Step: 5089, LR: 9.57264029682185e-06, Loss: 467.57733154296875
2024-08-03T22:24:45.182648797Z 
 54%|█████▎    | 5090/9500 [17:27:15<15:16:29, 12.47s/it]08/03/2024 15:24:45 - INFO - __main__ -   Step: 5090, LR: 9.57046975313457e-06, Loss: 441.689208984375
2024-08-03T22:24:57.981628573Z 
 54%|█████▎    | 5091/9500 [17:27:27<15:23:33, 12.57s/it]08/03/2024 15:24:57 - INFO - __main__ -   Step: 5091, LR: 9.568299209447292e-06, Loss: 466.1520690917969
2024-08-03T22:25:10.426137399Z 
 54%|█████▎    | 5092/9500 [17:27:40<15:20:37, 12.53s/it]08/03/2024 15:25:10 - INFO - __main__ -   Step: 5092, LR: 9.566128665760012e-06, Loss: 447.0947265625
2024-08-03T22:25:22.526448729Z 
 54%|█████▎    | 5093/9500 [17:27:52<15:10:55, 12.40s/it]08/03/2024 15:25:22 - INFO - __main__ -   Step: 5093, LR: 9.563958122072734e-06, Loss: 406.0005798339844
2024-08-03T22:25:35.329499278Z 
 54%|█████▎    | 5094/9500 [17:28:05<15:19:32, 12.52s/it]08/03/2024 15:25:35 - INFO - __main__ -   Step: 5094, LR: 9.561787578385455e-06, Loss: 363.33770751953125
2024-08-03T22:25:47.485139132Z 
 54%|█████▎    | 5095/9500 [17:28:17<15:11:15, 12.41s/it]08/03/2024 15:25:47 - INFO - __main__ -   Step: 5095, LR: 9.559617034698177e-06, Loss: 348.1158447265625
2024-08-03T22:25:59.874673089Z 
 54%|█████▎    | 5096/9500 [17:28:29<15:10:33, 12.41s/it]08/03/2024 15:25:59 - INFO - __main__ -   Step: 5096, LR: 9.557446491010898e-06, Loss: 424.414794921875
2024-08-03T22:26:12.094220614Z 
 54%|█████▎    | 5097/9500 [17:28:42<15:06:15, 12.35s/it]08/03/2024 15:26:12 - INFO - __main__ -   Step: 5097, LR: 9.555275947323618e-06, Loss: 372.427490234375
2024-08-03T22:26:24.001231138Z 
 54%|█████▎    | 5098/9500 [17:28:53<14:56:19, 12.22s/it]08/03/2024 15:26:24 - INFO - __main__ -   Step: 5098, LR: 9.55310540363634e-06, Loss: 359.46490478515625
2024-08-03T22:26:36.229837967Z 
 54%|█████▎    | 5099/9500 [17:29:06<14:56:21, 12.22s/it]08/03/2024 15:26:36 - INFO - __main__ -   Step: 5099, LR: 9.55093485994906e-06, Loss: 421.1378173828125
2024-08-03T22:26:48.737919414Z 
 54%|█████▎    | 5100/9500 [17:29:18<15:02:29, 12.31s/it]08/03/2024 15:26:48 - INFO - __main__ -   Step: 5100, LR: 9.548764316261781e-06, Loss: 469.2420959472656
2024-08-03T22:27:00.884468542Z 
 54%|█████▎    | 5101/9500 [17:29:30<14:58:45, 12.26s/it]08/03/2024 15:27:00 - INFO - __main__ -   Step: 5101, LR: 9.546593772574503e-06, Loss: 438.8193664550781
2024-08-03T22:27:12.848603160Z 
 54%|█████▎    | 5102/9500 [17:29:42<14:52:05, 12.17s/it]08/03/2024 15:27:12 - INFO - __main__ -   Step: 5102, LR: 9.544423228887224e-06, Loss: 405.99029541015625
2024-08-03T22:27:25.565889009Z 
 54%|█████▎    | 5103/9500 [17:29:55<15:03:54, 12.33s/it]08/03/2024 15:27:25 - INFO - __main__ -   Step: 5103, LR: 9.542252685199946e-06, Loss: 364.9964294433594
2024-08-03T22:27:37.876874664Z 
 54%|█████▎    | 5104/9500 [17:30:07<15:03:11, 12.33s/it]08/03/2024 15:27:37 - INFO - __main__ -   Step: 5104, LR: 9.540082141512667e-06, Loss: 560.2652587890625
2024-08-03T22:27:50.368399107Z 
 54%|█████▎    | 5105/9500 [17:30:20<15:06:35, 12.38s/it]08/03/2024 15:27:50 - INFO - __main__ -   Step: 5105, LR: 9.537911597825387e-06, Loss: 499.22125244140625
2024-08-03T22:28:03.102508609Z 
 54%|█████▎    | 5106/9500 [17:30:33<15:14:14, 12.48s/it]08/03/2024 15:28:03 - INFO - __main__ -   Step: 5106, LR: 9.535741054138107e-06, Loss: 474.15020751953125
2024-08-03T22:28:15.069573051Z 
 54%|█████▍    | 5107/9500 [17:30:45<15:02:40, 12.33s/it]08/03/2024 15:28:15 - INFO - __main__ -   Step: 5107, LR: 9.533570510450829e-06, Loss: 403.3562316894531
2024-08-03T22:28:27.616375159Z 
 54%|█████▍    | 5108/9500 [17:30:57<15:07:15, 12.39s/it]08/03/2024 15:28:27 - INFO - __main__ -   Step: 5108, LR: 9.53139996676355e-06, Loss: 454.658203125
2024-08-03T22:28:39.791711698Z 
 54%|█████▍    | 5109/9500 [17:31:09<15:02:14, 12.33s/it]08/03/2024 15:28:39 - INFO - __main__ -   Step: 5109, LR: 9.529229423076272e-06, Loss: 363.71697998046875
2024-08-03T22:28:51.924318167Z 
 54%|█████▍    | 5110/9500 [17:31:21<14:57:44, 12.27s/it]08/03/2024 15:28:51 - INFO - __main__ -   Step: 5110, LR: 9.527058879388993e-06, Loss: 436.57489013671875
2024-08-03T22:29:04.060116146Z 
 54%|█████▍    | 5111/9500 [17:31:33<14:54:35, 12.23s/it]08/03/2024 15:29:04 - INFO - __main__ -   Step: 5111, LR: 9.524888335701715e-06, Loss: 391.3408203125
2024-08-03T22:29:16.360430694Z 
 54%|█████▍    | 5112/9500 [17:31:46<14:55:56, 12.25s/it]08/03/2024 15:29:16 - INFO - __main__ -   Step: 5112, LR: 9.522717792014435e-06, Loss: 412.20135498046875
2024-08-03T22:29:28.796813733Z 
 54%|█████▍    | 5113/9500 [17:31:58<14:59:48, 12.31s/it]08/03/2024 15:29:28 - INFO - __main__ -   Step: 5113, LR: 9.520547248327155e-06, Loss: 447.52294921875
2024-08-03T22:29:41.484355048Z 
 54%|█████▍    | 5114/9500 [17:32:11<15:07:57, 12.42s/it]08/03/2024 15:29:41 - INFO - __main__ -   Step: 5114, LR: 9.518376704639876e-06, Loss: 447.2279357910156
2024-08-03T22:29:53.858442167Z 
 54%|█████▍    | 5115/9500 [17:32:23<15:06:43, 12.41s/it]08/03/2024 15:29:53 - INFO - __main__ -   Step: 5115, LR: 9.516206160952598e-06, Loss: 388.4337158203125
2024-08-03T22:30:06.720008441Z 
 54%|█████▍    | 5116/9500 [17:32:36<15:16:29, 12.54s/it]08/03/2024 15:30:06 - INFO - __main__ -   Step: 5116, LR: 9.51403561726532e-06, Loss: 500.71441650390625
2024-08-03T22:30:19.151064661Z 
 54%|█████▍    | 5117/9500 [17:32:49<15:13:49, 12.51s/it]08/03/2024 15:30:19 - INFO - __main__ -   Step: 5117, LR: 9.511865073578041e-06, Loss: 423.13751220703125
2024-08-03T22:30:31.273072557Z 
 54%|█████▍    | 5118/9500 [17:33:01<15:05:07, 12.39s/it]08/03/2024 15:30:31 - INFO - __main__ -   Step: 5118, LR: 9.509694529890762e-06, Loss: 378.86480712890625
2024-08-03T22:30:43.760140760Z 
 54%|█████▍    | 5119/9500 [17:33:13<15:06:58, 12.42s/it]08/03/2024 15:30:43 - INFO - __main__ -   Step: 5119, LR: 9.507523986203482e-06, Loss: 421.5003662109375
2024-08-03T22:30:56.162639572Z 
 54%|█████▍    | 5120/9500 [17:33:26<15:06:20, 12.42s/it]08/03/2024 15:30:56 - INFO - __main__ -   Step: 5120, LR: 9.505353442516204e-06, Loss: 395.408447265625
2024-08-03T22:31:08.305047984Z 
 54%|█████▍    | 5121/9500 [17:33:38<15:00:09, 12.33s/it]08/03/2024 15:31:08 - INFO - __main__ -   Step: 5121, LR: 9.503182898828924e-06, Loss: 466.1988525390625
2024-08-03T22:31:20.877290805Z 
 54%|█████▍    | 5122/9500 [17:33:50<15:05:10, 12.41s/it]08/03/2024 15:31:20 - INFO - __main__ -   Step: 5122, LR: 9.501012355141645e-06, Loss: 461.6646728515625
2024-08-03T22:31:33.374187663Z 
 54%|█████▍    | 5123/9500 [17:34:03<15:06:58, 12.43s/it]08/03/2024 15:31:33 - INFO - __main__ -   Step: 5123, LR: 9.498841811454367e-06, Loss: 392.60650634765625
2024-08-03T22:31:45.491225240Z 
 54%|█████▍    | 5124/9500 [17:34:15<14:59:51, 12.34s/it]08/03/2024 15:31:45 - INFO - __main__ -   Step: 5124, LR: 9.496671267767088e-06, Loss: 363.9598388671875
2024-08-03T22:31:58.244785229Z 
 54%|█████▍    | 5125/9500 [17:34:28<15:08:44, 12.46s/it]08/03/2024 15:31:58 - INFO - __main__ -   Step: 5125, LR: 9.49450072407981e-06, Loss: 512.57275390625
2024-08-03T22:32:10.819954918Z 
 54%|█████▍    | 5126/9500 [17:34:40<15:10:59, 12.50s/it]08/03/2024 15:32:10 - INFO - __main__ -   Step: 5126, LR: 9.49233018039253e-06, Loss: 415.1673889160156
2024-08-03T22:32:23.025571216Z 
 54%|█████▍    | 5127/9500 [17:34:52<15:04:25, 12.41s/it]08/03/2024 15:32:23 - INFO - __main__ -   Step: 5127, LR: 9.490159636705251e-06, Loss: 478.4876708984375
2024-08-03T22:32:35.501501085Z 
 54%|█████▍    | 5128/9500 [17:35:05<15:05:40, 12.43s/it]08/03/2024 15:32:35 - INFO - __main__ -   Step: 5128, LR: 9.487989093017971e-06, Loss: 433.2787780761719
2024-08-03T22:32:47.626347701Z 
 54%|█████▍    | 5129/9500 [17:35:17<14:58:48, 12.34s/it]08/03/2024 15:32:47 - INFO - __main__ -   Step: 5129, LR: 9.485818549330693e-06, Loss: 450.72894287109375
2024-08-03T22:32:59.837947143Z 
 54%|█████▍    | 5130/9500 [17:35:29<14:55:51, 12.30s/it]08/03/2024 15:32:59 - INFO - __main__ -   Step: 5130, LR: 9.483648005643414e-06, Loss: 373.6745910644531
2024-08-03T22:33:12.472331301Z 
 54%|█████▍    | 5131/9500 [17:35:42<15:02:56, 12.40s/it]08/03/2024 15:33:12 - INFO - __main__ -   Step: 5131, LR: 9.481477461956136e-06, Loss: 391.3478088378906
2024-08-03T22:33:24.580685583Z 
 54%|█████▍    | 5132/9500 [17:35:54<14:56:22, 12.31s/it]08/03/2024 15:33:24 - INFO - __main__ -   Step: 5132, LR: 9.479306918268857e-06, Loss: 393.54742431640625
2024-08-03T22:33:36.591199080Z 
 54%|█████▍    | 5133/9500 [17:36:06<14:49:33, 12.22s/it]08/03/2024 15:33:36 - INFO - __main__ -   Step: 5133, LR: 9.477136374581577e-06, Loss: 328.86932373046875
2024-08-03T22:33:49.068728229Z 
 54%|█████▍    | 5134/9500 [17:36:19<14:54:55, 12.30s/it]08/03/2024 15:33:49 - INFO - __main__ -   Step: 5134, LR: 9.474965830894299e-06, Loss: 438.6847229003906
2024-08-03T22:34:01.334973179Z 
 54%|█████▍    | 5135/9500 [17:36:31<14:54:01, 12.29s/it]08/03/2024 15:34:01 - INFO - __main__ -   Step: 5135, LR: 9.472795287207019e-06, Loss: 415.5980529785156
2024-08-03T22:34:13.586232715Z 
 54%|█████▍    | 5136/9500 [17:36:43<14:52:59, 12.28s/it]08/03/2024 15:34:13 - INFO - __main__ -   Step: 5136, LR: 9.47062474351974e-06, Loss: 465.7348937988281
2024-08-03T22:34:26.415135945Z 
 54%|█████▍    | 5137/9500 [17:36:56<15:04:48, 12.44s/it]08/03/2024 15:34:26 - INFO - __main__ -   Step: 5137, LR: 9.468454199832462e-06, Loss: 478.17974853515625
2024-08-03T22:34:38.553120799Z 
 54%|█████▍    | 5138/9500 [17:37:08<14:57:57, 12.35s/it]08/03/2024 15:34:38 - INFO - __main__ -   Step: 5138, LR: 9.466283656145183e-06, Loss: 338.28125
2024-08-03T22:34:51.051861411Z 
 54%|█████▍    | 5139/9500 [17:37:20<15:00:57, 12.40s/it]08/03/2024 15:34:51 - INFO - __main__ -   Step: 5139, LR: 9.464113112457905e-06, Loss: 434.31036376953125
2024-08-03T22:35:03.659516506Z 
 54%|█████▍    | 5140/9500 [17:37:33<15:05:22, 12.46s/it]08/03/2024 15:35:03 - INFO - __main__ -   Step: 5140, LR: 9.461942568770625e-06, Loss: 479.9691467285156
2024-08-03T22:35:15.870336669Z 
 54%|█████▍    | 5141/9500 [17:37:45<14:59:44, 12.38s/it]08/03/2024 15:35:15 - INFO - __main__ -   Step: 5141, LR: 9.459772025083346e-06, Loss: 454.7337646484375
2024-08-03T22:35:28.368948542Z 
 54%|█████▍    | 5142/9500 [17:37:58<15:02:01, 12.42s/it]08/03/2024 15:35:28 - INFO - __main__ -   Step: 5142, LR: 9.457601481396066e-06, Loss: 435.98736572265625
2024-08-03T22:35:40.812516890Z 
 54%|█████▍    | 5143/9500 [17:38:10<15:02:21, 12.43s/it]08/03/2024 15:35:40 - INFO - __main__ -   Step: 5143, LR: 9.455430937708788e-06, Loss: 353.4293212890625
2024-08-03T22:35:53.133445557Z 
 54%|█████▍    | 5144/9500 [17:38:23<14:59:51, 12.39s/it]08/03/2024 15:35:53 - INFO - __main__ -   Step: 5144, LR: 9.45326039402151e-06, Loss: 377.1300048828125
2024-08-03T22:36:06.005163888Z 
 54%|█████▍    | 5145/9500 [17:38:35<15:10:02, 12.54s/it]08/03/2024 15:36:06 - INFO - __main__ -   Step: 5145, LR: 9.451089850334231e-06, Loss: 412.56561279296875
2024-08-03T22:36:18.372934468Z 
 54%|█████▍    | 5146/9500 [17:38:48<15:06:07, 12.49s/it]08/03/2024 15:36:18 - INFO - __main__ -   Step: 5146, LR: 9.448919306646953e-06, Loss: 431.2901611328125
2024-08-03T22:36:30.502453691Z 
 54%|█████▍    | 5147/9500 [17:39:00<14:58:08, 12.38s/it]08/03/2024 15:36:30 - INFO - __main__ -   Step: 5147, LR: 9.446748762959672e-06, Loss: 581.6389770507812
2024-08-03T22:36:42.758781277Z 
 54%|█████▍    | 5148/9500 [17:39:12<14:55:15, 12.34s/it]08/03/2024 15:36:42 - INFO - __main__ -   Step: 5148, LR: 9.444578219272394e-06, Loss: 350.24920654296875
2024-08-03T22:36:55.292984854Z 
 54%|█████▍    | 5149/9500 [17:39:25<14:59:13, 12.40s/it]08/03/2024 15:36:55 - INFO - __main__ -   Step: 5149, LR: 9.442407675585114e-06, Loss: 389.97283935546875
2024-08-03T22:37:07.514724462Z 
 54%|█████▍    | 5150/9500 [17:39:37<14:55:07, 12.35s/it]08/03/2024 15:37:07 - INFO - __main__ -   Step: 5150, LR: 9.440237131897835e-06, Loss: 470.6168212890625
2024-08-03T22:37:19.699137059Z 
 54%|█████▍    | 5151/9500 [17:39:49<14:51:23, 12.30s/it]08/03/2024 15:37:19 - INFO - __main__ -   Step: 5151, LR: 9.438066588210557e-06, Loss: 469.1407470703125
2024-08-03T22:37:32.023237636Z 
 54%|█████▍    | 5152/9500 [17:40:01<14:51:45, 12.31s/it]08/03/2024 15:37:32 - INFO - __main__ -   Step: 5152, LR: 9.435896044523279e-06, Loss: 339.3269348144531
2024-08-03T22:37:44.502983420Z 
 54%|█████▍    | 5153/9500 [17:40:14<14:55:20, 12.36s/it]08/03/2024 15:37:44 - INFO - __main__ -   Step: 5153, LR: 9.433725500836e-06, Loss: 479.2546081542969
2024-08-03T22:37:56.572316931Z 
 54%|█████▍    | 5154/9500 [17:40:26<14:48:51, 12.27s/it]08/03/2024 15:37:56 - INFO - __main__ -   Step: 5154, LR: 9.431554957148722e-06, Loss: 349.908447265625
2024-08-03T22:38:08.725346491Z 
 54%|█████▍    | 5155/9500 [17:40:38<14:46:05, 12.24s/it]08/03/2024 15:38:08 - INFO - __main__ -   Step: 5155, LR: 9.429384413461442e-06, Loss: 436.24908447265625
2024-08-03T22:38:21.096474503Z 
 54%|█████▍    | 5156/9500 [17:40:51<14:48:48, 12.28s/it]08/03/2024 15:38:21 - INFO - __main__ -   Step: 5156, LR: 9.427213869774161e-06, Loss: 367.84857177734375
2024-08-03T22:38:33.118344607Z 
 54%|█████▍    | 5157/9500 [17:41:03<14:43:05, 12.20s/it]08/03/2024 15:38:33 - INFO - __main__ -   Step: 5157, LR: 9.425043326086883e-06, Loss: 385.07110595703125
2024-08-03T22:38:45.331560943Z 
 54%|█████▍    | 5158/9500 [17:41:15<14:43:10, 12.20s/it]08/03/2024 15:38:45 - INFO - __main__ -   Step: 5158, LR: 9.422872782399604e-06, Loss: 389.214111328125
2024-08-03T22:38:57.704493677Z 
 54%|█████▍    | 5159/9500 [17:41:27<14:46:37, 12.25s/it]08/03/2024 15:38:57 - INFO - __main__ -   Step: 5159, LR: 9.420702238712326e-06, Loss: 381.77783203125
2024-08-03T22:39:10.038049002Z 
 54%|█████▍    | 5160/9500 [17:41:39<14:48:07, 12.28s/it]08/03/2024 15:39:10 - INFO - __main__ -   Step: 5160, LR: 9.418531695025048e-06, Loss: 313.8582763671875
2024-08-03T22:39:22.289540567Z 
 54%|█████▍    | 5161/9500 [17:41:52<14:47:20, 12.27s/it]08/03/2024 15:39:22 - INFO - __main__ -   Step: 5161, LR: 9.41636115133777e-06, Loss: 440.04840087890625
2024-08-03T22:39:34.766408923Z 
 54%|█████▍    | 5162/9500 [17:42:04<14:51:37, 12.33s/it]08/03/2024 15:39:34 - INFO - __main__ -   Step: 5162, LR: 9.414190607650489e-06, Loss: 458.0298156738281
2024-08-03T22:39:47.015391296Z 
 54%|█████▍    | 5163/9500 [17:42:16<14:49:36, 12.31s/it]08/03/2024 15:39:47 - INFO - __main__ -   Step: 5163, LR: 9.41202006396321e-06, Loss: 415.09368896484375
2024-08-03T22:39:59.031584404Z 
 54%|█████▍    | 5164/9500 [17:42:28<14:43:06, 12.22s/it]08/03/2024 15:39:59 - INFO - __main__ -   Step: 5164, LR: 9.40984952027593e-06, Loss: 334.7608947753906
2024-08-03T22:40:11.894879017Z 
 54%|█████▍    | 5165/9500 [17:42:41<14:56:50, 12.41s/it]08/03/2024 15:40:11 - INFO - __main__ -   Step: 5165, LR: 9.407678976588652e-06, Loss: 392.58953857421875
2024-08-03T22:40:23.951227068Z 
 54%|█████▍    | 5166/9500 [17:42:53<14:48:54, 12.31s/it]08/03/2024 15:40:23 - INFO - __main__ -   Step: 5166, LR: 9.405508432901374e-06, Loss: 409.0545654296875
2024-08-03T22:40:36.450855606Z 
 54%|█████▍    | 5167/9500 [17:43:06<14:52:53, 12.36s/it]08/03/2024 15:40:36 - INFO - __main__ -   Step: 5167, LR: 9.403337889214095e-06, Loss: 425.02655029296875
2024-08-03T22:40:49.011470979Z 
 54%|█████▍    | 5168/9500 [17:43:18<14:56:56, 12.42s/it]08/03/2024 15:40:49 - INFO - __main__ -   Step: 5168, LR: 9.401167345526817e-06, Loss: 421.62432861328125
2024-08-03T22:41:01.146466729Z 
 54%|█████▍    | 5169/9500 [17:43:31<14:50:30, 12.34s/it]08/03/2024 15:41:01 - INFO - __main__ -   Step: 5169, LR: 9.398996801839537e-06, Loss: 406.1114196777344
2024-08-03T22:41:13.893652272Z 
 54%|█████▍    | 5170/9500 [17:43:43<14:59:11, 12.46s/it]08/03/2024 15:41:13 - INFO - __main__ -   Step: 5170, LR: 9.396826258152258e-06, Loss: 364.23480224609375
2024-08-03T22:41:26.351086575Z 
 54%|█████▍    | 5171/9500 [17:43:56<14:58:54, 12.46s/it]08/03/2024 15:41:26 - INFO - __main__ -   Step: 5171, LR: 9.394655714464978e-06, Loss: 443.78076171875
2024-08-03T22:41:39.049853855Z 
 54%|█████▍    | 5172/9500 [17:44:08<15:03:54, 12.53s/it]08/03/2024 15:41:39 - INFO - __main__ -   Step: 5172, LR: 9.3924851707777e-06, Loss: 480.223388671875
2024-08-03T22:41:51.212984591Z 
 54%|█████▍    | 5173/9500 [17:44:21<14:55:44, 12.42s/it]08/03/2024 15:41:51 - INFO - __main__ -   Step: 5173, LR: 9.390314627090421e-06, Loss: 427.52008056640625
2024-08-03T22:42:04.144152015Z 
 54%|█████▍    | 5174/9500 [17:44:34<15:06:34, 12.57s/it]08/03/2024 15:42:04 - INFO - __main__ -   Step: 5174, LR: 9.388144083403143e-06, Loss: 433.7327880859375
2024-08-03T22:42:16.309378952Z 
 54%|█████▍    | 5175/9500 [17:44:46<14:57:31, 12.45s/it]08/03/2024 15:42:16 - INFO - __main__ -   Step: 5175, LR: 9.385973539715864e-06, Loss: 399.60174560546875
2024-08-03T22:42:28.926960234Z 
 54%|█████▍    | 5176/9500 [17:44:58<15:00:55, 12.50s/it]08/03/2024 15:42:28 - INFO - __main__ -   Step: 5176, LR: 9.383802996028584e-06, Loss: 511.7161560058594
2024-08-03T22:42:41.429263974Z 
 54%|█████▍    | 5177/9500 [17:45:11<15:00:43, 12.50s/it]08/03/2024 15:42:41 - INFO - __main__ -   Step: 5177, LR: 9.381632452341306e-06, Loss: 417.7991027832031
2024-08-03T22:42:53.644259500Z 
 55%|█████▍    | 5178/9500 [17:45:23<14:54:19, 12.42s/it]08/03/2024 15:42:53 - INFO - __main__ -   Step: 5178, LR: 9.379461908654026e-06, Loss: 429.30645751953125
2024-08-03T22:43:05.803578421Z 
 55%|█████▍    | 5179/9500 [17:45:35<14:48:35, 12.34s/it]08/03/2024 15:43:05 - INFO - __main__ -   Step: 5179, LR: 9.377291364966747e-06, Loss: 565.4168701171875
2024-08-03T22:43:18.376793193Z 
 55%|█████▍    | 5180/9500 [17:45:48<14:53:27, 12.41s/it]08/03/2024 15:43:18 - INFO - __main__ -   Step: 5180, LR: 9.375120821279469e-06, Loss: 345.3341369628906
2024-08-03T22:43:30.638685619Z 
 55%|█████▍    | 5181/9500 [17:46:00<14:50:04, 12.36s/it]08/03/2024 15:43:30 - INFO - __main__ -   Step: 5181, LR: 9.37295027759219e-06, Loss: 442.367919921875
2024-08-03T22:43:43.089883922Z 
 55%|█████▍    | 5182/9500 [17:46:13<14:51:43, 12.39s/it]08/03/2024 15:43:43 - INFO - __main__ -   Step: 5182, LR: 9.370779733904912e-06, Loss: 435.868408203125
2024-08-03T22:43:56.172073753Z 
 55%|█████▍    | 5183/9500 [17:46:26<15:06:26, 12.60s/it]08/03/2024 15:43:56 - INFO - __main__ -   Step: 5183, LR: 9.368609190217632e-06, Loss: 505.29998779296875
2024-08-03T22:44:08.313658009Z 
 55%|█████▍    | 5184/9500 [17:46:38<14:56:22, 12.46s/it]08/03/2024 15:44:08 - INFO - __main__ -   Step: 5184, LR: 9.366438646530353e-06, Loss: 417.749267578125
2024-08-03T22:44:21.009536218Z 
 55%|█████▍    | 5185/9500 [17:46:50<15:01:13, 12.53s/it]08/03/2024 15:44:21 - INFO - __main__ -   Step: 5185, LR: 9.364268102843073e-06, Loss: 511.8350524902344
2024-08-03T22:44:33.830063088Z 
 55%|█████▍    | 5186/9500 [17:47:03<15:07:15, 12.62s/it]08/03/2024 15:44:33 - INFO - __main__ -   Step: 5186, LR: 9.362097559155795e-06, Loss: 372.18927001953125
2024-08-03T22:44:45.902571130Z 
 55%|█████▍    | 5187/9500 [17:47:15<14:55:16, 12.45s/it]08/03/2024 15:44:45 - INFO - __main__ -   Step: 5187, LR: 9.359927015468516e-06, Loss: 387.73590087890625
2024-08-03T22:44:58.034146291Z 
 55%|█████▍    | 5188/9500 [17:47:27<14:48:06, 12.36s/it]08/03/2024 15:44:58 - INFO - __main__ -   Step: 5188, LR: 9.357756471781238e-06, Loss: 370.21917724609375
2024-08-03T22:45:10.440263957Z 
 55%|█████▍    | 5189/9500 [17:47:40<14:48:56, 12.37s/it]08/03/2024 15:45:10 - INFO - __main__ -   Step: 5189, LR: 9.35558592809396e-06, Loss: 397.08056640625
2024-08-03T22:45:22.726126109Z 
 55%|█████▍    | 5190/9500 [17:47:52<14:46:52, 12.35s/it]08/03/2024 15:45:22 - INFO - __main__ -   Step: 5190, LR: 9.353415384406679e-06, Loss: 412.84173583984375
2024-08-03T22:45:34.730182086Z 
 55%|█████▍    | 5191/9500 [17:48:04<14:39:17, 12.24s/it]08/03/2024 15:45:34 - INFO - __main__ -   Step: 5191, LR: 9.3512448407194e-06, Loss: 520.7351684570312
2024-08-03T22:45:47.023415271Z 
 55%|█████▍    | 5192/9500 [17:48:16<14:40:10, 12.26s/it]08/03/2024 15:45:47 - INFO - __main__ -   Step: 5192, LR: 9.34907429703212e-06, Loss: 379.66729736328125
2024-08-03T22:45:58.982466624Z 
 55%|█████▍    | 5193/9500 [17:48:28<14:33:30, 12.17s/it]08/03/2024 15:45:58 - INFO - __main__ -   Step: 5193, LR: 9.346903753344842e-06, Loss: 419.3948669433594
2024-08-03T22:46:11.084169955Z 
 55%|█████▍    | 5194/9500 [17:48:41<14:31:52, 12.15s/it]08/03/2024 15:46:11 - INFO - __main__ -   Step: 5194, LR: 9.344733209657564e-06, Loss: 497.0777587890625
2024-08-03T22:46:23.517721485Z 
 55%|█████▍    | 5195/9500 [17:48:53<14:37:47, 12.23s/it]08/03/2024 15:46:23 - INFO - __main__ -   Step: 5195, LR: 9.342562665970285e-06, Loss: 343.4552001953125
2024-08-03T22:46:35.747932235Z 
 55%|█████▍    | 5196/9500 [17:49:05<14:37:30, 12.23s/it]08/03/2024 15:46:35 - INFO - __main__ -   Step: 5196, LR: 9.340392122283007e-06, Loss: 377.2735595703125
2024-08-03T22:46:48.300196876Z 
 55%|█████▍    | 5197/9500 [17:49:18<14:44:10, 12.33s/it]08/03/2024 15:46:48 - INFO - __main__ -   Step: 5197, LR: 9.338221578595727e-06, Loss: 573.3995361328125
2024-08-03T22:47:00.384260345Z 
 55%|█████▍    | 5198/9500 [17:49:30<14:38:41, 12.26s/it]08/03/2024 15:47:00 - INFO - __main__ -   Step: 5198, LR: 9.336051034908448e-06, Loss: 374.491943359375
2024-08-03T22:47:12.943437065Z 
 55%|█████▍    | 5199/9500 [17:49:42<14:45:01, 12.35s/it]08/03/2024 15:47:12 - INFO - __main__ -   Step: 5199, LR: 9.333880491221168e-06, Loss: 418.3564453125
2024-08-03T22:47:25.072142852Z 
 55%|█████▍    | 5200/9500 [17:49:55<14:40:09, 12.28s/it]08/03/2024 15:47:25 - INFO - __main__ -   Step: 5200, LR: 9.33170994753389e-06, Loss: 386.3979797363281
2024-08-03T22:47:37.262845141Z 
 55%|█████▍    | 5201/9500 [17:50:07<14:37:59, 12.25s/it]08/03/2024 15:47:37 - INFO - __main__ -   Step: 5201, LR: 9.329539403846611e-06, Loss: 429.0849609375
2024-08-03T22:47:49.890503579Z 
 55%|█████▍    | 5202/9500 [17:50:19<14:45:49, 12.37s/it]08/03/2024 15:47:49 - INFO - __main__ -   Step: 5202, LR: 9.327368860159333e-06, Loss: 494.695556640625
2024-08-03T22:48:01.956154095Z 
 55%|█████▍    | 5203/9500 [17:50:31<14:39:09, 12.28s/it]08/03/2024 15:48:01 - INFO - __main__ -   Step: 5203, LR: 9.325198316472054e-06, Loss: 437.0791015625
2024-08-03T22:48:14.300526362Z 
 55%|█████▍    | 5204/9500 [17:50:44<14:40:25, 12.30s/it]08/03/2024 15:48:14 - INFO - __main__ -   Step: 5204, LR: 9.323027772784774e-06, Loss: 527.623779296875
2024-08-03T22:48:26.917877277Z 
 55%|█████▍    | 5205/9500 [17:50:56<14:47:07, 12.39s/it]08/03/2024 15:48:26 - INFO - __main__ -   Step: 5205, LR: 9.320857229097496e-06, Loss: 460.92333984375
2024-08-03T22:48:39.505251455Z 
 55%|█████▍    | 5206/9500 [17:51:09<14:51:05, 12.45s/it]08/03/2024 15:48:39 - INFO - __main__ -   Step: 5206, LR: 9.318686685410217e-06, Loss: 467.75341796875
2024-08-03T22:48:51.761010627Z 
 55%|█████▍    | 5207/9500 [17:51:21<14:46:41, 12.39s/it]08/03/2024 15:48:51 - INFO - __main__ -   Step: 5207, LR: 9.316516141722937e-06, Loss: 493.78948974609375
2024-08-03T22:49:04.426623798Z 
 55%|█████▍    | 5208/9500 [17:51:34<14:52:20, 12.47s/it]08/03/2024 15:49:04 - INFO - __main__ -   Step: 5208, LR: 9.314345598035659e-06, Loss: 392.8208923339844
2024-08-03T22:49:16.500479420Z 
 55%|█████▍    | 5209/9500 [17:51:46<14:43:32, 12.35s/it]08/03/2024 15:49:16 - INFO - __main__ -   Step: 5209, LR: 9.31217505434838e-06, Loss: 424.8749084472656
2024-08-03T22:49:28.757327154Z 
 55%|█████▍    | 5210/9500 [17:51:58<14:41:14, 12.33s/it]08/03/2024 15:49:28 - INFO - __main__ -   Step: 5210, LR: 9.3100045106611e-06, Loss: 468.81634521484375
2024-08-03T22:49:41.078896717Z 
 55%|█████▍    | 5211/9500 [17:52:11<14:40:57, 12.32s/it]08/03/2024 15:49:41 - INFO - __main__ -   Step: 5211, LR: 9.307833966973822e-06, Loss: 361.5556640625
2024-08-03T22:49:53.276623264Z 
 55%|█████▍    | 5212/9500 [17:52:23<14:38:01, 12.29s/it]08/03/2024 15:49:53 - INFO - __main__ -   Step: 5212, LR: 9.305663423286543e-06, Loss: 418.0160827636719
2024-08-03T22:50:05.784466540Z 
 55%|█████▍    | 5213/9500 [17:52:35<14:42:36, 12.35s/it]08/03/2024 15:50:05 - INFO - __main__ -   Step: 5213, LR: 9.303492879599265e-06, Loss: 492.33349609375
2024-08-03T22:50:18.487831417Z 
 55%|█████▍    | 5214/9500 [17:52:48<14:49:54, 12.46s/it]08/03/2024 15:50:18 - INFO - __main__ -   Step: 5214, LR: 9.301322335911985e-06, Loss: 554.89453125
2024-08-03T22:50:30.712593708Z 
 55%|█████▍    | 5215/9500 [17:53:00<14:44:42, 12.39s/it]08/03/2024 15:50:30 - INFO - __main__ -   Step: 5215, LR: 9.299151792224706e-06, Loss: 399.15777587890625
2024-08-03T22:50:43.032298956Z 
 55%|█████▍    | 5216/9500 [17:53:12<14:43:02, 12.37s/it]08/03/2024 15:50:43 - INFO - __main__ -   Step: 5216, LR: 9.296981248537428e-06, Loss: 381.02685546875
2024-08-03T22:50:55.993861061Z 
 55%|█████▍    | 5217/9500 [17:53:25<14:55:33, 12.55s/it]08/03/2024 15:50:55 - INFO - __main__ -   Step: 5217, LR: 9.294810704850148e-06, Loss: 438.0633544921875
2024-08-03T22:51:08.165098782Z 
 55%|█████▍    | 5218/9500 [17:53:38<14:47:19, 12.43s/it]08/03/2024 15:51:08 - INFO - __main__ -   Step: 5218, LR: 9.29264016116287e-06, Loss: 419.2997741699219
2024-08-03T22:51:20.636951442Z 
 55%|█████▍    | 5219/9500 [17:53:50<14:47:56, 12.44s/it]08/03/2024 15:51:20 - INFO - __main__ -   Step: 5219, LR: 9.29046961747559e-06, Loss: 386.2865905761719
2024-08-03T22:51:32.922794718Z 
 55%|█████▍    | 5220/9500 [17:54:02<14:44:19, 12.40s/it]08/03/2024 15:51:32 - INFO - __main__ -   Step: 5220, LR: 9.288299073788312e-06, Loss: 282.74249267578125
2024-08-03T22:51:44.944494190Z 
 55%|█████▍    | 5221/9500 [17:54:14<14:36:04, 12.28s/it]08/03/2024 15:51:44 - INFO - __main__ -   Step: 5221, LR: 9.286128530101032e-06, Loss: 437.47283935546875
2024-08-03T22:51:57.102939041Z 
 55%|█████▍    | 5222/9500 [17:54:27<14:33:11, 12.25s/it]08/03/2024 15:51:57 - INFO - __main__ -   Step: 5222, LR: 9.283957986413754e-06, Loss: 376.021240234375
2024-08-03T22:52:10.061169096Z 
 55%|█████▍    | 5223/9500 [17:54:39<14:48:12, 12.46s/it]08/03/2024 15:52:10 - INFO - __main__ -   Step: 5223, LR: 9.281787442726475e-06, Loss: 523.53173828125
2024-08-03T22:52:22.017302125Z 
 55%|█████▍    | 5224/9500 [17:54:51<14:37:13, 12.31s/it]08/03/2024 15:52:22 - INFO - __main__ -   Step: 5224, LR: 9.279616899039195e-06, Loss: 399.51416015625
2024-08-03T22:52:34.223526545Z 
 55%|█████▌    | 5225/9500 [17:55:04<14:34:48, 12.28s/it]08/03/2024 15:52:34 - INFO - __main__ -   Step: 5225, LR: 9.277446355351917e-06, Loss: 471.4084777832031
2024-08-03T22:52:46.441172613Z 
 55%|█████▌    | 5226/9500 [17:55:16<14:33:19, 12.26s/it]08/03/2024 15:52:46 - INFO - __main__ -   Step: 5226, LR: 9.275275811664638e-06, Loss: 494.13525390625
2024-08-03T22:52:58.640272154Z 
 55%|█████▌    | 5227/9500 [17:55:28<14:31:48, 12.24s/it]08/03/2024 15:52:58 - INFO - __main__ -   Step: 5227, LR: 9.27310526797736e-06, Loss: 428.67694091796875
2024-08-03T22:53:11.142337005Z 
 55%|█████▌    | 5228/9500 [17:55:41<14:37:10, 12.32s/it]08/03/2024 15:53:11 - INFO - __main__ -   Step: 5228, LR: 9.27093472429008e-06, Loss: 434.041748046875
2024-08-03T22:53:23.354503418Z 
 55%|█████▌    | 5229/9500 [17:55:53<14:34:40, 12.29s/it]08/03/2024 15:53:23 - INFO - __main__ -   Step: 5229, LR: 9.268764180602801e-06, Loss: 312.4660339355469
2024-08-03T22:53:35.668285997Z 
 55%|█████▌    | 5230/9500 [17:56:05<14:35:01, 12.30s/it]08/03/2024 15:53:35 - INFO - __main__ -   Step: 5230, LR: 9.266593636915523e-06, Loss: 363.91796875
2024-08-03T22:53:47.777866910Z 
 55%|█████▌    | 5231/9500 [17:56:17<14:30:51, 12.24s/it]08/03/2024 15:53:47 - INFO - __main__ -   Step: 5231, LR: 9.264423093228243e-06, Loss: 419.4934997558594
2024-08-03T22:54:00.394422882Z 
 55%|█████▌    | 5232/9500 [17:56:30<14:38:40, 12.35s/it]08/03/2024 15:54:00 - INFO - __main__ -   Step: 5232, LR: 9.262252549540964e-06, Loss: 414.52001953125
2024-08-03T22:54:12.515052067Z 
 55%|█████▌    | 5233/9500 [17:56:42<14:33:31, 12.28s/it]08/03/2024 15:54:12 - INFO - __main__ -   Step: 5233, LR: 9.260082005853686e-06, Loss: 425.86859130859375
2024-08-03T22:54:24.736991829Z 
 55%|█████▌    | 5234/9500 [17:56:54<14:32:01, 12.26s/it]08/03/2024 15:54:24 - INFO - __main__ -   Step: 5234, LR: 9.257911462166407e-06, Loss: 461.2528381347656
2024-08-03T22:54:37.312999183Z 
 55%|█████▌    | 5235/9500 [17:57:07<14:38:27, 12.36s/it]08/03/2024 15:54:37 - INFO - __main__ -   Step: 5235, LR: 9.255740918479127e-06, Loss: 374.9554443359375
2024-08-03T22:54:49.340719114Z 
 55%|█████▌    | 5236/9500 [17:57:19<14:31:12, 12.26s/it]08/03/2024 15:54:49 - INFO - __main__ -   Step: 5236, LR: 9.253570374791849e-06, Loss: 371.0638732910156
2024-08-03T22:55:01.928850517Z 
 55%|█████▌    | 5237/9500 [17:57:31<14:38:01, 12.36s/it]08/03/2024 15:55:01 - INFO - __main__ -   Step: 5237, LR: 9.25139983110457e-06, Loss: 360.90740966796875
2024-08-03T22:55:14.131835287Z 
 55%|█████▌    | 5238/9500 [17:57:44<14:34:31, 12.31s/it]08/03/2024 15:55:14 - INFO - __main__ -   Step: 5238, LR: 9.24922928741729e-06, Loss: 351.9222717285156
2024-08-03T22:55:26.327399757Z 
 55%|█████▌    | 5239/9500 [17:57:56<14:31:50, 12.28s/it]08/03/2024 15:55:26 - INFO - __main__ -   Step: 5239, LR: 9.247058743730012e-06, Loss: 387.927978515625
2024-08-03T22:55:38.348011563Z 
 55%|█████▌    | 5240/9500 [17:58:08<14:26:11, 12.20s/it]08/03/2024 15:55:38 - INFO - __main__ -   Step: 5240, LR: 9.244888200042733e-06, Loss: 474.39825439453125
2024-08-03T22:55:50.372544217Z 
 55%|█████▌    | 5241/9500 [17:58:20<14:22:14, 12.15s/it]08/03/2024 15:55:50 - INFO - __main__ -   Step: 5241, LR: 9.242717656355455e-06, Loss: 404.7189025878906
2024-08-03T22:56:02.782181720Z 
 55%|█████▌    | 5242/9500 [17:58:32<14:27:38, 12.23s/it]08/03/2024 15:56:02 - INFO - __main__ -   Step: 5242, LR: 9.240547112668175e-06, Loss: 337.4849853515625
2024-08-03T22:56:15.197418544Z 
 55%|█████▌    | 5243/9500 [17:58:45<14:31:27, 12.28s/it]08/03/2024 15:56:15 - INFO - __main__ -   Step: 5243, LR: 9.238376568980896e-06, Loss: 465.86785888671875
2024-08-03T22:56:27.404508751Z 
 55%|█████▌    | 5244/9500 [17:58:57<14:29:38, 12.26s/it]08/03/2024 15:56:27 - INFO - __main__ -   Step: 5244, LR: 9.236206025293618e-06, Loss: 409.2003173828125
2024-08-03T22:56:39.873898321Z 
 55%|█████▌    | 5245/9500 [17:59:09<14:33:54, 12.32s/it]08/03/2024 15:56:39 - INFO - __main__ -   Step: 5245, LR: 9.234035481606338e-06, Loss: 488.2349853515625
2024-08-03T22:56:52.210283286Z 
 55%|█████▌    | 5246/9500 [17:59:22<14:33:58, 12.33s/it]08/03/2024 15:56:52 - INFO - __main__ -   Step: 5246, LR: 9.23186493791906e-06, Loss: 486.892578125
2024-08-03T22:57:04.620372764Z 
 55%|█████▌    | 5247/9500 [17:59:34<14:35:32, 12.35s/it]08/03/2024 15:57:04 - INFO - __main__ -   Step: 5247, LR: 9.229694394231781e-06, Loss: 479.5169677734375
2024-08-03T22:57:17.145556795Z 
 55%|█████▌    | 5248/9500 [17:59:47<14:39:01, 12.40s/it]08/03/2024 15:57:17 - INFO - __main__ -   Step: 5248, LR: 9.227523850544502e-06, Loss: 388.1455383300781
2024-08-03T22:57:29.159740651Z 
 55%|█████▌    | 5249/9500 [17:59:59<14:30:31, 12.29s/it]08/03/2024 15:57:29 - INFO - __main__ -   Step: 5249, LR: 9.225353306857222e-06, Loss: 354.6321716308594
2024-08-03T22:57:41.696487454Z 
 55%|█████▌    | 5250/9500 [18:00:11<14:35:38, 12.36s/it]08/03/2024 15:57:41 - INFO - __main__ -   Step: 5250, LR: 9.223182763169944e-06, Loss: 412.4815368652344
2024-08-03T22:57:54.060645166Z 
 55%|█████▌    | 5251/9500 [18:00:23<14:35:28, 12.36s/it]08/03/2024 15:57:54 - INFO - __main__ -   Step: 5251, LR: 9.221012219482665e-06, Loss: 395.2576904296875
2024-08-03T22:58:06.031273255Z 
 55%|█████▌    | 5252/9500 [18:00:35<14:26:56, 12.25s/it]08/03/2024 15:58:06 - INFO - __main__ -   Step: 5252, LR: 9.218841675795385e-06, Loss: 526.705078125
2024-08-03T22:58:18.902383721Z 
 55%|█████▌    | 5253/9500 [18:00:48<14:40:02, 12.43s/it]08/03/2024 15:58:18 - INFO - __main__ -   Step: 5253, LR: 9.216671132108107e-06, Loss: 406.4274597167969
2024-08-03T22:58:31.468082366Z 
 55%|█████▌    | 5254/9500 [18:01:01<14:42:38, 12.47s/it]08/03/2024 15:58:31 - INFO - __main__ -   Step: 5254, LR: 9.214500588420828e-06, Loss: 422.47247314453125
2024-08-03T22:58:43.561615221Z 
 55%|█████▌    | 5255/9500 [18:01:13<14:34:23, 12.36s/it]08/03/2024 15:58:43 - INFO - __main__ -   Step: 5255, LR: 9.21233004473355e-06, Loss: 492.5401916503906
2024-08-03T22:58:55.874447107Z 
 55%|█████▌    | 5256/9500 [18:01:25<14:33:12, 12.35s/it]08/03/2024 15:58:55 - INFO - __main__ -   Step: 5256, LR: 9.210159501046272e-06, Loss: 526.84033203125
2024-08-03T22:59:08.407819117Z 
 55%|█████▌    | 5257/9500 [18:01:38<14:37:00, 12.40s/it]08/03/2024 15:59:08 - INFO - __main__ -   Step: 5257, LR: 9.207988957358991e-06, Loss: 435.7080383300781
2024-08-03T22:59:20.392612216Z 
 55%|█████▌    | 5258/9500 [18:01:50<14:27:57, 12.28s/it]08/03/2024 15:59:20 - INFO - __main__ -   Step: 5258, LR: 9.205818413671713e-06, Loss: 341.693603515625
2024-08-03T22:59:32.966059233Z 
 55%|█████▌    | 5259/9500 [18:02:02<14:34:02, 12.37s/it]08/03/2024 15:59:32 - INFO - __main__ -   Step: 5259, LR: 9.203647869984433e-06, Loss: 394.99481201171875
2024-08-03T22:59:45.349969483Z 
 55%|█████▌    | 5260/9500 [18:02:15<14:34:12, 12.37s/it]08/03/2024 15:59:45 - INFO - __main__ -   Step: 5260, LR: 9.201477326297154e-06, Loss: 468.04510498046875
2024-08-03T22:59:57.498862935Z 
 55%|█████▌    | 5261/9500 [18:02:27<14:29:18, 12.30s/it]08/03/2024 15:59:57 - INFO - __main__ -   Step: 5261, LR: 9.199306782609876e-06, Loss: 398.62493896484375
2024-08-03T23:00:09.737057490Z 
 55%|█████▌    | 5262/9500 [18:02:39<14:27:42, 12.28s/it]08/03/2024 16:00:09 - INFO - __main__ -   Step: 5262, LR: 9.197136238922598e-06, Loss: 437.2274169921875
2024-08-03T23:00:22.296915707Z 
 55%|█████▌    | 5263/9500 [18:02:52<14:33:19, 12.37s/it]08/03/2024 16:00:22 - INFO - __main__ -   Step: 5263, LR: 9.194965695235319e-06, Loss: 472.1858215332031
2024-08-03T23:00:34.591754334Z 
 55%|█████▌    | 5264/9500 [18:03:04<14:31:35, 12.35s/it]08/03/2024 16:00:34 - INFO - __main__ -   Step: 5264, LR: 9.192795151548039e-06, Loss: 373.90771484375
2024-08-03T23:00:46.655953639Z 
 55%|█████▌    | 5265/9500 [18:03:16<14:25:26, 12.26s/it]08/03/2024 16:00:46 - INFO - __main__ -   Step: 5265, LR: 9.19062460786076e-06, Loss: 400.3287353515625
2024-08-03T23:00:59.392316794Z 
 55%|█████▌    | 5266/9500 [18:03:29<14:35:16, 12.40s/it]08/03/2024 16:00:59 - INFO - __main__ -   Step: 5266, LR: 9.18845406417348e-06, Loss: 410.5079040527344
2024-08-03T23:01:11.339704117Z 
 55%|█████▌    | 5267/9500 [18:03:41<14:25:25, 12.27s/it]08/03/2024 16:01:11 - INFO - __main__ -   Step: 5267, LR: 9.186283520486202e-06, Loss: 352.9237976074219
2024-08-03T23:01:23.352993692Z 
 55%|█████▌    | 5268/9500 [18:03:53<14:19:51, 12.19s/it]08/03/2024 16:01:23 - INFO - __main__ -   Step: 5268, LR: 9.184112976798924e-06, Loss: 446.50732421875
2024-08-03T23:01:35.755066703Z 
 55%|█████▌    | 5269/9500 [18:04:05<14:24:07, 12.25s/it]08/03/2024 16:01:35 - INFO - __main__ -   Step: 5269, LR: 9.181942433111645e-06, Loss: 480.6473388671875
2024-08-03T23:01:47.854033349Z 
 55%|█████▌    | 5270/9500 [18:04:17<14:20:37, 12.21s/it]08/03/2024 16:01:47 - INFO - __main__ -   Step: 5270, LR: 9.179771889424367e-06, Loss: 608.1111450195312
2024-08-03T23:02:00.134754727Z 
 55%|█████▌    | 5271/9500 [18:04:30<14:21:58, 12.23s/it]08/03/2024 16:02:00 - INFO - __main__ -   Step: 5271, LR: 9.177601345737087e-06, Loss: 459.1235046386719
2024-08-03T23:02:12.616787087Z 
 55%|█████▌    | 5272/9500 [18:04:42<14:27:06, 12.31s/it]08/03/2024 16:02:12 - INFO - __main__ -   Step: 5272, LR: 9.175430802049808e-06, Loss: 428.4969787597656
2024-08-03T23:02:24.831718301Z 
 56%|█████▌    | 5273/9500 [18:04:54<14:24:59, 12.28s/it]08/03/2024 16:02:24 - INFO - __main__ -   Step: 5273, LR: 9.173260258362528e-06, Loss: 409.442138671875
2024-08-03T23:02:37.388940447Z 
 56%|█████▌    | 5274/9500 [18:05:07<14:30:40, 12.36s/it]08/03/2024 16:02:37 - INFO - __main__ -   Step: 5274, LR: 9.17108971467525e-06, Loss: 422.33807373046875
2024-08-03T23:02:49.802198267Z 
 56%|█████▌    | 5275/9500 [18:05:19<14:31:34, 12.38s/it]08/03/2024 16:02:49 - INFO - __main__ -   Step: 5275, LR: 9.168919170987971e-06, Loss: 350.32537841796875
2024-08-03T23:03:02.203905126Z 
 56%|█████▌    | 5276/9500 [18:05:32<14:31:52, 12.38s/it]08/03/2024 16:03:02 - INFO - __main__ -   Step: 5276, LR: 9.166748627300693e-06, Loss: 455.7287292480469
2024-08-03T23:03:14.750511309Z 
 56%|█████▌    | 5277/9500 [18:05:44<14:35:05, 12.43s/it]08/03/2024 16:03:14 - INFO - __main__ -   Step: 5277, LR: 9.164578083613414e-06, Loss: 504.10552978515625
2024-08-03T23:03:27.138183415Z 
 56%|█████▌    | 5278/9500 [18:05:57<14:33:55, 12.42s/it]08/03/2024 16:03:27 - INFO - __main__ -   Step: 5278, LR: 9.162407539926134e-06, Loss: 338.37713623046875
2024-08-03T23:03:39.252522579Z 
 56%|█████▌    | 5279/9500 [18:06:09<14:27:16, 12.33s/it]08/03/2024 16:03:39 - INFO - __main__ -   Step: 5279, LR: 9.160236996238856e-06, Loss: 482.35601806640625
2024-08-03T23:03:51.250928001Z 
 56%|█████▌    | 5280/9500 [18:06:21<14:20:06, 12.23s/it]08/03/2024 16:03:51 - INFO - __main__ -   Step: 5280, LR: 9.158066452551575e-06, Loss: 478.8742370605469
2024-08-03T23:04:03.836855485Z 
 56%|█████▌    | 5281/9500 [18:06:33<14:27:26, 12.34s/it]08/03/2024 16:04:03 - INFO - __main__ -   Step: 5281, LR: 9.155895908864297e-06, Loss: 415.6533203125
2024-08-03T23:04:15.789991492Z 
 56%|█████▌    | 5282/9500 [18:06:45<14:19:09, 12.22s/it]08/03/2024 16:04:15 - INFO - __main__ -   Step: 5282, LR: 9.153725365177019e-06, Loss: 398.68145751953125
2024-08-03T23:04:28.016683704Z 
 56%|█████▌    | 5283/9500 [18:06:57<14:19:03, 12.22s/it]08/03/2024 16:04:28 - INFO - __main__ -   Step: 5283, LR: 9.15155482148974e-06, Loss: 422.1646728515625
2024-08-03T23:04:40.108293711Z 
 56%|█████▌    | 5284/9500 [18:07:10<14:16:05, 12.18s/it]08/03/2024 16:04:40 - INFO - __main__ -   Step: 5284, LR: 9.149384277802462e-06, Loss: 388.70794677734375
2024-08-03T23:04:52.504388708Z 
 56%|█████▌    | 5285/9500 [18:07:22<14:20:22, 12.25s/it]08/03/2024 16:04:52 - INFO - __main__ -   Step: 5285, LR: 9.147213734115182e-06, Loss: 337.8623046875
2024-08-03T23:05:04.635238245Z 
 56%|█████▌    | 5286/9500 [18:07:34<14:17:42, 12.21s/it]08/03/2024 16:05:04 - INFO - __main__ -   Step: 5286, LR: 9.145043190427903e-06, Loss: 484.4024658203125
2024-08-03T23:05:16.728777773Z 
 56%|█████▌    | 5287/9500 [18:07:46<14:15:00, 12.18s/it]08/03/2024 16:05:16 - INFO - __main__ -   Step: 5287, LR: 9.142872646740623e-06, Loss: 435.7292175292969
2024-08-03T23:05:29.188230128Z 
 56%|█████▌    | 5288/9500 [18:07:59<14:20:45, 12.26s/it]08/03/2024 16:05:29 - INFO - __main__ -   Step: 5288, LR: 9.140702103053345e-06, Loss: 405.6297302246094
2024-08-03T23:05:41.313576073Z 
 56%|█████▌    | 5289/9500 [18:08:11<14:17:41, 12.22s/it]08/03/2024 16:05:41 - INFO - __main__ -   Step: 5289, LR: 9.138531559366066e-06, Loss: 395.98016357421875
2024-08-03T23:05:53.449764379Z 
 56%|█████▌    | 5290/9500 [18:08:23<14:15:40, 12.19s/it]08/03/2024 16:05:53 - INFO - __main__ -   Step: 5290, LR: 9.136361015678788e-06, Loss: 530.9641723632812
2024-08-03T23:06:06.077700342Z 
 56%|█████▌    | 5291/9500 [18:08:36<14:24:37, 12.33s/it]08/03/2024 16:06:06 - INFO - __main__ -   Step: 5291, LR: 9.13419047199151e-06, Loss: 494.5531005859375
2024-08-03T23:06:18.188569493Z 
 56%|█████▌    | 5292/9500 [18:08:48<14:19:53, 12.26s/it]08/03/2024 16:06:18 - INFO - __main__ -   Step: 5292, LR: 9.132019928304229e-06, Loss: 434.7635192871094
2024-08-03T23:06:30.225436336Z 
 56%|█████▌    | 5293/9500 [18:09:00<14:14:59, 12.19s/it]08/03/2024 16:06:30 - INFO - __main__ -   Step: 5293, LR: 9.12984938461695e-06, Loss: 426.506591796875
2024-08-03T23:06:42.653507267Z 
 56%|█████▌    | 5294/9500 [18:09:12<14:19:42, 12.26s/it]08/03/2024 16:06:42 - INFO - __main__ -   Step: 5294, LR: 9.12767884092967e-06, Loss: 408.07489013671875
2024-08-03T23:06:54.651978702Z 
 56%|█████▌    | 5295/9500 [18:09:24<14:13:55, 12.18s/it]08/03/2024 16:06:54 - INFO - __main__ -   Step: 5295, LR: 9.125508297242392e-06, Loss: 411.4261474609375
2024-08-03T23:07:06.947646998Z 
 56%|█████▌    | 5296/9500 [18:09:36<14:16:03, 12.22s/it]08/03/2024 16:07:06 - INFO - __main__ -   Step: 5296, LR: 9.123337753555114e-06, Loss: 384.7630920410156
2024-08-03T23:07:19.512895783Z 
 56%|█████▌    | 5297/9500 [18:09:49<14:23:09, 12.32s/it]08/03/2024 16:07:19 - INFO - __main__ -   Step: 5297, LR: 9.121167209867835e-06, Loss: 408.1170959472656
2024-08-03T23:07:31.868988462Z 
 56%|█████▌    | 5298/9500 [18:10:01<14:23:40, 12.33s/it]08/03/2024 16:07:31 - INFO - __main__ -   Step: 5298, LR: 9.118996666180557e-06, Loss: 584.2947998046875
2024-08-03T23:07:43.915450682Z 
 56%|█████▌    | 5299/9500 [18:10:13<14:17:27, 12.25s/it]08/03/2024 16:07:43 - INFO - __main__ -   Step: 5299, LR: 9.116826122493278e-06, Loss: 360.74359130859375
2024-08-03T23:07:56.300817371Z 
 56%|█████▌    | 5300/9500 [18:10:26<14:20:10, 12.29s/it]08/03/2024 16:07:56 - INFO - __main__ -   Step: 5300, LR: 9.114655578805998e-06, Loss: 417.77984619140625
2024-08-03T23:08:08.399344030Z 
 56%|█████▌    | 5301/9500 [18:10:38<14:15:58, 12.23s/it]08/03/2024 16:08:08 - INFO - __main__ -   Step: 5301, LR: 9.112485035118718e-06, Loss: 429.6361083984375
2024-08-03T23:08:20.574842939Z 
 56%|█████▌    | 5302/9500 [18:10:50<14:14:37, 12.21s/it]08/03/2024 16:08:20 - INFO - __main__ -   Step: 5302, LR: 9.11031449143144e-06, Loss: 338.3124084472656
2024-08-03T23:08:32.895354775Z 
 56%|█████▌    | 5303/9500 [18:11:02<14:16:37, 12.25s/it]08/03/2024 16:08:32 - INFO - __main__ -   Step: 5303, LR: 9.108143947744161e-06, Loss: 397.2027282714844
2024-08-03T23:08:44.807362656Z 
 56%|█████▌    | 5304/9500 [18:11:14<14:09:25, 12.15s/it]08/03/2024 16:08:44 - INFO - __main__ -   Step: 5304, LR: 9.105973404056883e-06, Loss: 382.6215515136719
2024-08-03T23:08:57.188823768Z 
 56%|█████▌    | 5305/9500 [18:11:27<14:14:08, 12.22s/it]08/03/2024 16:08:57 - INFO - __main__ -   Step: 5305, LR: 9.103802860369604e-06, Loss: 378.71343994140625
2024-08-03T23:09:09.565980454Z 
 56%|█████▌    | 5306/9500 [18:11:39<14:17:18, 12.26s/it]08/03/2024 16:09:09 - INFO - __main__ -   Step: 5306, LR: 9.101632316682326e-06, Loss: 449.681640625
2024-08-03T23:09:21.771930126Z 
 56%|█████▌    | 5307/9500 [18:11:51<14:15:52, 12.25s/it]08/03/2024 16:09:21 - INFO - __main__ -   Step: 5307, LR: 9.099461772995046e-06, Loss: 466.1750183105469
2024-08-03T23:09:34.176854374Z 
 56%|█████▌    | 5308/9500 [18:12:04<14:18:58, 12.29s/it]08/03/2024 16:09:34 - INFO - __main__ -   Step: 5308, LR: 9.097291229307767e-06, Loss: 462.78997802734375
2024-08-03T23:09:46.594482886Z 
 56%|█████▌    | 5309/9500 [18:12:16<14:21:20, 12.33s/it]08/03/2024 16:09:46 - INFO - __main__ -   Step: 5309, LR: 9.095120685620487e-06, Loss: 327.41790771484375
2024-08-03T23:09:58.634682995Z 
 56%|█████▌    | 5310/9500 [18:12:28<14:15:02, 12.24s/it]08/03/2024 16:09:58 - INFO - __main__ -   Step: 5310, LR: 9.092950141933209e-06, Loss: 456.9465637207031
2024-08-03T23:10:10.958796402Z 
 56%|█████▌    | 5311/9500 [18:12:40<14:16:30, 12.27s/it]08/03/2024 16:10:10 - INFO - __main__ -   Step: 5311, LR: 9.09077959824593e-06, Loss: 384.51580810546875
2024-08-03T23:10:23.581334763Z 
 56%|█████▌    | 5312/9500 [18:12:53<14:23:44, 12.37s/it]08/03/2024 16:10:23 - INFO - __main__ -   Step: 5312, LR: 9.088609054558652e-06, Loss: 446.9656982421875
2024-08-03T23:10:35.769550037Z 
 56%|█████▌    | 5313/9500 [18:13:05<14:19:37, 12.32s/it]08/03/2024 16:10:35 - INFO - __main__ -   Step: 5313, LR: 9.086438510871373e-06, Loss: 326.9004821777344
2024-08-03T23:10:48.324671578Z 
 56%|█████▌    | 5314/9500 [18:13:18<14:24:22, 12.39s/it]08/03/2024 16:10:48 - INFO - __main__ -   Step: 5314, LR: 9.084267967184093e-06, Loss: 429.57965087890625
2024-08-03T23:11:00.863352401Z 
 56%|█████▌    | 5315/9500 [18:13:30<14:27:17, 12.43s/it]08/03/2024 16:11:00 - INFO - __main__ -   Step: 5315, LR: 9.082097423496815e-06, Loss: 375.68865966796875
2024-08-03T23:11:12.910869691Z 
 56%|█████▌    | 5316/9500 [18:13:42<14:18:59, 12.32s/it]08/03/2024 16:11:12 - INFO - __main__ -   Step: 5316, LR: 9.079926879809535e-06, Loss: 419.71484375
2024-08-03T23:11:25.045327908Z 
 56%|█████▌    | 5317/9500 [18:13:54<14:14:56, 12.26s/it]08/03/2024 16:11:25 - INFO - __main__ -   Step: 5317, LR: 9.077756336122256e-06, Loss: 404.80157470703125
2024-08-03T23:11:37.428619779Z 
 56%|█████▌    | 5318/9500 [18:14:07<14:17:15, 12.30s/it]08/03/2024 16:11:37 - INFO - __main__ -   Step: 5318, LR: 9.075585792434978e-06, Loss: 429.55218505859375
2024-08-03T23:11:49.664419521Z 
 56%|█████▌    | 5319/9500 [18:14:19<14:15:43, 12.28s/it]08/03/2024 16:11:49 - INFO - __main__ -   Step: 5319, LR: 9.0734152487477e-06, Loss: 487.06292724609375
2024-08-03T23:12:02.240502680Z 
 56%|█████▌    | 5320/9500 [18:14:32<14:21:41, 12.37s/it]08/03/2024 16:12:02 - INFO - __main__ -   Step: 5320, LR: 9.071244705060421e-06, Loss: 462.1828918457031
2024-08-03T23:12:14.715130173Z 
 56%|█████▌    | 5321/9500 [18:14:44<14:23:42, 12.40s/it]08/03/2024 16:12:14 - INFO - __main__ -   Step: 5321, LR: 9.06907416137314e-06, Loss: 429.2992248535156
2024-08-03T23:12:26.574497274Z 
 56%|█████▌    | 5322/9500 [18:14:56<14:12:11, 12.24s/it]08/03/2024 16:12:26 - INFO - __main__ -   Step: 5322, LR: 9.066903617685862e-06, Loss: 374.28253173828125
2024-08-03T23:12:38.649759606Z 
 56%|█████▌    | 5323/9500 [18:15:08<14:08:34, 12.19s/it]08/03/2024 16:12:38 - INFO - __main__ -   Step: 5323, LR: 9.064733073998582e-06, Loss: 422.8302001953125
2024-08-03T23:12:51.270248487Z 
 56%|█████▌    | 5324/9500 [18:15:21<14:17:23, 12.32s/it]08/03/2024 16:12:51 - INFO - __main__ -   Step: 5324, LR: 9.062562530311304e-06, Loss: 449.9901428222656
2024-08-03T23:13:03.576849235Z 
 56%|█████▌    | 5325/9500 [18:15:33<14:16:55, 12.32s/it]08/03/2024 16:13:03 - INFO - __main__ -   Step: 5325, LR: 9.060391986624025e-06, Loss: 421.1258544921875
2024-08-03T23:13:15.781045917Z 
 56%|█████▌    | 5326/9500 [18:15:45<14:14:24, 12.28s/it]08/03/2024 16:13:15 - INFO - __main__ -   Step: 5326, LR: 9.058221442936747e-06, Loss: 453.6750793457031
2024-08-03T23:13:27.891781348Z 
 56%|█████▌    | 5327/9500 [18:15:57<14:10:37, 12.23s/it]08/03/2024 16:13:27 - INFO - __main__ -   Step: 5327, LR: 9.056050899249468e-06, Loss: 439.2410583496094
2024-08-03T23:13:40.566390836Z 
 56%|█████▌    | 5328/9500 [18:16:10<14:19:41, 12.36s/it]08/03/2024 16:13:40 - INFO - __main__ -   Step: 5328, LR: 9.053880355562188e-06, Loss: 435.45849609375
2024-08-03T23:13:52.708864213Z 
 56%|█████▌    | 5329/9500 [18:16:22<14:14:52, 12.30s/it]08/03/2024 16:13:52 - INFO - __main__ -   Step: 5329, LR: 9.05170981187491e-06, Loss: 450.24560546875
2024-08-03T23:14:05.055399216Z 
 56%|█████▌    | 5330/9500 [18:16:34<14:15:41, 12.31s/it]08/03/2024 16:14:05 - INFO - __main__ -   Step: 5330, LR: 9.04953926818763e-06, Loss: 444.750732421875
2024-08-03T23:14:17.545105625Z 
 56%|█████▌    | 5331/9500 [18:16:47<14:19:10, 12.37s/it]08/03/2024 16:14:17 - INFO - __main__ -   Step: 5331, LR: 9.047368724500351e-06, Loss: 466.3222961425781
2024-08-03T23:14:29.512630039Z 
 56%|█████▌    | 5332/9500 [18:16:59<14:10:41, 12.25s/it]08/03/2024 16:14:29 - INFO - __main__ -   Step: 5332, LR: 9.045198180813073e-06, Loss: 453.14324951171875
2024-08-03T23:14:41.934882842Z 
 56%|█████▌    | 5333/9500 [18:17:11<14:14:09, 12.30s/it]08/03/2024 16:14:41 - INFO - __main__ -   Step: 5333, LR: 9.043027637125794e-06, Loss: 336.9150085449219
2024-08-03T23:14:54.329619693Z 
 56%|█████▌    | 5334/9500 [18:17:24<14:15:56, 12.33s/it]08/03/2024 16:14:54 - INFO - __main__ -   Step: 5334, LR: 9.040857093438516e-06, Loss: 341.6706237792969
2024-08-03T23:15:06.414817012Z 
 56%|█████▌    | 5335/9500 [18:17:36<14:10:41, 12.25s/it]08/03/2024 16:15:06 - INFO - __main__ -   Step: 5335, LR: 9.038686549751236e-06, Loss: 465.6920166015625
2024-08-03T23:15:18.668276371Z 
 56%|█████▌    | 5336/9500 [18:17:48<14:10:27, 12.25s/it]08/03/2024 16:15:18 - INFO - __main__ -   Step: 5336, LR: 9.036516006063957e-06, Loss: 419.0653076171875
2024-08-03T23:15:31.506949721Z 
 56%|█████▌    | 5337/9500 [18:18:01<14:22:24, 12.43s/it]08/03/2024 16:15:31 - INFO - __main__ -   Step: 5337, LR: 9.034345462376677e-06, Loss: 442.524658203125
2024-08-03T23:15:43.736513880Z 
 56%|█████▌    | 5338/9500 [18:18:13<14:18:03, 12.37s/it]08/03/2024 16:15:43 - INFO - __main__ -   Step: 5338, LR: 9.032174918689399e-06, Loss: 506.8271789550781
2024-08-03T23:15:55.986608166Z 
 56%|█████▌    | 5339/9500 [18:18:25<14:15:20, 12.33s/it]08/03/2024 16:15:55 - INFO - __main__ -   Step: 5339, LR: 9.03000437500212e-06, Loss: 385.8206787109375
2024-08-03T23:16:08.416065182Z 
 56%|█████▌    | 5340/9500 [18:18:38<14:17:07, 12.36s/it]08/03/2024 16:16:08 - INFO - __main__ -   Step: 5340, LR: 9.027833831314842e-06, Loss: 304.8666687011719
2024-08-03T23:16:20.390800308Z 
 56%|█████▌    | 5341/9500 [18:18:50<14:08:51, 12.25s/it]08/03/2024 16:16:20 - INFO - __main__ -   Step: 5341, LR: 9.025663287627563e-06, Loss: 372.274169921875
2024-08-03T23:16:32.780445343Z 
 56%|█████▌    | 5342/9500 [18:19:02<14:11:38, 12.29s/it]08/03/2024 16:16:32 - INFO - __main__ -   Step: 5342, LR: 9.023492743940283e-06, Loss: 315.81597900390625
2024-08-03T23:16:45.184486941Z 
 56%|█████▌    | 5343/9500 [18:19:15<14:13:49, 12.32s/it]08/03/2024 16:16:45 - INFO - __main__ -   Step: 5343, LR: 9.021322200253005e-06, Loss: 387.91546630859375
2024-08-03T23:16:57.473975687Z 
 56%|█████▋    | 5344/9500 [18:19:27<14:12:54, 12.31s/it]08/03/2024 16:16:57 - INFO - __main__ -   Step: 5344, LR: 9.019151656565725e-06, Loss: 588.7083740234375
2024-08-03T23:17:09.826802190Z 
 56%|█████▋    | 5345/9500 [18:19:39<14:13:31, 12.33s/it]08/03/2024 16:17:09 - INFO - __main__ -   Step: 5345, LR: 9.016981112878446e-06, Loss: 472.7020263671875
2024-08-03T23:17:22.450673882Z 
 56%|█████▋    | 5346/9500 [18:19:52<14:19:31, 12.41s/it]08/03/2024 16:17:22 - INFO - __main__ -   Step: 5346, LR: 9.014810569191168e-06, Loss: 323.9027404785156
2024-08-03T23:17:34.466591616Z 
 56%|█████▋    | 5347/9500 [18:20:04<14:11:01, 12.30s/it]08/03/2024 16:17:34 - INFO - __main__ -   Step: 5347, LR: 9.01264002550389e-06, Loss: 360.44940185546875
2024-08-03T23:17:46.571936508Z 
 56%|█████▋    | 5348/9500 [18:20:16<14:06:52, 12.24s/it]08/03/2024 16:17:46 - INFO - __main__ -   Step: 5348, LR: 9.010469481816611e-06, Loss: 366.33258056640625
2024-08-03T23:17:58.951705034Z 
 56%|█████▋    | 5349/9500 [18:20:28<14:09:37, 12.28s/it]08/03/2024 16:17:58 - INFO - __main__ -   Step: 5349, LR: 9.008298938129333e-06, Loss: 378.8955993652344
2024-08-03T23:18:10.958277161Z 
 56%|█████▋    | 5350/9500 [18:20:40<14:03:43, 12.20s/it]08/03/2024 16:18:10 - INFO - __main__ -   Step: 5350, LR: 9.006128394442052e-06, Loss: 315.90948486328125
2024-08-03T23:18:23.145959928Z 
 56%|█████▋    | 5351/9500 [18:20:53<14:03:18, 12.20s/it]08/03/2024 16:18:23 - INFO - __main__ -   Step: 5351, LR: 9.003957850754772e-06, Loss: 438.8686218261719
2024-08-03T23:18:36.057563717Z 
 56%|█████▋    | 5352/9500 [18:21:05<14:17:57, 12.41s/it]08/03/2024 16:18:36 - INFO - __main__ -   Step: 5352, LR: 9.001787307067494e-06, Loss: 441.55255126953125
2024-08-03T23:18:48.231922322Z 
 56%|█████▋    | 5353/9500 [18:21:18<14:12:51, 12.34s/it]08/03/2024 16:18:48 - INFO - __main__ -   Step: 5353, LR: 8.999616763380215e-06, Loss: 455.27301025390625
2024-08-03T23:19:00.444934109Z 
 56%|█████▋    | 5354/9500 [18:21:30<14:10:01, 12.30s/it]08/03/2024 16:19:00 - INFO - __main__ -   Step: 5354, LR: 8.997446219692937e-06, Loss: 476.462890625
2024-08-03T23:19:13.146077186Z 
 56%|█████▋    | 5355/9500 [18:21:43<14:18:06, 12.42s/it]08/03/2024 16:19:13 - INFO - __main__ -   Step: 5355, LR: 8.995275676005659e-06, Loss: 451.9798583984375
2024-08-03T23:19:25.233376524Z 
 56%|█████▋    | 5356/9500 [18:21:55<14:10:59, 12.32s/it]08/03/2024 16:19:25 - INFO - __main__ -   Step: 5356, LR: 8.99310513231838e-06, Loss: 407.9408874511719
2024-08-03T23:19:37.320145217Z 
 56%|█████▋    | 5357/9500 [18:22:07<14:05:55, 12.25s/it]08/03/2024 16:19:37 - INFO - __main__ -   Step: 5357, LR: 8.9909345886311e-06, Loss: 495.1876525878906
2024-08-03T23:19:50.131056838Z 
 56%|█████▋    | 5358/9500 [18:22:20<14:17:18, 12.42s/it]08/03/2024 16:19:50 - INFO - __main__ -   Step: 5358, LR: 8.988764044943822e-06, Loss: 464.8426513671875
2024-08-03T23:20:02.424662489Z 
 56%|█████▋    | 5359/9500 [18:22:32<14:14:30, 12.38s/it]08/03/2024 16:20:02 - INFO - __main__ -   Step: 5359, LR: 8.986593501256541e-06, Loss: 484.121826171875
2024-08-03T23:20:14.668762011Z 
 56%|█████▋    | 5360/9500 [18:22:44<14:11:27, 12.34s/it]08/03/2024 16:20:14 - INFO - __main__ -   Step: 5360, LR: 8.984422957569263e-06, Loss: 361.9955139160156
2024-08-03T23:20:27.326046659Z 
 56%|█████▋    | 5361/9500 [18:22:57<14:17:49, 12.44s/it]08/03/2024 16:20:27 - INFO - __main__ -   Step: 5361, LR: 8.982252413881985e-06, Loss: 604.088623046875
2024-08-03T23:20:39.547152359Z 
 56%|█████▋    | 5362/9500 [18:23:09<14:13:11, 12.37s/it]08/03/2024 16:20:39 - INFO - __main__ -   Step: 5362, LR: 8.980081870194706e-06, Loss: 443.2772521972656
2024-08-03T23:20:51.851862384Z 
 56%|█████▋    | 5363/9500 [18:23:21<14:11:36, 12.35s/it]08/03/2024 16:20:51 - INFO - __main__ -   Step: 5363, LR: 8.977911326507428e-06, Loss: 449.94677734375
2024-08-03T23:21:04.581589899Z 
 56%|█████▋    | 5364/9500 [18:23:34<14:19:13, 12.46s/it]08/03/2024 16:21:04 - INFO - __main__ -   Step: 5364, LR: 8.975740782820148e-06, Loss: 359.94427490234375
2024-08-03T23:21:16.786155383Z 
 56%|█████▋    | 5365/9500 [18:23:46<14:13:39, 12.39s/it]08/03/2024 16:21:16 - INFO - __main__ -   Step: 5365, LR: 8.973570239132869e-06, Loss: 512.473388671875
2024-08-03T23:21:29.035615711Z 
 56%|█████▋    | 5366/9500 [18:23:58<14:10:36, 12.35s/it]08/03/2024 16:21:29 - INFO - __main__ -   Step: 5366, LR: 8.971399695445589e-06, Loss: 461.87371826171875
2024-08-03T23:21:41.448427026Z 
 56%|█████▋    | 5367/9500 [18:24:11<14:11:47, 12.37s/it]08/03/2024 16:21:41 - INFO - __main__ -   Step: 5367, LR: 8.96922915175831e-06, Loss: 357.7105712890625
2024-08-03T23:21:53.467073744Z 
 57%|█████▋    | 5368/9500 [18:24:23<14:04:24, 12.26s/it]08/03/2024 16:21:53 - INFO - __main__ -   Step: 5368, LR: 8.967058608071032e-06, Loss: 478.4117126464844
2024-08-03T23:22:05.690558891Z 
 57%|█████▋    | 5369/9500 [18:24:35<14:03:25, 12.25s/it]08/03/2024 16:22:05 - INFO - __main__ -   Step: 5369, LR: 8.964888064383754e-06, Loss: 426.6756591796875
2024-08-03T23:22:18.099958054Z 
 57%|█████▋    | 5370/9500 [18:24:48<14:06:30, 12.30s/it]08/03/2024 16:22:18 - INFO - __main__ -   Step: 5370, LR: 8.962717520696473e-06, Loss: 494.48223876953125
2024-08-03T23:22:30.688875244Z 
 57%|█████▋    | 5371/9500 [18:25:00<14:12:18, 12.39s/it]08/03/2024 16:22:30 - INFO - __main__ -   Step: 5371, LR: 8.960546977009195e-06, Loss: 373.38800048828125
2024-08-03T23:22:42.766657516Z 
 57%|█████▋    | 5372/9500 [18:25:12<14:05:45, 12.29s/it]08/03/2024 16:22:42 - INFO - __main__ -   Step: 5372, LR: 8.958376433321917e-06, Loss: 457.9245300292969
2024-08-03T23:22:54.936695892Z 
 57%|█████▋    | 5373/9500 [18:25:24<14:03:01, 12.26s/it]08/03/2024 16:22:54 - INFO - __main__ -   Step: 5373, LR: 8.956205889634636e-06, Loss: 400.275390625
2024-08-03T23:23:07.458550682Z 
 57%|█████▋    | 5374/9500 [18:25:37<14:08:17, 12.34s/it]08/03/2024 16:23:07 - INFO - __main__ -   Step: 5374, LR: 8.954035345947358e-06, Loss: 462.11004638671875
2024-08-03T23:23:19.719172371Z 
 57%|█████▋    | 5375/9500 [18:25:49<14:06:32, 12.31s/it]08/03/2024 16:23:19 - INFO - __main__ -   Step: 5375, LR: 8.95186480226008e-06, Loss: 366.63836669921875
2024-08-03T23:23:32.186411437Z 
 57%|█████▋    | 5376/9500 [18:26:02<14:09:29, 12.36s/it]08/03/2024 16:23:32 - INFO - __main__ -   Step: 5376, LR: 8.949694258572801e-06, Loss: 516.55615234375
2024-08-03T23:23:44.512661767Z 
 57%|█████▋    | 5377/9500 [18:26:14<14:08:36, 12.35s/it]08/03/2024 16:23:44 - INFO - __main__ -   Step: 5377, LR: 8.947523714885521e-06, Loss: 418.1372375488281
2024-08-03T23:23:56.443791700Z 
 57%|█████▋    | 5378/9500 [18:26:26<13:59:47, 12.22s/it]08/03/2024 16:23:56 - INFO - __main__ -   Step: 5378, LR: 8.945353171198243e-06, Loss: 433.0576477050781
2024-08-03T23:24:08.853091445Z 
 57%|█████▋    | 5379/9500 [18:26:38<14:03:24, 12.28s/it]08/03/2024 16:24:08 - INFO - __main__ -   Step: 5379, LR: 8.943182627510964e-06, Loss: 318.5088195800781
2024-08-03T23:24:21.312359344Z 
 57%|█████▋    | 5380/9500 [18:26:51<14:06:53, 12.33s/it]08/03/2024 16:24:21 - INFO - __main__ -   Step: 5380, LR: 8.941012083823684e-06, Loss: 383.9449157714844
2024-08-03T23:24:33.352299125Z 
 57%|█████▋    | 5381/9500 [18:27:03<14:00:38, 12.25s/it]08/03/2024 16:24:33 - INFO - __main__ -   Step: 5381, LR: 8.938841540136406e-06, Loss: 444.1353454589844
2024-08-03T23:24:45.508288909Z 
 57%|█████▋    | 5382/9500 [18:27:15<13:58:36, 12.22s/it]08/03/2024 16:24:45 - INFO - __main__ -   Step: 5382, LR: 8.936670996449127e-06, Loss: 389.9665832519531
2024-08-03T23:24:57.845493831Z 
 57%|█████▋    | 5383/9500 [18:27:27<14:00:50, 12.25s/it]08/03/2024 16:24:57 - INFO - __main__ -   Step: 5383, LR: 8.934500452761849e-06, Loss: 383.9139099121094
2024-08-03T23:25:10.033267299Z 
 57%|█████▋    | 5384/9500 [18:27:39<13:59:16, 12.23s/it]08/03/2024 16:25:10 - INFO - __main__ -   Step: 5384, LR: 8.932329909074569e-06, Loss: 364.7939758300781
2024-08-03T23:25:22.013313682Z 
 57%|█████▋    | 5385/9500 [18:27:51<13:53:50, 12.16s/it]08/03/2024 16:25:22 - INFO - __main__ -   Step: 5385, LR: 8.93015936538729e-06, Loss: 292.1988525390625
2024-08-03T23:25:34.458268262Z 
 57%|█████▋    | 5386/9500 [18:28:04<13:59:32, 12.24s/it]08/03/2024 16:25:34 - INFO - __main__ -   Step: 5386, LR: 8.927988821700012e-06, Loss: 405.72357177734375
2024-08-03T23:25:46.646685197Z 
 57%|█████▋    | 5387/9500 [18:28:16<13:58:11, 12.23s/it]08/03/2024 16:25:46 - INFO - __main__ -   Step: 5387, LR: 8.925818278012732e-06, Loss: 419.01361083984375
2024-08-03T23:25:59.337548952Z 
 57%|█████▋    | 5388/9500 [18:28:29<14:07:29, 12.37s/it]08/03/2024 16:25:59 - INFO - __main__ -   Step: 5388, LR: 8.923647734325453e-06, Loss: 446.5426940917969
2024-08-03T23:26:11.689044461Z 
 57%|█████▋    | 5389/9500 [18:28:41<14:06:59, 12.36s/it]08/03/2024 16:26:11 - INFO - __main__ -   Step: 5389, LR: 8.921477190638175e-06, Loss: 315.3629455566406
2024-08-03T23:26:24.115980309Z 
 57%|█████▋    | 5390/9500 [18:28:54<14:08:08, 12.38s/it]08/03/2024 16:26:24 - INFO - __main__ -   Step: 5390, LR: 8.919306646950896e-06, Loss: 494.83367919921875
2024-08-03T23:26:36.363263346Z 
 57%|█████▋    | 5391/9500 [18:29:06<14:05:09, 12.34s/it]08/03/2024 16:26:36 - INFO - __main__ -   Step: 5391, LR: 8.917136103263616e-06, Loss: 360.499755859375
2024-08-03T23:26:48.937753449Z 
 57%|█████▋    | 5392/9500 [18:29:18<14:09:45, 12.41s/it]08/03/2024 16:26:48 - INFO - __main__ -   Step: 5392, LR: 8.914965559576338e-06, Loss: 532.4261474609375
2024-08-03T23:27:01.122423586Z 
 57%|█████▋    | 5393/9500 [18:29:31<14:04:53, 12.34s/it]08/03/2024 16:27:01 - INFO - __main__ -   Step: 5393, LR: 8.91279501588906e-06, Loss: 459.39599609375
2024-08-03T23:27:13.288865023Z 
 57%|█████▋    | 5394/9500 [18:29:43<14:01:03, 12.29s/it]08/03/2024 16:27:13 - INFO - __main__ -   Step: 5394, LR: 8.910624472201779e-06, Loss: 519.5792236328125
2024-08-03T23:27:25.983044048Z 
 57%|█████▋    | 5395/9500 [18:29:55<14:09:08, 12.41s/it]08/03/2024 16:27:25 - INFO - __main__ -   Step: 5395, LR: 8.9084539285145e-06, Loss: 351.85589599609375
2024-08-03T23:27:37.995060863Z 
 57%|█████▋    | 5396/9500 [18:30:07<14:00:44, 12.29s/it]08/03/2024 16:27:37 - INFO - __main__ -   Step: 5396, LR: 8.906283384827222e-06, Loss: 362.66851806640625
2024-08-03T23:27:49.966934128Z 
 57%|█████▋    | 5397/9500 [18:30:19<13:53:58, 12.20s/it]08/03/2024 16:27:49 - INFO - __main__ -   Step: 5397, LR: 8.904112841139944e-06, Loss: 399.9565124511719
2024-08-03T23:28:02.913866744Z 
 57%|█████▋    | 5398/9500 [18:30:32<14:09:11, 12.42s/it]08/03/2024 16:28:02 - INFO - __main__ -   Step: 5398, LR: 8.901942297452664e-06, Loss: 469.0550842285156
2024-08-03T23:28:15.169384920Z 
 57%|█████▋    | 5399/9500 [18:30:45<14:05:35, 12.37s/it]08/03/2024 16:28:15 - INFO - __main__ -   Step: 5399, LR: 8.899771753765385e-06, Loss: 421.2030944824219
2024-08-03T23:28:27.286651016Z 
 57%|█████▋    | 5400/9500 [18:30:57<14:00:09, 12.30s/it]08/03/2024 16:28:27 - INFO - __main__ -   Step: 5400, LR: 8.897601210078107e-06, Loss: 455.45849609375
2024-08-03T23:28:39.741723489Z 
 57%|█████▋    | 5401/9500 [18:31:09<14:03:14, 12.34s/it]08/03/2024 16:28:39 - INFO - __main__ -   Step: 5401, LR: 8.895430666390828e-06, Loss: 474.3041687011719
2024-08-03T23:28:52.117156181Z 
 57%|█████▋    | 5402/9500 [18:31:22<14:03:42, 12.35s/it]08/03/2024 16:28:52 - INFO - __main__ -   Step: 5402, LR: 8.893260122703548e-06, Loss: 418.3173828125
2024-08-03T23:29:04.583476987Z 
 57%|█████▋    | 5403/9500 [18:31:34<14:05:49, 12.39s/it]08/03/2024 16:29:04 - INFO - __main__ -   Step: 5403, LR: 8.89108957901627e-06, Loss: 397.15435791015625
2024-08-03T23:29:17.224403295Z 
 57%|█████▋    | 5404/9500 [18:31:47<14:10:48, 12.46s/it]08/03/2024 16:29:17 - INFO - __main__ -   Step: 5404, LR: 8.888919035328991e-06, Loss: 335.93572998046875
2024-08-03T23:29:29.265063734Z 
 57%|█████▋    | 5405/9500 [18:31:59<14:01:57, 12.34s/it]08/03/2024 16:29:29 - INFO - __main__ -   Step: 5405, LR: 8.886748491641711e-06, Loss: 325.50787353515625
2024-08-03T23:29:41.308960968Z 
 57%|█████▋    | 5406/9500 [18:32:11<13:55:45, 12.25s/it]08/03/2024 16:29:41 - INFO - __main__ -   Step: 5406, LR: 8.884577947954433e-06, Loss: 410.1920166015625
2024-08-03T23:29:54.242900835Z 
 57%|█████▋    | 5407/9500 [18:32:24<14:09:35, 12.45s/it]08/03/2024 16:29:54 - INFO - __main__ -   Step: 5407, LR: 8.882407404267154e-06, Loss: 480.2752990722656
2024-08-03T23:30:06.603788655Z 
 57%|█████▋    | 5408/9500 [18:32:36<14:07:26, 12.43s/it]08/03/2024 16:30:06 - INFO - __main__ -   Step: 5408, LR: 8.880236860579876e-06, Loss: 441.8271484375
2024-08-03T23:30:18.796721382Z 
 57%|█████▋    | 5409/9500 [18:32:48<14:02:29, 12.36s/it]08/03/2024 16:30:18 - INFO - __main__ -   Step: 5409, LR: 8.878066316892596e-06, Loss: 432.69903564453125
2024-08-03T23:30:31.667903646Z 
 57%|█████▋    | 5410/9500 [18:33:01<14:12:48, 12.51s/it]08/03/2024 16:30:31 - INFO - __main__ -   Step: 5410, LR: 8.875895773205317e-06, Loss: 384.3227233886719
2024-08-03T23:30:44.089160666Z 
 57%|█████▋    | 5411/9500 [18:33:14<14:10:47, 12.48s/it]08/03/2024 16:30:44 - INFO - __main__ -   Step: 5411, LR: 8.873725229518039e-06, Loss: 371.86602783203125
2024-08-03T23:30:56.014026658Z 
 57%|█████▋    | 5412/9500 [18:33:25<13:59:08, 12.32s/it]08/03/2024 16:30:56 - INFO - __main__ -   Step: 5412, LR: 8.871554685830759e-06, Loss: 338.0588073730469
2024-08-03T23:31:08.139687921Z 
 57%|█████▋    | 5413/9500 [18:33:38<13:55:02, 12.26s/it]08/03/2024 16:31:08 - INFO - __main__ -   Step: 5413, LR: 8.86938414214348e-06, Loss: 384.69708251953125
2024-08-03T23:31:20.893101385Z 
 57%|█████▋    | 5414/9500 [18:33:50<14:04:56, 12.41s/it]08/03/2024 16:31:20 - INFO - __main__ -   Step: 5414, LR: 8.867213598456202e-06, Loss: 357.71282958984375
2024-08-03T23:31:32.894918509Z 
 57%|█████▋    | 5415/9500 [18:34:02<13:56:27, 12.29s/it]08/03/2024 16:31:32 - INFO - __main__ -   Step: 5415, LR: 8.865043054768923e-06, Loss: 428.4306945800781
2024-08-03T23:31:45.223601707Z 
 57%|█████▋    | 5416/9500 [18:34:15<13:57:07, 12.30s/it]08/03/2024 16:31:45 - INFO - __main__ -   Step: 5416, LR: 8.862872511081643e-06, Loss: 423.9830322265625
2024-08-03T23:31:57.931001780Z 
 57%|█████▋    | 5417/9500 [18:34:27<14:05:15, 12.42s/it]08/03/2024 16:31:57 - INFO - __main__ -   Step: 5417, LR: 8.860701967394365e-06, Loss: 393.5810546875
2024-08-03T23:32:10.047567106Z 
 57%|█████▋    | 5418/9500 [18:34:39<13:58:50, 12.33s/it]08/03/2024 16:32:10 - INFO - __main__ -   Step: 5418, LR: 8.858531423707086e-06, Loss: 504.849853515625
2024-08-03T23:32:22.482133338Z 
 57%|█████▋    | 5419/9500 [18:34:52<14:00:45, 12.36s/it]08/03/2024 16:32:22 - INFO - __main__ -   Step: 5419, LR: 8.856360880019806e-06, Loss: 432.4818115234375
2024-08-03T23:32:35.576531106Z 
 57%|█████▋    | 5420/9500 [18:35:05<14:15:31, 12.58s/it]08/03/2024 16:32:35 - INFO - __main__ -   Step: 5420, LR: 8.854190336332528e-06, Loss: 523.6736450195312
2024-08-03T23:32:47.530171403Z 
 57%|█████▋    | 5421/9500 [18:35:17<14:02:30, 12.39s/it]08/03/2024 16:32:47 - INFO - __main__ -   Step: 5421, LR: 8.85201979264525e-06, Loss: 349.21905517578125
2024-08-03T23:32:59.662171869Z 
 57%|█████▋    | 5422/9500 [18:35:29<13:56:59, 12.31s/it]08/03/2024 16:32:59 - INFO - __main__ -   Step: 5422, LR: 8.849849248957971e-06, Loss: 432.83880615234375
2024-08-03T23:33:12.294748321Z 
 57%|█████▋    | 5423/9500 [18:35:42<14:03:15, 12.41s/it]08/03/2024 16:33:12 - INFO - __main__ -   Step: 5423, LR: 8.84767870527069e-06, Loss: 463.7521057128906
2024-08-03T23:33:25.097877634Z 
 57%|█████▋    | 5424/9500 [18:35:55<14:11:04, 12.53s/it]08/03/2024 16:33:25 - INFO - __main__ -   Step: 5424, LR: 8.845508161583412e-06, Loss: 449.97454833984375
2024-08-03T23:33:37.117534203Z 
 57%|█████▋    | 5425/9500 [18:36:07<14:00:29, 12.38s/it]08/03/2024 16:33:37 - INFO - __main__ -   Step: 5425, LR: 8.843337617896134e-06, Loss: 409.7637023925781
2024-08-03T23:33:49.800577594Z 
 57%|█████▋    | 5426/9500 [18:36:19<14:06:33, 12.47s/it]08/03/2024 16:33:49 - INFO - __main__ -   Step: 5426, LR: 8.841167074208854e-06, Loss: 431.48968505859375
2024-08-03T23:34:01.786935976Z 
 57%|█████▋    | 5427/9500 [18:36:31<13:56:32, 12.32s/it]08/03/2024 16:34:01 - INFO - __main__ -   Step: 5427, LR: 8.838996530521575e-06, Loss: 395.599853515625
2024-08-03T23:34:14.142497879Z 
 57%|█████▋    | 5428/9500 [18:36:44<13:57:00, 12.33s/it]08/03/2024 16:34:14 - INFO - __main__ -   Step: 5428, LR: 8.836825986834297e-06, Loss: 477.5511169433594
2024-08-03T23:34:26.730673988Z 
 57%|█████▋    | 5429/9500 [18:36:56<14:01:59, 12.41s/it]08/03/2024 16:34:26 - INFO - __main__ -   Step: 5429, LR: 8.834655443147018e-06, Loss: 346.8569641113281
2024-08-03T23:34:38.764074549Z 
 57%|█████▋    | 5430/9500 [18:37:08<13:54:07, 12.30s/it]08/03/2024 16:34:38 - INFO - __main__ -   Step: 5430, LR: 8.832484899459738e-06, Loss: 409.0252685546875
2024-08-03T23:34:50.940059945Z 
 57%|█████▋    | 5431/9500 [18:37:20<13:51:28, 12.26s/it]08/03/2024 16:34:50 - INFO - __main__ -   Step: 5431, LR: 8.83031435577246e-06, Loss: 401.09344482421875
2024-08-03T23:35:03.815736712Z 
 57%|█████▋    | 5432/9500 [18:37:33<14:03:46, 12.45s/it]08/03/2024 16:35:03 - INFO - __main__ -   Step: 5432, LR: 8.828143812085181e-06, Loss: 366.65606689453125
2024-08-03T23:35:15.823295136Z 
 57%|█████▋    | 5433/9500 [18:37:45<13:54:40, 12.31s/it]08/03/2024 16:35:15 - INFO - __main__ -   Step: 5433, LR: 8.825973268397901e-06, Loss: 280.64459228515625
2024-08-03T23:35:28.386434255Z 
 57%|█████▋    | 5434/9500 [18:37:58<13:59:29, 12.39s/it]08/03/2024 16:35:28 - INFO - __main__ -   Step: 5434, LR: 8.823802724710623e-06, Loss: 454.5618896484375
2024-08-03T23:35:41.145040706Z 
 57%|█████▋    | 5435/9500 [18:38:11<14:06:51, 12.50s/it]08/03/2024 16:35:41 - INFO - __main__ -   Step: 5435, LR: 8.821632181023344e-06, Loss: 420.6059875488281
2024-08-03T23:35:53.145398918Z 
 57%|█████▋    | 5436/9500 [18:38:23<13:56:30, 12.35s/it]08/03/2024 16:35:53 - INFO - __main__ -   Step: 5436, LR: 8.819461637336066e-06, Loss: 418.3572998046875
2024-08-03T23:36:05.253659833Z 
 57%|█████▋    | 5437/9500 [18:38:35<13:51:23, 12.28s/it]08/03/2024 16:36:05 - INFO - __main__ -   Step: 5437, LR: 8.817291093648786e-06, Loss: 414.766357421875
2024-08-03T23:36:18.122793397Z 
 57%|█████▋    | 5438/9500 [18:38:48<14:03:11, 12.45s/it]08/03/2024 16:36:18 - INFO - __main__ -   Step: 5438, LR: 8.815120549961507e-06, Loss: 392.2563171386719
2024-08-03T23:36:30.291719058Z 
 57%|█████▋    | 5439/9500 [18:39:00<13:57:10, 12.37s/it]08/03/2024 16:36:30 - INFO - __main__ -   Step: 5439, LR: 8.812950006274229e-06, Loss: 368.078857421875
2024-08-03T23:36:42.469052434Z 
 57%|█████▋    | 5440/9500 [18:39:12<13:53:04, 12.31s/it]08/03/2024 16:36:42 - INFO - __main__ -   Step: 5440, LR: 8.810779462586949e-06, Loss: 419.19244384765625
2024-08-03T23:36:55.141066546Z 
 57%|█████▋    | 5441/9500 [18:39:25<14:00:11, 12.42s/it]08/03/2024 16:36:55 - INFO - __main__ -   Step: 5441, LR: 8.80860891889967e-06, Loss: 415.8878173828125
2024-08-03T23:37:07.283523604Z 
 57%|█████▋    | 5442/9500 [18:39:37<13:54:21, 12.34s/it]08/03/2024 16:37:07 - INFO - __main__ -   Step: 5442, LR: 8.806438375212392e-06, Loss: 476.11279296875
2024-08-03T23:37:19.314922157Z 
 57%|█████▋    | 5443/9500 [18:39:49<13:47:57, 12.24s/it]08/03/2024 16:37:19 - INFO - __main__ -   Step: 5443, LR: 8.804267831525113e-06, Loss: 293.06982421875
2024-08-03T23:37:32.248204571Z 
 57%|█████▋    | 5444/9500 [18:40:02<14:01:43, 12.45s/it]08/03/2024 16:37:32 - INFO - __main__ -   Step: 5444, LR: 8.802097287837833e-06, Loss: 581.6845703125
2024-08-03T23:37:44.382547446Z 
 57%|█████▋    | 5445/9500 [18:40:14<13:55:04, 12.36s/it]08/03/2024 16:37:44 - INFO - __main__ -   Step: 5445, LR: 8.799926744150555e-06, Loss: 426.18402099609375
2024-08-03T23:37:56.502059369Z 
 57%|█████▋    | 5446/9500 [18:40:26<13:50:04, 12.29s/it]08/03/2024 16:37:56 - INFO - __main__ -   Step: 5446, LR: 8.797756200463276e-06, Loss: 378.51837158203125
2024-08-03T23:38:09.137736107Z 
 57%|█████▋    | 5447/9500 [18:40:39<13:56:58, 12.39s/it]08/03/2024 16:38:09 - INFO - __main__ -   Step: 5447, LR: 8.795585656775996e-06, Loss: 596.821533203125
2024-08-03T23:38:21.296876374Z 
 57%|█████▋    | 5448/9500 [18:40:51<13:52:04, 12.32s/it]08/03/2024 16:38:21 - INFO - __main__ -   Step: 5448, LR: 8.793415113088718e-06, Loss: 438.554931640625
2024-08-03T23:38:33.989412849Z 
 57%|█████▋    | 5449/9500 [18:41:03<13:59:24, 12.43s/it]08/03/2024 16:38:33 - INFO - __main__ -   Step: 5449, LR: 8.79124456940144e-06, Loss: 369.1837463378906
2024-08-03T23:38:46.848788139Z 
 57%|█████▋    | 5450/9500 [18:41:16<14:07:50, 12.56s/it]08/03/2024 16:38:46 - INFO - __main__ -   Step: 5450, LR: 8.789074025714161e-06, Loss: 425.1445007324219
2024-08-03T23:38:59.355702913Z 
 57%|█████▋    | 5451/9500 [18:41:29<14:06:32, 12.54s/it]08/03/2024 16:38:59 - INFO - __main__ -   Step: 5451, LR: 8.786903482026883e-06, Loss: 470.68426513671875
2024-08-03T23:39:11.600669233Z 
 57%|█████▋    | 5452/9500 [18:41:41<14:00:16, 12.45s/it]08/03/2024 16:39:11 - INFO - __main__ -   Step: 5452, LR: 8.784732938339602e-06, Loss: 584.0791015625
2024-08-03T23:39:24.174535159Z 
 57%|█████▋    | 5453/9500 [18:41:54<14:02:28, 12.49s/it]08/03/2024 16:39:24 - INFO - __main__ -   Step: 5453, LR: 8.782562394652324e-06, Loss: 463.89044189453125
2024-08-03T23:39:36.550650537Z 
 57%|█████▋    | 5454/9500 [18:42:06<13:59:57, 12.46s/it]08/03/2024 16:39:36 - INFO - __main__ -   Step: 5454, LR: 8.780391850965044e-06, Loss: 471.18731689453125
2024-08-03T23:39:48.783234005Z 
 57%|█████▋    | 5455/9500 [18:42:18<13:55:13, 12.39s/it]08/03/2024 16:39:48 - INFO - __main__ -   Step: 5455, LR: 8.778221307277765e-06, Loss: 496.3060607910156
2024-08-03T23:40:00.911762499Z 
 57%|█████▋    | 5456/9500 [18:42:30<13:49:45, 12.31s/it]08/03/2024 16:40:00 - INFO - __main__ -   Step: 5456, LR: 8.776050763590487e-06, Loss: 411.323974609375
2024-08-03T23:40:13.380884041Z 
 57%|█████▋    | 5457/9500 [18:42:43<13:52:44, 12.36s/it]08/03/2024 16:40:13 - INFO - __main__ -   Step: 5457, LR: 8.773880219903208e-06, Loss: 424.2333984375
2024-08-03T23:40:25.776611702Z 
 57%|█████▋    | 5458/9500 [18:42:55<13:53:18, 12.37s/it]08/03/2024 16:40:25 - INFO - __main__ -   Step: 5458, LR: 8.77170967621593e-06, Loss: 550.8350830078125
2024-08-03T23:40:38.017401629Z 
 57%|█████▋    | 5459/9500 [18:43:07<13:50:29, 12.33s/it]08/03/2024 16:40:38 - INFO - __main__ -   Step: 5459, LR: 8.76953913252865e-06, Loss: 458.44647216796875
2024-08-03T23:40:50.667707696Z 
 57%|█████▋    | 5460/9500 [18:43:20<13:56:44, 12.43s/it]08/03/2024 16:40:50 - INFO - __main__ -   Step: 5460, LR: 8.767368588841371e-06, Loss: 442.9658508300781
2024-08-03T23:41:02.799492021Z 
 57%|█████▋    | 5461/9500 [18:43:32<13:50:34, 12.34s/it]08/03/2024 16:41:02 - INFO - __main__ -   Step: 5461, LR: 8.765198045154091e-06, Loss: 395.75860595703125
2024-08-03T23:41:15.553168909Z 
 57%|█████▋    | 5462/9500 [18:43:45<13:58:45, 12.46s/it]08/03/2024 16:41:15 - INFO - __main__ -   Step: 5462, LR: 8.763027501466813e-06, Loss: 426.781005859375
2024-08-03T23:41:28.191928835Z 
 58%|█████▊    | 5463/9500 [18:43:58<14:02:05, 12.52s/it]08/03/2024 16:41:28 - INFO - __main__ -   Step: 5463, LR: 8.760856957779534e-06, Loss: 444.3760986328125
2024-08-03T23:41:40.374581717Z 
 58%|█████▊    | 5464/9500 [18:44:10<13:55:09, 12.42s/it]08/03/2024 16:41:40 - INFO - __main__ -   Step: 5464, LR: 8.758686414092256e-06, Loss: 373.8743896484375
2024-08-03T23:41:52.427659665Z 
 58%|█████▊    | 5465/9500 [18:44:22<13:47:38, 12.31s/it]08/03/2024 16:41:52 - INFO - __main__ -   Step: 5465, LR: 8.756515870404978e-06, Loss: 348.5072937011719
2024-08-03T23:42:04.910051879Z 
 58%|█████▊    | 5466/9500 [18:44:34<13:50:58, 12.36s/it]08/03/2024 16:42:04 - INFO - __main__ -   Step: 5466, LR: 8.754345326717697e-06, Loss: 424.9169006347656
2024-08-03T23:42:17.201722248Z 
 58%|█████▊    | 5467/9500 [18:44:47<13:49:23, 12.34s/it]08/03/2024 16:42:17 - INFO - __main__ -   Step: 5467, LR: 8.752174783030419e-06, Loss: 448.783203125
2024-08-03T23:42:29.672930047Z 
 58%|█████▊    | 5468/9500 [18:44:59<13:51:51, 12.38s/it]08/03/2024 16:42:29 - INFO - __main__ -   Step: 5468, LR: 8.750004239343139e-06, Loss: 402.13421630859375
2024-08-03T23:42:42.071751280Z 
 58%|█████▊    | 5469/9500 [18:45:12<13:52:03, 12.38s/it]08/03/2024 16:42:42 - INFO - __main__ -   Step: 5469, LR: 8.74783369565586e-06, Loss: 301.12652587890625
2024-08-03T23:42:54.332930684Z 
 58%|█████▊    | 5470/9500 [18:45:24<13:49:21, 12.35s/it]08/03/2024 16:42:54 - INFO - __main__ -   Step: 5470, LR: 8.745663151968582e-06, Loss: 495.13507080078125
2024-08-03T23:43:06.513005842Z 
 58%|█████▊    | 5471/9500 [18:45:36<13:45:46, 12.30s/it]08/03/2024 16:43:06 - INFO - __main__ -   Step: 5471, LR: 8.743492608281304e-06, Loss: 399.83331298828125
2024-08-03T23:43:18.998626724Z 
 58%|█████▊    | 5472/9500 [18:45:48<13:49:21, 12.35s/it]08/03/2024 16:43:18 - INFO - __main__ -   Step: 5472, LR: 8.741322064594025e-06, Loss: 454.7927551269531
2024-08-03T23:43:31.243347951Z 
 58%|█████▊    | 5473/9500 [18:46:01<13:46:57, 12.32s/it]08/03/2024 16:43:31 - INFO - __main__ -   Step: 5473, LR: 8.739151520906745e-06, Loss: 396.1254577636719
2024-08-03T23:43:43.420060739Z 
 58%|█████▊    | 5474/9500 [18:46:13<13:43:50, 12.28s/it]08/03/2024 16:43:43 - INFO - __main__ -   Step: 5474, LR: 8.736980977219467e-06, Loss: 434.4148254394531
2024-08-03T23:43:55.972626345Z 
 58%|█████▊    | 5475/9500 [18:46:25<13:49:10, 12.36s/it]08/03/2024 16:43:55 - INFO - __main__ -   Step: 5475, LR: 8.734810433532186e-06, Loss: 386.38739013671875
2024-08-03T23:44:08.114658563Z 
 58%|█████▊    | 5476/9500 [18:46:38<13:44:34, 12.29s/it]08/03/2024 16:44:08 - INFO - __main__ -   Step: 5476, LR: 8.732639889844908e-06, Loss: 443.5199279785156
2024-08-03T23:44:20.507763197Z 
 58%|█████▊    | 5477/9500 [18:46:50<13:46:20, 12.32s/it]08/03/2024 16:44:20 - INFO - __main__ -   Step: 5477, LR: 8.73046934615763e-06, Loss: 494.02850341796875
2024-08-03T23:44:33.401811144Z 
 58%|█████▊    | 5478/9500 [18:47:03<13:57:35, 12.50s/it]08/03/2024 16:44:33 - INFO - __main__ -   Step: 5478, LR: 8.728298802470351e-06, Loss: 393.9612121582031
2024-08-03T23:44:45.631901080Z 
 58%|█████▊    | 5479/9500 [18:47:15<13:52:02, 12.42s/it]08/03/2024 16:44:45 - INFO - __main__ -   Step: 5479, LR: 8.726128258783073e-06, Loss: 384.8331298828125
2024-08-03T23:44:57.637299387Z 
 58%|█████▊    | 5480/9500 [18:47:27<13:43:36, 12.29s/it]08/03/2024 16:44:57 - INFO - __main__ -   Step: 5480, LR: 8.723957715095793e-06, Loss: 499.991943359375
2024-08-03T23:45:10.186887239Z 
 58%|█████▊    | 5481/9500 [18:47:40<13:48:33, 12.37s/it]08/03/2024 16:45:10 - INFO - __main__ -   Step: 5481, LR: 8.721787171408514e-06, Loss: 424.9809265136719
2024-08-03T23:45:22.294875533Z 
 58%|█████▊    | 5482/9500 [18:47:52<13:43:05, 12.29s/it]08/03/2024 16:45:22 - INFO - __main__ -   Step: 5482, LR: 8.719616627721234e-06, Loss: 586.3231811523438
2024-08-03T23:45:34.408494489Z 
 58%|█████▊    | 5483/9500 [18:48:04<13:39:20, 12.24s/it]08/03/2024 16:45:34 - INFO - __main__ -   Step: 5483, LR: 8.717446084033955e-06, Loss: 344.5001220703125
2024-08-03T23:45:47.064926019Z 
 58%|█████▊    | 5484/9500 [18:48:17<13:47:31, 12.36s/it]08/03/2024 16:45:47 - INFO - __main__ -   Step: 5484, LR: 8.715275540346677e-06, Loss: 422.28228759765625
2024-08-03T23:45:59.160232238Z 
 58%|█████▊    | 5485/9500 [18:48:29<13:41:56, 12.28s/it]08/03/2024 16:45:59 - INFO - __main__ -   Step: 5485, LR: 8.713104996659399e-06, Loss: 384.48565673828125
2024-08-03T23:46:11.031878272Z 
 58%|█████▊    | 5486/9500 [18:48:40<13:33:28, 12.16s/it]08/03/2024 16:46:11 - INFO - __main__ -   Step: 5486, LR: 8.71093445297212e-06, Loss: 400.803466796875
2024-08-03T23:46:23.623352814Z 
 58%|█████▊    | 5487/9500 [18:48:53<13:41:56, 12.29s/it]08/03/2024 16:46:23 - INFO - __main__ -   Step: 5487, LR: 8.70876390928484e-06, Loss: 397.31634521484375
2024-08-03T23:46:35.721082506Z 
 58%|█████▊    | 5488/9500 [18:49:05<13:37:53, 12.23s/it]08/03/2024 16:46:35 - INFO - __main__ -   Step: 5488, LR: 8.706593365597562e-06, Loss: 366.09521484375
2024-08-03T23:46:48.006324812Z 
 58%|█████▊    | 5489/9500 [18:49:17<13:38:43, 12.25s/it]08/03/2024 16:46:48 - INFO - __main__ -   Step: 5489, LR: 8.704422821910281e-06, Loss: 327.75299072265625
2024-08-03T23:47:00.546770682Z 
 58%|█████▊    | 5490/9500 [18:49:30<13:44:26, 12.34s/it]08/03/2024 16:47:00 - INFO - __main__ -   Step: 5490, LR: 8.702252278223003e-06, Loss: 419.38201904296875
2024-08-03T23:47:12.708276130Z 
 58%|█████▊    | 5491/9500 [18:49:42<13:40:44, 12.28s/it]08/03/2024 16:47:12 - INFO - __main__ -   Step: 5491, LR: 8.700081734535725e-06, Loss: 421.3725891113281
2024-08-03T23:47:25.166904107Z 
 58%|█████▊    | 5492/9500 [18:49:55<13:44:03, 12.34s/it]08/03/2024 16:47:25 - INFO - __main__ -   Step: 5492, LR: 8.697911190848446e-06, Loss: 507.5147399902344
2024-08-03T23:47:37.749339160Z 
 58%|█████▊    | 5493/9500 [18:50:07<13:48:46, 12.41s/it]08/03/2024 16:47:37 - INFO - __main__ -   Step: 5493, LR: 8.695740647161168e-06, Loss: 417.08111572265625
2024-08-03T23:47:49.851011209Z 
 58%|█████▊    | 5494/9500 [18:50:19<13:42:23, 12.32s/it]08/03/2024 16:47:49 - INFO - __main__ -   Step: 5494, LR: 8.69357010347389e-06, Loss: 372.8243408203125
2024-08-03T23:48:01.844142261Z 
 58%|█████▊    | 5495/9500 [18:50:31<13:35:41, 12.22s/it]08/03/2024 16:48:01 - INFO - __main__ -   Step: 5495, LR: 8.691399559786609e-06, Loss: 364.4156494140625
2024-08-03T23:48:14.288384704Z 
 58%|█████▊    | 5496/9500 [18:50:44<13:39:58, 12.29s/it]08/03/2024 16:48:14 - INFO - __main__ -   Step: 5496, LR: 8.689229016099329e-06, Loss: 295.5478515625
2024-08-03T23:48:26.433046154Z 
 58%|█████▊    | 5497/9500 [18:50:56<13:36:54, 12.24s/it]08/03/2024 16:48:26 - INFO - __main__ -   Step: 5497, LR: 8.68705847241205e-06, Loss: 486.6060791015625
2024-08-03T23:48:38.547748902Z 
 58%|█████▊    | 5498/9500 [18:51:08<13:34:06, 12.21s/it]08/03/2024 16:48:38 - INFO - __main__ -   Step: 5498, LR: 8.684887928724772e-06, Loss: 508.2004089355469
2024-08-03T23:48:50.910749753Z 
 58%|█████▊    | 5499/9500 [18:51:20<13:37:03, 12.25s/it]08/03/2024 16:48:50 - INFO - __main__ -   Step: 5499, LR: 8.682717385037494e-06, Loss: 327.70086669921875
2024-08-03T23:49:03.687682231Z 
 58%|█████▊    | 5500/9500 [18:51:33<13:47:20, 12.41s/it]08/03/2024 16:49:03 - INFO - __main__ -   Step: 5500, LR: 8.680546841350215e-06, Loss: 461.57171630859375
2024-08-03T23:49:15.857264847Z 
 58%|█████▊    | 5501/9500 [18:51:45<13:42:19, 12.34s/it]08/03/2024 16:49:15 - INFO - __main__ -   Step: 5501, LR: 8.678376297662937e-06, Loss: 525.100830078125
2024-08-03T23:49:28.370304837Z 
 58%|█████▊    | 5502/9500 [18:51:58<13:45:36, 12.39s/it]08/03/2024 16:49:28 - INFO - __main__ -   Step: 5502, LR: 8.676205753975657e-06, Loss: 405.4576416015625
2024-08-03T23:49:40.481845762Z 
 58%|█████▊    | 5503/9500 [18:52:10<13:39:50, 12.31s/it]08/03/2024 16:49:40 - INFO - __main__ -   Step: 5503, LR: 8.674035210288378e-06, Loss: 384.67901611328125
2024-08-03T23:49:52.761533282Z 
 58%|█████▊    | 5504/9500 [18:52:22<13:39:05, 12.30s/it]08/03/2024 16:49:52 - INFO - __main__ -   Step: 5504, LR: 8.671864666601098e-06, Loss: 359.2225341796875
2024-08-03T23:50:05.218025367Z 
 58%|█████▊    | 5505/9500 [18:52:35<13:42:02, 12.35s/it]08/03/2024 16:50:05 - INFO - __main__ -   Step: 5505, LR: 8.66969412291382e-06, Loss: 559.40966796875
2024-08-03T23:50:17.733417700Z 
 58%|█████▊    | 5506/9500 [18:52:47<13:45:12, 12.40s/it]08/03/2024 16:50:17 - INFO - __main__ -   Step: 5506, LR: 8.667523579226541e-06, Loss: 413.03387451171875
2024-08-03T23:50:29.754676793Z 
 58%|█████▊    | 5507/9500 [18:52:59<13:37:30, 12.28s/it]08/03/2024 16:50:29 - INFO - __main__ -   Step: 5507, LR: 8.665353035539263e-06, Loss: 401.7996520996094
2024-08-03T23:50:41.826042935Z 
 58%|█████▊    | 5508/9500 [18:53:11<13:33:03, 12.22s/it]08/03/2024 16:50:41 - INFO - __main__ -   Step: 5508, LR: 8.663182491851984e-06, Loss: 486.18170166015625
2024-08-03T23:50:54.530540628Z 
 58%|█████▊    | 5509/9500 [18:53:24<13:42:31, 12.37s/it]08/03/2024 16:50:54 - INFO - __main__ -   Step: 5509, LR: 8.661011948164704e-06, Loss: 424.1828308105469
2024-08-03T23:51:06.622227178Z 
 58%|█████▊    | 5510/9500 [18:53:36<13:36:50, 12.28s/it]08/03/2024 16:51:06 - INFO - __main__ -   Step: 5510, LR: 8.658841404477426e-06, Loss: 415.96466064453125
2024-08-03T23:51:18.586243686Z 
 58%|█████▊    | 5511/9500 [18:53:48<13:30:16, 12.19s/it]08/03/2024 16:51:18 - INFO - __main__ -   Step: 5511, LR: 8.656670860790146e-06, Loss: 341.78704833984375
2024-08-03T23:51:30.994610032Z 
 58%|█████▊    | 5512/9500 [18:54:00<13:34:27, 12.25s/it]08/03/2024 16:51:30 - INFO - __main__ -   Step: 5512, LR: 8.654500317102867e-06, Loss: 388.6845397949219
2024-08-03T23:51:43.245662443Z 
 58%|█████▊    | 5513/9500 [18:54:13<13:34:12, 12.25s/it]08/03/2024 16:51:43 - INFO - __main__ -   Step: 5513, LR: 8.652329773415589e-06, Loss: 496.4940490722656
2024-08-03T23:51:55.425185864Z 
 58%|█████▊    | 5514/9500 [18:54:25<13:32:32, 12.23s/it]08/03/2024 16:51:55 - INFO - __main__ -   Step: 5514, LR: 8.65015922972831e-06, Loss: 414.5596923828125
2024-08-03T23:52:08.248122841Z 
 58%|█████▊    | 5515/9500 [18:54:38<13:44:07, 12.41s/it]08/03/2024 16:52:08 - INFO - __main__ -   Step: 5515, LR: 8.647988686041032e-06, Loss: 466.10772705078125
2024-08-03T23:52:20.244274745Z 
 58%|█████▊    | 5516/9500 [18:54:50<13:35:42, 12.28s/it]08/03/2024 16:52:20 - INFO - __main__ -   Step: 5516, LR: 8.645818142353752e-06, Loss: 433.45697021484375
2024-08-03T23:52:31.844112254Z 
 58%|█████▊    | 5517/9500 [18:55:01<13:21:52, 12.08s/it]08/03/2024 16:52:31 - INFO - __main__ -   Step: 5517, LR: 8.643647598666473e-06, Loss: 330.1365966796875
2024-08-03T23:52:44.273419497Z 
 58%|█████▊    | 5518/9500 [18:55:14<13:28:38, 12.18s/it]08/03/2024 16:52:44 - INFO - __main__ -   Step: 5518, LR: 8.641477054979193e-06, Loss: 387.1831970214844
2024-08-03T23:52:56.203481772Z 
 58%|█████▊    | 5519/9500 [18:55:26<13:23:21, 12.11s/it]08/03/2024 16:52:56 - INFO - __main__ -   Step: 5519, LR: 8.639306511291915e-06, Loss: 412.85443115234375
2024-08-03T23:53:08.280976171Z 
 58%|█████▊    | 5520/9500 [18:55:38<13:22:33, 12.10s/it]08/03/2024 16:53:08 - INFO - __main__ -   Step: 5520, LR: 8.637135967604636e-06, Loss: 447.0386962890625
2024-08-03T23:53:20.960590802Z 
 58%|█████▊    | 5521/9500 [18:55:50<13:33:54, 12.27s/it]08/03/2024 16:53:20 - INFO - __main__ -   Step: 5521, LR: 8.634965423917358e-06, Loss: 454.87945556640625
2024-08-03T23:53:33.173352219Z 
 58%|█████▊    | 5522/9500 [18:56:03<13:32:30, 12.25s/it]08/03/2024 16:53:33 - INFO - __main__ -   Step: 5522, LR: 8.63279488023008e-06, Loss: 431.2942810058594
2024-08-03T23:53:45.817921660Z 
 58%|█████▊    | 5523/9500 [18:56:15<13:40:03, 12.37s/it]08/03/2024 16:53:45 - INFO - __main__ -   Step: 5523, LR: 8.6306243365428e-06, Loss: 433.1787109375
2024-08-03T23:53:59.124664223Z 
 58%|█████▊    | 5524/9500 [18:56:29<13:58:25, 12.65s/it]08/03/2024 16:53:59 - INFO - __main__ -   Step: 5524, LR: 8.62845379285552e-06, Loss: 469.0105285644531
2024-08-03T23:54:11.307989715Z 
 58%|█████▊    | 5525/9500 [18:56:41<13:48:53, 12.51s/it]08/03/2024 16:54:11 - INFO - __main__ -   Step: 5525, LR: 8.62628324916824e-06, Loss: 366.4632568359375
2024-08-03T23:54:23.651925136Z 
 58%|█████▊    | 5526/9500 [18:56:53<13:45:21, 12.46s/it]08/03/2024 16:54:23 - INFO - __main__ -   Step: 5526, LR: 8.624112705480962e-06, Loss: 426.4740295410156
2024-08-03T23:54:36.144840227Z 
 58%|█████▊    | 5527/9500 [18:57:06<13:45:46, 12.47s/it]08/03/2024 16:54:36 - INFO - __main__ -   Step: 5527, LR: 8.621942161793684e-06, Loss: 355.6617126464844
2024-08-03T23:54:48.613945468Z 
 58%|█████▊    | 5528/9500 [18:57:18<13:45:31, 12.47s/it]08/03/2024 16:54:48 - INFO - __main__ -   Step: 5528, LR: 8.619771618106405e-06, Loss: 468.795166015625
2024-08-03T23:55:00.798462699Z 
 58%|█████▊    | 5529/9500 [18:57:30<13:39:39, 12.38s/it]08/03/2024 16:55:00 - INFO - __main__ -   Step: 5529, LR: 8.617601074419127e-06, Loss: 464.42242431640625
2024-08-03T23:55:13.367108122Z 
 58%|█████▊    | 5530/9500 [18:57:43<13:43:05, 12.44s/it]08/03/2024 16:55:13 - INFO - __main__ -   Step: 5530, LR: 8.615430530731847e-06, Loss: 419.3162536621094
2024-08-03T23:55:25.458672500Z 
 58%|█████▊    | 5531/9500 [18:57:55<13:35:58, 12.34s/it]08/03/2024 16:55:25 - INFO - __main__ -   Step: 5531, LR: 8.613259987044568e-06, Loss: 410.7763366699219
2024-08-03T23:55:37.785738345Z 
 58%|█████▊    | 5532/9500 [18:58:07<13:35:36, 12.33s/it]08/03/2024 16:55:37 - INFO - __main__ -   Step: 5532, LR: 8.611089443357288e-06, Loss: 393.4149169921875
2024-08-03T23:55:50.297179305Z 
 58%|█████▊    | 5533/9500 [18:58:20<13:38:55, 12.39s/it]08/03/2024 16:55:50 - INFO - __main__ -   Step: 5533, LR: 8.60891889967001e-06, Loss: 346.7171325683594
2024-08-03T23:56:02.718443078Z 
 58%|█████▊    | 5534/9500 [18:58:32<13:39:26, 12.40s/it]08/03/2024 16:56:02 - INFO - __main__ -   Step: 5534, LR: 8.606748355982731e-06, Loss: 467.32513427734375
2024-08-03T23:56:14.993768618Z 
 58%|█████▊    | 5535/9500 [18:58:44<13:36:49, 12.36s/it]08/03/2024 16:56:14 - INFO - __main__ -   Step: 5535, LR: 8.604577812295453e-06, Loss: 384.52978515625
2024-08-03T23:56:27.674012882Z 
 58%|█████▊    | 5536/9500 [18:58:57<13:42:57, 12.46s/it]08/03/2024 16:56:27 - INFO - __main__ -   Step: 5536, LR: 8.602407268608174e-06, Loss: 331.2412109375
2024-08-03T23:56:40.049887690Z 
 58%|█████▊    | 5537/9500 [18:59:09<13:41:09, 12.43s/it]08/03/2024 16:56:40 - INFO - __main__ -   Step: 5537, LR: 8.600236724920894e-06, Loss: 357.82574462890625
2024-08-03T23:56:52.456882929Z 
 58%|█████▊    | 5538/9500 [18:59:22<13:40:26, 12.42s/it]08/03/2024 16:56:52 - INFO - __main__ -   Step: 5538, LR: 8.598066181233616e-06, Loss: 411.43707275390625
2024-08-03T23:57:05.358812444Z 
 58%|█████▊    | 5539/9500 [18:59:35<13:49:40, 12.57s/it]08/03/2024 16:57:05 - INFO - __main__ -   Step: 5539, LR: 8.595895637546336e-06, Loss: 515.0425415039062
2024-08-03T23:57:17.563853678Z 
 58%|█████▊    | 5540/9500 [18:59:47<13:42:17, 12.46s/it]08/03/2024 16:57:17 - INFO - __main__ -   Step: 5540, LR: 8.593725093859057e-06, Loss: 414.75079345703125
2024-08-03T23:57:29.896756794Z 
 58%|█████▊    | 5541/9500 [18:59:59<13:39:35, 12.42s/it]08/03/2024 16:57:29 - INFO - __main__ -   Step: 5541, LR: 8.591554550171779e-06, Loss: 558.4337158203125
2024-08-03T23:57:42.093626531Z 
 58%|█████▊    | 5542/9500 [19:00:12<13:34:55, 12.35s/it]08/03/2024 16:57:42 - INFO - __main__ -   Step: 5542, LR: 8.5893840064845e-06, Loss: 544.629150390625
2024-08-03T23:57:54.590744008Z 
 58%|█████▊    | 5543/9500 [19:00:24<13:37:34, 12.40s/it]08/03/2024 16:57:54 - INFO - __main__ -   Step: 5543, LR: 8.587213462797222e-06, Loss: 481.9617919921875
2024-08-03T23:58:06.820142923Z 
 58%|█████▊    | 5544/9500 [19:00:36<13:34:03, 12.35s/it]08/03/2024 16:58:06 - INFO - __main__ -   Step: 5544, LR: 8.585042919109942e-06, Loss: 402.63519287109375
2024-08-03T23:58:19.011150097Z 
 58%|█████▊    | 5545/9500 [19:00:48<13:30:46, 12.30s/it]08/03/2024 16:58:19 - INFO - __main__ -   Step: 5545, LR: 8.582872375422663e-06, Loss: 350.1161804199219
2024-08-03T23:58:31.493630340Z 
 58%|█████▊    | 5546/9500 [19:01:01<13:34:10, 12.35s/it]08/03/2024 16:58:31 - INFO - __main__ -   Step: 5546, LR: 8.580701831735383e-06, Loss: 465.4200744628906
2024-08-03T23:58:43.673455914Z 
 58%|█████▊    | 5547/9500 [19:01:13<13:30:30, 12.30s/it]08/03/2024 16:58:43 - INFO - __main__ -   Step: 5547, LR: 8.578531288048105e-06, Loss: 468.5096435546875
2024-08-03T23:58:56.114707385Z 
 58%|█████▊    | 5548/9500 [19:01:26<13:33:03, 12.34s/it]08/03/2024 16:58:56 - INFO - __main__ -   Step: 5548, LR: 8.576360744360826e-06, Loss: 367.410400390625
2024-08-03T23:59:08.686508019Z 
 58%|█████▊    | 5549/9500 [19:01:38<13:37:21, 12.41s/it]08/03/2024 16:59:08 - INFO - __main__ -   Step: 5549, LR: 8.574190200673548e-06, Loss: 438.03564453125
2024-08-03T23:59:21.028815045Z 
 58%|█████▊    | 5550/9500 [19:01:50<13:35:45, 12.39s/it]08/03/2024 16:59:21 - INFO - __main__ -   Step: 5550, LR: 8.57201965698627e-06, Loss: 430.06488037109375
2024-08-03T23:59:33.286776055Z 
 58%|█████▊    | 5551/9500 [19:02:03<13:32:55, 12.35s/it]08/03/2024 16:59:33 - INFO - __main__ -   Step: 5551, LR: 8.56984911329899e-06, Loss: 503.419921875
2024-08-03T23:59:45.760957951Z 
 58%|█████▊    | 5552/9500 [19:02:15<13:35:08, 12.39s/it]08/03/2024 16:59:45 - INFO - __main__ -   Step: 5552, LR: 8.567678569611711e-06, Loss: 460.4468078613281
2024-08-03T23:59:57.842606790Z 
 58%|█████▊    | 5553/9500 [19:02:27<13:28:53, 12.30s/it]08/03/2024 16:59:57 - INFO - __main__ -   Step: 5553, LR: 8.565508025924432e-06, Loss: 369.604736328125
2024-08-04T00:00:09.782071167Z 
 58%|█████▊    | 5554/9500 [19:02:39<13:21:38, 12.19s/it]08/03/2024 17:00:09 - INFO - __main__ -   Step: 5554, LR: 8.563337482237152e-06, Loss: 384.7353210449219
2024-08-04T00:00:22.423788188Z 
 58%|█████▊    | 5555/9500 [19:02:52<13:30:21, 12.32s/it]08/03/2024 17:00:22 - INFO - __main__ -   Step: 5555, LR: 8.561166938549874e-06, Loss: 370.08856201171875
2024-08-04T00:00:34.457050560Z 
 58%|█████▊    | 5556/9500 [19:03:04<13:24:24, 12.24s/it]08/03/2024 17:00:34 - INFO - __main__ -   Step: 5556, LR: 8.558996394862595e-06, Loss: 478.1741943359375
2024-08-04T00:00:46.574840896Z 
 58%|█████▊    | 5557/9500 [19:03:16<13:21:50, 12.20s/it]08/03/2024 17:00:46 - INFO - __main__ -   Step: 5557, LR: 8.556825851175317e-06, Loss: 407.7315368652344
2024-08-04T00:00:59.135263551Z 
 59%|█████▊    | 5558/9500 [19:03:29<13:28:43, 12.31s/it]08/03/2024 17:00:59 - INFO - __main__ -   Step: 5558, LR: 8.554655307488037e-06, Loss: 392.6662902832031
2024-08-04T00:01:11.221324666Z 
 59%|█████▊    | 5559/9500 [19:03:41<13:24:06, 12.24s/it]08/03/2024 17:01:11 - INFO - __main__ -   Step: 5559, LR: 8.552484763800758e-06, Loss: 371.27508544921875
2024-08-04T00:01:23.803543607Z 
 59%|█████▊    | 5560/9500 [19:03:53<13:30:36, 12.34s/it]08/03/2024 17:01:23 - INFO - __main__ -   Step: 5560, LR: 8.55031422011348e-06, Loss: 366.99322509765625
2024-08-04T00:01:36.660921494Z 
 59%|█████▊    | 5561/9500 [19:04:06<13:40:30, 12.50s/it]08/03/2024 17:01:36 - INFO - __main__ -   Step: 5561, LR: 8.5481436764262e-06, Loss: 313.22186279296875
2024-08-04T00:01:48.805383338Z 
 59%|█████▊    | 5562/9500 [19:04:18<13:33:20, 12.39s/it]08/03/2024 17:01:48 - INFO - __main__ -   Step: 5562, LR: 8.545973132738921e-06, Loss: 449.66632080078125
2024-08-04T00:02:00.859309889Z 
 59%|█████▊    | 5563/9500 [19:04:30<13:26:27, 12.29s/it]08/03/2024 17:02:00 - INFO - __main__ -   Step: 5563, LR: 8.543802589051643e-06, Loss: 452.03350830078125
2024-08-04T00:02:13.519775672Z 
 59%|█████▊    | 5564/9500 [19:04:43<13:33:32, 12.40s/it]08/03/2024 17:02:13 - INFO - __main__ -   Step: 5564, LR: 8.541632045364365e-06, Loss: 409.3775939941406
2024-08-04T00:02:25.681281364Z 
 59%|█████▊    | 5565/9500 [19:04:55<13:28:36, 12.33s/it]08/03/2024 17:02:25 - INFO - __main__ -   Step: 5565, LR: 8.539461501677084e-06, Loss: 300.8839416503906
2024-08-04T00:02:37.753412488Z 
 59%|█████▊    | 5566/9500 [19:05:07<13:23:20, 12.25s/it]08/03/2024 17:02:37 - INFO - __main__ -   Step: 5566, LR: 8.537290957989806e-06, Loss: 478.7942199707031
2024-08-04T00:02:50.409400288Z 
 59%|█████▊    | 5567/9500 [19:05:20<13:31:04, 12.37s/it]08/03/2024 17:02:50 - INFO - __main__ -   Step: 5567, LR: 8.535120414302528e-06, Loss: 469.11138916015625
2024-08-04T00:03:02.736409670Z 
 59%|█████▊    | 5568/9500 [19:05:32<13:29:57, 12.36s/it]08/03/2024 17:03:02 - INFO - __main__ -   Step: 5568, LR: 8.532949870615247e-06, Loss: 444.2513122558594
2024-08-04T00:03:15.160349309Z 
 59%|█████▊    | 5569/9500 [19:05:45<13:31:01, 12.38s/it]08/03/2024 17:03:15 - INFO - __main__ -   Step: 5569, LR: 8.530779326927969e-06, Loss: 430.1623840332031
2024-08-04T00:03:27.700856459Z 
 59%|█████▊    | 5570/9500 [19:05:57<13:33:59, 12.43s/it]08/03/2024 17:03:27 - INFO - __main__ -   Step: 5570, LR: 8.52860878324069e-06, Loss: 409.775634765625
2024-08-04T00:03:39.736328869Z 
 59%|█████▊    | 5571/9500 [19:06:09<13:26:05, 12.31s/it]08/03/2024 17:03:39 - INFO - __main__ -   Step: 5571, LR: 8.526438239553412e-06, Loss: 467.1358642578125
2024-08-04T00:03:52.213263980Z 
 59%|█████▊    | 5572/9500 [19:06:22<13:29:10, 12.36s/it]08/03/2024 17:03:52 - INFO - __main__ -   Step: 5572, LR: 8.524267695866132e-06, Loss: 570.0032958984375
2024-08-04T00:04:04.827199507Z 
 59%|█████▊    | 5573/9500 [19:06:34<13:33:56, 12.44s/it]08/03/2024 17:04:04 - INFO - __main__ -   Step: 5573, LR: 8.522097152178853e-06, Loss: 409.5259094238281
2024-08-04T00:04:16.915687377Z 
 59%|█████▊    | 5574/9500 [19:06:46<13:26:54, 12.33s/it]08/03/2024 17:04:16 - INFO - __main__ -   Step: 5574, LR: 8.519926608491575e-06, Loss: 362.9232177734375
2024-08-04T00:04:28.790158415Z 
 59%|█████▊    | 5575/9500 [19:06:58<13:17:43, 12.19s/it]08/03/2024 17:04:28 - INFO - __main__ -   Step: 5575, LR: 8.517756064804295e-06, Loss: 393.59906005859375
2024-08-04T00:04:41.100050379Z 
 59%|█████▊    | 5576/9500 [19:07:11<13:19:47, 12.23s/it]08/03/2024 17:04:41 - INFO - __main__ -   Step: 5576, LR: 8.515585521117016e-06, Loss: 430.5760498046875
2024-08-04T00:04:53.346520488Z 
 59%|█████▊    | 5577/9500 [19:07:23<13:19:55, 12.23s/it]08/03/2024 17:04:53 - INFO - __main__ -   Step: 5577, LR: 8.513414977429738e-06, Loss: 376.9015197753906
2024-08-04T00:05:05.709723066Z 
 59%|█████▊    | 5578/9500 [19:07:35<13:22:14, 12.27s/it]08/03/2024 17:05:05 - INFO - __main__ -   Step: 5578, LR: 8.51124443374246e-06, Loss: 406.223876953125
2024-08-04T00:05:18.483851054Z 
 59%|█████▊    | 5579/9500 [19:07:48<13:31:51, 12.42s/it]08/03/2024 17:05:18 - INFO - __main__ -   Step: 5579, LR: 8.50907389005518e-06, Loss: 390.8236999511719
2024-08-04T00:05:30.519551827Z 
 59%|█████▊    | 5580/9500 [19:08:00<13:24:03, 12.31s/it]08/03/2024 17:05:30 - INFO - __main__ -   Step: 5580, LR: 8.506903346367901e-06, Loss: 506.3825378417969
2024-08-04T00:05:42.948076759Z 
 59%|█████▊    | 5581/9500 [19:08:12<13:26:14, 12.34s/it]08/03/2024 17:05:42 - INFO - __main__ -   Step: 5581, LR: 8.504732802680623e-06, Loss: 390.28857421875
2024-08-04T00:05:55.593313976Z 
 59%|█████▉    | 5582/9500 [19:08:25<13:31:56, 12.43s/it]08/03/2024 17:05:55 - INFO - __main__ -   Step: 5582, LR: 8.502562258993342e-06, Loss: 409.8562316894531
2024-08-04T00:06:07.649725106Z 
 59%|█████▉    | 5583/9500 [19:08:37<13:24:20, 12.32s/it]08/03/2024 17:06:07 - INFO - __main__ -   Step: 5583, LR: 8.500391715306064e-06, Loss: 444.99151611328125
2024-08-04T00:06:19.888401984Z 
 59%|█████▉    | 5584/9500 [19:08:49<13:22:31, 12.30s/it]08/03/2024 17:06:19 - INFO - __main__ -   Step: 5584, LR: 8.498221171618786e-06, Loss: 415.7857666015625
2024-08-04T00:06:32.174477565Z 
 59%|█████▉    | 5585/9500 [19:09:02<13:22:07, 12.29s/it]08/03/2024 17:06:32 - INFO - __main__ -   Step: 5585, LR: 8.496050627931507e-06, Loss: 427.893310546875
2024-08-04T00:06:45.037265801Z 
 59%|█████▉    | 5586/9500 [19:09:14<13:33:04, 12.46s/it]08/03/2024 17:06:45 - INFO - __main__ -   Step: 5586, LR: 8.493880084244227e-06, Loss: 416.37774658203125
2024-08-04T00:06:57.083837865Z 
 59%|█████▉    | 5587/9500 [19:09:27<13:24:41, 12.34s/it]08/03/2024 17:06:57 - INFO - __main__ -   Step: 5587, LR: 8.491709540556949e-06, Loss: 498.9590759277344
2024-08-04T00:07:09.367388603Z 
 59%|█████▉    | 5588/9500 [19:09:39<13:23:24, 12.32s/it]08/03/2024 17:07:09 - INFO - __main__ -   Step: 5588, LR: 8.48953899686967e-06, Loss: 399.842529296875
2024-08-04T00:07:21.901235350Z 
 59%|█████▉    | 5589/9500 [19:09:51<13:27:20, 12.39s/it]08/03/2024 17:07:21 - INFO - __main__ -   Step: 5589, LR: 8.48736845318239e-06, Loss: 421.8293151855469
2024-08-04T00:07:34.220543317Z 
 59%|█████▉    | 5590/9500 [19:10:04<13:25:50, 12.37s/it]08/03/2024 17:07:34 - INFO - __main__ -   Step: 5590, LR: 8.485197909495112e-06, Loss: 487.5357666015625
2024-08-04T00:07:46.064721743Z 
 59%|█████▉    | 5591/9500 [19:10:16<13:15:25, 12.21s/it]08/03/2024 17:07:46 - INFO - __main__ -   Step: 5591, LR: 8.483027365807833e-06, Loss: 444.0155334472656
2024-08-04T00:07:58.553867133Z 
 59%|█████▉    | 5592/9500 [19:10:28<13:20:42, 12.29s/it]08/03/2024 17:07:58 - INFO - __main__ -   Step: 5592, LR: 8.480856822120555e-06, Loss: 388.6691589355469
2024-08-04T00:08:11.039136089Z 
 59%|█████▉    | 5593/9500 [19:10:40<13:24:15, 12.35s/it]08/03/2024 17:08:11 - INFO - __main__ -   Step: 5593, LR: 8.478686278433275e-06, Loss: 516.054443359375
2024-08-04T00:08:23.262539946Z 
 59%|█████▉    | 5594/9500 [19:10:53<13:21:33, 12.31s/it]08/03/2024 17:08:23 - INFO - __main__ -   Step: 5594, LR: 8.476515734745996e-06, Loss: 473.07366943359375
2024-08-04T00:08:35.895686225Z 
 59%|█████▉    | 5595/9500 [19:11:05<13:27:36, 12.41s/it]08/03/2024 17:08:35 - INFO - __main__ -   Step: 5595, LR: 8.474345191058718e-06, Loss: 309.7297058105469
2024-08-04T00:08:48.275269774Z 
 59%|█████▉    | 5596/9500 [19:11:18<13:26:49, 12.40s/it]08/03/2024 17:08:48 - INFO - __main__ -   Step: 5596, LR: 8.47217464737144e-06, Loss: 529.0169677734375
2024-08-04T00:09:00.616599124Z 
 59%|█████▉    | 5597/9500 [19:11:30<13:25:28, 12.38s/it]08/03/2024 17:09:00 - INFO - __main__ -   Step: 5597, LR: 8.470004103684159e-06, Loss: 493.15625
2024-08-04T00:09:13.025527766Z 
 59%|█████▉    | 5598/9500 [19:11:42<13:25:47, 12.39s/it]08/03/2024 17:09:13 - INFO - __main__ -   Step: 5598, LR: 8.46783355999688e-06, Loss: 429.545166015625
2024-08-04T00:09:25.096602197Z 
 59%|█████▉    | 5599/9500 [19:11:55<13:19:21, 12.29s/it]08/03/2024 17:09:25 - INFO - __main__ -   Step: 5599, LR: 8.465663016309602e-06, Loss: 389.624267578125
2024-08-04T00:09:37.245145152Z 
 59%|█████▉    | 5600/9500 [19:12:07<13:16:18, 12.25s/it]08/03/2024 17:09:37 - INFO - __main__ -   Step: 5600, LR: 8.463492472622322e-06, Loss: 488.900634765625
2024-08-04T00:09:49.872291380Z 
 59%|█████▉    | 5601/9500 [19:12:19<13:23:25, 12.36s/it]08/03/2024 17:09:49 - INFO - __main__ -   Step: 5601, LR: 8.461321928935044e-06, Loss: 418.71466064453125
2024-08-04T00:10:02.057367811Z 
 59%|█████▉    | 5602/9500 [19:12:31<13:19:44, 12.31s/it]08/03/2024 17:10:02 - INFO - __main__ -   Step: 5602, LR: 8.459151385247765e-06, Loss: 404.6510009765625
2024-08-04T00:10:14.233314525Z 
 59%|█████▉    | 5603/9500 [19:12:44<13:16:55, 12.27s/it]08/03/2024 17:10:14 - INFO - __main__ -   Step: 5603, LR: 8.456980841560487e-06, Loss: 397.14129638671875
2024-08-04T00:10:26.702254462Z 
 59%|█████▉    | 5604/9500 [19:12:56<13:20:36, 12.33s/it]08/03/2024 17:10:26 - INFO - __main__ -   Step: 5604, LR: 8.454810297873207e-06, Loss: 500.4870910644531
2024-08-04T00:10:38.883785810Z 
 59%|█████▉    | 5605/9500 [19:13:08<13:17:30, 12.29s/it]08/03/2024 17:10:38 - INFO - __main__ -   Step: 5605, LR: 8.452639754185928e-06, Loss: 480.943115234375
2024-08-04T00:10:51.029464495Z 
 59%|█████▉    | 5606/9500 [19:13:20<13:14:35, 12.24s/it]08/03/2024 17:10:51 - INFO - __main__ -   Step: 5606, LR: 8.45046921049865e-06, Loss: 460.22552490234375
2024-08-04T00:11:03.572589440Z 
 59%|█████▉    | 5607/9500 [19:13:33<13:20:13, 12.33s/it]08/03/2024 17:11:03 - INFO - __main__ -   Step: 5607, LR: 8.44829866681137e-06, Loss: 429.708740234375
2024-08-04T00:11:15.942161926Z 
 59%|█████▉    | 5608/9500 [19:13:45<13:20:43, 12.34s/it]08/03/2024 17:11:15 - INFO - __main__ -   Step: 5608, LR: 8.446128123124091e-06, Loss: 453.88421630859375
2024-08-04T00:11:28.078190562Z 
 59%|█████▉    | 5609/9500 [19:13:58<13:16:28, 12.28s/it]08/03/2024 17:11:28 - INFO - __main__ -   Step: 5609, LR: 8.443957579436813e-06, Loss: 448.27740478515625
2024-08-04T00:11:40.464671749Z 
 59%|█████▉    | 5610/9500 [19:14:10<13:18:18, 12.31s/it]08/03/2024 17:11:40 - INFO - __main__ -   Step: 5610, LR: 8.441787035749534e-06, Loss: 388.64306640625
2024-08-04T00:11:52.405560922Z 
 59%|█████▉    | 5611/9500 [19:14:22<13:10:51, 12.20s/it]08/03/2024 17:11:52 - INFO - __main__ -   Step: 5611, LR: 8.439616492062254e-06, Loss: 351.534912109375
2024-08-04T00:12:04.586792823Z 
 59%|█████▉    | 5612/9500 [19:14:34<13:10:15, 12.20s/it]08/03/2024 17:12:04 - INFO - __main__ -   Step: 5612, LR: 8.437445948374976e-06, Loss: 331.14752197265625
2024-08-04T00:12:17.038165473Z 
 59%|█████▉    | 5613/9500 [19:14:46<13:15:02, 12.27s/it]08/03/2024 17:12:17 - INFO - __main__ -   Step: 5613, LR: 8.435275404687697e-06, Loss: 448.1040954589844
2024-08-04T00:12:29.339234483Z 
 59%|█████▉    | 5614/9500 [19:14:59<13:15:21, 12.28s/it]08/03/2024 17:12:29 - INFO - __main__ -   Step: 5614, LR: 8.433104861000417e-06, Loss: 473.69439697265625
2024-08-04T00:12:41.370673292Z 
 59%|█████▉    | 5615/9500 [19:15:11<13:10:20, 12.21s/it]08/03/2024 17:12:41 - INFO - __main__ -   Step: 5615, LR: 8.430934317313139e-06, Loss: 338.7805480957031
2024-08-04T00:12:54.338781969Z 
 59%|█████▉    | 5616/9500 [19:15:24<13:24:55, 12.43s/it]08/03/2024 17:12:54 - INFO - __main__ -   Step: 5616, LR: 8.42876377362586e-06, Loss: 381.2288818359375
2024-08-04T00:13:06.395048205Z 
 59%|█████▉    | 5617/9500 [19:15:36<13:17:23, 12.32s/it]08/03/2024 17:13:06 - INFO - __main__ -   Step: 5617, LR: 8.426593229938582e-06, Loss: 445.7930908203125
2024-08-04T00:13:18.774714225Z 
 59%|█████▉    | 5618/9500 [19:15:48<13:18:19, 12.34s/it]08/03/2024 17:13:18 - INFO - __main__ -   Step: 5618, LR: 8.424422686251302e-06, Loss: 450.93890380859375
2024-08-04T00:13:31.242143786Z 
 59%|█████▉    | 5619/9500 [19:16:01<13:20:36, 12.38s/it]08/03/2024 17:13:31 - INFO - __main__ -   Step: 5619, LR: 8.422252142564023e-06, Loss: 476.3589172363281
2024-08-04T00:13:43.449373359Z 
 59%|█████▉    | 5620/9500 [19:16:13<13:17:05, 12.33s/it]08/03/2024 17:13:43 - INFO - __main__ -   Step: 5620, LR: 8.420081598876745e-06, Loss: 367.45050048828125
2024-08-04T00:13:55.651573628Z 
 59%|█████▉    | 5621/9500 [19:16:25<13:14:29, 12.29s/it]08/03/2024 17:13:55 - INFO - __main__ -   Step: 5621, LR: 8.417911055189465e-06, Loss: 424.2343444824219
2024-08-04T00:14:07.961788680Z 
 59%|█████▉    | 5622/9500 [19:16:37<13:14:41, 12.30s/it]08/03/2024 17:14:07 - INFO - __main__ -   Step: 5622, LR: 8.415740511502186e-06, Loss: 433.0836181640625
2024-08-04T00:14:20.198906133Z 
 59%|█████▉    | 5623/9500 [19:16:50<13:13:21, 12.28s/it]08/03/2024 17:14:20 - INFO - __main__ -   Step: 5623, LR: 8.413569967814908e-06, Loss: 426.031982421875
2024-08-04T00:14:32.466397721Z 
 59%|█████▉    | 5624/9500 [19:17:02<13:12:57, 12.27s/it]08/03/2024 17:14:32 - INFO - __main__ -   Step: 5624, LR: 8.41139942412763e-06, Loss: 501.06561279296875
2024-08-04T00:14:45.005577468Z 
 59%|█████▉    | 5625/9500 [19:17:14<13:17:52, 12.35s/it]08/03/2024 17:14:45 - INFO - __main__ -   Step: 5625, LR: 8.40922888044035e-06, Loss: 465.3880615234375
2024-08-04T00:14:57.223470186Z 
 59%|█████▉    | 5626/9500 [19:17:27<13:15:01, 12.31s/it]08/03/2024 17:14:57 - INFO - __main__ -   Step: 5626, LR: 8.40705833675307e-06, Loss: 428.56317138671875
2024-08-04T00:15:09.417133610Z 
 59%|█████▉    | 5627/9500 [19:17:39<13:12:30, 12.28s/it]08/03/2024 17:15:09 - INFO - __main__ -   Step: 5627, LR: 8.404887793065792e-06, Loss: 378.82073974609375
2024-08-04T00:15:21.433424816Z 
 59%|█████▉    | 5628/9500 [19:17:51<13:07:14, 12.20s/it]08/03/2024 17:15:21 - INFO - __main__ -   Step: 5628, LR: 8.402717249378512e-06, Loss: 405.127197265625
2024-08-04T00:15:33.880441945Z 
 59%|█████▉    | 5629/9500 [19:18:03<13:11:50, 12.27s/it]08/03/2024 17:15:33 - INFO - __main__ -   Step: 5629, LR: 8.400546705691234e-06, Loss: 373.68292236328125
2024-08-04T00:15:45.831840753Z 
 59%|█████▉    | 5630/9500 [19:18:15<13:05:24, 12.18s/it]08/03/2024 17:15:45 - INFO - __main__ -   Step: 5630, LR: 8.398376162003955e-06, Loss: 517.2758178710938
2024-08-04T00:15:58.156902932Z 
 59%|█████▉    | 5631/9500 [19:18:28<13:08:04, 12.22s/it]08/03/2024 17:15:58 - INFO - __main__ -   Step: 5631, LR: 8.396205618316677e-06, Loss: 372.31494140625
2024-08-04T00:16:11.012723029Z 
 59%|█████▉    | 5632/9500 [19:18:40<13:20:08, 12.41s/it]08/03/2024 17:16:11 - INFO - __main__ -   Step: 5632, LR: 8.394035074629397e-06, Loss: 448.5195617675781
2024-08-04T00:16:23.247257068Z 
 59%|█████▉    | 5633/9500 [19:18:53<13:16:30, 12.36s/it]08/03/2024 17:16:23 - INFO - __main__ -   Step: 5633, LR: 8.391864530942118e-06, Loss: 335.77069091796875
2024-08-04T00:16:35.455816891Z 
 59%|█████▉    | 5634/9500 [19:19:05<13:13:23, 12.31s/it]08/03/2024 17:16:35 - INFO - __main__ -   Step: 5634, LR: 8.38969398725484e-06, Loss: 288.22845458984375
2024-08-04T00:16:48.120484254Z 
 59%|█████▉    | 5635/9500 [19:19:18<13:19:59, 12.42s/it]08/03/2024 17:16:48 - INFO - __main__ -   Step: 5635, LR: 8.38752344356756e-06, Loss: 438.3914794921875
2024-08-04T00:17:00.270484858Z 
 59%|█████▉    | 5636/9500 [19:19:30<13:14:35, 12.34s/it]08/03/2024 17:17:00 - INFO - __main__ -   Step: 5636, LR: 8.385352899880281e-06, Loss: 440.7050476074219
2024-08-04T00:17:12.500442437Z 
 59%|█████▉    | 5637/9500 [19:19:42<13:12:17, 12.31s/it]08/03/2024 17:17:12 - INFO - __main__ -   Step: 5637, LR: 8.383182356193003e-06, Loss: 411.9071960449219
2024-08-04T00:17:25.341122233Z 
 59%|█████▉    | 5638/9500 [19:19:55<13:22:24, 12.47s/it]08/03/2024 17:17:25 - INFO - __main__ -   Step: 5638, LR: 8.381011812505724e-06, Loss: 348.30755615234375
2024-08-04T00:17:37.222685671Z 
 59%|█████▉    | 5639/9500 [19:20:07<13:10:54, 12.29s/it]08/03/2024 17:17:37 - INFO - __main__ -   Step: 5639, LR: 8.378841268818444e-06, Loss: 257.79290771484375
2024-08-04T00:17:49.404202723Z 
 59%|█████▉    | 5640/9500 [19:20:19<13:08:36, 12.26s/it]08/03/2024 17:17:49 - INFO - __main__ -   Step: 5640, LR: 8.376670725131166e-06, Loss: 348.4498291015625
2024-08-04T00:18:01.935540797Z 
 59%|█████▉    | 5641/9500 [19:20:31<13:13:40, 12.34s/it]08/03/2024 17:18:01 - INFO - __main__ -   Step: 5641, LR: 8.374500181443887e-06, Loss: 398.72662353515625
2024-08-04T00:18:14.051778659Z 
 59%|█████▉    | 5642/9500 [19:20:43<13:09:09, 12.27s/it]08/03/2024 17:18:14 - INFO - __main__ -   Step: 5642, LR: 8.372329637756607e-06, Loss: 415.7862548828125
2024-08-04T00:18:26.060739995Z 
 59%|█████▉    | 5643/9500 [19:20:55<13:03:51, 12.19s/it]08/03/2024 17:18:26 - INFO - __main__ -   Step: 5643, LR: 8.370159094069329e-06, Loss: 432.28521728515625
2024-08-04T00:18:38.418466454Z 
 59%|█████▉    | 5644/9500 [19:21:08<13:06:48, 12.24s/it]08/03/2024 17:18:38 - INFO - __main__ -   Step: 5644, LR: 8.36798855038205e-06, Loss: 370.5797424316406
2024-08-04T00:18:50.759099623Z 
 59%|█████▉    | 5645/9500 [19:21:20<13:08:28, 12.27s/it]08/03/2024 17:18:50 - INFO - __main__ -   Step: 5645, LR: 8.365818006694772e-06, Loss: 489.2077941894531
2024-08-04T00:19:02.952271981Z 
 59%|█████▉    | 5646/9500 [19:21:32<13:06:45, 12.25s/it]08/03/2024 17:19:02 - INFO - __main__ -   Step: 5646, LR: 8.363647463007493e-06, Loss: 478.88385009765625
2024-08-04T00:19:15.490919378Z 
 59%|█████▉    | 5647/9500 [19:21:45<13:12:08, 12.34s/it]08/03/2024 17:19:15 - INFO - __main__ -   Step: 5647, LR: 8.361476919320213e-06, Loss: 414.9418029785156
2024-08-04T00:19:27.545875400Z 
 59%|█████▉    | 5648/9500 [19:21:57<13:06:32, 12.25s/it]08/03/2024 17:19:27 - INFO - __main__ -   Step: 5648, LR: 8.359306375632935e-06, Loss: 357.2388000488281
2024-08-04T00:19:39.664541153Z 
 59%|█████▉    | 5649/9500 [19:22:09<13:03:46, 12.21s/it]08/03/2024 17:19:39 - INFO - __main__ -   Step: 5649, LR: 8.357135831945655e-06, Loss: 480.69183349609375
2024-08-04T00:19:52.512365173Z 
 59%|█████▉    | 5650/9500 [19:22:22<13:15:49, 12.40s/it]08/03/2024 17:19:52 - INFO - __main__ -   Step: 5650, LR: 8.354965288258376e-06, Loss: 591.1942749023438
2024-08-04T00:20:04.644596960Z 
 59%|█████▉    | 5651/9500 [19:22:34<13:10:24, 12.32s/it]08/03/2024 17:20:04 - INFO - __main__ -   Step: 5651, LR: 8.352794744571098e-06, Loss: 427.6207275390625
2024-08-04T00:20:16.668497947Z 
 59%|█████▉    | 5652/9500 [19:22:46<13:04:29, 12.23s/it]08/03/2024 17:20:16 - INFO - __main__ -   Step: 5652, LR: 8.35062420088382e-06, Loss: 461.91021728515625
2024-08-04T00:20:29.223958430Z 
 60%|█████▉    | 5653/9500 [19:22:59<13:10:30, 12.33s/it]08/03/2024 17:20:29 - INFO - __main__ -   Step: 5653, LR: 8.348453657196541e-06, Loss: 408.25665283203125
2024-08-04T00:20:41.645694066Z 
 60%|█████▉    | 5654/9500 [19:23:11<13:12:04, 12.36s/it]08/03/2024 17:20:41 - INFO - __main__ -   Step: 5654, LR: 8.346283113509261e-06, Loss: 342.8050537109375
2024-08-04T00:20:53.855575755Z 
 60%|█████▉    | 5655/9500 [19:23:23<13:09:02, 12.31s/it]08/03/2024 17:20:53 - INFO - __main__ -   Step: 5655, LR: 8.344112569821982e-06, Loss: 412.4586181640625
2024-08-04T00:21:06.489082168Z 
 60%|█████▉    | 5656/9500 [19:23:36<13:15:00, 12.41s/it]08/03/2024 17:21:06 - INFO - __main__ -   Step: 5656, LR: 8.341942026134702e-06, Loss: 411.9503173828125
2024-08-04T00:21:18.942013148Z 
 60%|█████▉    | 5657/9500 [19:23:48<13:15:38, 12.42s/it]08/03/2024 17:21:18 - INFO - __main__ -   Step: 5657, LR: 8.339771482447424e-06, Loss: 417.882080078125
2024-08-04T00:21:31.163185198Z 
 60%|█████▉    | 5658/9500 [19:24:01<13:11:34, 12.36s/it]08/03/2024 17:21:31 - INFO - __main__ -   Step: 5658, LR: 8.337600938760145e-06, Loss: 496.00927734375
2024-08-04T00:21:43.722095341Z 
 60%|█████▉    | 5659/9500 [19:24:13<13:15:09, 12.42s/it]08/03/2024 17:21:43 - INFO - __main__ -   Step: 5659, LR: 8.335430395072867e-06, Loss: 418.73681640625
2024-08-04T00:21:56.389155819Z 
 60%|█████▉    | 5660/9500 [19:24:26<13:19:40, 12.49s/it]08/03/2024 17:21:56 - INFO - __main__ -   Step: 5660, LR: 8.333259851385589e-06, Loss: 495.35992431640625
2024-08-04T00:22:08.863505247Z 
 60%|█████▉    | 5661/9500 [19:24:38<13:19:03, 12.49s/it]08/03/2024 17:22:08 - INFO - __main__ -   Step: 5661, LR: 8.331089307698308e-06, Loss: 525.470703125
2024-08-04T00:22:21.657554850Z 
 60%|█████▉    | 5662/9500 [19:24:51<13:24:43, 12.58s/it]08/03/2024 17:22:21 - INFO - __main__ -   Step: 5662, LR: 8.32891876401103e-06, Loss: 298.2786560058594
2024-08-04T00:22:33.837853411Z 
 60%|█████▉    | 5663/9500 [19:25:03<13:16:50, 12.46s/it]08/03/2024 17:22:33 - INFO - __main__ -   Step: 5663, LR: 8.32674822032375e-06, Loss: 447.094970703125
2024-08-04T00:22:45.839294895Z 
 60%|█████▉    | 5664/9500 [19:25:15<13:07:49, 12.32s/it]08/03/2024 17:22:45 - INFO - __main__ -   Step: 5664, LR: 8.324577676636471e-06, Loss: 372.1195068359375
2024-08-04T00:22:58.212027666Z 
 60%|█████▉    | 5665/9500 [19:25:28<13:08:34, 12.34s/it]08/03/2024 17:22:58 - INFO - __main__ -   Step: 5665, LR: 8.322407132949193e-06, Loss: 437.6895446777344
2024-08-04T00:23:10.421571263Z 
 60%|█████▉    | 5666/9500 [19:25:40<13:05:55, 12.30s/it]08/03/2024 17:23:10 - INFO - __main__ -   Step: 5666, LR: 8.320236589261914e-06, Loss: 439.5021057128906
2024-08-04T00:23:22.275466126Z 
 60%|█████▉    | 5667/9500 [19:25:52<12:57:11, 12.17s/it]08/03/2024 17:23:22 - INFO - __main__ -   Step: 5667, LR: 8.318066045574636e-06, Loss: 395.4499816894531
2024-08-04T00:23:34.775296924Z 
 60%|█████▉    | 5668/9500 [19:26:04<13:03:22, 12.27s/it]08/03/2024 17:23:34 - INFO - __main__ -   Step: 5668, LR: 8.315895501887356e-06, Loss: 398.79193115234375
2024-08-04T00:23:46.954234032Z 
 60%|█████▉    | 5669/9500 [19:26:16<13:01:30, 12.24s/it]08/03/2024 17:23:46 - INFO - __main__ -   Step: 5669, LR: 8.313724958200077e-06, Loss: 451.5574951171875
2024-08-04T00:23:59.436109947Z 
 60%|█████▉    | 5670/9500 [19:26:29<13:05:56, 12.31s/it]08/03/2024 17:23:59 - INFO - __main__ -   Step: 5670, LR: 8.311554414512797e-06, Loss: 326.0177917480469
2024-08-04T00:24:11.710291242Z 
 60%|█████▉    | 5671/9500 [19:26:41<13:05:00, 12.30s/it]08/03/2024 17:24:11 - INFO - __main__ -   Step: 5671, LR: 8.309383870825519e-06, Loss: 515.8382568359375
2024-08-04T00:24:24.220441459Z 
 60%|█████▉    | 5672/9500 [19:26:54<13:08:48, 12.36s/it]08/03/2024 17:24:24 - INFO - __main__ -   Step: 5672, LR: 8.30721332713824e-06, Loss: 433.0143127441406
2024-08-04T00:24:36.399467836Z 
 60%|█████▉    | 5673/9500 [19:27:06<13:05:03, 12.31s/it]08/03/2024 17:24:36 - INFO - __main__ -   Step: 5673, LR: 8.305042783450962e-06, Loss: 496.05413818359375
2024-08-04T00:24:48.410275127Z 
 60%|█████▉    | 5674/9500 [19:27:18<12:59:10, 12.22s/it]08/03/2024 17:24:48 - INFO - __main__ -   Step: 5674, LR: 8.302872239763684e-06, Loss: 330.6068420410156
2024-08-04T00:25:00.965015162Z 
 60%|█████▉    | 5675/9500 [19:27:30<13:05:22, 12.32s/it]08/03/2024 17:25:00 - INFO - __main__ -   Step: 5675, LR: 8.300701696076403e-06, Loss: 560.7742919921875
2024-08-04T00:25:13.084299803Z 
 60%|█████▉    | 5676/9500 [19:27:43<13:01:20, 12.26s/it]08/03/2024 17:25:13 - INFO - __main__ -   Step: 5676, LR: 8.298531152389125e-06, Loss: 335.62908935546875
2024-08-04T00:25:25.166255427Z 
 60%|█████▉    | 5677/9500 [19:27:55<12:57:44, 12.21s/it]08/03/2024 17:25:25 - INFO - __main__ -   Step: 5677, LR: 8.296360608701845e-06, Loss: 309.60870361328125
2024-08-04T00:25:37.978701054Z 
 60%|█████▉    | 5678/9500 [19:28:07<13:09:07, 12.39s/it]08/03/2024 17:25:37 - INFO - __main__ -   Step: 5678, LR: 8.294190065014566e-06, Loss: 426.7862243652344
2024-08-04T00:25:50.029125798Z 
 60%|█████▉    | 5679/9500 [19:28:19<13:02:28, 12.29s/it]08/03/2024 17:25:50 - INFO - __main__ -   Step: 5679, LR: 8.292019521327288e-06, Loss: 415.0323486328125
2024-08-04T00:26:02.016620604Z 
 60%|█████▉    | 5680/9500 [19:28:31<12:56:32, 12.20s/it]08/03/2024 17:26:02 - INFO - __main__ -   Step: 5680, LR: 8.28984897764001e-06, Loss: 415.1933898925781
2024-08-04T00:26:14.734180140Z 
 60%|█████▉    | 5681/9500 [19:28:44<13:06:16, 12.35s/it]08/03/2024 17:26:14 - INFO - __main__ -   Step: 5681, LR: 8.287678433952731e-06, Loss: 382.81494140625
2024-08-04T00:26:26.847264217Z 
 60%|█████▉    | 5682/9500 [19:28:56<13:01:29, 12.28s/it]08/03/2024 17:26:26 - INFO - __main__ -   Step: 5682, LR: 8.285507890265451e-06, Loss: 362.7716064453125
2024-08-04T00:26:38.851454148Z 
 60%|█████▉    | 5683/9500 [19:29:08<12:56:00, 12.20s/it]08/03/2024 17:26:38 - INFO - __main__ -   Step: 5683, LR: 8.283337346578173e-06, Loss: 312.51885986328125
2024-08-04T00:26:51.211285620Z 
 60%|█████▉    | 5684/9500 [19:29:21<12:58:53, 12.25s/it]08/03/2024 17:26:51 - INFO - __main__ -   Step: 5684, LR: 8.281166802890892e-06, Loss: 440.9208068847656
2024-08-04T00:27:03.382161689Z 
 60%|█████▉    | 5685/9500 [19:29:33<12:57:14, 12.22s/it]08/03/2024 17:27:03 - INFO - __main__ -   Step: 5685, LR: 8.278996259203614e-06, Loss: 436.8892822265625
2024-08-04T00:27:15.511742168Z 
 60%|█████▉    | 5686/9500 [19:29:45<12:55:13, 12.20s/it]08/03/2024 17:27:15 - INFO - __main__ -   Step: 5686, LR: 8.276825715516336e-06, Loss: 389.463623046875
2024-08-04T00:27:27.900095138Z 
 60%|█████▉    | 5687/9500 [19:29:57<12:58:42, 12.25s/it]08/03/2024 17:27:27 - INFO - __main__ -   Step: 5687, LR: 8.274655171829057e-06, Loss: 381.7935791015625
2024-08-04T00:27:39.973150952Z 
 60%|█████▉    | 5688/9500 [19:30:09<12:55:03, 12.20s/it]08/03/2024 17:27:39 - INFO - __main__ -   Step: 5688, LR: 8.272484628141779e-06, Loss: 401.86737060546875
2024-08-04T00:27:52.422017791Z 
 60%|█████▉    | 5689/9500 [19:30:22<12:59:36, 12.27s/it]08/03/2024 17:27:52 - INFO - __main__ -   Step: 5689, LR: 8.2703140844545e-06, Loss: 415.4339294433594
2024-08-04T00:28:04.772091252Z 
 60%|█████▉    | 5690/9500 [19:30:34<13:00:51, 12.30s/it]08/03/2024 17:28:04 - INFO - __main__ -   Step: 5690, LR: 8.26814354076722e-06, Loss: 368.8140869140625
2024-08-04T00:28:16.712913362Z 
 60%|█████▉    | 5691/9500 [19:30:46<12:53:52, 12.19s/it]08/03/2024 17:28:16 - INFO - __main__ -   Step: 5691, LR: 8.26597299707994e-06, Loss: 421.6787109375
2024-08-04T00:28:28.666603550Z 
 60%|█████▉    | 5692/9500 [19:30:58<12:49:09, 12.12s/it]08/03/2024 17:28:28 - INFO - __main__ -   Step: 5692, LR: 8.263802453392661e-06, Loss: 354.25787353515625
2024-08-04T00:28:41.364352303Z 
 60%|█████▉    | 5693/9500 [19:31:11<12:59:58, 12.29s/it]08/03/2024 17:28:41 - INFO - __main__ -   Step: 5693, LR: 8.261631909705383e-06, Loss: 351.5428466796875
2024-08-04T00:28:53.433163670Z 
 60%|█████▉    | 5694/9500 [19:31:23<12:55:30, 12.23s/it]08/03/2024 17:28:53 - INFO - __main__ -   Step: 5694, LR: 8.259461366018105e-06, Loss: 348.5221862792969
2024-08-04T00:29:05.689332049Z 
 60%|█████▉    | 5695/9500 [19:31:35<12:55:53, 12.23s/it]08/03/2024 17:29:05 - INFO - __main__ -   Step: 5695, LR: 8.257290822330826e-06, Loss: 488.5498962402344
2024-08-04T00:29:17.979700386Z 
 60%|█████▉    | 5696/9500 [19:31:47<12:56:44, 12.25s/it]08/03/2024 17:29:17 - INFO - __main__ -   Step: 5696, LR: 8.255120278643548e-06, Loss: 333.9776611328125
2024-08-04T00:29:30.365165566Z 
 60%|█████▉    | 5697/9500 [19:32:00<12:59:05, 12.29s/it]08/03/2024 17:29:30 - INFO - __main__ -   Step: 5697, LR: 8.252949734956268e-06, Loss: 449.392578125
2024-08-04T00:29:42.430530006Z 
 60%|█████▉    | 5698/9500 [19:32:12<12:54:34, 12.22s/it]08/03/2024 17:29:42 - INFO - __main__ -   Step: 5698, LR: 8.250779191268989e-06, Loss: 389.7398681640625
2024-08-04T00:29:55.282726174Z 
 60%|█████▉    | 5699/9500 [19:32:25<13:06:19, 12.41s/it]08/03/2024 17:29:55 - INFO - __main__ -   Step: 5699, LR: 8.248608647581709e-06, Loss: 508.61285400390625
2024-08-04T00:30:07.492881159Z 
 60%|██████    | 5700/9500 [19:32:37<13:02:16, 12.35s/it]08/03/2024 17:30:07 - INFO - __main__ -   Step: 5700, LR: 8.24643810389443e-06, Loss: 421.74566650390625
2024-08-04T00:30:19.672816513Z 
 60%|██████    | 5701/9500 [19:32:49<12:58:48, 12.30s/it]08/03/2024 17:30:19 - INFO - __main__ -   Step: 5701, LR: 8.244267560207152e-06, Loss: 374.6099853515625
2024-08-04T00:30:32.269154534Z 
 60%|██████    | 5702/9500 [19:33:02<13:04:13, 12.39s/it]08/03/2024 17:30:32 - INFO - __main__ -   Step: 5702, LR: 8.242097016519874e-06, Loss: 383.32452392578125
2024-08-04T00:30:44.706621398Z 
 60%|██████    | 5703/9500 [19:33:14<13:04:53, 12.40s/it]08/03/2024 17:30:44 - INFO - __main__ -   Step: 5703, LR: 8.239926472832595e-06, Loss: 486.7915344238281
2024-08-04T00:30:57.080885724Z 
 60%|██████    | 5704/9500 [19:33:27<13:04:11, 12.40s/it]08/03/2024 17:30:57 - INFO - __main__ -   Step: 5704, LR: 8.237755929145315e-06, Loss: 545.5496826171875
2024-08-04T00:31:09.603124323Z 
 60%|██████    | 5705/9500 [19:33:39<13:06:23, 12.43s/it]08/03/2024 17:31:09 - INFO - __main__ -   Step: 5705, LR: 8.235585385458037e-06, Loss: 349.63555908203125
2024-08-04T00:31:21.824534182Z 
 60%|██████    | 5706/9500 [19:33:51<13:02:10, 12.37s/it]08/03/2024 17:31:21 - INFO - __main__ -   Step: 5706, LR: 8.233414841770757e-06, Loss: 428.0267639160156
2024-08-04T00:31:33.971443204Z 
 60%|██████    | 5707/9500 [19:34:03<12:57:44, 12.30s/it]08/03/2024 17:31:33 - INFO - __main__ -   Step: 5707, LR: 8.231244298083478e-06, Loss: 325.8636474609375
2024-08-04T00:31:46.622125617Z 
 60%|██████    | 5708/9500 [19:34:16<13:04:08, 12.41s/it]08/03/2024 17:31:46 - INFO - __main__ -   Step: 5708, LR: 8.2290737543962e-06, Loss: 382.1160583496094
2024-08-04T00:31:58.832366212Z 
 60%|██████    | 5709/9500 [19:34:28<13:00:11, 12.35s/it]08/03/2024 17:31:58 - INFO - __main__ -   Step: 5709, LR: 8.226903210708921e-06, Loss: 399.3395690917969
2024-08-04T00:32:11.004761672Z 
 60%|██████    | 5710/9500 [19:34:40<12:56:39, 12.30s/it]08/03/2024 17:32:11 - INFO - __main__ -   Step: 5710, LR: 8.224732667021643e-06, Loss: 375.0200500488281
2024-08-04T00:32:23.664567727Z 
 60%|██████    | 5711/9500 [19:34:53<13:03:20, 12.40s/it]08/03/2024 17:32:23 - INFO - __main__ -   Step: 5711, LR: 8.222562123334363e-06, Loss: 425.7583923339844
2024-08-04T00:32:35.876307972Z 
 60%|██████    | 5712/9500 [19:35:05<12:59:30, 12.35s/it]08/03/2024 17:32:35 - INFO - __main__ -   Step: 5712, LR: 8.220391579647084e-06, Loss: 339.93035888671875
2024-08-04T00:32:48.129644469Z 
 60%|██████    | 5713/9500 [19:35:18<12:57:31, 12.32s/it]08/03/2024 17:32:48 - INFO - __main__ -   Step: 5713, LR: 8.218221035959804e-06, Loss: 541.2501831054688
2024-08-04T00:33:00.238417921Z 
 60%|██████    | 5714/9500 [19:35:30<12:53:20, 12.26s/it]08/03/2024 17:33:00 - INFO - __main__ -   Step: 5714, LR: 8.216050492272526e-06, Loss: 368.42138671875
2024-08-04T00:33:13.241363576Z 
 60%|██████    | 5715/9500 [19:35:43<13:07:16, 12.48s/it]08/03/2024 17:33:13 - INFO - __main__ -   Step: 5715, LR: 8.213879948585247e-06, Loss: 385.71563720703125
2024-08-04T00:33:25.399044008Z 
 60%|██████    | 5716/9500 [19:35:55<13:00:57, 12.38s/it]08/03/2024 17:33:25 - INFO - __main__ -   Step: 5716, LR: 8.211709404897969e-06, Loss: 359.53399658203125
2024-08-04T00:33:37.468530213Z 
 60%|██████    | 5717/9500 [19:36:07<12:54:49, 12.29s/it]08/03/2024 17:33:37 - INFO - __main__ -   Step: 5717, LR: 8.20953886121069e-06, Loss: 387.242431640625
2024-08-04T00:33:50.538086816Z 
 60%|██████    | 5718/9500 [19:36:20<13:09:23, 12.52s/it]08/03/2024 17:33:50 - INFO - __main__ -   Step: 5718, LR: 8.20736831752341e-06, Loss: 426.08660888671875
2024-08-04T00:34:02.841941162Z 
 60%|██████    | 5719/9500 [19:36:32<13:05:01, 12.46s/it]08/03/2024 17:34:02 - INFO - __main__ -   Step: 5719, LR: 8.205197773836132e-06, Loss: 518.4588623046875
2024-08-04T00:34:14.849406158Z 
 60%|██████    | 5720/9500 [19:36:44<12:56:19, 12.32s/it]08/03/2024 17:34:14 - INFO - __main__ -   Step: 5720, LR: 8.203027230148852e-06, Loss: 375.3873291015625
2024-08-04T00:34:27.592367526Z 
 60%|██████    | 5721/9500 [19:36:57<13:04:03, 12.45s/it]08/03/2024 17:34:27 - INFO - __main__ -   Step: 5721, LR: 8.200856686461573e-06, Loss: 407.7235412597656
2024-08-04T00:34:39.641570148Z 
 60%|██████    | 5722/9500 [19:37:09<12:56:18, 12.33s/it]08/03/2024 17:34:39 - INFO - __main__ -   Step: 5722, LR: 8.198686142774295e-06, Loss: 393.9248352050781
2024-08-04T00:34:51.634909692Z 
 60%|██████    | 5723/9500 [19:37:21<12:49:45, 12.23s/it]08/03/2024 17:34:51 - INFO - __main__ -   Step: 5723, LR: 8.196515599087016e-06, Loss: 292.2764892578125
2024-08-04T00:35:04.429361919Z 
 60%|██████    | 5724/9500 [19:37:34<13:00:14, 12.40s/it]08/03/2024 17:35:04 - INFO - __main__ -   Step: 5724, LR: 8.194345055399738e-06, Loss: 542.013427734375
2024-08-04T00:35:16.687050063Z 
 60%|██████    | 5725/9500 [19:37:46<12:57:23, 12.36s/it]08/03/2024 17:35:16 - INFO - __main__ -   Step: 5725, LR: 8.192174511712458e-06, Loss: 495.6697998046875
2024-08-04T00:35:28.782834166Z 
 60%|██████    | 5726/9500 [19:37:58<12:52:16, 12.28s/it]08/03/2024 17:35:28 - INFO - __main__ -   Step: 5726, LR: 8.19000396802518e-06, Loss: 371.67022705078125
2024-08-04T00:35:41.521838531Z 
 60%|██████    | 5727/9500 [19:38:11<13:00:46, 12.42s/it]08/03/2024 17:35:41 - INFO - __main__ -   Step: 5727, LR: 8.187833424337899e-06, Loss: 383.4302978515625
2024-08-04T00:35:53.630010532Z 
 60%|██████    | 5728/9500 [19:38:23<12:54:45, 12.32s/it]08/03/2024 17:35:53 - INFO - __main__ -   Step: 5728, LR: 8.18566288065062e-06, Loss: 368.59039306640625
2024-08-04T00:36:05.628854021Z 
 60%|██████    | 5729/9500 [19:38:35<12:48:25, 12.23s/it]08/03/2024 17:36:05 - INFO - __main__ -   Step: 5729, LR: 8.183492336963342e-06, Loss: 446.041015625
2024-08-04T00:36:18.849429165Z 
 60%|██████    | 5730/9500 [19:38:48<13:06:56, 12.52s/it]08/03/2024 17:36:18 - INFO - __main__ -   Step: 5730, LR: 8.181321793276064e-06, Loss: 387.84283447265625
2024-08-04T00:36:31.120179477Z 
 60%|██████    | 5731/9500 [19:39:01<13:01:58, 12.45s/it]08/03/2024 17:36:31 - INFO - __main__ -   Step: 5731, LR: 8.179151249588785e-06, Loss: 332.56915283203125
2024-08-04T00:36:43.293491306Z 
 60%|██████    | 5732/9500 [19:39:13<12:56:35, 12.37s/it]08/03/2024 17:36:43 - INFO - __main__ -   Step: 5732, LR: 8.176980705901505e-06, Loss: 343.3387451171875
2024-08-04T00:36:55.807248136Z 
 60%|██████    | 5733/9500 [19:39:25<12:59:09, 12.41s/it]08/03/2024 17:36:55 - INFO - __main__ -   Step: 5733, LR: 8.174810162214227e-06, Loss: 365.9706115722656
2024-08-04T00:37:07.898356192Z 
 60%|██████    | 5734/9500 [19:39:37<12:52:56, 12.31s/it]08/03/2024 17:37:07 - INFO - __main__ -   Step: 5734, LR: 8.172639618526947e-06, Loss: 525.5869140625
2024-08-04T00:37:20.191895813Z 
 60%|██████    | 5735/9500 [19:39:50<12:52:20, 12.31s/it]08/03/2024 17:37:20 - INFO - __main__ -   Step: 5735, LR: 8.170469074839668e-06, Loss: 396.16754150390625
2024-08-04T00:37:33.212404516Z 
 60%|██████    | 5736/9500 [19:40:03<13:05:32, 12.52s/it]08/03/2024 17:37:33 - INFO - __main__ -   Step: 5736, LR: 8.16829853115239e-06, Loss: 439.62640380859375
2024-08-04T00:37:45.747814319Z 
 60%|██████    | 5737/9500 [19:40:15<13:05:35, 12.53s/it]08/03/2024 17:37:45 - INFO - __main__ -   Step: 5737, LR: 8.166127987465111e-06, Loss: 417.5445251464844
2024-08-04T00:37:58.372808220Z 
 60%|██████    | 5738/9500 [19:40:28<13:07:14, 12.56s/it]08/03/2024 17:37:58 - INFO - __main__ -   Step: 5738, LR: 8.163957443777833e-06, Loss: 496.17877197265625
2024-08-04T00:38:11.087318565Z 
 60%|██████    | 5739/9500 [19:40:41<13:10:00, 12.60s/it]08/03/2024 17:38:11 - INFO - __main__ -   Step: 5739, LR: 8.161786900090553e-06, Loss: 451.1045837402344
2024-08-04T00:38:23.223702820Z 
 60%|██████    | 5740/9500 [19:40:53<13:01:01, 12.46s/it]08/03/2024 17:38:23 - INFO - __main__ -   Step: 5740, LR: 8.159616356403274e-06, Loss: 419.4509582519531
2024-08-04T00:38:35.538921121Z 
 60%|██████    | 5741/9500 [19:41:05<12:58:02, 12.42s/it]08/03/2024 17:38:35 - INFO - __main__ -   Step: 5741, LR: 8.157445812715994e-06, Loss: 437.7032470703125
2024-08-04T00:38:47.840466925Z 
 60%|██████    | 5742/9500 [19:41:17<12:55:37, 12.38s/it]08/03/2024 17:38:47 - INFO - __main__ -   Step: 5742, LR: 8.155275269028716e-06, Loss: 361.780029296875
2024-08-04T00:39:00.151168898Z 
 60%|██████    | 5743/9500 [19:41:30<12:54:02, 12.36s/it]08/03/2024 17:39:00 - INFO - __main__ -   Step: 5743, LR: 8.153104725341437e-06, Loss: 437.04534912109375
2024-08-04T00:39:12.252605276Z 
 60%|██████    | 5744/9500 [19:41:42<12:48:57, 12.28s/it]08/03/2024 17:39:12 - INFO - __main__ -   Step: 5744, LR: 8.150934181654159e-06, Loss: 413.2245788574219
2024-08-04T00:39:25.219561042Z 
 60%|██████    | 5745/9500 [19:41:55<13:01:34, 12.49s/it]08/03/2024 17:39:25 - INFO - __main__ -   Step: 5745, LR: 8.14876363796688e-06, Loss: 420.7533874511719
2024-08-04T00:39:37.621825016Z 
 60%|██████    | 5746/9500 [19:42:07<12:59:45, 12.46s/it]08/03/2024 17:39:37 - INFO - __main__ -   Step: 5746, LR: 8.1465930942796e-06, Loss: 536.7103881835938
2024-08-04T00:39:49.624620507Z 
 60%|██████    | 5747/9500 [19:42:19<12:50:54, 12.32s/it]08/03/2024 17:39:49 - INFO - __main__ -   Step: 5747, LR: 8.144422550592322e-06, Loss: 343.68157958984375
2024-08-04T00:40:02.041283643Z 
 61%|██████    | 5748/9500 [19:42:31<12:52:25, 12.35s/it]08/03/2024 17:40:02 - INFO - __main__ -   Step: 5748, LR: 8.142252006905043e-06, Loss: 507.5494384765625
2024-08-04T00:40:14.293211828Z 
 61%|██████    | 5749/9500 [19:42:44<12:50:20, 12.32s/it]08/03/2024 17:40:14 - INFO - __main__ -   Step: 5749, LR: 8.140081463217763e-06, Loss: 373.4139709472656
2024-08-04T00:40:26.398907067Z 
 61%|██████    | 5750/9500 [19:42:56<12:46:04, 12.26s/it]08/03/2024 17:40:26 - INFO - __main__ -   Step: 5750, LR: 8.137910919530485e-06, Loss: 429.705810546875
2024-08-04T00:40:39.198728060Z 
 61%|██████    | 5751/9500 [19:43:09<12:56:02, 12.42s/it]08/03/2024 17:40:39 - INFO - __main__ -   Step: 5751, LR: 8.135740375843206e-06, Loss: 499.2579040527344
2024-08-04T00:40:51.820333189Z 
 61%|██████    | 5752/9500 [19:43:21<12:59:37, 12.48s/it]08/03/2024 17:40:51 - INFO - __main__ -   Step: 5752, LR: 8.133569832155928e-06, Loss: 352.45928955078125
2024-08-04T00:41:04.088113879Z 
 61%|██████    | 5753/9500 [19:43:34<12:55:25, 12.42s/it]08/03/2024 17:41:04 - INFO - __main__ -   Step: 5753, LR: 8.131399288468648e-06, Loss: 335.05218505859375
2024-08-04T00:41:16.506797684Z 
 61%|██████    | 5754/9500 [19:43:46<12:55:14, 12.42s/it]08/03/2024 17:41:16 - INFO - __main__ -   Step: 5754, LR: 8.12922874478137e-06, Loss: 433.135986328125
2024-08-04T00:41:28.778579296Z 
 61%|██████    | 5755/9500 [19:43:58<12:52:19, 12.37s/it]08/03/2024 17:41:28 - INFO - __main__ -   Step: 5755, LR: 8.127058201094091e-06, Loss: 425.3791198730469
2024-08-04T00:41:40.838359631Z 
 61%|██████    | 5756/9500 [19:44:10<12:46:14, 12.28s/it]08/03/2024 17:41:40 - INFO - __main__ -   Step: 5756, LR: 8.12488765740681e-06, Loss: 365.60638427734375
2024-08-04T00:41:53.044430866Z 
 61%|██████    | 5757/9500 [19:44:22<12:44:39, 12.26s/it]08/03/2024 17:41:53 - INFO - __main__ -   Step: 5757, LR: 8.122717113719532e-06, Loss: 411.97772216796875
2024-08-04T00:42:05.628693342Z 
 61%|██████    | 5758/9500 [19:44:35<12:50:34, 12.36s/it]08/03/2024 17:42:05 - INFO - __main__ -   Step: 5758, LR: 8.120546570032254e-06, Loss: 320.4585266113281
2024-08-04T00:42:17.905517081Z 
 61%|██████    | 5759/9500 [19:44:47<12:48:53, 12.33s/it]08/03/2024 17:42:17 - INFO - __main__ -   Step: 5759, LR: 8.118376026344975e-06, Loss: 491.85174560546875
2024-08-04T00:42:29.707128545Z 
 61%|██████    | 5760/9500 [19:44:59<12:38:46, 12.17s/it]08/03/2024 17:42:29 - INFO - __main__ -   Step: 5760, LR: 8.116205482657695e-06, Loss: 361.4910888671875
2024-08-04T00:42:42.658529994Z 
 61%|██████    | 5761/9500 [19:45:12<12:53:07, 12.41s/it]08/03/2024 17:42:42 - INFO - __main__ -   Step: 5761, LR: 8.114034938970417e-06, Loss: 501.73223876953125
2024-08-04T00:42:54.653342662Z 
 61%|██████    | 5762/9500 [19:45:24<12:45:13, 12.28s/it]08/03/2024 17:42:54 - INFO - __main__ -   Step: 5762, LR: 8.111864395283138e-06, Loss: 383.9891357421875
2024-08-04T00:43:06.610765941Z 
 61%|██████    | 5763/9500 [19:45:36<12:38:56, 12.19s/it]08/03/2024 17:43:06 - INFO - __main__ -   Step: 5763, LR: 8.109693851595858e-06, Loss: 451.19854736328125
2024-08-04T00:43:19.081319839Z 
 61%|██████    | 5764/9500 [19:45:49<12:44:04, 12.27s/it]08/03/2024 17:43:19 - INFO - __main__ -   Step: 5764, LR: 8.10752330790858e-06, Loss: 343.4778747558594
2024-08-04T00:43:31.289915753Z 
 61%|██████    | 5765/9500 [19:46:01<12:42:42, 12.25s/it]08/03/2024 17:43:31 - INFO - __main__ -   Step: 5765, LR: 8.105352764221301e-06, Loss: 506.73516845703125
2024-08-04T00:43:43.371839259Z 
 61%|██████    | 5766/9500 [19:46:13<12:39:19, 12.20s/it]08/03/2024 17:43:43 - INFO - __main__ -   Step: 5766, LR: 8.103182220534023e-06, Loss: 425.31475830078125
2024-08-04T00:43:56.012145743Z 
 61%|██████    | 5767/9500 [19:46:25<12:47:18, 12.33s/it]08/03/2024 17:43:56 - INFO - __main__ -   Step: 5767, LR: 8.101011676846743e-06, Loss: 456.7967224121094
2024-08-04T00:44:08.427633057Z 
 61%|██████    | 5768/9500 [19:46:38<12:48:39, 12.36s/it]08/03/2024 17:44:08 - INFO - __main__ -   Step: 5768, LR: 8.098841133159464e-06, Loss: 332.21295166015625
2024-08-04T00:44:20.840848164Z 
 61%|██████    | 5769/9500 [19:46:50<12:49:28, 12.37s/it]08/03/2024 17:44:20 - INFO - __main__ -   Step: 5769, LR: 8.096670589472186e-06, Loss: 385.22235107421875
2024-08-04T00:44:33.723076717Z 
 61%|██████    | 5770/9500 [19:47:03<12:58:44, 12.53s/it]08/03/2024 17:44:33 - INFO - __main__ -   Step: 5770, LR: 8.094500045784906e-06, Loss: 497.8658447265625
2024-08-04T00:44:45.595463870Z 
 61%|██████    | 5771/9500 [19:47:15<12:46:19, 12.33s/it]08/03/2024 17:44:45 - INFO - __main__ -   Step: 5771, LR: 8.092329502097627e-06, Loss: 399.2362060546875
2024-08-04T00:44:57.795676577Z 
 61%|██████    | 5772/9500 [19:47:27<12:43:42, 12.29s/it]08/03/2024 17:44:57 - INFO - __main__ -   Step: 5772, LR: 8.090158958410349e-06, Loss: 436.68902587890625
2024-08-04T00:45:10.584052126Z 
 61%|██████    | 5773/9500 [19:47:40<12:52:45, 12.44s/it]08/03/2024 17:45:10 - INFO - __main__ -   Step: 5773, LR: 8.08798841472307e-06, Loss: 406.748046875
2024-08-04T00:45:22.768423491Z 
 61%|██████    | 5774/9500 [19:47:52<12:47:47, 12.36s/it]08/03/2024 17:45:22 - INFO - __main__ -   Step: 5774, LR: 8.08581787103579e-06, Loss: 422.7855529785156
2024-08-04T00:45:35.063523715Z 
 61%|██████    | 5775/9500 [19:48:05<12:46:18, 12.34s/it]08/03/2024 17:45:35 - INFO - __main__ -   Step: 5775, LR: 8.083647327348512e-06, Loss: 448.3900146484375
2024-08-04T00:45:47.814967754Z 
 61%|██████    | 5776/9500 [19:48:17<12:53:41, 12.47s/it]08/03/2024 17:45:47 - INFO - __main__ -   Step: 5776, LR: 8.081476783661234e-06, Loss: 475.6842956542969
2024-08-04T00:45:59.923869420Z 
 61%|██████    | 5777/9500 [19:48:29<12:46:50, 12.36s/it]08/03/2024 17:45:59 - INFO - __main__ -   Step: 5777, LR: 8.079306239973953e-06, Loss: 418.1591491699219
2024-08-04T00:46:12.136276652Z 
 61%|██████    | 5778/9500 [19:48:42<12:43:55, 12.31s/it]08/03/2024 17:46:12 - INFO - __main__ -   Step: 5778, LR: 8.077135696286675e-06, Loss: 350.44403076171875
2024-08-04T00:46:24.709955442Z 
 61%|██████    | 5779/9500 [19:48:54<12:48:32, 12.39s/it]08/03/2024 17:46:24 - INFO - __main__ -   Step: 5779, LR: 8.074965152599396e-06, Loss: 448.9592590332031
2024-08-04T00:46:36.831146701Z 
 61%|██████    | 5780/9500 [19:49:06<12:43:17, 12.31s/it]08/03/2024 17:46:36 - INFO - __main__ -   Step: 5780, LR: 8.072794608912118e-06, Loss: 380.56011962890625
2024-08-04T00:46:48.831356476Z 
 61%|██████    | 5781/9500 [19:49:18<12:37:18, 12.22s/it]08/03/2024 17:46:48 - INFO - __main__ -   Step: 5781, LR: 8.070624065224838e-06, Loss: 447.49786376953125
2024-08-04T00:47:01.338028855Z 
 61%|██████    | 5782/9500 [19:49:31<12:42:27, 12.30s/it]08/03/2024 17:47:01 - INFO - __main__ -   Step: 5782, LR: 8.06845352153756e-06, Loss: 460.88494873046875
2024-08-04T00:47:13.439371154Z 
 61%|██████    | 5783/9500 [19:49:43<12:38:28, 12.24s/it]08/03/2024 17:47:13 - INFO - __main__ -   Step: 5783, LR: 8.066282977850281e-06, Loss: 481.9852600097656
2024-08-04T00:47:25.712140088Z 
 61%|██████    | 5784/9500 [19:49:55<12:38:49, 12.25s/it]08/03/2024 17:47:25 - INFO - __main__ -   Step: 5784, LR: 8.064112434163001e-06, Loss: 448.2283935546875
2024-08-04T00:47:38.475900136Z 
 61%|██████    | 5785/9500 [19:50:08<12:48:06, 12.41s/it]08/03/2024 17:47:38 - INFO - __main__ -   Step: 5785, LR: 8.061941890475722e-06, Loss: 567.6861572265625
2024-08-04T00:47:50.797555997Z 
 61%|██████    | 5786/9500 [19:50:20<12:46:21, 12.38s/it]08/03/2024 17:47:50 - INFO - __main__ -   Step: 5786, LR: 8.059771346788444e-06, Loss: 461.7769775390625
2024-08-04T00:48:03.480806732Z 
 61%|██████    | 5787/9500 [19:50:33<12:51:45, 12.47s/it]08/03/2024 17:48:03 - INFO - __main__ -   Step: 5787, LR: 8.057600803101166e-06, Loss: 477.2464599609375
2024-08-04T00:48:16.093953533Z 
 61%|██████    | 5788/9500 [19:50:46<12:54:11, 12.51s/it]08/03/2024 17:48:16 - INFO - __main__ -   Step: 5788, LR: 8.055430259413885e-06, Loss: 389.9942626953125
2024-08-04T00:48:28.174792979Z 
 61%|██████    | 5789/9500 [19:50:58<12:45:56, 12.38s/it]08/03/2024 17:48:28 - INFO - __main__ -   Step: 5789, LR: 8.053259715726607e-06, Loss: 396.0540771484375
2024-08-04T00:48:40.333483558Z 
 61%|██████    | 5790/9500 [19:51:10<12:41:33, 12.32s/it]08/03/2024 17:48:40 - INFO - __main__ -   Step: 5790, LR: 8.051089172039329e-06, Loss: 389.3399353027344
2024-08-04T00:48:52.745725750Z 
 61%|██████    | 5791/9500 [19:51:22<12:43:08, 12.35s/it]08/03/2024 17:48:52 - INFO - __main__ -   Step: 5791, LR: 8.04891862835205e-06, Loss: 355.2521667480469
2024-08-04T00:49:04.961770675Z 
 61%|██████    | 5792/9500 [19:51:34<12:40:32, 12.31s/it]08/03/2024 17:49:04 - INFO - __main__ -   Step: 5792, LR: 8.04674808466477e-06, Loss: 343.19171142578125
2024-08-04T00:49:17.247446172Z 
 61%|██████    | 5793/9500 [19:51:47<12:39:56, 12.30s/it]08/03/2024 17:49:17 - INFO - __main__ -   Step: 5793, LR: 8.044577540977492e-06, Loss: 473.44573974609375
2024-08-04T00:49:29.524374107Z 
 61%|██████    | 5794/9500 [19:51:59<12:39:18, 12.29s/it]08/03/2024 17:49:29 - INFO - __main__ -   Step: 5794, LR: 8.042406997290213e-06, Loss: 367.4827880859375
2024-08-04T00:49:41.622047855Z 
 61%|██████    | 5795/9500 [19:52:11<12:35:29, 12.23s/it]08/03/2024 17:49:41 - INFO - __main__ -   Step: 5795, LR: 8.040236453602933e-06, Loss: 436.85699462890625
2024-08-04T00:49:53.875624590Z 
 61%|██████    | 5796/9500 [19:52:23<12:35:38, 12.24s/it]08/03/2024 17:49:53 - INFO - __main__ -   Step: 5796, LR: 8.038065909915655e-06, Loss: 439.1332092285156
2024-08-04T00:50:06.966942588Z 
 61%|██████    | 5797/9500 [19:52:36<12:51:11, 12.50s/it]08/03/2024 17:50:06 - INFO - __main__ -   Step: 5797, LR: 8.035895366228376e-06, Loss: 361.18341064453125
2024-08-04T00:50:19.070086526Z 
 61%|██████    | 5798/9500 [19:52:49<12:43:42, 12.38s/it]08/03/2024 17:50:19 - INFO - __main__ -   Step: 5798, LR: 8.033724822541098e-06, Loss: 385.110595703125
2024-08-04T00:50:31.201567782Z 
 61%|██████    | 5799/9500 [19:53:01<12:38:56, 12.30s/it]08/03/2024 17:50:31 - INFO - __main__ -   Step: 5799, LR: 8.031554278853818e-06, Loss: 586.64501953125
2024-08-04T00:50:43.188526549Z 
 61%|██████    | 5800/9500 [19:53:13<12:32:52, 12.21s/it]08/03/2024 17:50:43 - INFO - __main__ -   Step: 5800, LR: 8.029383735166539e-06, Loss: 402.6283264160156
2024-08-04T00:50:55.923176049Z 
 61%|██████    | 5801/9500 [19:53:25<12:42:23, 12.37s/it]08/03/2024 17:50:55 - INFO - __main__ -   Step: 5801, LR: 8.02721319147926e-06, Loss: 444.8374328613281
2024-08-04T00:51:08.204658738Z 
 61%|██████    | 5802/9500 [19:53:38<12:40:37, 12.34s/it]08/03/2024 17:51:08 - INFO - __main__ -   Step: 5802, LR: 8.02504264779198e-06, Loss: 498.57012939453125
2024-08-04T00:51:20.160756149Z 
 61%|██████    | 5803/9500 [19:53:50<12:33:17, 12.23s/it]08/03/2024 17:51:20 - INFO - __main__ -   Step: 5803, LR: 8.022872104104702e-06, Loss: 320.9305725097656
2024-08-04T00:51:33.153641938Z 
 61%|██████    | 5804/9500 [19:54:03<12:47:16, 12.46s/it]08/03/2024 17:51:33 - INFO - __main__ -   Step: 5804, LR: 8.020701560417424e-06, Loss: 461.5538024902344
2024-08-04T00:51:45.460403867Z 
 61%|██████    | 5805/9500 [19:54:15<12:44:18, 12.41s/it]08/03/2024 17:51:45 - INFO - __main__ -   Step: 5805, LR: 8.018531016730145e-06, Loss: 561.8195190429688
2024-08-04T00:51:57.633288145Z 
 61%|██████    | 5806/9500 [19:54:27<12:39:42, 12.34s/it]08/03/2024 17:51:57 - INFO - __main__ -   Step: 5806, LR: 8.016360473042865e-06, Loss: 378.1302185058594
2024-08-04T00:52:09.957047161Z 
 61%|██████    | 5807/9500 [19:54:39<12:39:12, 12.33s/it]08/03/2024 17:52:09 - INFO - __main__ -   Step: 5807, LR: 8.014189929355587e-06, Loss: 411.08441162109375
2024-08-04T00:52:21.771054374Z 
 61%|██████    | 5808/9500 [19:54:51<12:29:23, 12.18s/it]08/03/2024 17:52:21 - INFO - __main__ -   Step: 5808, LR: 8.012019385668308e-06, Loss: 397.7144470214844
2024-08-04T00:52:34.133835780Z 
 61%|██████    | 5809/9500 [19:55:04<12:32:35, 12.23s/it]08/03/2024 17:52:34 - INFO - __main__ -   Step: 5809, LR: 8.009848841981028e-06, Loss: 454.7532653808594
2024-08-04T00:52:46.976874078Z 
 61%|██████    | 5810/9500 [19:55:16<12:43:37, 12.42s/it]08/03/2024 17:52:46 - INFO - __main__ -   Step: 5810, LR: 8.00767829829375e-06, Loss: 440.19097900390625
2024-08-04T00:52:59.450318737Z 
 61%|██████    | 5811/9500 [19:55:29<12:44:27, 12.43s/it]08/03/2024 17:52:59 - INFO - __main__ -   Step: 5811, LR: 8.005507754606471e-06, Loss: 426.92401123046875
2024-08-04T00:53:11.891905284Z 
 61%|██████    | 5812/9500 [19:55:41<12:44:23, 12.44s/it]08/03/2024 17:53:11 - INFO - __main__ -   Step: 5812, LR: 8.003337210919193e-06, Loss: 381.96319580078125
2024-08-04T00:53:24.504648773Z 
 61%|██████    | 5813/9500 [19:55:54<12:47:27, 12.49s/it]08/03/2024 17:53:24 - INFO - __main__ -   Step: 5813, LR: 8.001166667231913e-06, Loss: 383.606201171875
2024-08-04T00:53:36.639076945Z 
 61%|██████    | 5814/9500 [19:56:06<12:40:42, 12.38s/it]08/03/2024 17:53:36 - INFO - __main__ -   Step: 5814, LR: 7.998996123544634e-06, Loss: 346.329833984375
2024-08-04T00:53:48.823274532Z 
 61%|██████    | 5815/9500 [19:56:18<12:36:50, 12.32s/it]08/03/2024 17:53:48 - INFO - __main__ -   Step: 5815, LR: 7.996825579857356e-06, Loss: 359.1744689941406
2024-08-04T00:54:01.476573334Z 
 61%|██████    | 5816/9500 [19:56:31<12:42:43, 12.42s/it]08/03/2024 17:54:01 - INFO - __main__ -   Step: 5816, LR: 7.994655036170076e-06, Loss: 570.3261108398438
2024-08-04T00:54:14.236741797Z 
 61%|██████    | 5817/9500 [19:56:44<12:48:44, 12.52s/it]08/03/2024 17:54:14 - INFO - __main__ -   Step: 5817, LR: 7.992484492482797e-06, Loss: 426.50762939453125
2024-08-04T00:54:26.502705045Z 
 61%|██████    | 5818/9500 [19:56:56<12:43:46, 12.45s/it]08/03/2024 17:54:26 - INFO - __main__ -   Step: 5818, LR: 7.990313948795519e-06, Loss: 414.29449462890625
2024-08-04T00:54:38.946718897Z 
 61%|██████▏   | 5819/9500 [19:57:08<12:43:32, 12.45s/it]08/03/2024 17:54:38 - INFO - __main__ -   Step: 5819, LR: 7.98814340510824e-06, Loss: 353.0852966308594
2024-08-04T00:54:51.459758878Z 
 61%|██████▏   | 5820/9500 [19:57:21<12:44:34, 12.47s/it]08/03/2024 17:54:51 - INFO - __main__ -   Step: 5820, LR: 7.98597286142096e-06, Loss: 394.90411376953125
2024-08-04T00:55:03.741258522Z 
 61%|██████▏   | 5821/9500 [19:57:33<12:40:58, 12.41s/it]08/03/2024 17:55:03 - INFO - __main__ -   Step: 5821, LR: 7.983802317733682e-06, Loss: 432.4022521972656
2024-08-04T00:55:16.228944318Z 
 61%|██████▏   | 5822/9500 [19:57:46<12:42:11, 12.43s/it]08/03/2024 17:55:16 - INFO - __main__ -   Step: 5822, LR: 7.981631774046403e-06, Loss: 355.0867919921875
2024-08-04T00:55:28.613297777Z 
 61%|██████▏   | 5823/9500 [19:57:58<12:41:03, 12.42s/it]08/03/2024 17:55:28 - INFO - __main__ -   Step: 5823, LR: 7.979461230359123e-06, Loss: 526.6663818359375
2024-08-04T00:55:40.713383562Z 
 61%|██████▏   | 5824/9500 [19:58:10<12:35:00, 12.32s/it]08/03/2024 17:55:40 - INFO - __main__ -   Step: 5824, LR: 7.977290686671845e-06, Loss: 494.9862060546875
2024-08-04T00:55:53.151706364Z 
 61%|██████▏   | 5825/9500 [19:58:23<12:36:54, 12.36s/it]08/03/2024 17:55:53 - INFO - __main__ -   Step: 5825, LR: 7.975120142984566e-06, Loss: 457.9163818359375
2024-08-04T00:56:05.267627089Z 
 61%|██████▏   | 5826/9500 [19:58:35<12:32:15, 12.29s/it]08/03/2024 17:56:05 - INFO - __main__ -   Step: 5826, LR: 7.972949599297288e-06, Loss: 316.8043212890625
2024-08-04T00:56:17.221162973Z 
 61%|██████▏   | 5827/9500 [19:58:47<12:25:58, 12.19s/it]08/03/2024 17:56:17 - INFO - __main__ -   Step: 5827, LR: 7.970779055610008e-06, Loss: 428.36199951171875
2024-08-04T00:56:29.713991021Z 
 61%|██████▏   | 5828/9500 [19:58:59<12:31:24, 12.28s/it]08/03/2024 17:56:29 - INFO - __main__ -   Step: 5828, LR: 7.96860851192273e-06, Loss: 373.45501708984375
2024-08-04T00:56:41.764760787Z 
 61%|██████▏   | 5829/9500 [19:59:11<12:27:01, 12.21s/it]08/03/2024 17:56:41 - INFO - __main__ -   Step: 5829, LR: 7.96643796823545e-06, Loss: 440.2974548339844
2024-08-04T00:56:53.712334221Z 
 61%|██████▏   | 5830/9500 [19:59:23<12:22:01, 12.13s/it]08/03/2024 17:56:53 - INFO - __main__ -   Step: 5830, LR: 7.96426742454817e-06, Loss: 429.703369140625
2024-08-04T00:57:06.539638764Z 
 61%|██████▏   | 5831/9500 [19:59:36<12:34:35, 12.34s/it]08/03/2024 17:57:06 - INFO - __main__ -   Step: 5831, LR: 7.962096880860892e-06, Loss: 354.1873474121094
2024-08-04T00:57:18.970221055Z 
 61%|██████▏   | 5832/9500 [19:59:48<12:36:02, 12.37s/it]08/03/2024 17:57:18 - INFO - __main__ -   Step: 5832, LR: 7.959926337173614e-06, Loss: 500.94219970703125
2024-08-04T00:57:31.204447897Z 
 61%|██████▏   | 5833/9500 [20:00:01<12:33:24, 12.33s/it]08/03/2024 17:57:31 - INFO - __main__ -   Step: 5833, LR: 7.957755793486335e-06, Loss: 420.1122131347656
2024-08-04T00:57:43.836936779Z 
 61%|██████▏   | 5834/9500 [20:00:13<12:38:47, 12.42s/it]08/03/2024 17:57:43 - INFO - __main__ -   Step: 5834, LR: 7.955585249799055e-06, Loss: 491.0565490722656
2024-08-04T00:57:55.960926805Z 
 61%|██████▏   | 5835/9500 [20:00:25<12:33:11, 12.33s/it]08/03/2024 17:57:55 - INFO - __main__ -   Step: 5835, LR: 7.953414706111777e-06, Loss: 484.4951171875
2024-08-04T00:58:08.264341790Z 
 61%|██████▏   | 5836/9500 [20:00:38<12:32:29, 12.32s/it]08/03/2024 17:58:08 - INFO - __main__ -   Step: 5836, LR: 7.951244162424498e-06, Loss: 501.3327941894531
2024-08-04T00:58:21.295338086Z 
 61%|██████▏   | 5837/9500 [20:00:51<12:45:15, 12.53s/it]08/03/2024 17:58:21 - INFO - __main__ -   Step: 5837, LR: 7.949073618737218e-06, Loss: 516.8419189453125
2024-08-04T00:58:33.424423212Z 
 61%|██████▏   | 5838/9500 [20:01:03<12:37:36, 12.41s/it]08/03/2024 17:58:33 - INFO - __main__ -   Step: 5838, LR: 7.94690307504994e-06, Loss: 440.3501281738281
2024-08-04T00:58:45.718152720Z 
 61%|██████▏   | 5839/9500 [20:01:15<12:35:13, 12.38s/it]08/03/2024 17:58:45 - INFO - __main__ -   Step: 5839, LR: 7.944732531362661e-06, Loss: 389.7674560546875
2024-08-04T00:58:58.381388617Z 
 61%|██████▏   | 5840/9500 [20:01:28<12:40:15, 12.46s/it]08/03/2024 17:58:58 - INFO - __main__ -   Step: 5840, LR: 7.942561987675383e-06, Loss: 394.76861572265625
2024-08-04T00:59:10.501924269Z 
 61%|██████▏   | 5841/9500 [20:01:40<12:33:46, 12.36s/it]08/03/2024 17:59:10 - INFO - __main__ -   Step: 5841, LR: 7.940391443988104e-06, Loss: 279.75518798828125
2024-08-04T00:59:22.566417119Z 
 61%|██████▏   | 5842/9500 [20:01:52<12:28:09, 12.27s/it]08/03/2024 17:59:22 - INFO - __main__ -   Step: 5842, LR: 7.938220900300824e-06, Loss: 423.1007080078125
2024-08-04T00:59:34.696988149Z 
 62%|██████▏   | 5843/9500 [20:02:04<12:25:22, 12.23s/it]08/03/2024 17:59:34 - INFO - __main__ -   Step: 5843, LR: 7.936050356613546e-06, Loss: 562.11279296875
2024-08-04T00:59:47.422451243Z 
 62%|██████▏   | 5844/9500 [20:02:17<12:34:13, 12.38s/it]08/03/2024 17:59:47 - INFO - __main__ -   Step: 5844, LR: 7.933879812926266e-06, Loss: 365.6903381347656
2024-08-04T00:59:59.414475586Z 
 62%|██████▏   | 5845/9500 [20:02:29<12:26:59, 12.26s/it]08/03/2024 17:59:59 - INFO - __main__ -   Step: 5845, LR: 7.931709269238987e-06, Loss: 272.3867492675781
2024-08-04T01:00:11.531031938Z 
 62%|██████▏   | 5846/9500 [20:02:41<12:24:07, 12.22s/it]08/03/2024 18:00:11 - INFO - __main__ -   Step: 5846, LR: 7.929538725551709e-06, Loss: 564.19482421875
2024-08-04T01:00:24.012482925Z 
 62%|██████▏   | 5847/9500 [20:02:53<12:28:42, 12.30s/it]08/03/2024 18:00:24 - INFO - __main__ -   Step: 5847, LR: 7.92736818186443e-06, Loss: 400.8795166015625
2024-08-04T01:00:36.599236025Z 
 62%|██████▏   | 5848/9500 [20:03:06<12:33:47, 12.38s/it]08/03/2024 18:00:36 - INFO - __main__ -   Step: 5848, LR: 7.925197638177152e-06, Loss: 542.641357421875
2024-08-04T01:00:48.802817713Z 
 62%|██████▏   | 5849/9500 [20:03:18<12:30:17, 12.33s/it]08/03/2024 18:00:48 - INFO - __main__ -   Step: 5849, LR: 7.923027094489872e-06, Loss: 567.8543090820312
2024-08-04T01:01:01.147115218Z 
 62%|██████▏   | 5850/9500 [20:03:31<12:30:20, 12.33s/it]08/03/2024 18:01:01 - INFO - __main__ -   Step: 5850, LR: 7.920856550802593e-06, Loss: 461.2163391113281
2024-08-04T01:01:13.219212852Z 
 62%|██████▏   | 5851/9500 [20:03:43<12:25:20, 12.26s/it]08/03/2024 18:01:13 - INFO - __main__ -   Step: 5851, LR: 7.918686007115313e-06, Loss: 310.95880126953125
2024-08-04T01:01:25.407596694Z 
 62%|██████▏   | 5852/9500 [20:03:55<12:23:54, 12.24s/it]08/03/2024 18:01:25 - INFO - __main__ -   Step: 5852, LR: 7.916515463428035e-06, Loss: 563.3713989257812
2024-08-04T01:01:37.939341453Z 
 62%|██████▏   | 5853/9500 [20:04:07<12:29:07, 12.32s/it]08/03/2024 18:01:37 - INFO - __main__ -   Step: 5853, LR: 7.914344919740756e-06, Loss: 399.05828857421875
2024-08-04T01:01:50.146444702Z 
 62%|██████▏   | 5854/9500 [20:04:20<12:26:46, 12.29s/it]08/03/2024 18:01:50 - INFO - __main__ -   Step: 5854, LR: 7.912174376053478e-06, Loss: 441.8674011230469
2024-08-04T01:02:02.409705780Z 
 62%|██████▏   | 5855/9500 [20:04:32<12:26:05, 12.28s/it]08/03/2024 18:02:02 - INFO - __main__ -   Step: 5855, LR: 7.9100038323662e-06, Loss: 593.2098999023438
2024-08-04T01:02:15.033625124Z 
 62%|██████▏   | 5856/9500 [20:04:44<12:32:07, 12.38s/it]08/03/2024 18:02:15 - INFO - __main__ -   Step: 5856, LR: 7.90783328867892e-06, Loss: 397.2897644042969
2024-08-04T01:02:27.220610980Z 
 62%|██████▏   | 5857/9500 [20:04:57<12:28:20, 12.33s/it]08/03/2024 18:02:27 - INFO - __main__ -   Step: 5857, LR: 7.905662744991641e-06, Loss: 348.2244567871094
2024-08-04T01:02:39.161030865Z 
 62%|██████▏   | 5858/9500 [20:05:09<12:21:07, 12.21s/it]08/03/2024 18:02:39 - INFO - __main__ -   Step: 5858, LR: 7.90349220130436e-06, Loss: 405.8006286621094
2024-08-04T01:02:51.900321794Z 
 62%|██████▏   | 5859/9500 [20:05:21<12:30:33, 12.37s/it]08/03/2024 18:02:51 - INFO - __main__ -   Step: 5859, LR: 7.901321657617082e-06, Loss: 426.15545654296875
2024-08-04T01:03:04.209348125Z 
 62%|██████▏   | 5860/9500 [20:05:34<12:29:16, 12.35s/it]08/03/2024 18:03:04 - INFO - __main__ -   Step: 5860, LR: 7.899151113929804e-06, Loss: 443.39752197265625
2024-08-04T01:03:16.370382757Z 
 62%|██████▏   | 5861/9500 [20:05:46<12:25:36, 12.29s/it]08/03/2024 18:03:16 - INFO - __main__ -   Step: 5861, LR: 7.896980570242525e-06, Loss: 415.90399169921875
2024-08-04T01:03:28.723205025Z 
 62%|██████▏   | 5862/9500 [20:05:58<12:26:29, 12.31s/it]08/03/2024 18:03:28 - INFO - __main__ -   Step: 5862, LR: 7.894810026555247e-06, Loss: 377.3722229003906
2024-08-04T01:03:41.060380524Z 
 62%|██████▏   | 5863/9500 [20:06:10<12:26:44, 12.32s/it]08/03/2024 18:03:41 - INFO - __main__ -   Step: 5863, LR: 7.892639482867967e-06, Loss: 443.24688720703125
2024-08-04T01:03:53.103738129Z 
 62%|██████▏   | 5864/9500 [20:06:23<12:21:31, 12.24s/it]08/03/2024 18:03:53 - INFO - __main__ -   Step: 5864, LR: 7.890468939180688e-06, Loss: 479.8486022949219
2024-08-04T01:04:05.753343615Z 
 62%|██████▏   | 5865/9500 [20:06:35<12:28:49, 12.36s/it]08/03/2024 18:04:05 - INFO - __main__ -   Step: 5865, LR: 7.888298395493408e-06, Loss: 386.59619140625
2024-08-04T01:04:18.225332159Z 
 62%|██████▏   | 5866/9500 [20:06:48<12:30:39, 12.39s/it]08/03/2024 18:04:18 - INFO - __main__ -   Step: 5866, LR: 7.88612785180613e-06, Loss: 374.02398681640625
2024-08-04T01:04:30.477002773Z 
 62%|██████▏   | 5867/9500 [20:07:00<12:27:52, 12.35s/it]08/03/2024 18:04:30 - INFO - __main__ -   Step: 5867, LR: 7.883957308118851e-06, Loss: 376.95452880859375
2024-08-04T01:04:42.840815839Z 
 62%|██████▏   | 5868/9500 [20:07:12<12:27:53, 12.35s/it]08/03/2024 18:04:42 - INFO - __main__ -   Step: 5868, LR: 7.881786764431573e-06, Loss: 476.80938720703125
2024-08-04T01:04:55.036868760Z 
 62%|██████▏   | 5869/9500 [20:07:24<12:24:47, 12.31s/it]08/03/2024 18:04:55 - INFO - __main__ -   Step: 5869, LR: 7.879616220744295e-06, Loss: 521.4454345703125
2024-08-04T01:05:07.065348356Z 
 62%|██████▏   | 5870/9500 [20:07:37<12:19:32, 12.22s/it]08/03/2024 18:05:07 - INFO - __main__ -   Step: 5870, LR: 7.877445677057014e-06, Loss: 334.8441467285156
2024-08-04T01:05:19.777562014Z 
 62%|██████▏   | 5871/9500 [20:07:49<12:28:11, 12.37s/it]08/03/2024 18:05:19 - INFO - __main__ -   Step: 5871, LR: 7.875275133369736e-06, Loss: 494.95489501953125
2024-08-04T01:05:31.821724273Z 
 62%|██████▏   | 5872/9500 [20:08:01<12:22:04, 12.27s/it]08/03/2024 18:05:31 - INFO - __main__ -   Step: 5872, LR: 7.873104589682456e-06, Loss: 407.42352294921875
2024-08-04T01:05:43.750796909Z 
 62%|██████▏   | 5873/9500 [20:08:13<12:15:38, 12.17s/it]08/03/2024 18:05:43 - INFO - __main__ -   Step: 5873, LR: 7.870934045995177e-06, Loss: 314.33380126953125
2024-08-04T01:05:56.215947151Z 
 62%|██████▏   | 5874/9500 [20:08:26<12:20:48, 12.26s/it]08/03/2024 18:05:56 - INFO - __main__ -   Step: 5874, LR: 7.868763502307899e-06, Loss: 470.27197265625
2024-08-04T01:06:08.245482501Z 
 62%|██████▏   | 5875/9500 [20:08:38<12:16:26, 12.19s/it]08/03/2024 18:06:08 - INFO - __main__ -   Step: 5875, LR: 7.86659295862062e-06, Loss: 360.60015869140625
2024-08-04T01:06:20.486514264Z 
 62%|██████▏   | 5876/9500 [20:08:50<12:17:11, 12.21s/it]08/03/2024 18:06:20 - INFO - __main__ -   Step: 5876, LR: 7.864422414933342e-06, Loss: 538.6499633789062
2024-08-04T01:06:32.968175918Z 
 62%|██████▏   | 5877/9500 [20:09:02<12:21:59, 12.29s/it]08/03/2024 18:06:32 - INFO - __main__ -   Step: 5877, LR: 7.862251871246062e-06, Loss: 474.26507568359375
2024-08-04T01:06:45.015624923Z 
 62%|██████▏   | 5878/9500 [20:09:14<12:17:25, 12.22s/it]08/03/2024 18:06:45 - INFO - __main__ -   Step: 5878, LR: 7.860081327558783e-06, Loss: 381.66162109375
2024-08-04T01:06:57.231778497Z 
 62%|██████▏   | 5879/9500 [20:09:27<12:17:14, 12.22s/it]08/03/2024 18:06:57 - INFO - __main__ -   Step: 5879, LR: 7.857910783871503e-06, Loss: 382.2369384765625
2024-08-04T01:07:09.886705994Z 
 62%|██████▏   | 5880/9500 [20:09:39<12:24:58, 12.35s/it]08/03/2024 18:07:09 - INFO - __main__ -   Step: 5880, LR: 7.855740240184225e-06, Loss: 428.68115234375
2024-08-04T01:07:22.212111859Z 
 62%|██████▏   | 5881/9500 [20:09:52<12:24:22, 12.34s/it]08/03/2024 18:07:22 - INFO - __main__ -   Step: 5881, LR: 7.853569696496946e-06, Loss: 338.00445556640625
2024-08-04T01:07:34.495939550Z 
 62%|██████▏   | 5882/9500 [20:10:04<12:23:07, 12.32s/it]08/03/2024 18:07:34 - INFO - __main__ -   Step: 5882, LR: 7.851399152809668e-06, Loss: 349.82977294921875
2024-08-04T01:07:47.164790933Z 
 62%|██████▏   | 5883/9500 [20:10:17<12:29:09, 12.43s/it]08/03/2024 18:07:47 - INFO - __main__ -   Step: 5883, LR: 7.84922860912239e-06, Loss: 386.01226806640625
2024-08-04T01:07:59.629238192Z 
 62%|██████▏   | 5884/9500 [20:10:29<12:29:37, 12.44s/it]08/03/2024 18:07:59 - INFO - __main__ -   Step: 5884, LR: 7.847058065435111e-06, Loss: 421.13739013671875
2024-08-04T01:08:11.696008022Z 
 62%|██████▏   | 5885/9500 [20:10:41<12:22:40, 12.33s/it]08/03/2024 18:08:11 - INFO - __main__ -   Step: 5885, LR: 7.844887521747831e-06, Loss: 442.7828369140625
2024-08-04T01:08:23.535159249Z 
 62%|██████▏   | 5886/9500 [20:10:53<12:13:41, 12.18s/it]08/03/2024 18:08:23 - INFO - __main__ -   Step: 5886, LR: 7.842716978060551e-06, Loss: 315.429443359375
2024-08-04T01:08:35.943863694Z 
 62%|██████▏   | 5887/9500 [20:11:05<12:17:35, 12.25s/it]08/03/2024 18:08:35 - INFO - __main__ -   Step: 5887, LR: 7.840546434373272e-06, Loss: 370.9398193359375
2024-08-04T01:08:48.056716892Z 
 62%|██████▏   | 5888/9500 [20:11:17<12:14:55, 12.21s/it]08/03/2024 18:08:48 - INFO - __main__ -   Step: 5888, LR: 7.838375890685994e-06, Loss: 393.79632568359375
2024-08-04T01:09:00.258521892Z 
 62%|██████▏   | 5889/9500 [20:11:30<12:14:37, 12.21s/it]08/03/2024 18:09:00 - INFO - __main__ -   Step: 5889, LR: 7.836205346998716e-06, Loss: 357.2795715332031
2024-08-04T01:09:13.257791025Z 
 62%|██████▏   | 5890/9500 [20:11:43<12:28:43, 12.44s/it]08/03/2024 18:09:13 - INFO - __main__ -   Step: 5890, LR: 7.834034803311437e-06, Loss: 373.47296142578125
2024-08-04T01:09:25.342825032Z 
 62%|██████▏   | 5891/9500 [20:11:55<12:22:01, 12.34s/it]08/03/2024 18:09:25 - INFO - __main__ -   Step: 5891, LR: 7.831864259624159e-06, Loss: 433.74591064453125
2024-08-04T01:09:37.373068177Z 
 62%|██████▏   | 5892/9500 [20:12:07<12:16:18, 12.24s/it]08/03/2024 18:09:37 - INFO - __main__ -   Step: 5892, LR: 7.829693715936879e-06, Loss: 426.54046630859375
2024-08-04T01:09:50.165961698Z 
 62%|██████▏   | 5893/9500 [20:12:20<12:25:59, 12.41s/it]08/03/2024 18:09:50 - INFO - __main__ -   Step: 5893, LR: 7.8275231722496e-06, Loss: 406.39990234375
2024-08-04T01:10:02.643038165Z 
 62%|██████▏   | 5894/9500 [20:12:32<12:27:00, 12.43s/it]08/03/2024 18:10:02 - INFO - __main__ -   Step: 5894, LR: 7.82535262856232e-06, Loss: 430.19476318359375
2024-08-04T01:10:14.792298004Z 
 62%|██████▏   | 5895/9500 [20:12:44<12:21:45, 12.35s/it]08/03/2024 18:10:14 - INFO - __main__ -   Step: 5895, LR: 7.823182084875042e-06, Loss: 326.7319030761719
2024-08-04T01:10:27.244150988Z 
 62%|██████▏   | 5896/9500 [20:12:57<12:23:27, 12.38s/it]08/03/2024 18:10:27 - INFO - __main__ -   Step: 5896, LR: 7.821011541187763e-06, Loss: 444.75567626953125
2024-08-04T01:10:39.728219952Z 
 62%|██████▏   | 5897/9500 [20:13:09<12:25:11, 12.41s/it]08/03/2024 18:10:39 - INFO - __main__ -   Step: 5897, LR: 7.818840997500485e-06, Loss: 485.5012512207031
2024-08-04T01:10:51.782884058Z 
 62%|██████▏   | 5898/9500 [20:13:21<12:18:35, 12.30s/it]08/03/2024 18:10:51 - INFO - __main__ -   Step: 5898, LR: 7.816670453813206e-06, Loss: 383.8022766113281
2024-08-04T01:11:04.251598991Z 
 62%|██████▏   | 5899/9500 [20:13:34<12:21:21, 12.35s/it]08/03/2024 18:11:04 - INFO - __main__ -   Step: 5899, LR: 7.814499910125926e-06, Loss: 403.2398681640625
2024-08-04T01:11:16.614139320Z 
 62%|██████▏   | 5900/9500 [20:13:46<12:21:20, 12.36s/it]08/03/2024 18:11:16 - INFO - __main__ -   Step: 5900, LR: 7.812329366438648e-06, Loss: 378.0190734863281
2024-08-04T01:11:28.714203570Z 
 62%|██████▏   | 5901/9500 [20:13:58<12:16:32, 12.28s/it]08/03/2024 18:11:28 - INFO - __main__ -   Step: 5901, LR: 7.810158822751367e-06, Loss: 384.7479248046875
2024-08-04T01:11:41.063048664Z 
 62%|██████▏   | 5902/9500 [20:14:10<12:17:35, 12.30s/it]08/03/2024 18:11:41 - INFO - __main__ -   Step: 5902, LR: 7.807988279064089e-06, Loss: 385.55621337890625
2024-08-04T01:11:53.157812682Z 
 62%|██████▏   | 5903/9500 [20:14:23<12:13:41, 12.24s/it]08/03/2024 18:11:53 - INFO - __main__ -   Step: 5903, LR: 7.80581773537681e-06, Loss: 492.84619140625
2024-08-04T01:12:05.119190545Z 
 62%|██████▏   | 5904/9500 [20:14:35<12:08:30, 12.16s/it]08/03/2024 18:12:05 - INFO - __main__ -   Step: 5904, LR: 7.803647191689532e-06, Loss: 435.82965087890625
2024-08-04T01:12:17.687946001Z 
 62%|██████▏   | 5905/9500 [20:14:47<12:15:44, 12.28s/it]08/03/2024 18:12:17 - INFO - __main__ -   Step: 5905, LR: 7.801476648002254e-06, Loss: 393.889404296875
2024-08-04T01:12:29.687823972Z 
 62%|██████▏   | 5906/9500 [20:14:59<12:10:30, 12.20s/it]08/03/2024 18:12:29 - INFO - __main__ -   Step: 5906, LR: 7.799306104314974e-06, Loss: 463.5382385253906
2024-08-04T01:12:41.798631781Z 
 62%|██████▏   | 5907/9500 [20:15:11<12:08:47, 12.17s/it]08/03/2024 18:12:41 - INFO - __main__ -   Step: 5907, LR: 7.797135560627695e-06, Loss: 358.5447998046875
2024-08-04T01:12:54.199032608Z 
 62%|██████▏   | 5908/9500 [20:15:24<12:12:43, 12.24s/it]08/03/2024 18:12:54 - INFO - __main__ -   Step: 5908, LR: 7.794965016940415e-06, Loss: 379.52142333984375
2024-08-04T01:13:06.312671694Z 
 62%|██████▏   | 5909/9500 [20:15:36<12:10:15, 12.20s/it]08/03/2024 18:13:06 - INFO - __main__ -   Step: 5909, LR: 7.792794473253137e-06, Loss: 420.1230163574219
2024-08-04T01:13:18.509438901Z 
 62%|██████▏   | 5910/9500 [20:15:48<12:09:58, 12.20s/it]08/03/2024 18:13:18 - INFO - __main__ -   Step: 5910, LR: 7.790623929565858e-06, Loss: 383.2171630859375
2024-08-04T01:13:30.982647424Z 
 62%|██████▏   | 5911/9500 [20:16:00<12:14:39, 12.28s/it]08/03/2024 18:13:30 - INFO - __main__ -   Step: 5911, LR: 7.78845338587858e-06, Loss: 417.5079650878906
2024-08-04T01:13:42.969039678Z 
 62%|██████▏   | 5912/9500 [20:16:12<12:09:09, 12.19s/it]08/03/2024 18:13:42 - INFO - __main__ -   Step: 5912, LR: 7.786282842191301e-06, Loss: 307.90142822265625
2024-08-04T01:13:55.283720336Z 
 62%|██████▏   | 5913/9500 [20:16:25<12:11:08, 12.23s/it]08/03/2024 18:13:55 - INFO - __main__ -   Step: 5913, LR: 7.784112298504021e-06, Loss: 530.871826171875
2024-08-04T01:14:07.689379712Z 
 62%|██████▏   | 5914/9500 [20:16:37<12:14:05, 12.28s/it]08/03/2024 18:14:07 - INFO - __main__ -   Step: 5914, LR: 7.781941754816743e-06, Loss: 408.03704833984375
2024-08-04T01:14:19.772001622Z 
 62%|██████▏   | 5915/9500 [20:16:49<12:10:18, 12.22s/it]08/03/2024 18:14:19 - INFO - __main__ -   Step: 5915, LR: 7.779771211129463e-06, Loss: 389.98651123046875
2024-08-04T01:14:32.228538017Z 
 62%|██████▏   | 5916/9500 [20:17:02<12:14:17, 12.29s/it]08/03/2024 18:14:32 - INFO - __main__ -   Step: 5916, LR: 7.777600667442184e-06, Loss: 492.77166748046875
2024-08-04T01:14:45.211008554Z 
 62%|██████▏   | 5917/9500 [20:17:15<12:26:26, 12.50s/it]08/03/2024 18:14:45 - INFO - __main__ -   Step: 5917, LR: 7.775430123754906e-06, Loss: 489.03399658203125
2024-08-04T01:14:57.851710448Z 
 62%|██████▏   | 5918/9500 [20:17:27<12:28:45, 12.54s/it]08/03/2024 18:14:57 - INFO - __main__ -   Step: 5918, LR: 7.773259580067627e-06, Loss: 545.4896240234375
2024-08-04T01:15:10.209232352Z 
 62%|██████▏   | 5919/9500 [20:17:40<12:25:14, 12.49s/it]08/03/2024 18:15:10 - INFO - __main__ -   Step: 5919, LR: 7.771089036380349e-06, Loss: 430.98040771484375
2024-08-04T01:15:22.616693796Z 
 62%|██████▏   | 5920/9500 [20:17:52<12:23:37, 12.46s/it]08/03/2024 18:15:22 - INFO - __main__ -   Step: 5920, LR: 7.768918492693069e-06, Loss: 374.03753662109375
2024-08-04T01:15:34.690744125Z 
 62%|██████▏   | 5921/9500 [20:18:04<12:16:27, 12.35s/it]08/03/2024 18:15:34 - INFO - __main__ -   Step: 5921, LR: 7.76674794900579e-06, Loss: 448.93023681640625
2024-08-04T01:15:46.770809014Z 
 62%|██████▏   | 5922/9500 [20:18:16<12:11:29, 12.27s/it]08/03/2024 18:15:46 - INFO - __main__ -   Step: 5922, LR: 7.76457740531851e-06, Loss: 492.1253662109375
2024-08-04T01:15:59.207128717Z 
 62%|██████▏   | 5923/9500 [20:18:29<12:14:19, 12.32s/it]08/03/2024 18:15:59 - INFO - __main__ -   Step: 5923, LR: 7.762406861631232e-06, Loss: 409.73046875
2024-08-04T01:16:11.241208158Z 
 62%|██████▏   | 5924/9500 [20:18:41<12:09:02, 12.23s/it]08/03/2024 18:16:11 - INFO - __main__ -   Step: 5924, LR: 7.760236317943953e-06, Loss: 389.2630615234375
2024-08-04T01:16:23.361240137Z 
 62%|██████▏   | 5925/9500 [20:18:53<12:06:50, 12.20s/it]08/03/2024 18:16:23 - INFO - __main__ -   Step: 5925, LR: 7.758065774256675e-06, Loss: 332.2275390625
2024-08-04T01:16:35.751148459Z 
 62%|██████▏   | 5926/9500 [20:19:05<12:10:03, 12.26s/it]08/03/2024 18:16:35 - INFO - __main__ -   Step: 5926, LR: 7.755895230569396e-06, Loss: 410.081298828125
2024-08-04T01:16:47.921176738Z 
 62%|██████▏   | 5927/9500 [20:19:17<12:08:18, 12.23s/it]08/03/2024 18:16:47 - INFO - __main__ -   Step: 5927, LR: 7.753724686882116e-06, Loss: 389.31390380859375
2024-08-04T01:17:00.076531213Z 
 62%|██████▏   | 5928/9500 [20:19:30<12:06:45, 12.21s/it]08/03/2024 18:17:00 - INFO - __main__ -   Step: 5928, LR: 7.751554143194838e-06, Loss: 347.0471496582031
2024-08-04T01:17:12.229224428Z 
 62%|██████▏   | 5929/9500 [20:19:42<12:05:34, 12.19s/it]08/03/2024 18:17:12 - INFO - __main__ -   Step: 5929, LR: 7.749383599507558e-06, Loss: 378.7187194824219
2024-08-04T01:17:24.666405244Z 
 62%|██████▏   | 5930/9500 [20:19:54<12:09:46, 12.27s/it]08/03/2024 18:17:24 - INFO - __main__ -   Step: 5930, LR: 7.747213055820279e-06, Loss: 303.5077209472656
2024-08-04T01:17:37.295676458Z 
 62%|██████▏   | 5931/9500 [20:20:07<12:16:04, 12.37s/it]08/03/2024 18:17:37 - INFO - __main__ -   Step: 5931, LR: 7.745042512133e-06, Loss: 597.2847900390625
2024-08-04T01:17:49.187915947Z 
 62%|██████▏   | 5932/9500 [20:20:19<12:07:15, 12.23s/it]08/03/2024 18:17:49 - INFO - __main__ -   Step: 5932, LR: 7.742871968445722e-06, Loss: 373.8408508300781
2024-08-04T01:18:01.827887258Z 
 62%|██████▏   | 5933/9500 [20:20:31<12:14:22, 12.35s/it]08/03/2024 18:18:01 - INFO - __main__ -   Step: 5933, LR: 7.740701424758444e-06, Loss: 389.6832275390625
2024-08-04T01:18:13.802474783Z 
 62%|██████▏   | 5934/9500 [20:20:43<12:07:25, 12.24s/it]08/03/2024 18:18:13 - INFO - __main__ -   Step: 5934, LR: 7.738530881071164e-06, Loss: 336.433349609375
2024-08-04T01:18:26.003710184Z 
 62%|██████▏   | 5935/9500 [20:20:55<12:06:32, 12.23s/it]08/03/2024 18:18:26 - INFO - __main__ -   Step: 5935, LR: 7.736360337383885e-06, Loss: 435.256103515625
2024-08-04T01:18:38.201004369Z 
 62%|██████▏   | 5936/9500 [20:21:08<12:05:47, 12.22s/it]08/03/2024 18:18:38 - INFO - __main__ -   Step: 5936, LR: 7.734189793696605e-06, Loss: 341.7591552734375
2024-08-04T01:18:50.560168732Z 
 62%|██████▏   | 5937/9500 [20:21:20<12:08:05, 12.26s/it]08/03/2024 18:18:50 - INFO - __main__ -   Step: 5937, LR: 7.732019250009327e-06, Loss: 402.46435546875
2024-08-04T01:19:02.997972382Z 
 63%|██████▎   | 5938/9500 [20:21:32<12:11:02, 12.31s/it]08/03/2024 18:19:02 - INFO - __main__ -   Step: 5938, LR: 7.729848706322048e-06, Loss: 485.8438415527344
2024-08-04T01:19:15.592772127Z 
 63%|██████▎   | 5939/9500 [20:21:45<12:15:50, 12.40s/it]08/03/2024 18:19:15 - INFO - __main__ -   Step: 5939, LR: 7.72767816263477e-06, Loss: 359.53570556640625
2024-08-04T01:19:27.723087503Z 
 63%|██████▎   | 5940/9500 [20:21:57<12:10:51, 12.32s/it]08/03/2024 18:19:27 - INFO - __main__ -   Step: 5940, LR: 7.725507618947491e-06, Loss: 408.6535949707031
2024-08-04T01:19:39.835243626Z 
 63%|██████▎   | 5941/9500 [20:22:09<12:06:59, 12.26s/it]08/03/2024 18:19:39 - INFO - __main__ -   Step: 5941, LR: 7.723337075260211e-06, Loss: 359.5633239746094
2024-08-04T01:19:52.567516447Z 
 63%|██████▎   | 5942/9500 [20:22:22<12:15:15, 12.40s/it]08/03/2024 18:19:52 - INFO - __main__ -   Step: 5942, LR: 7.721166531572933e-06, Loss: 558.4168701171875
2024-08-04T01:20:04.653692115Z 
 63%|██████▎   | 5943/9500 [20:22:34<12:09:29, 12.31s/it]08/03/2024 18:20:04 - INFO - __main__ -   Step: 5943, LR: 7.718995987885654e-06, Loss: 380.15716552734375
2024-08-04T01:20:16.810282956Z 
 63%|██████▎   | 5944/9500 [20:22:46<12:06:37, 12.26s/it]08/03/2024 18:20:16 - INFO - __main__ -   Step: 5944, LR: 7.716825444198374e-06, Loss: 412.7733154296875
2024-08-04T01:20:29.069126480Z 
 63%|██████▎   | 5945/9500 [20:22:59<12:06:24, 12.26s/it]08/03/2024 18:20:29 - INFO - __main__ -   Step: 5945, LR: 7.714654900511096e-06, Loss: 398.5216064453125
2024-08-04T01:20:41.436500248Z 
 63%|██████▎   | 5946/9500 [20:23:11<12:08:06, 12.29s/it]08/03/2024 18:20:41 - INFO - __main__ -   Step: 5946, LR: 7.712484356823817e-06, Loss: 523.4362182617188
2024-08-04T01:20:53.577537706Z 
 63%|██████▎   | 5947/9500 [20:23:23<12:05:13, 12.25s/it]08/03/2024 18:20:53 - INFO - __main__ -   Step: 5947, LR: 7.710313813136539e-06, Loss: 420.90484619140625
2024-08-04T01:21:06.014910875Z 
 63%|██████▎   | 5948/9500 [20:23:35<12:08:24, 12.30s/it]08/03/2024 18:21:06 - INFO - __main__ -   Step: 5948, LR: 7.708143269449259e-06, Loss: 328.64166259765625
2024-08-04T01:21:18.041769598Z 
 63%|██████▎   | 5949/9500 [20:23:47<12:03:16, 12.22s/it]08/03/2024 18:21:18 - INFO - __main__ -   Step: 5949, LR: 7.70597272576198e-06, Loss: 496.5664367675781
2024-08-04T01:21:30.075360859Z 
 63%|██████▎   | 5950/9500 [20:24:00<11:59:44, 12.16s/it]08/03/2024 18:21:30 - INFO - __main__ -   Step: 5950, LR: 7.703802182074702e-06, Loss: 454.2095642089844
2024-08-04T01:21:42.550922645Z 
 63%|██████▎   | 5951/9500 [20:24:12<12:05:03, 12.26s/it]08/03/2024 18:21:42 - INFO - __main__ -   Step: 5951, LR: 7.701631638387422e-06, Loss: 444.5396728515625
2024-08-04T01:21:55.016782896Z 
 63%|██████▎   | 5952/9500 [20:24:24<12:08:32, 12.32s/it]08/03/2024 18:21:55 - INFO - __main__ -   Step: 5952, LR: 7.699461094700143e-06, Loss: 383.5614013671875
2024-08-04T01:22:07.082626125Z 
 63%|██████▎   | 5953/9500 [20:24:37<12:03:49, 12.24s/it]08/03/2024 18:22:07 - INFO - __main__ -   Step: 5953, LR: 7.697290551012865e-06, Loss: 452.00555419921875
2024-08-04T01:22:19.885365859Z 
 63%|██████▎   | 5954/9500 [20:24:49<12:13:31, 12.41s/it]08/03/2024 18:22:19 - INFO - __main__ -   Step: 5954, LR: 7.695120007325586e-06, Loss: 405.10113525390625
2024-08-04T01:22:32.089292705Z 
 63%|██████▎   | 5955/9500 [20:25:02<12:09:38, 12.35s/it]08/03/2024 18:22:32 - INFO - __main__ -   Step: 5955, LR: 7.692949463638306e-06, Loss: 349.40655517578125
2024-08-04T01:22:44.288675956Z 
 63%|██████▎   | 5956/9500 [20:25:14<12:06:46, 12.30s/it]08/03/2024 18:22:44 - INFO - __main__ -   Step: 5956, LR: 7.690778919951028e-06, Loss: 478.18182373046875
2024-08-04T01:22:56.822444071Z 
 63%|██████▎   | 5957/9500 [20:25:26<12:10:38, 12.37s/it]08/03/2024 18:22:56 - INFO - __main__ -   Step: 5957, LR: 7.68860837626375e-06, Loss: 480.6403503417969
2024-08-04T01:23:09.390897527Z 
 63%|██████▎   | 5958/9500 [20:25:39<12:13:52, 12.43s/it]08/03/2024 18:23:09 - INFO - __main__ -   Step: 5958, LR: 7.68643783257647e-06, Loss: 530.386474609375
2024-08-04T01:23:21.415247072Z 
 63%|██████▎   | 5959/9500 [20:25:51<12:06:27, 12.31s/it]08/03/2024 18:23:21 - INFO - __main__ -   Step: 5959, LR: 7.68426728888919e-06, Loss: 386.02923583984375
2024-08-04T01:23:33.930837296Z 
 63%|██████▎   | 5960/9500 [20:26:03<12:09:54, 12.37s/it]08/03/2024 18:23:33 - INFO - __main__ -   Step: 5960, LR: 7.682096745201912e-06, Loss: 484.9230041503906
2024-08-04T01:23:46.269901852Z 
 63%|██████▎   | 5961/9500 [20:26:16<12:09:07, 12.36s/it]08/03/2024 18:23:46 - INFO - __main__ -   Step: 5961, LR: 7.679926201514634e-06, Loss: 458.60772705078125
2024-08-04T01:23:58.330915588Z 
 63%|██████▎   | 5962/9500 [20:26:28<12:03:36, 12.27s/it]08/03/2024 18:23:58 - INFO - __main__ -   Step: 5962, LR: 7.677755657827354e-06, Loss: 465.3655700683594
2024-08-04T01:24:11.195085443Z 
 63%|██████▎   | 5963/9500 [20:26:41<12:13:52, 12.45s/it]08/03/2024 18:24:11 - INFO - __main__ -   Step: 5963, LR: 7.675585114140075e-06, Loss: 477.88531494140625
2024-08-04T01:24:23.158158963Z 
 63%|██████▎   | 5964/9500 [20:26:53<12:05:04, 12.30s/it]08/03/2024 18:24:23 - INFO - __main__ -   Step: 5964, LR: 7.673414570452797e-06, Loss: 357.2254638671875
2024-08-04T01:24:35.562122762Z 
 63%|██████▎   | 5965/9500 [20:27:05<12:06:39, 12.33s/it]08/03/2024 18:24:35 - INFO - __main__ -   Step: 5965, LR: 7.671244026765517e-06, Loss: 401.7815246582031
2024-08-04T01:24:48.265637586Z 
 63%|██████▎   | 5966/9500 [20:27:18<12:12:59, 12.44s/it]08/03/2024 18:24:48 - INFO - __main__ -   Step: 5966, LR: 7.669073483078238e-06, Loss: 447.95068359375
2024-08-04T01:25:00.265123381Z 
 63%|██████▎   | 5967/9500 [20:27:30<12:04:54, 12.31s/it]08/03/2024 18:25:00 - INFO - __main__ -   Step: 5967, LR: 7.66690293939096e-06, Loss: 486.3749694824219
2024-08-04T01:25:12.374837170Z 
 63%|██████▎   | 5968/9500 [20:27:42<12:01:09, 12.25s/it]08/03/2024 18:25:12 - INFO - __main__ -   Step: 5968, LR: 7.664732395703681e-06, Loss: 375.56982421875
2024-08-04T01:25:24.831537436Z 
 63%|██████▎   | 5969/9500 [20:27:54<12:04:35, 12.31s/it]08/03/2024 18:25:24 - INFO - __main__ -   Step: 5969, LR: 7.662561852016401e-06, Loss: 437.18218994140625
2024-08-04T01:25:36.961923944Z 
 63%|██████▎   | 5970/9500 [20:28:06<12:01:10, 12.26s/it]08/03/2024 18:25:36 - INFO - __main__ -   Step: 5970, LR: 7.660391308329123e-06, Loss: 386.6053771972656
2024-08-04T01:25:49.367167634Z 
 63%|██████▎   | 5971/9500 [20:28:19<12:03:33, 12.30s/it]08/03/2024 18:25:49 - INFO - __main__ -   Step: 5971, LR: 7.658220764641844e-06, Loss: 395.0730285644531
2024-08-04T01:26:01.493928608Z 
 63%|██████▎   | 5972/9500 [20:28:31<12:00:15, 12.25s/it]08/03/2024 18:26:01 - INFO - __main__ -   Step: 5972, LR: 7.656050220954564e-06, Loss: 372.799072265625
2024-08-04T01:26:13.815472866Z 
 63%|██████▎   | 5973/9500 [20:28:43<12:01:20, 12.27s/it]08/03/2024 18:26:13 - INFO - __main__ -   Step: 5973, LR: 7.653879677267286e-06, Loss: 468.8139953613281
2024-08-04T01:26:25.833777535Z 
 63%|██████▎   | 5974/9500 [20:28:55<11:56:40, 12.20s/it]08/03/2024 18:26:25 - INFO - __main__ -   Step: 5974, LR: 7.651709133580007e-06, Loss: 412.9026184082031
2024-08-04T01:26:37.978623565Z 
 63%|██████▎   | 5975/9500 [20:29:07<11:55:34, 12.18s/it]08/03/2024 18:26:37 - INFO - __main__ -   Step: 5975, LR: 7.649538589892729e-06, Loss: 390.0399475097656
2024-08-04T01:26:50.650229273Z 
 63%|██████▎   | 5976/9500 [20:29:20<12:04:02, 12.33s/it]08/03/2024 18:26:50 - INFO - __main__ -   Step: 5976, LR: 7.647368046205449e-06, Loss: 444.78436279296875
2024-08-04T01:27:02.644204669Z 
 63%|██████▎   | 5977/9500 [20:29:32<11:57:57, 12.23s/it]08/03/2024 18:27:02 - INFO - __main__ -   Step: 5977, LR: 7.64519750251817e-06, Loss: 398.1676330566406
2024-08-04T01:27:14.796886106Z 
 63%|██████▎   | 5978/9500 [20:29:44<11:56:26, 12.21s/it]08/03/2024 18:27:14 - INFO - __main__ -   Step: 5978, LR: 7.643026958830892e-06, Loss: 397.2857666015625
2024-08-04T01:27:27.689834655Z 
 63%|██████▎   | 5979/9500 [20:29:57<12:08:20, 12.41s/it]08/03/2024 18:27:27 - INFO - __main__ -   Step: 5979, LR: 7.640856415143612e-06, Loss: 365.220947265625
2024-08-04T01:27:40.295129980Z 
 63%|██████▎   | 5980/9500 [20:30:10<12:11:33, 12.47s/it]08/03/2024 18:27:40 - INFO - __main__ -   Step: 5980, LR: 7.638685871456333e-06, Loss: 379.39794921875
2024-08-04T01:27:52.196947490Z 
 63%|██████▎   | 5981/9500 [20:30:22<12:01:20, 12.30s/it]08/03/2024 18:27:52 - INFO - __main__ -   Step: 5981, LR: 7.636515327769055e-06, Loss: 393.4062194824219
2024-08-04T01:28:04.856889895Z 
 63%|██████▎   | 5982/9500 [20:30:34<12:07:29, 12.41s/it]08/03/2024 18:28:04 - INFO - __main__ -   Step: 5982, LR: 7.634344784081777e-06, Loss: 489.4472961425781
2024-08-04T01:28:17.211865535Z 
 63%|██████▎   | 5983/9500 [20:30:47<12:06:21, 12.39s/it]08/03/2024 18:28:17 - INFO - __main__ -   Step: 5983, LR: 7.632174240394496e-06, Loss: 404.2247314453125
2024-08-04T01:28:29.298555128Z 
 63%|██████▎   | 5984/9500 [20:30:59<12:00:47, 12.30s/it]08/03/2024 18:28:29 - INFO - __main__ -   Step: 5984, LR: 7.630003696707218e-06, Loss: 418.15362548828125
2024-08-04T01:28:41.800744668Z 
 63%|██████▎   | 5985/9500 [20:31:11<12:04:08, 12.36s/it]08/03/2024 18:28:41 - INFO - __main__ -   Step: 5985, LR: 7.627833153019939e-06, Loss: 476.181396484375
2024-08-04T01:28:53.962191144Z 
 63%|██████▎   | 5986/9500 [20:31:23<12:00:25, 12.30s/it]08/03/2024 18:28:53 - INFO - __main__ -   Step: 5986, LR: 7.62566260933266e-06, Loss: 351.0279541015625
2024-08-04T01:29:06.293089994Z 
 63%|██████▎   | 5987/9500 [20:31:36<12:00:44, 12.31s/it]08/03/2024 18:29:06 - INFO - __main__ -   Step: 5987, LR: 7.623492065645382e-06, Loss: 356.79449462890625
2024-08-04T01:29:18.669994305Z 
 63%|██████▎   | 5988/9500 [20:31:48<12:01:43, 12.33s/it]08/03/2024 18:29:18 - INFO - __main__ -   Step: 5988, LR: 7.6213215219581025e-06, Loss: 334.5606994628906
2024-08-04T01:29:30.763128830Z 
 63%|██████▎   | 5989/9500 [20:32:00<11:57:21, 12.26s/it]08/03/2024 18:29:30 - INFO - __main__ -   Step: 5989, LR: 7.619150978270824e-06, Loss: 428.91180419921875
2024-08-04T01:29:42.704455416Z 
 63%|██████▎   | 5990/9500 [20:32:12<11:51:34, 12.16s/it]08/03/2024 18:29:42 - INFO - __main__ -   Step: 5990, LR: 7.616980434583544e-06, Loss: 412.211669921875
2024-08-04T01:29:55.117869654Z 
 63%|██████▎   | 5991/9500 [20:32:25<11:55:45, 12.24s/it]08/03/2024 18:29:55 - INFO - __main__ -   Step: 5991, LR: 7.6148098908962655e-06, Loss: 432.5187072753906
2024-08-04T01:30:07.201067557Z 
 63%|██████▎   | 5992/9500 [20:32:37<11:52:49, 12.19s/it]08/03/2024 18:30:07 - INFO - __main__ -   Step: 5992, LR: 7.612639347208986e-06, Loss: 333.62811279296875
2024-08-04T01:30:19.593786059Z 
 63%|██████▎   | 5993/9500 [20:32:49<11:56:08, 12.25s/it]08/03/2024 18:30:19 - INFO - __main__ -   Step: 5993, LR: 7.610468803521708e-06, Loss: 411.9888000488281
2024-08-04T01:30:32.075064120Z 
 63%|██████▎   | 5994/9500 [20:33:02<11:59:57, 12.32s/it]08/03/2024 18:30:32 - INFO - __main__ -   Step: 5994, LR: 7.608298259834429e-06, Loss: 302.86712646484375
2024-08-04T01:30:44.152685122Z 
 63%|██████▎   | 5995/9500 [20:33:14<11:55:29, 12.25s/it]08/03/2024 18:30:44 - INFO - __main__ -   Step: 5995, LR: 7.60612771614715e-06, Loss: 490.9316101074219
2024-08-04T01:30:56.377206010Z 
 63%|██████▎   | 5996/9500 [20:33:26<11:54:52, 12.24s/it]08/03/2024 18:30:56 - INFO - __main__ -   Step: 5996, LR: 7.603957172459872e-06, Loss: 447.301513671875
2024-08-04T01:31:08.691285369Z 
 63%|██████▎   | 5997/9500 [20:33:38<11:55:56, 12.26s/it]08/03/2024 18:31:08 - INFO - __main__ -   Step: 5997, LR: 7.6017866287725914e-06, Loss: 391.0126037597656
2024-08-04T01:31:20.665370324Z 
 63%|██████▎   | 5998/9500 [20:33:50<11:50:41, 12.18s/it]08/03/2024 18:31:20 - INFO - __main__ -   Step: 5998, LR: 7.599616085085313e-06, Loss: 301.6783447265625
2024-08-04T01:31:32.844687154Z 
 63%|██████▎   | 5999/9500 [20:34:02<11:50:32, 12.18s/it]08/03/2024 18:31:32 - INFO - __main__ -   Step: 5999, LR: 7.597445541398034e-06, Loss: 416.55328369140625
2024-08-04T01:31:45.184721092Z 
 63%|██████▎   | 6000/9500 [20:34:15<11:53:10, 12.23s/it]08/03/2024 18:31:45 - INFO - __main__ -   Step: 6000, LR: 7.595274997710755e-06, Loss: 291.666015625
2024-08-04T01:31:57.331541586Z 
 63%|██████▎   | 6001/9500 [20:34:27<11:51:35, 12.20s/it]08/03/2024 18:31:57 - INFO - __main__ -   Step: 6001, LR: 7.593104454023477e-06, Loss: 459.0496826171875
2024-08-04T01:32:09.915786662Z 
 63%|██████▎   | 6002/9500 [20:34:39<11:58:04, 12.32s/it]08/03/2024 18:32:09 - INFO - __main__ -   Step: 6002, LR: 7.5909339103361975e-06, Loss: 357.8918151855469
2024-08-04T01:32:22.466397639Z 
 63%|██████▎   | 6003/9500 [20:34:52<12:01:57, 12.39s/it]08/03/2024 18:32:22 - INFO - __main__ -   Step: 6003, LR: 7.588763366648919e-06, Loss: 344.60516357421875
2024-08-04T01:32:34.938967690Z 
 63%|██████▎   | 6004/9500 [20:35:04<12:03:14, 12.41s/it]08/03/2024 18:32:34 - INFO - __main__ -   Step: 6004, LR: 7.586592822961639e-06, Loss: 450.5914306640625
2024-08-04T01:32:47.158377554Z 
 63%|██████▎   | 6005/9500 [20:35:17<11:59:39, 12.35s/it]08/03/2024 18:32:47 - INFO - __main__ -   Step: 6005, LR: 7.5844222792743605e-06, Loss: 456.67236328125
2024-08-04T01:32:59.780740130Z 
 63%|██████▎   | 6006/9500 [20:35:29<12:04:07, 12.43s/it]08/03/2024 18:32:59 - INFO - __main__ -   Step: 6006, LR: 7.582251735587082e-06, Loss: 413.0331726074219
2024-08-04T01:33:12.246722169Z 
 63%|██████▎   | 6007/9500 [20:35:42<12:04:27, 12.44s/it]08/03/2024 18:33:12 - INFO - __main__ -   Step: 6007, LR: 7.580081191899803e-06, Loss: 374.17498779296875
2024-08-04T01:33:24.396855013Z 
 63%|██████▎   | 6008/9500 [20:35:54<11:59:07, 12.36s/it]08/03/2024 18:33:24 - INFO - __main__ -   Step: 6008, LR: 7.577910648212524e-06, Loss: 363.41888427734375
2024-08-04T01:33:37.028482026Z 
 63%|██████▎   | 6009/9500 [20:36:06<12:03:43, 12.44s/it]08/03/2024 18:33:37 - INFO - __main__ -   Step: 6009, LR: 7.575740104525245e-06, Loss: 323.137939453125
2024-08-04T01:33:49.421719206Z 
 63%|██████▎   | 6010/9500 [20:36:19<12:02:43, 12.43s/it]08/03/2024 18:33:49 - INFO - __main__ -   Step: 6010, LR: 7.573569560837967e-06, Loss: 371.0533447265625
2024-08-04T01:34:01.763187284Z 
 63%|██████▎   | 6011/9500 [20:36:31<12:01:03, 12.40s/it]08/03/2024 18:34:01 - INFO - __main__ -   Step: 6011, LR: 7.5713990171506865e-06, Loss: 394.0562744140625
2024-08-04T01:34:14.444566476Z 
 63%|██████▎   | 6012/9500 [20:36:44<12:05:45, 12.48s/it]08/03/2024 18:34:14 - INFO - __main__ -   Step: 6012, LR: 7.569228473463408e-06, Loss: 411.24090576171875
2024-08-04T01:34:26.302285634Z 
 63%|██████▎   | 6013/9500 [20:36:56<11:54:37, 12.30s/it]08/03/2024 18:34:26 - INFO - __main__ -   Step: 6013, LR: 7.56705792977613e-06, Loss: 366.61346435546875
2024-08-04T01:34:38.648901018Z 
 63%|██████▎   | 6014/9500 [20:37:08<11:55:17, 12.31s/it]08/03/2024 18:34:38 - INFO - __main__ -   Step: 6014, LR: 7.56488738608885e-06, Loss: 451.42791748046875
2024-08-04T01:34:50.862178809Z 
 63%|██████▎   | 6015/9500 [20:37:20<11:53:22, 12.28s/it]08/03/2024 18:34:50 - INFO - __main__ -   Step: 6015, LR: 7.562716842401572e-06, Loss: 527.90771484375
2024-08-04T01:35:03.496760795Z 
 63%|██████▎   | 6016/9500 [20:37:33<11:59:17, 12.39s/it]08/03/2024 18:35:03 - INFO - __main__ -   Step: 6016, LR: 7.560546298714293e-06, Loss: 448.5581970214844
2024-08-04T01:35:15.572583552Z 
 63%|██████▎   | 6017/9500 [20:37:45<11:53:41, 12.29s/it]08/03/2024 18:35:15 - INFO - __main__ -   Step: 6017, LR: 7.558375755027014e-06, Loss: 412.4677734375
2024-08-04T01:35:27.561279179Z 
 63%|██████▎   | 6018/9500 [20:37:57<11:48:09, 12.20s/it]08/03/2024 18:35:27 - INFO - __main__ -   Step: 6018, LR: 7.556205211339734e-06, Loss: 400.7447204589844
2024-08-04T01:35:40.336959490Z 
 63%|██████▎   | 6019/9500 [20:38:10<11:57:55, 12.37s/it]08/03/2024 18:35:40 - INFO - __main__ -   Step: 6019, LR: 7.554034667652456e-06, Loss: 482.89385986328125
2024-08-04T01:35:52.673069426Z 
 63%|██████▎   | 6020/9500 [20:38:22<11:57:03, 12.36s/it]08/03/2024 18:35:52 - INFO - __main__ -   Step: 6020, LR: 7.551864123965177e-06, Loss: 383.0963134765625
2024-08-04T01:36:05.078948152Z 
 63%|██████▎   | 6021/9500 [20:38:35<11:57:35, 12.38s/it]08/03/2024 18:36:05 - INFO - __main__ -   Step: 6021, LR: 7.549693580277898e-06, Loss: 561.0933227539062
2024-08-04T01:36:17.836822458Z 
 63%|██████▎   | 6022/9500 [20:38:47<12:04:01, 12.49s/it]08/03/2024 18:36:17 - INFO - __main__ -   Step: 6022, LR: 7.547523036590619e-06, Loss: 523.4022216796875
2024-08-04T01:36:30.259806046Z 
 63%|██████▎   | 6023/9500 [20:39:00<12:02:38, 12.47s/it]08/03/2024 18:36:30 - INFO - __main__ -   Step: 6023, LR: 7.545352492903341e-06, Loss: 580.7067260742188
2024-08-04T01:36:42.643875644Z 
 63%|██████▎   | 6024/9500 [20:39:12<12:00:56, 12.44s/it]08/03/2024 18:36:42 - INFO - __main__ -   Step: 6024, LR: 7.543181949216062e-06, Loss: 491.5008239746094
2024-08-04T01:36:55.358725686Z 
 63%|██████▎   | 6025/9500 [20:39:25<12:05:25, 12.53s/it]08/03/2024 18:36:55 - INFO - __main__ -   Step: 6025, LR: 7.5410114055287816e-06, Loss: 389.1131896972656
2024-08-04T01:37:07.721354544Z 
 63%|██████▎   | 6026/9500 [20:39:37<12:02:23, 12.48s/it]08/03/2024 18:37:07 - INFO - __main__ -   Step: 6026, LR: 7.538840861841503e-06, Loss: 548.4757080078125
2024-08-04T01:37:19.866576024Z 
 63%|██████▎   | 6027/9500 [20:39:49<11:56:26, 12.38s/it]08/03/2024 18:37:19 - INFO - __main__ -   Step: 6027, LR: 7.536670318154225e-06, Loss: 341.3518371582031
2024-08-04T01:37:32.597040367Z 
 63%|██████▎   | 6028/9500 [20:40:02<12:02:21, 12.48s/it]08/03/2024 18:37:32 - INFO - __main__ -   Step: 6028, LR: 7.534499774466945e-06, Loss: 420.12689208984375
2024-08-04T01:37:44.814859060Z 
 63%|██████▎   | 6029/9500 [20:40:14<11:57:32, 12.40s/it]08/03/2024 18:37:44 - INFO - __main__ -   Step: 6029, LR: 7.532329230779667e-06, Loss: 393.8819580078125
2024-08-04T01:37:57.111571530Z 
 63%|██████▎   | 6030/9500 [20:40:27<11:55:29, 12.37s/it]08/03/2024 18:37:57 - INFO - __main__ -   Step: 6030, LR: 7.5301586870923885e-06, Loss: 367.46966552734375
2024-08-04T01:38:09.438605674Z 
 63%|██████▎   | 6031/9500 [20:40:39<11:54:30, 12.36s/it]08/03/2024 18:38:09 - INFO - __main__ -   Step: 6031, LR: 7.527988143405108e-06, Loss: 326.0179443359375
2024-08-04T01:38:21.562363179Z 
 63%|██████▎   | 6032/9500 [20:40:51<11:50:14, 12.29s/it]08/03/2024 18:38:21 - INFO - __main__ -   Step: 6032, LR: 7.52581759971783e-06, Loss: 340.2039489746094
2024-08-04T01:38:33.518807163Z 
 64%|██████▎   | 6033/9500 [20:41:03<11:44:17, 12.19s/it]08/03/2024 18:38:33 - INFO - __main__ -   Step: 6033, LR: 7.523647056030551e-06, Loss: 370.696044921875
2024-08-04T01:38:45.730408391Z 
 64%|██████▎   | 6034/9500 [20:41:15<11:44:29, 12.20s/it]08/03/2024 18:38:45 - INFO - __main__ -   Step: 6034, LR: 7.521476512343272e-06, Loss: 410.14532470703125
2024-08-04T01:38:58.106630693Z 
 64%|██████▎   | 6035/9500 [20:41:28<11:47:25, 12.25s/it]08/03/2024 18:38:58 - INFO - __main__ -   Step: 6035, LR: 7.519305968655993e-06, Loss: 524.143798828125
2024-08-04T01:39:10.157998393Z 
 64%|██████▎   | 6036/9500 [20:41:40<11:43:46, 12.19s/it]08/03/2024 18:39:10 - INFO - __main__ -   Step: 6036, LR: 7.5171354249687145e-06, Loss: 320.26739501953125
2024-08-04T01:39:22.882160822Z 
 64%|██████▎   | 6037/9500 [20:41:52<11:52:49, 12.35s/it]08/03/2024 18:39:22 - INFO - __main__ -   Step: 6037, LR: 7.514964881281436e-06, Loss: 464.0531921386719
2024-08-04T01:39:34.825546969Z 
 64%|██████▎   | 6038/9500 [20:42:04<11:45:34, 12.23s/it]08/03/2024 18:39:34 - INFO - __main__ -   Step: 6038, LR: 7.512794337594156e-06, Loss: 410.1348876953125
2024-08-04T01:39:47.132800171Z 
 64%|██████▎   | 6039/9500 [20:42:17<11:46:44, 12.25s/it]08/03/2024 18:39:47 - INFO - __main__ -   Step: 6039, LR: 7.5106237939068775e-06, Loss: 408.4149169921875
2024-08-04T01:39:59.821121141Z 
 64%|██████▎   | 6040/9500 [20:42:29<11:54:04, 12.38s/it]08/03/2024 18:39:59 - INFO - __main__ -   Step: 6040, LR: 7.508453250219598e-06, Loss: 417.69854736328125
2024-08-04T01:40:12.047633146Z 
 64%|██████▎   | 6041/9500 [20:42:41<11:51:09, 12.34s/it]08/03/2024 18:40:12 - INFO - __main__ -   Step: 6041, LR: 7.50628270653232e-06, Loss: 424.97723388671875
2024-08-04T01:40:24.052519454Z 
 64%|██████▎   | 6042/9500 [20:42:53<11:45:14, 12.24s/it]08/03/2024 18:40:24 - INFO - __main__ -   Step: 6042, LR: 7.5041121628450404e-06, Loss: 559.8131103515625
2024-08-04T01:40:36.339321676Z 
 64%|██████▎   | 6043/9500 [20:43:06<11:45:54, 12.25s/it]08/03/2024 18:40:36 - INFO - __main__ -   Step: 6043, LR: 7.501941619157762e-06, Loss: 393.95428466796875
2024-08-04T01:40:48.607281170Z 
 64%|██████▎   | 6044/9500 [20:43:18<11:45:58, 12.26s/it]08/03/2024 18:40:48 - INFO - __main__ -   Step: 6044, LR: 7.4997710754704836e-06, Loss: 381.6893310546875
2024-08-04T01:41:00.755907244Z 
 64%|██████▎   | 6045/9500 [20:43:30<11:43:54, 12.22s/it]08/03/2024 18:41:00 - INFO - __main__ -   Step: 6045, LR: 7.4976005317832034e-06, Loss: 342.58831787109375
2024-08-04T01:41:13.223974017Z 
 64%|██████▎   | 6046/9500 [20:43:43<11:47:55, 12.30s/it]08/03/2024 18:41:13 - INFO - __main__ -   Step: 6046, LR: 7.495429988095925e-06, Loss: 458.5692443847656
2024-08-04T01:41:25.118373710Z 
 64%|██████▎   | 6047/9500 [20:43:55<11:40:45, 12.18s/it]08/03/2024 18:41:25 - INFO - __main__ -   Step: 6047, LR: 7.493259444408646e-06, Loss: 409.19317626953125
2024-08-04T01:41:37.578705350Z 
 64%|██████▎   | 6048/9500 [20:44:07<11:45:27, 12.26s/it]08/03/2024 18:41:37 - INFO - __main__ -   Step: 6048, LR: 7.491088900721367e-06, Loss: 362.4783020019531
2024-08-04T01:41:50.048052259Z 
 64%|██████▎   | 6049/9500 [20:44:19<11:48:49, 12.32s/it]08/03/2024 18:41:50 - INFO - __main__ -   Step: 6049, LR: 7.488918357034089e-06, Loss: 505.3017883300781
2024-08-04T01:42:02.388494909Z 
 64%|██████▎   | 6050/9500 [20:44:32<11:48:54, 12.33s/it]08/03/2024 18:42:02 - INFO - __main__ -   Step: 6050, LR: 7.4867478133468095e-06, Loss: 400.54644775390625
2024-08-04T01:42:14.670215212Z 
 64%|██████▎   | 6051/9500 [20:44:44<11:47:53, 12.31s/it]08/03/2024 18:42:14 - INFO - __main__ -   Step: 6051, LR: 7.484577269659531e-06, Loss: 521.962646484375
2024-08-04T01:42:27.434456754Z 
 64%|██████▎   | 6052/9500 [20:44:57<11:55:26, 12.45s/it]08/03/2024 18:42:27 - INFO - __main__ -   Step: 6052, LR: 7.482406725972251e-06, Loss: 483.59771728515625
2024-08-04T01:42:40.024906728Z 
 64%|██████▎   | 6053/9500 [20:45:09<11:57:38, 12.49s/it]08/03/2024 18:42:40 - INFO - __main__ -   Step: 6053, LR: 7.4802361822849725e-06, Loss: 434.36236572265625
2024-08-04T01:42:52.531585383Z 
 64%|██████▎   | 6054/9500 [20:45:22<11:57:42, 12.50s/it]08/03/2024 18:42:52 - INFO - __main__ -   Step: 6054, LR: 7.478065638597693e-06, Loss: 503.4929504394531
2024-08-04T01:43:04.977200937Z 
 64%|██████▎   | 6055/9500 [20:45:34<11:56:37, 12.48s/it]08/03/2024 18:43:04 - INFO - __main__ -   Step: 6055, LR: 7.475895094910415e-06, Loss: 481.1619567871094
2024-08-04T01:43:17.077320025Z 
 64%|██████▎   | 6056/9500 [20:45:47<11:49:51, 12.37s/it]08/03/2024 18:43:17 - INFO - __main__ -   Step: 6056, LR: 7.473724551223136e-06, Loss: 472.5464172363281
2024-08-04T01:43:29.152429738Z 
 64%|██████▍   | 6057/9500 [20:45:59<11:44:37, 12.28s/it]08/03/2024 18:43:29 - INFO - __main__ -   Step: 6057, LR: 7.471554007535857e-06, Loss: 445.6466064453125
2024-08-04T01:43:41.223727172Z 
 64%|██████▍   | 6058/9500 [20:46:11<11:40:50, 12.22s/it]08/03/2024 18:43:41 - INFO - __main__ -   Step: 6058, LR: 7.469383463848579e-06, Loss: 505.0521240234375
2024-08-04T01:43:54.120340715Z 
 64%|██████▍   | 6059/9500 [20:46:24<11:52:20, 12.42s/it]08/03/2024 18:43:54 - INFO - __main__ -   Step: 6059, LR: 7.4672129201612985e-06, Loss: 474.92724609375
2024-08-04T01:44:06.386334570Z 
 64%|██████▍   | 6060/9500 [20:46:36<11:49:27, 12.37s/it]08/03/2024 18:44:06 - INFO - __main__ -   Step: 6060, LR: 7.46504237647402e-06, Loss: 542.1708374023438
2024-08-04T01:44:18.515014440Z 
 64%|██████▍   | 6061/9500 [20:46:48<11:45:01, 12.30s/it]08/03/2024 18:44:18 - INFO - __main__ -   Step: 6061, LR: 7.462871832786741e-06, Loss: 448.5941162109375
2024-08-04T01:44:31.345053086Z 
 64%|██████▍   | 6062/9500 [20:47:01<11:53:55, 12.46s/it]08/03/2024 18:44:31 - INFO - __main__ -   Step: 6062, LR: 7.460701289099462e-06, Loss: 489.1658630371094
2024-08-04T01:44:43.642512133Z 
 64%|██████▍   | 6063/9500 [20:47:13<11:50:56, 12.41s/it]08/03/2024 18:44:43 - INFO - __main__ -   Step: 6063, LR: 7.458530745412184e-06, Loss: 465.2423095703125
2024-08-04T01:44:55.965605094Z 
 64%|██████▍   | 6064/9500 [20:47:25<11:49:13, 12.38s/it]08/03/2024 18:44:55 - INFO - __main__ -   Step: 6064, LR: 7.456360201724905e-06, Loss: 507.35968017578125
2024-08-04T01:45:08.609867631Z 
 64%|██████▍   | 6065/9500 [20:47:38<11:53:28, 12.46s/it]08/03/2024 18:45:08 - INFO - __main__ -   Step: 6065, LR: 7.454189658037626e-06, Loss: 373.16046142578125
2024-08-04T01:45:20.786200272Z 
 64%|██████▍   | 6066/9500 [20:47:50<11:48:21, 12.38s/it]08/03/2024 18:45:20 - INFO - __main__ -   Step: 6066, LR: 7.452019114350346e-06, Loss: 463.78289794921875
2024-08-04T01:45:33.353788050Z 
 64%|██████▍   | 6067/9500 [20:48:03<11:51:25, 12.43s/it]08/03/2024 18:45:33 - INFO - __main__ -   Step: 6067, LR: 7.449848570663068e-06, Loss: 351.5579833984375
2024-08-04T01:45:45.770005075Z 
 64%|██████▍   | 6068/9500 [20:48:15<11:50:54, 12.43s/it]08/03/2024 18:45:45 - INFO - __main__ -   Step: 6068, LR: 7.447678026975788e-06, Loss: 456.025146484375
2024-08-04T01:45:58.225517426Z 
 64%|██████▍   | 6069/9500 [20:48:28<11:51:10, 12.44s/it]08/03/2024 18:45:58 - INFO - __main__ -   Step: 6069, LR: 7.44550748328851e-06, Loss: 498.5552978515625
2024-08-04T01:46:10.949274454Z 
 64%|██████▍   | 6070/9500 [20:48:40<11:55:53, 12.52s/it]08/03/2024 18:46:10 - INFO - __main__ -   Step: 6070, LR: 7.443336939601231e-06, Loss: 492.0281982421875
2024-08-04T01:46:23.610901022Z 
 64%|██████▍   | 6071/9500 [20:48:53<11:58:03, 12.56s/it]08/03/2024 18:46:23 - INFO - __main__ -   Step: 6071, LR: 7.441166395913952e-06, Loss: 375.3349304199219
2024-08-04T01:46:35.976796115Z 
 64%|██████▍   | 6072/9500 [20:49:05<11:54:26, 12.50s/it]08/03/2024 18:46:35 - INFO - __main__ -   Step: 6072, LR: 7.438995852226674e-06, Loss: 450.5060729980469
2024-08-04T01:46:48.095282277Z 
 64%|██████▍   | 6073/9500 [20:49:18<11:47:37, 12.39s/it]08/03/2024 18:46:48 - INFO - __main__ -   Step: 6073, LR: 7.4368253085393935e-06, Loss: 452.1326599121094
2024-08-04T01:47:00.703889740Z 
 64%|██████▍   | 6074/9500 [20:49:30<11:51:10, 12.45s/it]08/03/2024 18:47:00 - INFO - __main__ -   Step: 6074, LR: 7.434654764852115e-06, Loss: 440.26190185546875
2024-08-04T01:47:12.872461227Z 
 64%|██████▍   | 6075/9500 [20:49:42<11:46:04, 12.37s/it]08/03/2024 18:47:12 - INFO - __main__ -   Step: 6075, LR: 7.432484221164837e-06, Loss: 415.6088562011719
2024-08-04T01:47:24.878157452Z 
 64%|██████▍   | 6076/9500 [20:49:54<11:39:38, 12.26s/it]08/03/2024 18:47:24 - INFO - __main__ -   Step: 6076, LR: 7.430313677477557e-06, Loss: 375.2657470703125
2024-08-04T01:47:37.705121685Z 
 64%|██████▍   | 6077/9500 [20:50:07<11:49:08, 12.43s/it]08/03/2024 18:47:37 - INFO - __main__ -   Step: 6077, LR: 7.428143133790279e-06, Loss: 482.10748291015625
2024-08-04T01:47:50.132385914Z 
 64%|██████▍   | 6078/9500 [20:50:20<11:48:52, 12.43s/it]08/03/2024 18:47:50 - INFO - __main__ -   Step: 6078, LR: 7.425972590103e-06, Loss: 418.9187927246094
2024-08-04T01:48:02.480529430Z 
 64%|██████▍   | 6079/9500 [20:50:32<11:47:17, 12.40s/it]08/03/2024 18:48:02 - INFO - __main__ -   Step: 6079, LR: 7.423802046415721e-06, Loss: 339.8607177734375
2024-08-04T01:48:15.431916723Z 
 64%|██████▍   | 6080/9500 [20:50:45<11:56:25, 12.57s/it]08/03/2024 18:48:15 - INFO - __main__ -   Step: 6080, LR: 7.421631502728441e-06, Loss: 363.5045471191406
2024-08-04T01:48:27.790447429Z 
 64%|██████▍   | 6081/9500 [20:50:57<11:52:37, 12.51s/it]08/03/2024 18:48:27 - INFO - __main__ -   Step: 6081, LR: 7.419460959041163e-06, Loss: 464.26483154296875
2024-08-04T01:48:40.056956757Z 
 64%|██████▍   | 6082/9500 [20:51:09<11:48:19, 12.43s/it]08/03/2024 18:48:40 - INFO - __main__ -   Step: 6082, LR: 7.417290415353884e-06, Loss: 427.9054870605469
2024-08-04T01:48:52.771432609Z 
 64%|██████▍   | 6083/9500 [20:51:22<11:52:54, 12.52s/it]08/03/2024 18:48:52 - INFO - __main__ -   Step: 6083, LR: 7.415119871666605e-06, Loss: 391.266845703125
2024-08-04T01:49:05.150705290Z 
 64%|██████▍   | 6084/9500 [20:51:35<11:50:19, 12.48s/it]08/03/2024 18:49:05 - INFO - __main__ -   Step: 6084, LR: 7.4129493279793265e-06, Loss: 430.83563232421875
2024-08-04T01:49:17.462733854Z 
 64%|██████▍   | 6085/9500 [20:51:47<11:47:18, 12.43s/it]08/03/2024 18:49:17 - INFO - __main__ -   Step: 6085, LR: 7.410778784292047e-06, Loss: 464.61468505859375
2024-08-04T01:49:30.109248408Z 
 64%|██████▍   | 6086/9500 [20:52:00<11:50:51, 12.49s/it]08/03/2024 18:49:30 - INFO - __main__ -   Step: 6086, LR: 7.408608240604769e-06, Loss: 385.39739990234375
2024-08-04T01:49:42.720238275Z 
 64%|██████▍   | 6087/9500 [20:52:12<11:52:39, 12.53s/it]08/03/2024 18:49:42 - INFO - __main__ -   Step: 6087, LR: 7.406437696917489e-06, Loss: 443.14990234375
2024-08-04T01:49:54.919064027Z 
 64%|██████▍   | 6088/9500 [20:52:24<11:46:49, 12.43s/it]08/03/2024 18:49:54 - INFO - __main__ -   Step: 6088, LR: 7.40426715323021e-06, Loss: 416.10015869140625
2024-08-04T01:50:07.431140505Z 
 64%|██████▍   | 6089/9500 [20:52:37<11:48:01, 12.45s/it]08/03/2024 18:50:07 - INFO - __main__ -   Step: 6089, LR: 7.402096609542932e-06, Loss: 353.04656982421875
2024-08-04T01:50:19.490330724Z 
 64%|██████▍   | 6090/9500 [20:52:49<11:41:05, 12.34s/it]08/03/2024 18:50:19 - INFO - __main__ -   Step: 6090, LR: 7.3999260658556524e-06, Loss: 387.57708740234375
2024-08-04T01:50:31.781821592Z 
 64%|██████▍   | 6091/9500 [20:53:01<11:40:07, 12.32s/it]08/03/2024 18:50:31 - INFO - __main__ -   Step: 6091, LR: 7.397755522168374e-06, Loss: 373.6593933105469
2024-08-04T01:50:44.183585556Z 
 64%|██████▍   | 6092/9500 [20:53:14<11:41:16, 12.35s/it]08/03/2024 18:50:44 - INFO - __main__ -   Step: 6092, LR: 7.3955849784810956e-06, Loss: 403.87164306640625
2024-08-04T01:50:56.278177140Z 
 64%|██████▍   | 6093/9500 [20:53:26<11:36:46, 12.27s/it]08/03/2024 18:50:56 - INFO - __main__ -   Step: 6093, LR: 7.393414434793816e-06, Loss: 323.58978271484375
2024-08-04T01:51:08.691430155Z 
 64%|██████▍   | 6094/9500 [20:53:38<11:38:59, 12.31s/it]08/03/2024 18:51:08 - INFO - __main__ -   Step: 6094, LR: 7.391243891106536e-06, Loss: 448.18157958984375
2024-08-04T01:51:21.168701534Z 
 64%|██████▍   | 6095/9500 [20:53:51<11:41:34, 12.36s/it]08/03/2024 18:51:21 - INFO - __main__ -   Step: 6095, LR: 7.389073347419258e-06, Loss: 413.3146057128906
2024-08-04T01:51:33.403690143Z 
 64%|██████▍   | 6096/9500 [20:54:03<11:39:11, 12.32s/it]08/03/2024 18:51:33 - INFO - __main__ -   Step: 6096, LR: 7.386902803731979e-06, Loss: 428.45526123046875
2024-08-04T01:51:45.629720644Z 
 64%|██████▍   | 6097/9500 [20:54:15<11:37:19, 12.29s/it]08/03/2024 18:51:45 - INFO - __main__ -   Step: 6097, LR: 7.3847322600447e-06, Loss: 316.956787109375
2024-08-04T01:51:58.192694489Z 
 64%|██████▍   | 6098/9500 [20:54:28<11:41:40, 12.38s/it]08/03/2024 18:51:58 - INFO - __main__ -   Step: 6098, LR: 7.3825617163574215e-06, Loss: 413.4056091308594
2024-08-04T01:52:10.395440934Z 
 64%|██████▍   | 6099/9500 [20:54:40<11:38:32, 12.32s/it]08/03/2024 18:52:10 - INFO - __main__ -   Step: 6099, LR: 7.380391172670143e-06, Loss: 394.2930908203125
2024-08-04T01:52:22.459862036Z 
 64%|██████▍   | 6100/9500 [20:54:52<11:33:55, 12.25s/it]08/03/2024 18:52:22 - INFO - __main__ -   Step: 6100, LR: 7.378220628982864e-06, Loss: 390.5029296875
2024-08-04T01:52:34.434295562Z 
 64%|██████▍   | 6101/9500 [20:55:04<11:29:06, 12.16s/it]08/03/2024 18:52:34 - INFO - __main__ -   Step: 6101, LR: 7.376050085295584e-06, Loss: 414.7580261230469
2024-08-04T01:52:47.341972238Z 
 64%|██████▍   | 6102/9500 [20:55:17<11:41:32, 12.39s/it]08/03/2024 18:52:47 - INFO - __main__ -   Step: 6102, LR: 7.373879541608305e-06, Loss: 446.5442199707031
2024-08-04T01:52:59.946590518Z 
 64%|██████▍   | 6103/9500 [20:55:29<11:45:01, 12.45s/it]08/03/2024 18:52:59 - INFO - __main__ -   Step: 6103, LR: 7.371708997921027e-06, Loss: 481.9211120605469
2024-08-04T01:53:12.467416144Z 
 64%|██████▍   | 6104/9500 [20:55:42<11:45:58, 12.47s/it]08/03/2024 18:53:12 - INFO - __main__ -   Step: 6104, LR: 7.3695384542337475e-06, Loss: 490.3672790527344
2024-08-04T01:53:25.183424413Z 
 64%|██████▍   | 6105/9500 [20:55:55<11:49:53, 12.55s/it]08/03/2024 18:53:25 - INFO - __main__ -   Step: 6105, LR: 7.367367910546469e-06, Loss: 429.54046630859375
2024-08-04T01:53:37.381162831Z 
 64%|██████▍   | 6106/9500 [20:56:07<11:43:45, 12.44s/it]08/03/2024 18:53:37 - INFO - __main__ -   Step: 6106, LR: 7.365197366859191e-06, Loss: 475.9364013671875
2024-08-04T01:53:49.651241721Z 
 64%|██████▍   | 6107/9500 [20:56:19<11:40:39, 12.39s/it]08/03/2024 18:53:49 - INFO - __main__ -   Step: 6107, LR: 7.363026823171911e-06, Loss: 352.7655334472656
2024-08-04T01:54:02.529247722Z 
 64%|██████▍   | 6108/9500 [20:56:32<11:48:43, 12.54s/it]08/03/2024 18:54:02 - INFO - __main__ -   Step: 6108, LR: 7.360856279484632e-06, Loss: 563.4115600585938
2024-08-04T01:54:14.632169464Z 
 64%|██████▍   | 6109/9500 [20:56:44<11:41:10, 12.41s/it]08/03/2024 18:54:14 - INFO - __main__ -   Step: 6109, LR: 7.358685735797353e-06, Loss: 383.7126770019531
2024-08-04T01:54:27.020624518Z 
 64%|██████▍   | 6110/9500 [20:56:56<11:40:39, 12.40s/it]08/03/2024 18:54:27 - INFO - __main__ -   Step: 6110, LR: 7.356515192110074e-06, Loss: 394.5005187988281
2024-08-04T01:54:39.310313852Z 
 64%|██████▍   | 6111/9500 [20:57:09<11:38:33, 12.37s/it]08/03/2024 18:54:39 - INFO - __main__ -   Step: 6111, LR: 7.354344648422795e-06, Loss: 332.29669189453125
2024-08-04T01:54:51.375606259Z 
 64%|██████▍   | 6112/9500 [20:57:21<11:33:14, 12.28s/it]08/03/2024 18:54:51 - INFO - __main__ -   Step: 6112, LR: 7.352174104735517e-06, Loss: 388.549560546875
2024-08-04T01:55:03.449869922Z 
 64%|██████▍   | 6113/9500 [20:57:33<11:29:35, 12.22s/it]08/03/2024 18:55:03 - INFO - __main__ -   Step: 6113, LR: 7.350003561048238e-06, Loss: 444.9895935058594
2024-08-04T01:55:15.852227435Z 
 64%|██████▍   | 6114/9500 [20:57:45<11:32:32, 12.27s/it]08/03/2024 18:55:15 - INFO - __main__ -   Step: 6114, LR: 7.347833017360959e-06, Loss: 386.1558837890625
2024-08-04T01:55:27.962590421Z 
 64%|██████▍   | 6115/9500 [20:57:57<11:29:36, 12.22s/it]08/03/2024 18:55:27 - INFO - __main__ -   Step: 6115, LR: 7.3456624736736796e-06, Loss: 391.44287109375
2024-08-04T01:55:40.325832764Z 
 64%|██████▍   | 6116/9500 [20:58:10<11:31:46, 12.27s/it]08/03/2024 18:55:40 - INFO - __main__ -   Step: 6116, LR: 7.3434919299864e-06, Loss: 418.8632507324219
2024-08-04T01:55:52.811079347Z 
 64%|██████▍   | 6117/9500 [20:58:22<11:35:16, 12.33s/it]08/03/2024 18:55:52 - INFO - __main__ -   Step: 6117, LR: 7.341321386299122e-06, Loss: 417.88336181640625
2024-08-04T01:56:05.358492905Z 
 64%|██████▍   | 6118/9500 [20:58:35<11:38:43, 12.40s/it]08/03/2024 18:56:05 - INFO - __main__ -   Step: 6118, LR: 7.3391508426118426e-06, Loss: 352.8353576660156
2024-08-04T01:56:17.606228387Z 
 64%|██████▍   | 6119/9500 [20:58:47<11:36:00, 12.35s/it]08/03/2024 18:56:17 - INFO - __main__ -   Step: 6119, LR: 7.336980298924564e-06, Loss: 477.68280029296875
2024-08-04T01:56:30.347996203Z 
 64%|██████▍   | 6120/9500 [20:59:00<11:42:24, 12.47s/it]08/03/2024 18:56:30 - INFO - __main__ -   Step: 6120, LR: 7.334809755237286e-06, Loss: 525.85302734375
2024-08-04T01:56:42.468182876Z 
 64%|██████▍   | 6121/9500 [20:59:12<11:36:18, 12.36s/it]08/03/2024 18:56:42 - INFO - __main__ -   Step: 6121, LR: 7.332639211550006e-06, Loss: 462.37384033203125
2024-08-04T01:56:54.761242283Z 
 64%|██████▍   | 6122/9500 [20:59:24<11:34:54, 12.34s/it]08/03/2024 18:56:54 - INFO - __main__ -   Step: 6122, LR: 7.330468667862727e-06, Loss: 383.8392333984375
2024-08-04T01:57:07.180378015Z 
 64%|██████▍   | 6123/9500 [20:59:37<11:35:59, 12.37s/it]08/03/2024 18:57:07 - INFO - __main__ -   Step: 6123, LR: 7.328298124175448e-06, Loss: 484.89483642578125
2024-08-04T01:57:19.371602287Z 
 64%|██████▍   | 6124/9500 [20:59:49<11:32:48, 12.31s/it]08/03/2024 18:57:19 - INFO - __main__ -   Step: 6124, LR: 7.326127580488169e-06, Loss: 358.1275634765625
2024-08-04T01:57:31.629458952Z 
 64%|██████▍   | 6125/9500 [21:00:01<11:31:42, 12.30s/it]08/03/2024 18:57:31 - INFO - __main__ -   Step: 6125, LR: 7.323957036800891e-06, Loss: 449.3772277832031
2024-08-04T01:57:44.039244747Z 
 64%|██████▍   | 6126/9500 [21:00:13<11:33:23, 12.33s/it]08/03/2024 18:57:44 - INFO - __main__ -   Step: 6126, LR: 7.321786493113612e-06, Loss: 370.712158203125
2024-08-04T01:57:56.220864616Z 
 64%|██████▍   | 6127/9500 [21:00:26<11:30:40, 12.29s/it]08/03/2024 18:57:56 - INFO - __main__ -   Step: 6127, LR: 7.319615949426333e-06, Loss: 398.32135009765625
2024-08-04T01:58:08.374418812Z 
 65%|██████▍   | 6128/9500 [21:00:38<11:28:14, 12.25s/it]08/03/2024 18:58:08 - INFO - __main__ -   Step: 6128, LR: 7.317445405739054e-06, Loss: 329.9873962402344
2024-08-04T01:58:21.022898363Z 
 65%|██████▍   | 6129/9500 [21:00:50<11:34:48, 12.37s/it]08/03/2024 18:58:21 - INFO - __main__ -   Step: 6129, LR: 7.315274862051775e-06, Loss: 376.61981201171875
2024-08-04T01:58:33.053583279Z 
 65%|██████▍   | 6130/9500 [21:01:02<11:28:56, 12.27s/it]08/03/2024 18:58:33 - INFO - __main__ -   Step: 6130, LR: 7.313104318364495e-06, Loss: 415.15283203125
2024-08-04T01:58:45.185178147Z 
 65%|██████▍   | 6131/9500 [21:01:15<11:26:28, 12.23s/it]08/03/2024 18:58:45 - INFO - __main__ -   Step: 6131, LR: 7.310933774677217e-06, Loss: 548.170654296875
2024-08-04T01:58:57.755084386Z 
 65%|██████▍   | 6132/9500 [21:01:27<11:32:03, 12.33s/it]08/03/2024 18:58:57 - INFO - __main__ -   Step: 6132, LR: 7.3087632309899385e-06, Loss: 332.6796875
2024-08-04T01:59:09.769892152Z 
 65%|██████▍   | 6133/9500 [21:01:39<11:26:34, 12.23s/it]08/03/2024 18:59:09 - INFO - __main__ -   Step: 6133, LR: 7.306592687302659e-06, Loss: 403.63922119140625
2024-08-04T01:59:22.105162639Z 
 65%|██████▍   | 6134/9500 [21:01:52<11:28:03, 12.26s/it]08/03/2024 18:59:22 - INFO - __main__ -   Step: 6134, LR: 7.304422143615381e-06, Loss: 434.78173828125
2024-08-04T01:59:34.603242564Z 
 65%|██████▍   | 6135/9500 [21:02:04<11:31:46, 12.33s/it]08/03/2024 18:59:34 - INFO - __main__ -   Step: 6135, LR: 7.3022515999281014e-06, Loss: 480.79901123046875
2024-08-04T01:59:46.728330500Z 
 65%|██████▍   | 6136/9500 [21:02:16<11:28:02, 12.27s/it]08/03/2024 18:59:46 - INFO - __main__ -   Step: 6136, LR: 7.300081056240822e-06, Loss: 370.43646240234375
2024-08-04T01:59:59.413988450Z 
 65%|██████▍   | 6137/9500 [21:02:29<11:34:48, 12.40s/it]08/03/2024 18:59:59 - INFO - __main__ -   Step: 6137, LR: 7.297910512553543e-06, Loss: 556.8491821289062
2024-08-04T02:00:12.104569619Z 
 65%|██████▍   | 6138/9500 [21:02:42<11:39:32, 12.48s/it]08/03/2024 19:00:12 - INFO - __main__ -   Step: 6138, LR: 7.2957399688662644e-06, Loss: 444.2883605957031
2024-08-04T02:00:24.185260344Z 
 65%|██████▍   | 6139/9500 [21:02:54<11:32:32, 12.36s/it]08/03/2024 19:00:24 - INFO - __main__ -   Step: 6139, LR: 7.293569425178986e-06, Loss: 473.7283935546875
2024-08-04T02:00:36.597292276Z 
 65%|██████▍   | 6140/9500 [21:03:06<11:33:10, 12.38s/it]08/03/2024 19:00:36 - INFO - __main__ -   Step: 6140, LR: 7.291398881491707e-06, Loss: 407.76611328125
2024-08-04T02:00:49.447137521Z 
 65%|██████▍   | 6141/9500 [21:03:19<11:40:52, 12.52s/it]08/03/2024 19:00:49 - INFO - __main__ -   Step: 6141, LR: 7.289228337804428e-06, Loss: 418.8995361328125
2024-08-04T02:01:01.641704212Z 
 65%|██████▍   | 6142/9500 [21:03:31<11:35:13, 12.42s/it]08/03/2024 19:01:01 - INFO - __main__ -   Step: 6142, LR: 7.28705779411715e-06, Loss: 536.4613037109375
2024-08-04T02:01:14.008204610Z 
 65%|██████▍   | 6143/9500 [21:03:43<11:34:05, 12.41s/it]08/03/2024 19:01:14 - INFO - __main__ -   Step: 6143, LR: 7.28488725042987e-06, Loss: 403.1146545410156
2024-08-04T02:01:26.244847020Z 
 65%|██████▍   | 6144/9500 [21:03:56<11:31:02, 12.35s/it]08/03/2024 19:01:26 - INFO - __main__ -   Step: 6144, LR: 7.28271670674259e-06, Loss: 405.65521240234375
2024-08-04T02:01:38.788830623Z 
 65%|██████▍   | 6145/9500 [21:04:08<11:34:00, 12.41s/it]08/03/2024 19:01:38 - INFO - __main__ -   Step: 6145, LR: 7.280546163055312e-06, Loss: 368.1929626464844
2024-08-04T02:01:50.796539806Z 
 65%|██████▍   | 6146/9500 [21:04:20<11:27:02, 12.29s/it]08/03/2024 19:01:50 - INFO - __main__ -   Step: 6146, LR: 7.2783756193680335e-06, Loss: 328.346923828125
2024-08-04T02:02:02.970896895Z 
 65%|██████▍   | 6147/9500 [21:04:32<11:24:52, 12.26s/it]08/03/2024 19:02:02 - INFO - __main__ -   Step: 6147, LR: 7.276205075680754e-06, Loss: 448.0447998046875
2024-08-04T02:02:15.677619071Z 
 65%|██████▍   | 6148/9500 [21:04:45<11:32:14, 12.39s/it]08/03/2024 19:02:15 - INFO - __main__ -   Step: 6148, LR: 7.274034531993476e-06, Loss: 484.7184753417969
2024-08-04T02:02:27.894614898Z 
 65%|██████▍   | 6149/9500 [21:04:57<11:29:07, 12.34s/it]08/03/2024 19:02:27 - INFO - __main__ -   Step: 6149, LR: 7.271863988306197e-06, Loss: 422.3374938964844
2024-08-04T02:02:40.047048100Z 
 65%|██████▍   | 6150/9500 [21:05:09<11:25:47, 12.28s/it]08/03/2024 19:02:40 - INFO - __main__ -   Step: 6150, LR: 7.269693444618917e-06, Loss: 377.71624755859375
2024-08-04T02:02:52.458079572Z 
 65%|██████▍   | 6151/9500 [21:05:22<11:27:44, 12.32s/it]08/03/2024 19:02:52 - INFO - __main__ -   Step: 6151, LR: 7.267522900931639e-06, Loss: 398.81121826171875
2024-08-04T02:03:04.599682147Z 
 65%|██████▍   | 6152/9500 [21:05:34<11:24:31, 12.27s/it]08/03/2024 19:03:04 - INFO - __main__ -   Step: 6152, LR: 7.2653523572443595e-06, Loss: 492.26727294921875
2024-08-04T02:03:16.760061463Z 
 65%|██████▍   | 6153/9500 [21:05:46<11:22:31, 12.24s/it]08/03/2024 19:03:16 - INFO - __main__ -   Step: 6153, LR: 7.263181813557081e-06, Loss: 460.21673583984375
2024-08-04T02:03:29.275837962Z 
 65%|██████▍   | 6154/9500 [21:05:59<11:27:01, 12.32s/it]08/03/2024 19:03:29 - INFO - __main__ -   Step: 6154, LR: 7.261011269869802e-06, Loss: 317.9801025390625
2024-08-04T02:03:41.399260487Z 
 65%|██████▍   | 6155/9500 [21:06:11<11:23:31, 12.26s/it]08/03/2024 19:03:41 - INFO - __main__ -   Step: 6155, LR: 7.258840726182523e-06, Loss: 359.8241271972656
2024-08-04T02:03:53.742091473Z 
 65%|██████▍   | 6156/9500 [21:06:23<11:24:42, 12.29s/it]08/03/2024 19:03:53 - INFO - __main__ -   Step: 6156, LR: 7.256670182495245e-06, Loss: 374.4930419921875
2024-08-04T02:04:06.524079217Z 
 65%|██████▍   | 6157/9500 [21:06:36<11:32:47, 12.43s/it]08/03/2024 19:04:06 - INFO - __main__ -   Step: 6157, LR: 7.254499638807965e-06, Loss: 540.4622802734375
2024-08-04T02:04:18.737226891Z 
 65%|██████▍   | 6158/9500 [21:06:48<11:28:53, 12.37s/it]08/03/2024 19:04:18 - INFO - __main__ -   Step: 6158, LR: 7.252329095120686e-06, Loss: 437.0711364746094
2024-08-04T02:04:31.138051258Z 
 65%|██████▍   | 6159/9500 [21:07:01<11:29:13, 12.38s/it]08/03/2024 19:04:31 - INFO - __main__ -   Step: 6159, LR: 7.250158551433407e-06, Loss: 401.84088134765625
2024-08-04T02:04:43.870041251Z 
 65%|██████▍   | 6160/9500 [21:07:13<11:34:56, 12.48s/it]08/03/2024 19:04:43 - INFO - __main__ -   Step: 6160, LR: 7.2479880077461286e-06, Loss: 480.34222412109375
2024-08-04T02:04:55.849144512Z 
 65%|██████▍   | 6161/9500 [21:07:25<11:26:18, 12.33s/it]08/03/2024 19:04:55 - INFO - __main__ -   Step: 6161, LR: 7.245817464058849e-06, Loss: 400.77789306640625
2024-08-04T02:05:08.519161159Z 
 65%|██████▍   | 6162/9500 [21:07:38<11:31:43, 12.43s/it]08/03/2024 19:05:08 - INFO - __main__ -   Step: 6162, LR: 7.243646920371571e-06, Loss: 385.8732604980469
2024-08-04T02:05:21.258276958Z 
 65%|██████▍   | 6163/9500 [21:07:51<11:36:37, 12.53s/it]08/03/2024 19:05:21 - INFO - __main__ -   Step: 6163, LR: 7.241476376684292e-06, Loss: 344.1019592285156
2024-08-04T02:05:33.626715028Z 
 65%|██████▍   | 6164/9500 [21:08:03<11:33:47, 12.48s/it]08/03/2024 19:05:33 - INFO - __main__ -   Step: 6164, LR: 7.239305832997012e-06, Loss: 408.5054931640625
2024-08-04T02:05:45.706512612Z 
 65%|██████▍   | 6165/9500 [21:08:15<11:26:56, 12.36s/it]08/03/2024 19:05:45 - INFO - __main__ -   Step: 6165, LR: 7.237135289309734e-06, Loss: 334.4552001953125
2024-08-04T02:05:58.322333906Z 
 65%|██████▍   | 6166/9500 [21:08:28<11:31:00, 12.44s/it]08/03/2024 19:05:58 - INFO - __main__ -   Step: 6166, LR: 7.2349647456224545e-06, Loss: 601.290283203125
2024-08-04T02:06:10.645353406Z 
 65%|██████▍   | 6167/9500 [21:08:40<11:28:56, 12.40s/it]08/03/2024 19:06:10 - INFO - __main__ -   Step: 6167, LR: 7.232794201935176e-06, Loss: 432.1067810058594
2024-08-04T02:06:22.874997651Z 
 65%|██████▍   | 6168/9500 [21:08:52<11:25:51, 12.35s/it]08/03/2024 19:06:22 - INFO - __main__ -   Step: 6168, LR: 7.230623658247898e-06, Loss: 461.2663269042969
2024-08-04T02:06:35.459246264Z 
 65%|██████▍   | 6169/9500 [21:09:05<11:29:32, 12.42s/it]08/03/2024 19:06:35 - INFO - __main__ -   Step: 6169, LR: 7.228453114560618e-06, Loss: 336.32366943359375
2024-08-04T02:06:47.427319111Z 
 65%|██████▍   | 6170/9500 [21:09:17<11:21:48, 12.28s/it]08/03/2024 19:06:47 - INFO - __main__ -   Step: 6170, LR: 7.22628257087334e-06, Loss: 400.57159423828125
2024-08-04T02:06:59.790239951Z 
 65%|██████▍   | 6171/9500 [21:09:29<11:22:54, 12.31s/it]08/03/2024 19:06:59 - INFO - __main__ -   Step: 6171, LR: 7.22411202718606e-06, Loss: 513.5029296875
2024-08-04T02:07:12.377201424Z 
 65%|██████▍   | 6172/9500 [21:09:42<11:27:20, 12.39s/it]08/03/2024 19:07:12 - INFO - __main__ -   Step: 6172, LR: 7.221941483498781e-06, Loss: 405.292236328125
2024-08-04T02:07:24.450159139Z 
 65%|██████▍   | 6173/9500 [21:09:54<11:21:49, 12.30s/it]08/03/2024 19:07:24 - INFO - __main__ -   Step: 6173, LR: 7.219770939811502e-06, Loss: 394.6151123046875
2024-08-04T02:07:36.700520549Z 
 65%|██████▍   | 6174/9500 [21:10:06<11:20:51, 12.28s/it]08/03/2024 19:07:36 - INFO - __main__ -   Step: 6174, LR: 7.217600396124224e-06, Loss: 437.883544921875
2024-08-04T02:07:49.083049900Z 
 65%|██████▌   | 6175/9500 [21:10:19<11:22:19, 12.31s/it]08/03/2024 19:07:49 - INFO - __main__ -   Step: 6175, LR: 7.215429852436945e-06, Loss: 406.87408447265625
2024-08-04T02:08:01.389780741Z 
 65%|██████▌   | 6176/9500 [21:10:31<11:22:00, 12.31s/it]08/03/2024 19:08:01 - INFO - __main__ -   Step: 6176, LR: 7.213259308749666e-06, Loss: 401.301025390625
2024-08-04T02:08:13.517888158Z 
 65%|██████▌   | 6177/9500 [21:10:43<11:18:46, 12.26s/it]08/03/2024 19:08:13 - INFO - __main__ -   Step: 6177, LR: 7.2110887650623875e-06, Loss: 318.38665771484375
2024-08-04T02:08:25.988362766Z 
 65%|██████▌   | 6178/9500 [21:10:55<11:22:08, 12.32s/it]08/03/2024 19:08:25 - INFO - __main__ -   Step: 6178, LR: 7.208918221375107e-06, Loss: 410.55731201171875
2024-08-04T02:08:38.027642097Z 
 65%|██████▌   | 6179/9500 [21:11:07<11:17:15, 12.24s/it]08/03/2024 19:08:38 - INFO - __main__ -   Step: 6179, LR: 7.206747677687829e-06, Loss: 387.40936279296875
2024-08-04T02:08:50.104132030Z 
 65%|██████▌   | 6180/9500 [21:11:20<11:14:24, 12.19s/it]08/03/2024 19:08:50 - INFO - __main__ -   Step: 6180, LR: 7.20457713400055e-06, Loss: 345.74578857421875
2024-08-04T02:09:02.420416262Z 
 65%|██████▌   | 6181/9500 [21:11:32<11:16:20, 12.23s/it]08/03/2024 19:09:02 - INFO - __main__ -   Step: 6181, LR: 7.202406590313271e-06, Loss: 347.5933532714844
2024-08-04T02:09:14.358702163Z 
 65%|██████▌   | 6182/9500 [21:11:44<11:11:20, 12.14s/it]08/03/2024 19:09:14 - INFO - __main__ -   Step: 6182, LR: 7.200236046625993e-06, Loss: 437.35772705078125
2024-08-04T02:09:26.422648486Z 
 65%|██████▌   | 6183/9500 [21:11:56<11:09:52, 12.12s/it]08/03/2024 19:09:26 - INFO - __main__ -   Step: 6183, LR: 7.1980655029387134e-06, Loss: 329.4658203125
2024-08-04T02:09:38.974645318Z 
 65%|██████▌   | 6184/9500 [21:12:08<11:16:53, 12.25s/it]08/03/2024 19:09:38 - INFO - __main__ -   Step: 6184, LR: 7.195894959251435e-06, Loss: 438.9165954589844
2024-08-04T02:09:51.007094236Z 
 65%|██████▌   | 6185/9500 [21:12:20<11:13:06, 12.18s/it]08/03/2024 19:09:51 - INFO - __main__ -   Step: 6185, LR: 7.193724415564155e-06, Loss: 433.01409912109375
2024-08-04T02:10:03.445178751Z 
 65%|██████▌   | 6186/9500 [21:12:33<11:17:07, 12.26s/it]08/03/2024 19:10:03 - INFO - __main__ -   Step: 6186, LR: 7.191553871876876e-06, Loss: 452.4115295410156
2024-08-04T02:10:15.418854225Z 
 65%|██████▌   | 6187/9500 [21:12:45<11:12:10, 12.17s/it]08/03/2024 19:10:15 - INFO - __main__ -   Step: 6187, LR: 7.189383328189597e-06, Loss: 352.5362548828125
2024-08-04T02:10:27.908034670Z 
 65%|██████▌   | 6188/9500 [21:12:57<11:17:13, 12.27s/it]08/03/2024 19:10:27 - INFO - __main__ -   Step: 6188, LR: 7.187212784502319e-06, Loss: 296.06646728515625
2024-08-04T02:10:40.122344121Z 
 65%|██████▌   | 6189/9500 [21:13:10<11:16:07, 12.25s/it]08/03/2024 19:10:40 - INFO - __main__ -   Step: 6189, LR: 7.18504224081504e-06, Loss: 434.3138427734375
2024-08-04T02:10:52.673872641Z 
 65%|██████▌   | 6190/9500 [21:13:22<11:20:52, 12.34s/it]08/03/2024 19:10:52 - INFO - __main__ -   Step: 6190, LR: 7.182871697127761e-06, Loss: 371.494384765625
2024-08-04T02:11:05.039175016Z 
 65%|██████▌   | 6191/9500 [21:13:34<11:21:02, 12.35s/it]08/03/2024 19:11:05 - INFO - __main__ -   Step: 6191, LR: 7.180701153440482e-06, Loss: 359.5788269042969
2024-08-04T02:11:17.164066984Z 
 65%|██████▌   | 6192/9500 [21:13:47<11:17:08, 12.28s/it]08/03/2024 19:11:17 - INFO - __main__ -   Step: 6192, LR: 7.178530609753202e-06, Loss: 390.74652099609375
2024-08-04T02:11:29.526821768Z 
 65%|██████▌   | 6193/9500 [21:13:59<11:18:16, 12.31s/it]08/03/2024 19:11:29 - INFO - __main__ -   Step: 6193, LR: 7.176360066065924e-06, Loss: 497.50689697265625
2024-08-04T02:11:42.001490908Z 
 65%|██████▌   | 6194/9500 [21:14:11<11:20:51, 12.36s/it]08/03/2024 19:11:42 - INFO - __main__ -   Step: 6194, LR: 7.1741895223786455e-06, Loss: 473.5346984863281
2024-08-04T02:11:54.126071413Z 
 65%|██████▌   | 6195/9500 [21:14:24<11:16:48, 12.29s/it]08/03/2024 19:11:54 - INFO - __main__ -   Step: 6195, LR: 7.172018978691366e-06, Loss: 411.9178466796875
2024-08-04T02:12:05.952531538Z 
 65%|██████▌   | 6196/9500 [21:14:35<11:08:59, 12.15s/it]08/03/2024 19:12:05 - INFO - __main__ -   Step: 6196, LR: 7.169848435004088e-06, Loss: 317.2685852050781
2024-08-04T02:12:18.398323144Z 
 65%|██████▌   | 6197/9500 [21:14:48<11:13:41, 12.24s/it]08/03/2024 19:12:18 - INFO - __main__ -   Step: 6197, LR: 7.1676778913168085e-06, Loss: 401.5474853515625
2024-08-04T02:12:30.663732169Z 
 65%|██████▌   | 6198/9500 [21:15:00<11:13:56, 12.25s/it]08/03/2024 19:12:30 - INFO - __main__ -   Step: 6198, LR: 7.165507347629529e-06, Loss: 403.87017822265625
2024-08-04T02:12:42.965319392Z 
 65%|██████▌   | 6199/9500 [21:15:12<11:14:39, 12.26s/it]08/03/2024 19:12:42 - INFO - __main__ -   Step: 6199, LR: 7.16333680394225e-06, Loss: 439.10113525390625
2024-08-04T02:12:55.464731668Z 
 65%|██████▌   | 6200/9500 [21:15:25<11:18:21, 12.33s/it]08/03/2024 19:12:55 - INFO - __main__ -   Step: 6200, LR: 7.1611662602549715e-06, Loss: 412.14996337890625
2024-08-04T02:13:07.867400072Z 
 65%|██████▌   | 6201/9500 [21:15:37<11:19:17, 12.35s/it]08/03/2024 19:13:07 - INFO - __main__ -   Step: 6201, LR: 7.158995716567693e-06, Loss: 392.3404846191406
2024-08-04T02:13:20.238956353Z 
 65%|██████▌   | 6202/9500 [21:15:50<11:19:22, 12.36s/it]08/03/2024 19:13:20 - INFO - __main__ -   Step: 6202, LR: 7.156825172880414e-06, Loss: 420.0202331542969
2024-08-04T02:13:32.822365123Z 
 65%|██████▌   | 6203/9500 [21:16:02<11:22:51, 12.43s/it]08/03/2024 19:13:32 - INFO - __main__ -   Step: 6203, LR: 7.154654629193135e-06, Loss: 400.7501525878906
2024-08-04T02:13:44.933451122Z 
 65%|██████▌   | 6204/9500 [21:16:14<11:17:26, 12.33s/it]08/03/2024 19:13:44 - INFO - __main__ -   Step: 6204, LR: 7.152484085505856e-06, Loss: 470.11151123046875
2024-08-04T02:13:57.399343460Z 
 65%|██████▌   | 6205/9500 [21:16:27<11:19:26, 12.37s/it]08/03/2024 19:13:57 - INFO - __main__ -   Step: 6205, LR: 7.150313541818577e-06, Loss: 415.74273681640625
2024-08-04T02:14:10.327156011Z 
 65%|██████▌   | 6206/9500 [21:16:40<11:28:23, 12.54s/it]08/03/2024 19:14:10 - INFO - __main__ -   Step: 6206, LR: 7.1481429981312974e-06, Loss: 419.873779296875
2024-08-04T02:14:22.499750796Z 
 65%|██████▌   | 6207/9500 [21:16:52<11:22:08, 12.43s/it]08/03/2024 19:14:22 - INFO - __main__ -   Step: 6207, LR: 7.145972454444019e-06, Loss: 354.325439453125
2024-08-04T02:14:34.865437567Z 
 65%|██████▌   | 6208/9500 [21:17:04<11:20:53, 12.41s/it]08/03/2024 19:14:34 - INFO - __main__ -   Step: 6208, LR: 7.1438019107567406e-06, Loss: 427.86676025390625
2024-08-04T02:14:47.607747910Z 
 65%|██████▌   | 6209/9500 [21:17:17<11:26:09, 12.51s/it]08/03/2024 19:14:47 - INFO - __main__ -   Step: 6209, LR: 7.141631367069461e-06, Loss: 474.40875244140625
2024-08-04T02:14:59.877135317Z 
 65%|██████▌   | 6210/9500 [21:17:29<11:21:59, 12.44s/it]08/03/2024 19:14:59 - INFO - __main__ -   Step: 6210, LR: 7.139460823382183e-06, Loss: 423.0879821777344
2024-08-04T02:15:12.443159624Z 
 65%|██████▌   | 6211/9500 [21:17:42<11:23:53, 12.48s/it]08/03/2024 19:15:12 - INFO - __main__ -   Step: 6211, LR: 7.1372902796949035e-06, Loss: 330.3330078125
2024-08-04T02:15:24.939743589Z 
 65%|██████▌   | 6212/9500 [21:17:54<11:24:01, 12.48s/it]08/03/2024 19:15:24 - INFO - __main__ -   Step: 6212, LR: 7.135119736007624e-06, Loss: 434.13592529296875
2024-08-04T02:15:37.122278462Z 
 65%|██████▌   | 6213/9500 [21:18:07<11:18:53, 12.39s/it]08/03/2024 19:15:37 - INFO - __main__ -   Step: 6213, LR: 7.132949192320345e-06, Loss: 390.2585144042969
2024-08-04T02:15:49.222340665Z 
 65%|██████▌   | 6214/9500 [21:18:19<11:13:53, 12.30s/it]08/03/2024 19:15:49 - INFO - __main__ -   Step: 6214, LR: 7.1307786486330665e-06, Loss: 404.558349609375
2024-08-04T02:16:01.727168430Z 
 65%|██████▌   | 6215/9500 [21:18:31<11:16:58, 12.36s/it]08/03/2024 19:16:01 - INFO - __main__ -   Step: 6215, LR: 7.128608104945788e-06, Loss: 419.9385986328125
2024-08-04T02:16:14.186596248Z 
 65%|██████▌   | 6216/9500 [21:18:44<11:18:19, 12.39s/it]08/03/2024 19:16:14 - INFO - __main__ -   Step: 6216, LR: 7.126437561258509e-06, Loss: 529.0408325195312
2024-08-04T02:16:26.435112698Z 
 65%|██████▌   | 6217/9500 [21:18:56<11:15:44, 12.35s/it]08/03/2024 19:16:26 - INFO - __main__ -   Step: 6217, LR: 7.12426701757123e-06, Loss: 471.12445068359375
2024-08-04T02:16:38.885498975Z 
 65%|██████▌   | 6218/9500 [21:19:08<11:17:10, 12.38s/it]08/03/2024 19:16:38 - INFO - __main__ -   Step: 6218, LR: 7.122096473883952e-06, Loss: 383.4219970703125
2024-08-04T02:16:50.945708202Z 
 65%|██████▌   | 6219/9500 [21:19:20<11:11:43, 12.28s/it]08/03/2024 19:16:50 - INFO - __main__ -   Step: 6219, LR: 7.119925930196672e-06, Loss: 399.12518310546875
2024-08-04T02:17:03.502586650Z 
 65%|██████▌   | 6220/9500 [21:19:33<11:15:59, 12.37s/it]08/03/2024 19:17:03 - INFO - __main__ -   Step: 6220, LR: 7.1177553865093925e-06, Loss: 357.15802001953125
2024-08-04T02:17:15.969530194Z 
 65%|██████▌   | 6221/9500 [21:19:45<11:17:27, 12.40s/it]08/03/2024 19:17:15 - INFO - __main__ -   Step: 6221, LR: 7.115584842822114e-06, Loss: 366.7158203125
2024-08-04T02:17:27.959347064Z 
 65%|██████▌   | 6222/9500 [21:19:57<11:10:35, 12.27s/it]08/03/2024 19:17:27 - INFO - __main__ -   Step: 6222, LR: 7.113414299134836e-06, Loss: 434.0229187011719
2024-08-04T02:17:40.194334298Z 
 66%|██████▌   | 6223/9500 [21:20:10<11:09:44, 12.26s/it]08/03/2024 19:17:40 - INFO - __main__ -   Step: 6223, LR: 7.111243755447556e-06, Loss: 486.2261962890625
2024-08-04T02:17:52.825118315Z 
 66%|██████▌   | 6224/9500 [21:20:22<11:15:33, 12.37s/it]08/03/2024 19:17:52 - INFO - __main__ -   Step: 6224, LR: 7.109073211760278e-06, Loss: 374.06005859375
2024-08-04T02:18:04.932746495Z 
 66%|██████▌   | 6225/9500 [21:20:34<11:11:00, 12.29s/it]08/03/2024 19:18:04 - INFO - __main__ -   Step: 6225, LR: 7.1069026680729994e-06, Loss: 448.29693603515625
2024-08-04T02:18:17.312656809Z 
 66%|██████▌   | 6226/9500 [21:20:47<11:12:13, 12.32s/it]08/03/2024 19:18:17 - INFO - __main__ -   Step: 6226, LR: 7.104732124385719e-06, Loss: 409.8231506347656
2024-08-04T02:18:30.060963796Z 
 66%|██████▌   | 6227/9500 [21:20:59<11:19:02, 12.45s/it]08/03/2024 19:18:30 - INFO - __main__ -   Step: 6227, LR: 7.102561580698441e-06, Loss: 578.5107421875
2024-08-04T02:18:42.321757411Z 
 66%|██████▌   | 6228/9500 [21:21:12<11:15:46, 12.39s/it]08/03/2024 19:18:42 - INFO - __main__ -   Step: 6228, LR: 7.100391037011162e-06, Loss: 391.29815673828125
2024-08-04T02:18:54.492079939Z 
 66%|██████▌   | 6229/9500 [21:21:24<11:11:56, 12.33s/it]08/03/2024 19:18:54 - INFO - __main__ -   Step: 6229, LR: 7.098220493323883e-06, Loss: 455.9041748046875
2024-08-04T02:19:06.637556716Z 
 66%|██████▌   | 6230/9500 [21:21:36<11:08:47, 12.27s/it]08/03/2024 19:19:06 - INFO - __main__ -   Step: 6230, LR: 7.096049949636604e-06, Loss: 447.671875
2024-08-04T02:19:19.322649735Z 
 66%|██████▌   | 6231/9500 [21:21:49<11:15:20, 12.40s/it]08/03/2024 19:19:19 - INFO - __main__ -   Step: 6231, LR: 7.093879405949325e-06, Loss: 442.0304260253906
2024-08-04T02:19:31.944686028Z 
 66%|██████▌   | 6232/9500 [21:22:01<11:18:50, 12.46s/it]08/03/2024 19:19:31 - INFO - __main__ -   Step: 6232, LR: 7.091708862262047e-06, Loss: 411.03076171875
2024-08-04T02:19:44.331050157Z 
 66%|██████▌   | 6233/9500 [21:22:14<11:17:22, 12.44s/it]08/03/2024 19:19:44 - INFO - __main__ -   Step: 6233, LR: 7.089538318574767e-06, Loss: 442.6177062988281
2024-08-04T02:19:56.812443008Z 
 66%|██████▌   | 6234/9500 [21:22:26<11:17:50, 12.45s/it]08/03/2024 19:19:56 - INFO - __main__ -   Step: 6234, LR: 7.087367774887488e-06, Loss: 470.5638732910156
2024-08-04T02:20:09.201636444Z 
 66%|██████▌   | 6235/9500 [21:22:39<11:16:35, 12.43s/it]08/03/2024 19:20:09 - INFO - __main__ -   Step: 6235, LR: 7.085197231200209e-06, Loss: 383.456787109375
2024-08-04T02:20:21.384196349Z 
 66%|██████▌   | 6236/9500 [21:22:51<11:12:17, 12.36s/it]08/03/2024 19:20:21 - INFO - __main__ -   Step: 6236, LR: 7.083026687512931e-06, Loss: 403.3634033203125
2024-08-04T02:20:34.177562466Z 
 66%|██████▌   | 6237/9500 [21:23:04<11:19:11, 12.49s/it]08/03/2024 19:20:34 - INFO - __main__ -   Step: 6237, LR: 7.080856143825651e-06, Loss: 469.2742004394531
2024-08-04T02:20:46.395377798Z 
 66%|██████▌   | 6238/9500 [21:23:16<11:14:33, 12.41s/it]08/03/2024 19:20:46 - INFO - __main__ -   Step: 6238, LR: 7.078685600138373e-06, Loss: 338.2311706542969
2024-08-04T02:20:58.588627114Z 
 66%|██████▌   | 6239/9500 [21:23:28<11:10:51, 12.34s/it]08/03/2024 19:20:58 - INFO - __main__ -   Step: 6239, LR: 7.0765150564510945e-06, Loss: 404.8543701171875
2024-08-04T02:21:11.006038301Z 
 66%|██████▌   | 6240/9500 [21:23:40<11:11:51, 12.37s/it]08/03/2024 19:21:11 - INFO - __main__ -   Step: 6240, LR: 7.074344512763814e-06, Loss: 379.6253662109375
2024-08-04T02:21:23.138836039Z 
 66%|██████▌   | 6241/9500 [21:23:53<11:07:51, 12.30s/it]08/03/2024 19:21:23 - INFO - __main__ -   Step: 6241, LR: 7.072173969076536e-06, Loss: 378.2014465332031
2024-08-04T02:21:35.581067257Z 
 66%|██████▌   | 6242/9500 [21:24:05<11:10:02, 12.34s/it]08/03/2024 19:21:35 - INFO - __main__ -   Step: 6242, LR: 7.070003425389257e-06, Loss: 544.2503662109375
2024-08-04T02:21:48.171236855Z 
 66%|██████▌   | 6243/9500 [21:24:18<11:13:55, 12.41s/it]08/03/2024 19:21:48 - INFO - __main__ -   Step: 6243, LR: 7.067832881701978e-06, Loss: 452.1322021484375
2024-08-04T02:22:00.363094902Z 
 66%|██████▌   | 6244/9500 [21:24:30<11:10:04, 12.35s/it]08/03/2024 19:22:00 - INFO - __main__ -   Step: 6244, LR: 7.0656623380147e-06, Loss: 329.3090515136719
2024-08-04T02:22:12.777159811Z 
 66%|██████▌   | 6245/9500 [21:24:42<11:10:57, 12.37s/it]08/03/2024 19:22:12 - INFO - __main__ -   Step: 6245, LR: 7.0634917943274205e-06, Loss: 467.2233581542969
2024-08-04T02:22:25.419428024Z 
 66%|██████▌   | 6246/9500 [21:24:55<11:15:12, 12.45s/it]08/03/2024 19:22:25 - INFO - __main__ -   Step: 6246, LR: 7.061321250640142e-06, Loss: 443.13958740234375
2024-08-04T02:22:37.872124185Z 
 66%|██████▌   | 6247/9500 [21:25:07<11:15:03, 12.45s/it]08/03/2024 19:22:37 - INFO - __main__ -   Step: 6247, LR: 7.059150706952862e-06, Loss: 536.9614868164062
2024-08-04T02:22:50.207067121Z 
 66%|██████▌   | 6248/9500 [21:25:20<11:12:57, 12.42s/it]08/03/2024 19:22:50 - INFO - __main__ -   Step: 6248, LR: 7.0569801632655835e-06, Loss: 424.1904296875
2024-08-04T02:23:02.479680690Z 
 66%|██████▌   | 6249/9500 [21:25:32<11:10:24, 12.37s/it]08/03/2024 19:23:02 - INFO - __main__ -   Step: 6249, LR: 7.054809619578304e-06, Loss: 371.0771179199219
2024-08-04T02:23:14.504924388Z 
 66%|██████▌   | 6250/9500 [21:25:44<11:04:33, 12.27s/it]08/03/2024 19:23:14 - INFO - __main__ -   Step: 6250, LR: 7.052639075891026e-06, Loss: 354.9038391113281
2024-08-04T02:23:26.615350039Z 
 66%|██████▌   | 6251/9500 [21:25:56<11:01:46, 12.22s/it]08/03/2024 19:23:26 - INFO - __main__ -   Step: 6251, LR: 7.050468532203747e-06, Loss: 366.96527099609375
2024-08-04T02:23:39.100121459Z 
 66%|██████▌   | 6252/9500 [21:26:09<11:05:51, 12.30s/it]08/03/2024 19:23:39 - INFO - __main__ -   Step: 6252, LR: 7.048297988516468e-06, Loss: 408.6520080566406
2024-08-04T02:23:51.212522313Z 
 66%|██████▌   | 6253/9500 [21:26:21<11:02:35, 12.24s/it]08/03/2024 19:23:51 - INFO - __main__ -   Step: 6253, LR: 7.0461274448291896e-06, Loss: 367.9033508300781
2024-08-04T02:24:03.397070470Z 
 66%|██████▌   | 6254/9500 [21:26:33<11:01:25, 12.23s/it]08/03/2024 19:24:03 - INFO - __main__ -   Step: 6254, LR: 7.0439569011419094e-06, Loss: 348.07110595703125
2024-08-04T02:24:15.875460549Z 
 66%|██████▌   | 6255/9500 [21:26:45<11:05:19, 12.30s/it]08/03/2024 19:24:15 - INFO - __main__ -   Step: 6255, LR: 7.041786357454631e-06, Loss: 388.2066955566406
2024-08-04T02:24:27.994444069Z 
 66%|██████▌   | 6256/9500 [21:26:57<11:02:08, 12.25s/it]08/03/2024 19:24:27 - INFO - __main__ -   Step: 6256, LR: 7.039615813767352e-06, Loss: 416.84405517578125
2024-08-04T02:24:40.042513576Z 
 66%|██████▌   | 6257/9500 [21:27:09<10:58:43, 12.19s/it]08/03/2024 19:24:40 - INFO - __main__ -   Step: 6257, LR: 7.037445270080073e-06, Loss: 486.87689208984375
2024-08-04T02:24:52.244829206Z 
 66%|██████▌   | 6258/9500 [21:27:22<10:58:45, 12.19s/it]08/03/2024 19:24:52 - INFO - __main__ -   Step: 6258, LR: 7.035274726392795e-06, Loss: 347.2486267089844
2024-08-04T02:25:04.442517897Z 
 66%|██████▌   | 6259/9500 [21:27:34<10:58:39, 12.19s/it]08/03/2024 19:25:04 - INFO - __main__ -   Step: 6259, LR: 7.0331041827055155e-06, Loss: 384.37493896484375
2024-08-04T02:25:16.337534187Z 
 66%|██████▌   | 6260/9500 [21:27:46<10:53:36, 12.10s/it]08/03/2024 19:25:16 - INFO - __main__ -   Step: 6260, LR: 7.030933639018237e-06, Loss: 456.72845458984375
2024-08-04T02:25:28.757135643Z 
 66%|██████▌   | 6261/9500 [21:27:58<10:58:31, 12.20s/it]08/03/2024 19:25:28 - INFO - __main__ -   Step: 6261, LR: 7.028763095330957e-06, Loss: 349.6310729980469
2024-08-04T02:25:41.024214306Z 
 66%|██████▌   | 6262/9500 [21:28:10<10:59:25, 12.22s/it]08/03/2024 19:25:41 - INFO - __main__ -   Step: 6262, LR: 7.0265925516436785e-06, Loss: 474.673583984375
2024-08-04T02:25:53.047745687Z 
 66%|██████▌   | 6263/9500 [21:28:22<10:56:03, 12.16s/it]08/03/2024 19:25:53 - INFO - __main__ -   Step: 6263, LR: 7.024422007956399e-06, Loss: 393.9736633300781
2024-08-04T02:26:05.580476940Z 
 66%|██████▌   | 6264/9500 [21:28:35<11:01:52, 12.27s/it]08/03/2024 19:26:05 - INFO - __main__ -   Step: 6264, LR: 7.022251464269121e-06, Loss: 459.8243713378906
2024-08-04T02:26:17.885535486Z 
 66%|██████▌   | 6265/9500 [21:28:47<11:02:12, 12.28s/it]08/03/2024 19:26:17 - INFO - __main__ -   Step: 6265, LR: 7.020080920581842e-06, Loss: 489.5457763671875
2024-08-04T02:26:30.283239469Z 
 66%|██████▌   | 6266/9500 [21:29:00<11:03:52, 12.32s/it]08/03/2024 19:26:30 - INFO - __main__ -   Step: 6266, LR: 7.017910376894563e-06, Loss: 380.7023010253906
2024-08-04T02:26:42.853441619Z 
 66%|██████▌   | 6267/9500 [21:29:12<11:07:45, 12.39s/it]08/03/2024 19:26:42 - INFO - __main__ -   Step: 6267, LR: 7.015739833207285e-06, Loss: 435.7107849121094
2024-08-04T02:26:55.155381927Z 
 66%|██████▌   | 6268/9500 [21:29:25<11:06:05, 12.37s/it]08/03/2024 19:26:55 - INFO - __main__ -   Step: 6268, LR: 7.0135692895200045e-06, Loss: 514.1482543945312
2024-08-04T02:27:07.423084482Z 
 66%|██████▌   | 6269/9500 [21:29:37<11:04:18, 12.34s/it]08/03/2024 19:27:07 - INFO - __main__ -   Step: 6269, LR: 7.011398745832726e-06, Loss: 432.7045593261719
2024-08-04T02:27:20.136310783Z 
 66%|██████▌   | 6270/9500 [21:29:50<11:10:11, 12.45s/it]08/03/2024 19:27:20 - INFO - __main__ -   Step: 6270, LR: 7.009228202145448e-06, Loss: 503.7403564453125
2024-08-04T02:27:32.049594993Z 
 66%|██████▌   | 6271/9500 [21:30:01<11:01:19, 12.29s/it]08/03/2024 19:27:32 - INFO - __main__ -   Step: 6271, LR: 7.007057658458168e-06, Loss: 345.74072265625
2024-08-04T02:27:44.132375032Z 
 66%|██████▌   | 6272/9500 [21:30:14<10:57:48, 12.23s/it]08/03/2024 19:27:44 - INFO - __main__ -   Step: 6272, LR: 7.00488711477089e-06, Loss: 482.2508239746094
2024-08-04T02:27:56.306155306Z 
 66%|██████▌   | 6273/9500 [21:30:26<10:56:44, 12.21s/it]08/03/2024 19:27:56 - INFO - __main__ -   Step: 6273, LR: 7.002716571083611e-06, Loss: 486.5166015625
2024-08-04T02:28:09.104010510Z 
 66%|██████▌   | 6274/9500 [21:30:39<11:06:00, 12.39s/it]08/03/2024 19:28:09 - INFO - __main__ -   Step: 6274, LR: 7.000546027396332e-06, Loss: 405.47021484375
2024-08-04T02:28:21.156144199Z 
 66%|██████▌   | 6275/9500 [21:30:51<11:00:23, 12.29s/it]08/03/2024 19:28:21 - INFO - __main__ -   Step: 6275, LR: 6.998375483709052e-06, Loss: 500.96209716796875
2024-08-04T02:28:33.197690653Z 
 66%|██████▌   | 6276/9500 [21:31:03<10:56:14, 12.21s/it]08/03/2024 19:28:33 - INFO - __main__ -   Step: 6276, LR: 6.9962049400217736e-06, Loss: 469.07720947265625
2024-08-04T02:28:45.825929163Z 
 66%|██████▌   | 6277/9500 [21:31:15<11:02:44, 12.34s/it]08/03/2024 19:28:45 - INFO - __main__ -   Step: 6277, LR: 6.994034396334495e-06, Loss: 496.18365478515625
2024-08-04T02:28:57.865108251Z 
 66%|██████▌   | 6278/9500 [21:31:27<10:57:43, 12.25s/it]08/03/2024 19:28:57 - INFO - __main__ -   Step: 6278, LR: 6.991863852647216e-06, Loss: 322.5811767578125
2024-08-04T02:29:10.055215668Z 
 66%|██████▌   | 6279/9500 [21:31:39<10:56:35, 12.23s/it]08/03/2024 19:29:10 - INFO - __main__ -   Step: 6279, LR: 6.989693308959937e-06, Loss: 422.1819763183594
2024-08-04T02:29:22.674993417Z 
 66%|██████▌   | 6280/9500 [21:31:52<11:02:38, 12.35s/it]08/03/2024 19:29:22 - INFO - __main__ -   Step: 6280, LR: 6.987522765272658e-06, Loss: 395.75433349609375
2024-08-04T02:29:34.656711850Z 
 66%|██████▌   | 6281/9500 [21:32:04<10:56:33, 12.24s/it]08/03/2024 19:29:34 - INFO - __main__ -   Step: 6281, LR: 6.98535222158538e-06, Loss: 319.015869140625
2024-08-04T02:29:46.742136560Z 
 66%|██████▌   | 6282/9500 [21:32:16<10:53:53, 12.19s/it]08/03/2024 19:29:46 - INFO - __main__ -   Step: 6282, LR: 6.9831816778980995e-06, Loss: 395.1802978515625
2024-08-04T02:29:59.392987590Z 
 66%|██████▌   | 6283/9500 [21:32:29<11:01:04, 12.33s/it]08/03/2024 19:29:59 - INFO - __main__ -   Step: 6283, LR: 6.981011134210821e-06, Loss: 324.8644714355469
2024-08-04T02:30:11.583470895Z 
 66%|██████▌   | 6284/9500 [21:32:41<10:58:38, 12.29s/it]08/03/2024 19:30:11 - INFO - __main__ -   Step: 6284, LR: 6.978840590523543e-06, Loss: 499.4776916503906
2024-08-04T02:30:23.831727936Z 
 66%|██████▌   | 6285/9500 [21:32:53<10:57:47, 12.28s/it]08/03/2024 19:30:23 - INFO - __main__ -   Step: 6285, LR: 6.976670046836263e-06, Loss: 467.9228210449219
2024-08-04T02:30:36.207434300Z 
 66%|██████▌   | 6286/9500 [21:33:06<10:59:11, 12.31s/it]08/03/2024 19:30:36 - INFO - __main__ -   Step: 6286, LR: 6.974499503148985e-06, Loss: 439.1261901855469
2024-08-04T02:30:48.632628164Z 
 66%|██████▌   | 6287/9500 [21:33:18<11:00:53, 12.34s/it]08/03/2024 19:30:48 - INFO - __main__ -   Step: 6287, LR: 6.9723289594617065e-06, Loss: 328.6084289550781
2024-08-04T02:31:00.843878236Z 
 66%|██████▌   | 6288/9500 [21:33:30<10:58:35, 12.30s/it]08/03/2024 19:31:00 - INFO - __main__ -   Step: 6288, LR: 6.970158415774427e-06, Loss: 321.15765380859375
2024-08-04T02:31:13.361054378Z 
 66%|██████▌   | 6289/9500 [21:33:43<11:01:50, 12.37s/it]08/03/2024 19:31:13 - INFO - __main__ -   Step: 6289, LR: 6.967987872087147e-06, Loss: 536.792724609375
2024-08-04T02:31:25.392476571Z 
 66%|██████▌   | 6290/9500 [21:33:55<10:56:14, 12.27s/it]08/03/2024 19:31:25 - INFO - __main__ -   Step: 6290, LR: 6.965817328399869e-06, Loss: 363.6383056640625
2024-08-04T02:31:37.565288173Z 
 66%|██████▌   | 6291/9500 [21:34:07<10:54:32, 12.24s/it]08/03/2024 19:31:37 - INFO - __main__ -   Step: 6291, LR: 6.96364678471259e-06, Loss: 433.4986267089844
2024-08-04T02:31:50.082437728Z 
 66%|██████▌   | 6292/9500 [21:34:20<10:58:48, 12.32s/it]08/03/2024 19:31:50 - INFO - __main__ -   Step: 6292, LR: 6.961476241025311e-06, Loss: 439.74200439453125
2024-08-04T02:32:02.134405136Z 
 66%|██████▌   | 6293/9500 [21:34:32<10:54:16, 12.24s/it]08/03/2024 19:32:02 - INFO - __main__ -   Step: 6293, LR: 6.9593056973380325e-06, Loss: 409.9621276855469
2024-08-04T02:32:14.670694007Z 
 66%|██████▋   | 6294/9500 [21:34:44<10:58:48, 12.33s/it]08/03/2024 19:32:14 - INFO - __main__ -   Step: 6294, LR: 6.957135153650754e-06, Loss: 414.1142578125
2024-08-04T02:32:27.291941951Z 
 66%|██████▋   | 6295/9500 [21:34:57<11:03:16, 12.42s/it]08/03/2024 19:32:27 - INFO - __main__ -   Step: 6295, LR: 6.954964609963475e-06, Loss: 452.1226501464844
2024-08-04T02:32:39.508768494Z 
 66%|██████▋   | 6296/9500 [21:35:09<10:59:52, 12.36s/it]08/03/2024 19:32:39 - INFO - __main__ -   Step: 6296, LR: 6.952794066276195e-06, Loss: 463.0260314941406
2024-08-04T02:32:51.513868401Z 
 66%|██████▋   | 6297/9500 [21:35:21<10:54:01, 12.25s/it]08/03/2024 19:32:51 - INFO - __main__ -   Step: 6297, LR: 6.950623522588916e-06, Loss: 315.73486328125
2024-08-04T02:33:03.924992417Z 
 66%|██████▋   | 6298/9500 [21:35:33<10:56:22, 12.30s/it]08/03/2024 19:33:03 - INFO - __main__ -   Step: 6298, LR: 6.948452978901638e-06, Loss: 404.9937438964844
2024-08-04T02:33:15.918403946Z 
 66%|██████▋   | 6299/9500 [21:35:45<10:51:16, 12.21s/it]08/03/2024 19:33:15 - INFO - __main__ -   Step: 6299, LR: 6.9462824352143584e-06, Loss: 528.955322265625
2024-08-04T02:33:28.479871758Z 
 66%|██████▋   | 6300/9500 [21:35:58<10:56:43, 12.31s/it]08/03/2024 19:33:28 - INFO - __main__ -   Step: 6300, LR: 6.94411189152708e-06, Loss: 439.18499755859375
2024-08-04T02:33:41.039453723Z 
 66%|██████▋   | 6301/9500 [21:36:10<11:00:27, 12.39s/it]08/03/2024 19:33:41 - INFO - __main__ -   Step: 6301, LR: 6.9419413478398015e-06, Loss: 447.8559265136719
2024-08-04T02:33:53.308013726Z 
 66%|██████▋   | 6302/9500 [21:36:23<10:58:21, 12.35s/it]08/03/2024 19:33:53 - INFO - __main__ -   Step: 6302, LR: 6.939770804152522e-06, Loss: 420.0948791503906
2024-08-04T02:34:05.904218571Z 
 66%|██████▋   | 6303/9500 [21:36:35<11:02:03, 12.43s/it]08/03/2024 19:34:05 - INFO - __main__ -   Step: 6303, LR: 6.937600260465243e-06, Loss: 406.2633056640625
2024-08-04T02:34:18.807467594Z 
 66%|██████▋   | 6304/9500 [21:36:48<11:09:29, 12.57s/it]08/03/2024 19:34:18 - INFO - __main__ -   Step: 6304, LR: 6.935429716777964e-06, Loss: 427.2716064453125
2024-08-04T02:34:31.188723995Z 
 66%|██████▋   | 6305/9500 [21:37:01<11:06:16, 12.51s/it]08/03/2024 19:34:31 - INFO - __main__ -   Step: 6305, LR: 6.933259173090685e-06, Loss: 376.4423522949219
2024-08-04T02:34:43.510139172Z 
 66%|██████▋   | 6306/9500 [21:37:13<11:03:01, 12.46s/it]08/03/2024 19:34:43 - INFO - __main__ -   Step: 6306, LR: 6.931088629403406e-06, Loss: 495.85638427734375
2024-08-04T02:34:56.531459775Z 
 66%|██████▋   | 6307/9500 [21:37:26<11:11:51, 12.63s/it]08/03/2024 19:34:56 - INFO - __main__ -   Step: 6307, LR: 6.9289180857161275e-06, Loss: 437.61334228515625
2024-08-04T02:35:08.600768181Z 
 66%|██████▋   | 6308/9500 [21:37:38<11:02:46, 12.46s/it]08/03/2024 19:35:08 - INFO - __main__ -   Step: 6308, LR: 6.926747542028849e-06, Loss: 333.84222412109375
2024-08-04T02:35:20.890625705Z 
 66%|██████▋   | 6309/9500 [21:37:50<10:59:53, 12.41s/it]08/03/2024 19:35:20 - INFO - __main__ -   Step: 6309, LR: 6.92457699834157e-06, Loss: 469.1125793457031
2024-08-04T02:35:33.730080928Z 
 66%|██████▋   | 6310/9500 [21:38:03<11:06:33, 12.54s/it]08/03/2024 19:35:33 - INFO - __main__ -   Step: 6310, LR: 6.9224064546542905e-06, Loss: 453.5606384277344
2024-08-04T02:35:46.122916206Z 
 66%|██████▋   | 6311/9500 [21:38:16<11:04:02, 12.49s/it]08/03/2024 19:35:46 - INFO - __main__ -   Step: 6311, LR: 6.920235910967011e-06, Loss: 413.199951171875
2024-08-04T02:35:58.199914795Z 
 66%|██████▋   | 6312/9500 [21:38:28<10:57:11, 12.37s/it]08/03/2024 19:35:58 - INFO - __main__ -   Step: 6312, LR: 6.918065367279733e-06, Loss: 359.331298828125
2024-08-04T02:36:10.970610071Z 
 66%|██████▋   | 6313/9500 [21:38:40<11:03:24, 12.49s/it]08/03/2024 19:36:10 - INFO - __main__ -   Step: 6313, LR: 6.9158948235924535e-06, Loss: 406.2707824707031
2024-08-04T02:36:23.198620452Z 
 66%|██████▋   | 6314/9500 [21:38:53<10:59:01, 12.41s/it]08/03/2024 19:36:23 - INFO - __main__ -   Step: 6314, LR: 6.913724279905175e-06, Loss: 419.8891296386719
2024-08-04T02:36:35.202928243Z 
 66%|██████▋   | 6315/9500 [21:39:05<10:52:20, 12.29s/it]08/03/2024 19:36:35 - INFO - __main__ -   Step: 6315, LR: 6.911553736217897e-06, Loss: 349.7886657714844
2024-08-04T02:36:47.367639945Z 
 66%|██████▋   | 6316/9500 [21:39:17<10:50:09, 12.25s/it]08/03/2024 19:36:47 - INFO - __main__ -   Step: 6316, LR: 6.909383192530617e-06, Loss: 460.6046142578125
2024-08-04T02:36:59.717554643Z 
 66%|██████▋   | 6317/9500 [21:39:29<10:51:30, 12.28s/it]08/03/2024 19:36:59 - INFO - __main__ -   Step: 6317, LR: 6.907212648843338e-06, Loss: 382.31170654296875
2024-08-04T02:37:11.808506746Z 
 67%|██████▋   | 6318/9500 [21:39:41<10:48:17, 12.22s/it]08/03/2024 19:37:11 - INFO - __main__ -   Step: 6318, LR: 6.905042105156059e-06, Loss: 486.76007080078125
2024-08-04T02:37:24.052671608Z 
 67%|██████▋   | 6319/9500 [21:39:53<10:48:23, 12.23s/it]08/03/2024 19:37:24 - INFO - __main__ -   Step: 6319, LR: 6.90287156146878e-06, Loss: 404.80889892578125
2024-08-04T02:37:36.573661920Z 
 67%|██████▋   | 6320/9500 [21:40:06<10:52:49, 12.32s/it]08/03/2024 19:37:36 - INFO - __main__ -   Step: 6320, LR: 6.900701017781502e-06, Loss: 392.0984802246094
2024-08-04T02:37:48.720806338Z 
 67%|██████▋   | 6321/9500 [21:40:18<10:49:54, 12.27s/it]08/03/2024 19:37:48 - INFO - __main__ -   Step: 6321, LR: 6.8985304740942226e-06, Loss: 450.0526428222656
2024-08-04T02:38:01.229456117Z 
 67%|██████▋   | 6322/9500 [21:40:31<10:53:33, 12.34s/it]08/03/2024 19:38:01 - INFO - __main__ -   Step: 6322, LR: 6.896359930406944e-06, Loss: 551.2578125
2024-08-04T02:38:13.123570344Z 
 67%|██████▋   | 6323/9500 [21:40:43<10:46:16, 12.21s/it]08/03/2024 19:38:13 - INFO - __main__ -   Step: 6323, LR: 6.894189386719665e-06, Loss: 432.4617004394531
2024-08-04T02:38:25.018756396Z 
 67%|██████▋   | 6324/9500 [21:40:54<10:41:09, 12.11s/it]08/03/2024 19:38:25 - INFO - __main__ -   Step: 6324, LR: 6.8920188430323856e-06, Loss: 325.85699462890625
2024-08-04T02:38:37.361018882Z 
 67%|██████▋   | 6325/9500 [21:41:07<10:44:35, 12.18s/it]08/03/2024 19:38:37 - INFO - __main__ -   Step: 6325, LR: 6.889848299345106e-06, Loss: 382.15032958984375
2024-08-04T02:38:49.833316520Z 
 67%|██████▋   | 6326/9500 [21:41:19<10:49:00, 12.27s/it]08/03/2024 19:38:49 - INFO - __main__ -   Step: 6326, LR: 6.887677755657828e-06, Loss: 370.19659423828125
2024-08-04T02:39:02.198588384Z 
 67%|██████▋   | 6327/9500 [21:41:32<10:50:20, 12.30s/it]08/03/2024 19:39:02 - INFO - __main__ -   Step: 6327, LR: 6.885507211970549e-06, Loss: 433.7975158691406
2024-08-04T02:39:14.118051316Z 
 67%|██████▋   | 6328/9500 [21:41:44<10:44:08, 12.18s/it]08/03/2024 19:39:14 - INFO - __main__ -   Step: 6328, LR: 6.88333666828327e-06, Loss: 381.78448486328125
2024-08-04T02:39:26.521344237Z 
 67%|██████▋   | 6329/9500 [21:41:56<10:47:24, 12.25s/it]08/03/2024 19:39:26 - INFO - __main__ -   Step: 6329, LR: 6.881166124595992e-06, Loss: 490.5269775390625
2024-08-04T02:39:38.822651020Z 
 67%|██████▋   | 6330/9500 [21:42:08<10:48:00, 12.27s/it]08/03/2024 19:39:38 - INFO - __main__ -   Step: 6330, LR: 6.878995580908712e-06, Loss: 450.3262939453125
2024-08-04T02:39:50.769826074Z 
 67%|██████▋   | 6331/9500 [21:42:20<10:42:46, 12.17s/it]08/03/2024 19:39:50 - INFO - __main__ -   Step: 6331, LR: 6.876825037221433e-06, Loss: 365.8652648925781
2024-08-04T02:40:03.134483705Z 
 67%|██████▋   | 6332/9500 [21:42:33<10:45:39, 12.23s/it]08/03/2024 19:40:03 - INFO - __main__ -   Step: 6332, LR: 6.874654493534154e-06, Loss: 376.97064208984375
2024-08-04T02:40:15.241162078Z 
 67%|██████▋   | 6333/9500 [21:42:45<10:43:31, 12.19s/it]08/03/2024 19:40:15 - INFO - __main__ -   Step: 6333, LR: 6.872483949846875e-06, Loss: 308.3215637207031
2024-08-04T02:40:27.605825495Z 
 67%|██████▋   | 6334/9500 [21:42:57<10:46:03, 12.24s/it]08/03/2024 19:40:27 - INFO - __main__ -   Step: 6334, LR: 6.870313406159597e-06, Loss: 380.3026428222656
2024-08-04T02:40:40.096870452Z 
 67%|██████▋   | 6335/9500 [21:43:10<10:49:45, 12.32s/it]08/03/2024 19:40:40 - INFO - __main__ -   Step: 6335, LR: 6.868142862472318e-06, Loss: 370.07293701171875
2024-08-04T02:40:52.085839434Z 
 67%|██████▋   | 6336/9500 [21:43:22<10:44:21, 12.22s/it]08/03/2024 19:40:52 - INFO - __main__ -   Step: 6336, LR: 6.865972318785039e-06, Loss: 344.3317565917969
2024-08-04T02:41:04.465545364Z 
 67%|██████▋   | 6337/9500 [21:43:34<10:46:41, 12.27s/it]08/03/2024 19:41:04 - INFO - __main__ -   Step: 6337, LR: 6.863801775097761e-06, Loss: 351.6814880371094
2024-08-04T02:41:16.966792236Z 
 67%|██████▋   | 6338/9500 [21:43:46<10:50:11, 12.34s/it]08/03/2024 19:41:16 - INFO - __main__ -   Step: 6338, LR: 6.861631231410481e-06, Loss: 372.95849609375
2024-08-04T02:41:28.940160507Z 
 67%|██████▋   | 6339/9500 [21:43:58<10:44:13, 12.23s/it]08/03/2024 19:41:28 - INFO - __main__ -   Step: 6339, LR: 6.859460687723201e-06, Loss: 420.7105712890625
2024-08-04T02:41:41.207741470Z 
 67%|██████▋   | 6340/9500 [21:44:11<10:44:38, 12.24s/it]08/03/2024 19:41:41 - INFO - __main__ -   Step: 6340, LR: 6.857290144035923e-06, Loss: 321.0282287597656
2024-08-04T02:41:53.991136670Z 
 67%|██████▋   | 6341/9500 [21:44:23<10:53:01, 12.40s/it]08/03/2024 19:41:53 - INFO - __main__ -   Step: 6341, LR: 6.8551196003486444e-06, Loss: 512.5523071289062
2024-08-04T02:42:06.108761895Z 
 67%|██████▋   | 6342/9500 [21:44:36<10:48:18, 12.32s/it]08/03/2024 19:42:06 - INFO - __main__ -   Step: 6342, LR: 6.852949056661365e-06, Loss: 384.31976318359375
2024-08-04T02:42:18.009406075Z 
 67%|██████▋   | 6343/9500 [21:44:47<10:41:31, 12.19s/it]08/03/2024 19:42:18 - INFO - __main__ -   Step: 6343, LR: 6.850778512974087e-06, Loss: 321.3770751953125
2024-08-04T02:42:30.513668278Z 
 67%|██████▋   | 6344/9500 [21:45:00<10:46:14, 12.29s/it]08/03/2024 19:42:30 - INFO - __main__ -   Step: 6344, LR: 6.848607969286808e-06, Loss: 392.4569091796875
2024-08-04T02:42:42.439970863Z 
 67%|██████▋   | 6345/9500 [21:45:12<10:40:21, 12.18s/it]08/03/2024 19:42:42 - INFO - __main__ -   Step: 6345, LR: 6.846437425599528e-06, Loss: 336.7799072265625
2024-08-04T02:42:54.840747303Z 
 67%|██████▋   | 6346/9500 [21:45:24<10:43:40, 12.24s/it]08/03/2024 19:42:54 - INFO - __main__ -   Step: 6346, LR: 6.84426688191225e-06, Loss: 331.47564697265625
2024-08-04T02:43:07.121908279Z 
 67%|██████▋   | 6347/9500 [21:45:37<10:44:01, 12.26s/it]08/03/2024 19:43:07 - INFO - __main__ -   Step: 6347, LR: 6.84209633822497e-06, Loss: 394.0708312988281
2024-08-04T02:43:19.268526387Z 
 67%|██████▋   | 6348/9500 [21:45:49<10:42:07, 12.22s/it]08/03/2024 19:43:19 - INFO - __main__ -   Step: 6348, LR: 6.839925794537692e-06, Loss: 312.0631408691406
2024-08-04T02:43:31.385048214Z 
 67%|██████▋   | 6349/9500 [21:46:01<10:40:13, 12.19s/it]08/03/2024 19:43:31 - INFO - __main__ -   Step: 6349, LR: 6.837755250850413e-06, Loss: 352.89642333984375
2024-08-04T02:43:44.457825230Z 
 67%|██████▋   | 6350/9500 [21:46:14<10:53:55, 12.46s/it]08/03/2024 19:43:44 - INFO - __main__ -   Step: 6350, LR: 6.835584707163134e-06, Loss: 391.5735778808594
2024-08-04T02:43:56.603573412Z 
 67%|██████▋   | 6351/9500 [21:46:26<10:48:50, 12.36s/it]08/03/2024 19:43:56 - INFO - __main__ -   Step: 6351, LR: 6.833414163475856e-06, Loss: 387.8119812011719
2024-08-04T02:44:08.618588418Z 
 67%|██████▋   | 6352/9500 [21:46:38<10:43:09, 12.26s/it]08/03/2024 19:44:08 - INFO - __main__ -   Step: 6352, LR: 6.831243619788576e-06, Loss: 319.0826416015625
2024-08-04T02:44:21.360824552Z 
 67%|██████▋   | 6353/9500 [21:46:51<10:50:33, 12.40s/it]08/03/2024 19:44:21 - INFO - __main__ -   Step: 6353, LR: 6.829073076101297e-06, Loss: 391.5648498535156
2024-08-04T02:44:33.565521750Z 
 67%|██████▋   | 6354/9500 [21:47:03<10:47:13, 12.34s/it]08/03/2024 19:44:33 - INFO - __main__ -   Step: 6354, LR: 6.826902532414018e-06, Loss: 419.996337890625
2024-08-04T02:44:45.669680521Z 
 67%|██████▋   | 6355/9500 [21:47:15<10:43:15, 12.27s/it]08/03/2024 19:44:45 - INFO - __main__ -   Step: 6355, LR: 6.8247319887267395e-06, Loss: 351.51715087890625
2024-08-04T02:44:58.164133124Z 
 67%|██████▋   | 6356/9500 [21:47:28<10:46:32, 12.34s/it]08/03/2024 19:44:58 - INFO - __main__ -   Step: 6356, LR: 6.82256144503946e-06, Loss: 376.32818603515625
2024-08-04T02:45:10.413795590Z 
 67%|██████▋   | 6357/9500 [21:47:40<10:44:56, 12.31s/it]08/03/2024 19:45:10 - INFO - __main__ -   Step: 6357, LR: 6.820390901352182e-06, Loss: 387.17913818359375
2024-08-04T02:45:23.086794894Z 
 67%|██████▋   | 6358/9500 [21:47:53<10:50:24, 12.42s/it]08/03/2024 19:45:23 - INFO - __main__ -   Step: 6358, LR: 6.818220357664902e-06, Loss: 453.49481201171875
2024-08-04T02:45:35.313196653Z 
 67%|██████▋   | 6359/9500 [21:48:05<10:47:09, 12.36s/it]08/03/2024 19:45:35 - INFO - __main__ -   Step: 6359, LR: 6.816049813977623e-06, Loss: 447.82232666015625
2024-08-04T02:45:48.168274301Z 
 67%|██████▋   | 6360/9500 [21:48:18<10:54:41, 12.51s/it]08/03/2024 19:45:48 - INFO - __main__ -   Step: 6360, LR: 6.813879270290345e-06, Loss: 370.3207702636719
2024-08-04T02:46:00.490144147Z 
 67%|██████▋   | 6361/9500 [21:48:30<10:51:31, 12.45s/it]08/03/2024 19:46:00 - INFO - __main__ -   Step: 6361, LR: 6.8117087266030655e-06, Loss: 461.8572692871094
2024-08-04T02:46:12.774767918Z 
 67%|██████▋   | 6362/9500 [21:48:42<10:48:38, 12.40s/it]08/03/2024 19:46:12 - INFO - __main__ -   Step: 6362, LR: 6.809538182915787e-06, Loss: 480.61279296875
2024-08-04T02:46:25.408549934Z 
 67%|██████▋   | 6363/9500 [21:48:55<10:52:05, 12.47s/it]08/03/2024 19:46:25 - INFO - __main__ -   Step: 6363, LR: 6.807367639228509e-06, Loss: 348.43939208984375
2024-08-04T02:46:37.383866503Z 
 67%|██████▋   | 6364/9500 [21:49:07<10:44:05, 12.32s/it]08/03/2024 19:46:37 - INFO - __main__ -   Step: 6364, LR: 6.805197095541229e-06, Loss: 455.59771728515625
2024-08-04T02:46:49.538037825Z 
 67%|██████▋   | 6365/9500 [21:49:19<10:41:14, 12.27s/it]08/03/2024 19:46:49 - INFO - __main__ -   Step: 6365, LR: 6.803026551853949e-06, Loss: 506.7210998535156
2024-08-04T02:47:02.096965923Z 
 67%|██████▋   | 6366/9500 [21:49:32<10:45:30, 12.36s/it]08/03/2024 19:47:02 - INFO - __main__ -   Step: 6366, LR: 6.800856008166671e-06, Loss: 400.2471008300781
2024-08-04T02:47:14.338329042Z 
 67%|██████▋   | 6367/9500 [21:49:44<10:43:29, 12.32s/it]08/03/2024 19:47:14 - INFO - __main__ -   Step: 6367, LR: 6.798685464479392e-06, Loss: 375.723876953125
2024-08-04T02:47:26.471822265Z 
 67%|██████▋   | 6368/9500 [21:49:56<10:40:18, 12.27s/it]08/03/2024 19:47:26 - INFO - __main__ -   Step: 6368, LR: 6.796514920792113e-06, Loss: 475.2808837890625
2024-08-04T02:47:39.081604289Z 
 67%|██████▋   | 6369/9500 [21:50:09<10:45:28, 12.37s/it]08/03/2024 19:47:39 - INFO - __main__ -   Step: 6369, LR: 6.7943443771048346e-06, Loss: 407.1048583984375
2024-08-04T02:47:51.157445492Z 
 67%|██████▋   | 6370/9500 [21:50:21<10:40:40, 12.28s/it]08/03/2024 19:47:51 - INFO - __main__ -   Step: 6370, LR: 6.792173833417556e-06, Loss: 356.2783508300781
2024-08-04T02:48:03.662879771Z 
 67%|██████▋   | 6371/9500 [21:50:33<10:43:58, 12.35s/it]08/03/2024 19:48:03 - INFO - __main__ -   Step: 6371, LR: 6.790003289730277e-06, Loss: 392.5820617675781
2024-08-04T02:48:16.186919700Z 
 67%|██████▋   | 6372/9500 [21:50:46<10:46:31, 12.40s/it]08/03/2024 19:48:16 - INFO - __main__ -   Step: 6372, LR: 6.7878327460429975e-06, Loss: 384.9041442871094
2024-08-04T02:48:28.212673991Z 
 67%|██████▋   | 6373/9500 [21:50:58<10:40:26, 12.29s/it]08/03/2024 19:48:28 - INFO - __main__ -   Step: 6373, LR: 6.785662202355718e-06, Loss: 376.08880615234375
2024-08-04T02:48:40.285707545Z 
 67%|██████▋   | 6374/9500 [21:51:10<10:36:51, 12.22s/it]08/03/2024 19:48:40 - INFO - __main__ -   Step: 6374, LR: 6.78349165866844e-06, Loss: 405.85235595703125
2024-08-04T02:48:52.527106491Z 
 67%|██████▋   | 6375/9500 [21:51:22<10:36:56, 12.23s/it]08/03/2024 19:48:52 - INFO - __main__ -   Step: 6375, LR: 6.7813211149811605e-06, Loss: 382.01678466796875
2024-08-04T02:49:04.860801446Z 
 67%|██████▋   | 6376/9500 [21:51:34<10:38:21, 12.26s/it]08/03/2024 19:49:04 - INFO - __main__ -   Step: 6376, LR: 6.779150571293882e-06, Loss: 372.2511291503906
2024-08-04T02:49:17.201356963Z 
 67%|██████▋   | 6377/9500 [21:51:47<10:39:24, 12.28s/it]08/03/2024 19:49:17 - INFO - __main__ -   Step: 6377, LR: 6.776980027606604e-06, Loss: 498.52392578125
2024-08-04T02:49:29.449368356Z 
 67%|██████▋   | 6378/9500 [21:51:59<10:38:37, 12.27s/it]08/03/2024 19:49:29 - INFO - __main__ -   Step: 6378, LR: 6.774809483919324e-06, Loss: 409.21868896484375
2024-08-04T02:49:41.795423769Z 
 67%|██████▋   | 6379/9500 [21:52:11<10:39:33, 12.30s/it]08/03/2024 19:49:41 - INFO - __main__ -   Step: 6379, LR: 6.772638940232045e-06, Loss: 340.52105712890625
2024-08-04T02:49:53.966247067Z 
 67%|██████▋   | 6380/9500 [21:52:23<10:37:24, 12.26s/it]08/03/2024 19:49:53 - INFO - __main__ -   Step: 6380, LR: 6.770468396544766e-06, Loss: 343.44085693359375
2024-08-04T02:50:06.437875459Z 
 67%|██████▋   | 6381/9500 [21:52:36<10:40:32, 12.32s/it]08/03/2024 19:50:06 - INFO - __main__ -   Step: 6381, LR: 6.768297852857487e-06, Loss: 522.7342529296875
2024-08-04T02:50:18.720782139Z 
 67%|██████▋   | 6382/9500 [21:52:48<10:39:43, 12.31s/it]08/03/2024 19:50:18 - INFO - __main__ -   Step: 6382, LR: 6.766127309170208e-06, Loss: 399.62188720703125
2024-08-04T02:50:31.498695640Z 
 67%|██████▋   | 6383/9500 [21:53:01<10:46:48, 12.45s/it]08/03/2024 19:50:31 - INFO - __main__ -   Step: 6383, LR: 6.76395676548293e-06, Loss: 489.3793029785156
2024-08-04T02:50:43.965398347Z 
 67%|██████▋   | 6384/9500 [21:53:13<10:46:51, 12.46s/it]08/03/2024 19:50:43 - INFO - __main__ -   Step: 6384, LR: 6.761786221795651e-06, Loss: 371.989501953125
2024-08-04T02:50:56.294883475Z 
 67%|██████▋   | 6385/9500 [21:53:26<10:44:40, 12.42s/it]08/03/2024 19:50:56 - INFO - __main__ -   Step: 6385, LR: 6.759615678108372e-06, Loss: 343.276611328125
2024-08-04T02:51:08.237784887Z 
 67%|██████▋   | 6386/9500 [21:53:38<10:37:05, 12.28s/it]08/03/2024 19:51:08 - INFO - __main__ -   Step: 6386, LR: 6.757445134421093e-06, Loss: 310.4498291015625
2024-08-04T02:51:20.961237366Z 
 67%|██████▋   | 6387/9500 [21:53:50<10:43:51, 12.41s/it]08/03/2024 19:51:20 - INFO - __main__ -   Step: 6387, LR: 6.755274590733813e-06, Loss: 540.181884765625
2024-08-04T02:51:33.002949192Z 
 67%|██████▋   | 6388/9500 [21:54:02<10:37:55, 12.30s/it]08/03/2024 19:51:33 - INFO - __main__ -   Step: 6388, LR: 6.753104047046535e-06, Loss: 357.35345458984375
2024-08-04T02:51:45.258558755Z 
 67%|██████▋   | 6389/9500 [21:54:15<10:37:02, 12.29s/it]08/03/2024 19:51:45 - INFO - __main__ -   Step: 6389, LR: 6.7509335033592564e-06, Loss: 509.1252136230469
2024-08-04T02:51:58.263703856Z 
 67%|██████▋   | 6390/9500 [21:54:28<10:48:00, 12.50s/it]08/03/2024 19:51:58 - INFO - __main__ -   Step: 6390, LR: 6.748762959671977e-06, Loss: 465.9765930175781
2024-08-04T02:52:10.571274561Z 
 67%|██████▋   | 6391/9500 [21:54:40<10:44:47, 12.44s/it]08/03/2024 19:52:10 - INFO - __main__ -   Step: 6391, LR: 6.746592415984699e-06, Loss: 416.3393249511719
2024-08-04T02:52:22.520813278Z 
 67%|██████▋   | 6392/9500 [21:54:52<10:36:53, 12.30s/it]08/03/2024 19:52:22 - INFO - __main__ -   Step: 6392, LR: 6.744421872297419e-06, Loss: 392.26873779296875
2024-08-04T02:52:35.101008975Z 
 67%|██████▋   | 6393/9500 [21:55:05<10:41:07, 12.38s/it]08/03/2024 19:52:35 - INFO - __main__ -   Step: 6393, LR: 6.74225132861014e-06, Loss: 374.508056640625
2024-08-04T02:52:47.224308143Z 
 67%|██████▋   | 6394/9500 [21:55:17<10:36:55, 12.30s/it]08/03/2024 19:52:47 - INFO - __main__ -   Step: 6394, LR: 6.740080784922861e-06, Loss: 372.6963195800781
2024-08-04T02:52:59.603078121Z 
 67%|██████▋   | 6395/9500 [21:55:29<10:37:52, 12.33s/it]08/03/2024 19:52:59 - INFO - __main__ -   Step: 6395, LR: 6.737910241235582e-06, Loss: 611.3052978515625
2024-08-04T02:53:12.466137871Z 
 67%|██████▋   | 6396/9500 [21:55:42<10:45:59, 12.49s/it]08/03/2024 19:53:12 - INFO - __main__ -   Step: 6396, LR: 6.735739697548304e-06, Loss: 481.39166259765625
2024-08-04T02:53:24.555819547Z 
 67%|██████▋   | 6397/9500 [21:55:54<10:39:38, 12.37s/it]08/03/2024 19:53:24 - INFO - __main__ -   Step: 6397, LR: 6.733569153861025e-06, Loss: 414.6630859375
2024-08-04T02:53:36.601914105Z 
 67%|██████▋   | 6398/9500 [21:56:06<10:34:26, 12.27s/it]08/03/2024 19:53:36 - INFO - __main__ -   Step: 6398, LR: 6.731398610173746e-06, Loss: 427.358154296875
2024-08-04T02:53:49.718480039Z 
 67%|██████▋   | 6399/9500 [21:56:19<10:47:19, 12.52s/it]08/03/2024 19:53:49 - INFO - __main__ -   Step: 6399, LR: 6.729228066486467e-06, Loss: 529.8492431640625
2024-08-04T02:54:01.616582991Z 
 67%|██████▋   | 6400/9500 [21:56:31<10:37:24, 12.34s/it]08/03/2024 19:54:01 - INFO - __main__ -   Step: 6400, LR: 6.727057522799188e-06, Loss: 363.7093200683594
2024-08-04T02:54:13.885586775Z 
 67%|██████▋   | 6401/9500 [21:56:43<10:36:08, 12.32s/it]08/03/2024 19:54:13 - INFO - __main__ -   Step: 6401, LR: 6.724886979111908e-06, Loss: 385.57379150390625
2024-08-04T02:54:26.388455217Z 
 67%|██████▋   | 6402/9500 [21:56:56<10:38:49, 12.37s/it]08/03/2024 19:54:26 - INFO - __main__ -   Step: 6402, LR: 6.72271643542463e-06, Loss: 445.0163269042969
2024-08-04T02:54:39.055976150Z 
 67%|██████▋   | 6403/9500 [21:57:08<10:43:11, 12.46s/it]08/03/2024 19:54:39 - INFO - __main__ -   Step: 6403, LR: 6.7205458917373515e-06, Loss: 318.7109375
2024-08-04T02:54:50.993850028Z 
 67%|██████▋   | 6404/9500 [21:57:20<10:34:53, 12.30s/it]08/03/2024 19:54:50 - INFO - __main__ -   Step: 6404, LR: 6.718375348050072e-06, Loss: 387.3608093261719
2024-08-04T02:55:03.459492825Z 
 67%|██████▋   | 6405/9500 [21:57:33<10:37:11, 12.35s/it]08/03/2024 19:55:03 - INFO - __main__ -   Step: 6405, LR: 6.716204804362794e-06, Loss: 461.0718994140625
2024-08-04T02:55:16.055568676Z 
 67%|██████▋   | 6406/9500 [21:57:45<10:40:44, 12.43s/it]08/03/2024 19:55:16 - INFO - __main__ -   Step: 6406, LR: 6.7140342606755145e-06, Loss: 398.0928649902344
2024-08-04T02:55:27.866931423Z 
 67%|██████▋   | 6407/9500 [21:57:57<10:31:02, 12.24s/it]08/03/2024 19:55:27 - INFO - __main__ -   Step: 6407, LR: 6.711863716988235e-06, Loss: 387.43951416015625
2024-08-04T02:55:40.194055154Z 
 67%|██████▋   | 6408/9500 [21:58:10<10:32:09, 12.27s/it]08/03/2024 19:55:40 - INFO - __main__ -   Step: 6408, LR: 6.709693173300956e-06, Loss: 417.7304992675781
2024-08-04T02:55:52.690678536Z 
 67%|██████▋   | 6409/9500 [21:58:22<10:35:30, 12.34s/it]08/03/2024 19:55:52 - INFO - __main__ -   Step: 6409, LR: 6.7075226296136775e-06, Loss: 454.40435791015625
2024-08-04T02:56:04.828031978Z 
 67%|██████▋   | 6410/9500 [21:58:34<10:32:14, 12.28s/it]08/03/2024 19:56:04 - INFO - __main__ -   Step: 6410, LR: 6.705352085926399e-06, Loss: 369.45977783203125
2024-08-04T02:56:17.230418571Z 
 67%|██████▋   | 6411/9500 [21:58:47<10:33:58, 12.31s/it]08/03/2024 19:56:17 - INFO - __main__ -   Step: 6411, LR: 6.70318154223912e-06, Loss: 426.75689697265625
2024-08-04T02:56:29.674750508Z 
 67%|██████▋   | 6412/9500 [21:58:59<10:35:46, 12.35s/it]08/03/2024 19:56:29 - INFO - __main__ -   Step: 6412, LR: 6.701010998551841e-06, Loss: 350.73187255859375
2024-08-04T02:56:42.114985363Z 
 68%|██████▊   | 6413/9500 [21:59:12<10:36:55, 12.38s/it]08/03/2024 19:56:42 - INFO - __main__ -   Step: 6413, LR: 6.698840454864563e-06, Loss: 502.1864929199219
2024-08-04T02:56:54.466682027Z 
 68%|██████▊   | 6414/9500 [21:59:24<10:36:15, 12.37s/it]08/03/2024 19:56:54 - INFO - __main__ -   Step: 6414, LR: 6.696669911177283e-06, Loss: 385.7157897949219
2024-08-04T02:57:07.078558175Z 
 68%|██████▊   | 6415/9500 [21:59:37<10:39:47, 12.44s/it]08/03/2024 19:57:07 - INFO - __main__ -   Step: 6415, LR: 6.6944993674900034e-06, Loss: 461.7058410644531
2024-08-04T02:57:19.335391172Z 
 68%|██████▊   | 6416/9500 [21:59:49<10:36:42, 12.39s/it]08/03/2024 19:57:19 - INFO - __main__ -   Step: 6416, LR: 6.692328823802725e-06, Loss: 461.19732666015625
2024-08-04T02:57:31.410624087Z 
 68%|██████▊   | 6417/9500 [22:00:01<10:31:41, 12.29s/it]08/03/2024 19:57:31 - INFO - __main__ -   Step: 6417, LR: 6.6901582801154465e-06, Loss: 345.8064270019531
2024-08-04T02:57:44.012035710Z 
 68%|██████▊   | 6418/9500 [22:00:13<10:36:13, 12.39s/it]08/03/2024 19:57:44 - INFO - __main__ -   Step: 6418, LR: 6.687987736428167e-06, Loss: 434.945068359375
2024-08-04T02:57:55.951743224Z 
 68%|██████▊   | 6419/9500 [22:00:25<10:29:08, 12.25s/it]08/03/2024 19:57:55 - INFO - __main__ -   Step: 6419, LR: 6.685817192740889e-06, Loss: 411.5079345703125
2024-08-04T02:58:08.273475361Z 
 68%|██████▊   | 6420/9500 [22:00:38<10:30:00, 12.27s/it]08/03/2024 19:58:08 - INFO - __main__ -   Step: 6420, LR: 6.68364664905361e-06, Loss: 413.46392822265625
2024-08-04T02:58:20.726020616Z 
 68%|██████▊   | 6421/9500 [22:00:50<10:32:34, 12.33s/it]08/03/2024 19:58:20 - INFO - __main__ -   Step: 6421, LR: 6.68147610536633e-06, Loss: 380.001708984375
2024-08-04T02:58:32.844956755Z 
 68%|██████▊   | 6422/9500 [22:01:02<10:29:10, 12.26s/it]08/03/2024 19:58:32 - INFO - __main__ -   Step: 6422, LR: 6.679305561679052e-06, Loss: 441.19012451171875
2024-08-04T02:58:45.203443540Z 
 68%|██████▊   | 6423/9500 [22:01:15<10:30:24, 12.29s/it]08/03/2024 19:58:45 - INFO - __main__ -   Step: 6423, LR: 6.6771350179917725e-06, Loss: 520.041748046875
2024-08-04T02:58:57.856563277Z 
 68%|██████▊   | 6424/9500 [22:01:27<10:35:45, 12.40s/it]08/03/2024 19:58:57 - INFO - __main__ -   Step: 6424, LR: 6.674964474304494e-06, Loss: 417.7813415527344
2024-08-04T02:59:10.106324591Z 
 68%|██████▊   | 6425/9500 [22:01:40<10:33:13, 12.36s/it]08/03/2024 19:59:10 - INFO - __main__ -   Step: 6425, LR: 6.672793930617215e-06, Loss: 486.68609619140625
2024-08-04T02:59:22.069646571Z 
 68%|██████▊   | 6426/9500 [22:01:52<10:26:59, 12.24s/it]08/03/2024 19:59:22 - INFO - __main__ -   Step: 6426, LR: 6.670623386929936e-06, Loss: 349.94488525390625
2024-08-04T02:59:34.790362871Z 
 68%|██████▊   | 6427/9500 [22:02:04<10:34:12, 12.38s/it]08/03/2024 19:59:34 - INFO - __main__ -   Step: 6427, LR: 6.668452843242658e-06, Loss: 400.5044860839844
2024-08-04T02:59:47.033941215Z 
 68%|██████▊   | 6428/9500 [22:02:16<10:31:51, 12.34s/it]08/03/2024 19:59:47 - INFO - __main__ -   Step: 6428, LR: 6.666282299555378e-06, Loss: 413.33074951171875
2024-08-04T02:59:59.098808327Z 
 68%|██████▊   | 6429/9500 [22:02:29<10:27:24, 12.26s/it]08/03/2024 19:59:59 - INFO - __main__ -   Step: 6429, LR: 6.664111755868099e-06, Loss: 413.5335693359375
2024-08-04T03:00:11.938397927Z 
 68%|██████▊   | 6430/9500 [22:02:41<10:36:08, 12.43s/it]08/03/2024 20:00:11 - INFO - __main__ -   Step: 6430, LR: 6.66194121218082e-06, Loss: 394.344482421875
2024-08-04T03:00:24.060601635Z 
 68%|██████▊   | 6431/9500 [22:02:53<10:31:09, 12.34s/it]08/03/2024 20:00:24 - INFO - __main__ -   Step: 6431, LR: 6.659770668493542e-06, Loss: 484.59674072265625
2024-08-04T03:00:36.007697711Z 
 68%|██████▊   | 6432/9500 [22:03:05<10:24:56, 12.22s/it]08/03/2024 20:00:36 - INFO - __main__ -   Step: 6432, LR: 6.657600124806262e-06, Loss: 344.62890625
2024-08-04T03:00:48.769958628Z 
 68%|██████▊   | 6433/9500 [22:03:18<10:33:01, 12.38s/it]08/03/2024 20:00:48 - INFO - __main__ -   Step: 6433, LR: 6.655429581118984e-06, Loss: 438.69842529296875
2024-08-04T03:01:00.847302383Z 
 68%|██████▊   | 6434/9500 [22:03:30<10:28:07, 12.29s/it]08/03/2024 20:01:00 - INFO - __main__ -   Step: 6434, LR: 6.6532590374317054e-06, Loss: 517.0
2024-08-04T03:01:12.973388651Z 
 68%|██████▊   | 6435/9500 [22:03:42<10:25:22, 12.24s/it]08/03/2024 20:01:12 - INFO - __main__ -   Step: 6435, LR: 6.651088493744425e-06, Loss: 479.6346435546875
2024-08-04T03:01:25.665986271Z 
 68%|██████▊   | 6436/9500 [22:03:55<10:32:04, 12.38s/it]08/03/2024 20:01:25 - INFO - __main__ -   Step: 6436, LR: 6.648917950057147e-06, Loss: 322.23822021484375
2024-08-04T03:01:37.955189098Z 
 68%|██████▊   | 6437/9500 [22:04:07<10:30:30, 12.35s/it]08/03/2024 20:01:37 - INFO - __main__ -   Step: 6437, LR: 6.6467474063698676e-06, Loss: 454.92376708984375
2024-08-04T03:01:50.310818935Z 
 68%|██████▊   | 6438/9500 [22:04:20<10:30:22, 12.35s/it]08/03/2024 20:01:50 - INFO - __main__ -   Step: 6438, LR: 6.644576862682589e-06, Loss: 495.80291748046875
2024-08-04T03:02:03.226415915Z 
 68%|██████▊   | 6439/9500 [22:04:33<10:38:47, 12.52s/it]08/03/2024 20:02:03 - INFO - __main__ -   Step: 6439, LR: 6.642406318995311e-06, Loss: 321.6700744628906
2024-08-04T03:02:15.382815443Z 
 68%|██████▊   | 6440/9500 [22:04:45<10:33:00, 12.41s/it]08/03/2024 20:02:15 - INFO - __main__ -   Step: 6440, LR: 6.640235775308031e-06, Loss: 487.64520263671875
2024-08-04T03:02:27.292489958Z 
 68%|██████▊   | 6441/9500 [22:04:57<10:25:06, 12.26s/it]08/03/2024 20:02:27 - INFO - __main__ -   Step: 6441, LR: 6.638065231620753e-06, Loss: 373.9996337890625
2024-08-04T03:02:39.716406204Z 
 68%|██████▊   | 6442/9500 [22:05:09<10:27:23, 12.31s/it]08/03/2024 20:02:39 - INFO - __main__ -   Step: 6442, LR: 6.635894687933473e-06, Loss: 376.1737060546875
2024-08-04T03:02:51.841879933Z 
 68%|██████▊   | 6443/9500 [22:05:21<10:24:22, 12.25s/it]08/03/2024 20:02:51 - INFO - __main__ -   Step: 6443, LR: 6.633724144246194e-06, Loss: 442.6592102050781
2024-08-04T03:03:03.903182304Z 
 68%|██████▊   | 6444/9500 [22:05:33<10:21:13, 12.20s/it]08/03/2024 20:03:03 - INFO - __main__ -   Step: 6444, LR: 6.631553600558915e-06, Loss: 417.77996826171875
2024-08-04T03:03:16.373688421Z 
 68%|██████▊   | 6445/9500 [22:05:46<10:25:11, 12.28s/it]08/03/2024 20:03:16 - INFO - __main__ -   Step: 6445, LR: 6.629383056871637e-06, Loss: 517.0030517578125
2024-08-04T03:03:29.122079925Z 
 68%|██████▊   | 6446/9500 [22:05:59<10:32:09, 12.42s/it]08/03/2024 20:03:29 - INFO - __main__ -   Step: 6446, LR: 6.627212513184358e-06, Loss: 466.3528137207031
2024-08-04T03:03:41.123668566Z 
 68%|██████▊   | 6447/9500 [22:06:11<10:25:34, 12.29s/it]08/03/2024 20:03:41 - INFO - __main__ -   Step: 6447, LR: 6.625041969497079e-06, Loss: 472.7657165527344
2024-08-04T03:03:53.145398100Z 
 68%|██████▊   | 6448/9500 [22:06:23<10:21:12, 12.21s/it]08/03/2024 20:03:53 - INFO - __main__ -   Step: 6448, LR: 6.6228714258098005e-06, Loss: 429.123779296875
2024-08-04T03:04:05.795424819Z 
 68%|██████▊   | 6449/9500 [22:06:35<10:27:40, 12.34s/it]08/03/2024 20:04:05 - INFO - __main__ -   Step: 6449, LR: 6.62070088212252e-06, Loss: 338.82073974609375
2024-08-04T03:04:18.109370855Z 
 68%|██████▊   | 6450/9500 [22:06:48<10:27:00, 12.33s/it]08/03/2024 20:04:18 - INFO - __main__ -   Step: 6450, LR: 6.618530338435242e-06, Loss: 479.0382385253906
2024-08-04T03:04:30.720525237Z 
 68%|██████▊   | 6451/9500 [22:07:00<10:31:01, 12.42s/it]08/03/2024 20:04:30 - INFO - __main__ -   Step: 6451, LR: 6.616359794747963e-06, Loss: 391.41680908203125
2024-08-04T03:04:43.420214115Z 
 68%|██████▊   | 6452/9500 [22:07:13<10:35:07, 12.50s/it]08/03/2024 20:04:43 - INFO - __main__ -   Step: 6452, LR: 6.614189251060684e-06, Loss: 488.15484619140625
2024-08-04T03:04:55.752548700Z 
 68%|██████▊   | 6453/9500 [22:07:25<10:32:19, 12.45s/it]08/03/2024 20:04:55 - INFO - __main__ -   Step: 6453, LR: 6.612018707373406e-06, Loss: 449.8326110839844
2024-08-04T03:05:07.690847552Z 
 68%|██████▊   | 6454/9500 [22:07:37<10:24:17, 12.30s/it]08/03/2024 20:05:07 - INFO - __main__ -   Step: 6454, LR: 6.6098481636861265e-06, Loss: 263.64398193359375
2024-08-04T03:05:20.270200568Z 
 68%|██████▊   | 6455/9500 [22:07:50<10:28:23, 12.38s/it]08/03/2024 20:05:20 - INFO - __main__ -   Step: 6455, LR: 6.607677619998848e-06, Loss: 483.15960693359375
2024-08-04T03:05:32.349824610Z 
 68%|██████▊   | 6456/9500 [22:08:02<10:23:34, 12.29s/it]08/03/2024 20:05:32 - INFO - __main__ -   Step: 6456, LR: 6.605507076311568e-06, Loss: 341.47412109375
2024-08-04T03:05:44.329532285Z 
 68%|██████▊   | 6457/9500 [22:08:14<10:18:37, 12.20s/it]08/03/2024 20:05:44 - INFO - __main__ -   Step: 6457, LR: 6.6033365326242894e-06, Loss: 397.539306640625
2024-08-04T03:05:56.948500320Z 
 68%|██████▊   | 6458/9500 [22:08:26<10:24:50, 12.32s/it]08/03/2024 20:05:56 - INFO - __main__ -   Step: 6458, LR: 6.60116598893701e-06, Loss: 331.21246337890625
2024-08-04T03:06:09.046672816Z 
 68%|██████▊   | 6459/9500 [22:08:38<10:21:11, 12.26s/it]08/03/2024 20:06:09 - INFO - __main__ -   Step: 6459, LR: 6.598995445249732e-06, Loss: 377.5115661621094
2024-08-04T03:06:21.547845501Z 
 68%|██████▊   | 6460/9500 [22:08:51<10:24:42, 12.33s/it]08/03/2024 20:06:21 - INFO - __main__ -   Step: 6460, LR: 6.596824901562453e-06, Loss: 482.6927795410156
2024-08-04T03:06:34.408436014Z 
 68%|██████▊   | 6461/9500 [22:09:04<10:32:34, 12.49s/it]08/03/2024 20:06:34 - INFO - __main__ -   Step: 6461, LR: 6.594654357875174e-06, Loss: 362.2828369140625
2024-08-04T03:06:46.591302679Z 
 68%|██████▊   | 6462/9500 [22:09:16<10:27:42, 12.40s/it]08/03/2024 20:06:46 - INFO - __main__ -   Step: 6462, LR: 6.5924838141878956e-06, Loss: 397.8397216796875
2024-08-04T03:06:58.641322911Z 
 68%|██████▊   | 6463/9500 [22:09:28<10:22:13, 12.29s/it]08/03/2024 20:06:58 - INFO - __main__ -   Step: 6463, LR: 6.590313270500615e-06, Loss: 340.8326110839844
2024-08-04T03:07:11.212662813Z 
 68%|██████▊   | 6464/9500 [22:09:41<10:26:15, 12.38s/it]08/03/2024 20:07:11 - INFO - __main__ -   Step: 6464, LR: 6.588142726813337e-06, Loss: 433.1847229003906
2024-08-04T03:07:23.251979526Z 
 68%|██████▊   | 6465/9500 [22:09:53<10:20:55, 12.28s/it]08/03/2024 20:07:23 - INFO - __main__ -   Step: 6465, LR: 6.5859721831260585e-06, Loss: 368.97784423828125
2024-08-04T03:07:35.907608754Z 
 68%|██████▊   | 6466/9500 [22:10:05<10:26:29, 12.39s/it]08/03/2024 20:07:35 - INFO - __main__ -   Step: 6466, LR: 6.583801639438779e-06, Loss: 503.63885498046875
2024-08-04T03:07:48.560961080Z 
 68%|██████▊   | 6467/9500 [22:10:18<10:30:17, 12.47s/it]08/03/2024 20:07:48 - INFO - __main__ -   Step: 6467, LR: 6.581631095751501e-06, Loss: 373.48504638671875
2024-08-04T03:08:00.613774799Z 
 68%|██████▊   | 6468/9500 [22:10:30<10:23:46, 12.34s/it]08/03/2024 20:08:00 - INFO - __main__ -   Step: 6468, LR: 6.5794605520642215e-06, Loss: 417.930419921875
2024-08-04T03:08:12.598107890Z 
 68%|██████▊   | 6469/9500 [22:10:42<10:18:07, 12.24s/it]08/03/2024 20:08:12 - INFO - __main__ -   Step: 6469, LR: 6.577290008376943e-06, Loss: 453.94805908203125
2024-08-04T03:08:25.378470933Z 
 68%|██████▊   | 6470/9500 [22:10:55<10:26:09, 12.40s/it]08/03/2024 20:08:25 - INFO - __main__ -   Step: 6470, LR: 6.575119464689663e-06, Loss: 496.0740966796875
2024-08-04T03:08:37.464555019Z 
 68%|██████▊   | 6471/9500 [22:11:07<10:21:12, 12.31s/it]08/03/2024 20:08:37 - INFO - __main__ -   Step: 6471, LR: 6.5729489210023845e-06, Loss: 435.19293212890625
2024-08-04T03:08:49.448174856Z 
 68%|██████▊   | 6472/9500 [22:11:19<10:16:08, 12.21s/it]08/03/2024 20:08:49 - INFO - __main__ -   Step: 6472, LR: 6.570778377315106e-06, Loss: 360.4384765625
2024-08-04T03:09:02.157861780Z 
 68%|██████▊   | 6473/9500 [22:11:32<10:23:30, 12.36s/it]08/03/2024 20:09:02 - INFO - __main__ -   Step: 6473, LR: 6.568607833627827e-06, Loss: 470.6068115234375
2024-08-04T03:09:14.596337775Z 
 68%|██████▊   | 6474/9500 [22:11:44<10:24:30, 12.38s/it]08/03/2024 20:09:14 - INFO - __main__ -   Step: 6474, LR: 6.566437289940548e-06, Loss: 432.8372802734375
2024-08-04T03:09:27.091412830Z 
 68%|██████▊   | 6475/9500 [22:11:57<10:25:59, 12.42s/it]08/03/2024 20:09:27 - INFO - __main__ -   Step: 6475, LR: 6.564266746253269e-06, Loss: 349.8072204589844
2024-08-04T03:09:39.753218187Z 
 68%|██████▊   | 6476/9500 [22:12:09<10:29:30, 12.49s/it]08/03/2024 20:09:39 - INFO - __main__ -   Step: 6476, LR: 6.562096202565991e-06, Loss: 380.42779541015625
2024-08-04T03:09:51.680652701Z 
 68%|██████▊   | 6477/9500 [22:12:21<10:20:47, 12.32s/it]08/03/2024 20:09:51 - INFO - __main__ -   Step: 6477, LR: 6.5599256588787105e-06, Loss: 300.97955322265625
2024-08-04T03:10:03.896388023Z 
 68%|██████▊   | 6478/9500 [22:12:33<10:18:59, 12.29s/it]08/03/2024 20:10:03 - INFO - __main__ -   Step: 6478, LR: 6.557755115191432e-06, Loss: 444.0574951171875
2024-08-04T03:10:16.505216194Z 
 68%|██████▊   | 6479/9500 [22:12:46<10:23:36, 12.39s/it]08/03/2024 20:10:16 - INFO - __main__ -   Step: 6479, LR: 6.555584571504154e-06, Loss: 353.8166809082031
2024-08-04T03:10:28.605796672Z 
 68%|██████▊   | 6480/9500 [22:12:58<10:19:05, 12.30s/it]08/03/2024 20:10:28 - INFO - __main__ -   Step: 6480, LR: 6.553414027816874e-06, Loss: 340.7652282714844
2024-08-04T03:10:40.729659800Z 
 68%|██████▊   | 6481/9500 [22:13:10<10:16:13, 12.25s/it]08/03/2024 20:10:40 - INFO - __main__ -   Step: 6481, LR: 6.551243484129596e-06, Loss: 343.69635009765625
2024-08-04T03:10:53.783402870Z 
 68%|██████▊   | 6482/9500 [22:13:23<10:28:12, 12.49s/it]08/03/2024 20:10:53 - INFO - __main__ -   Step: 6482, LR: 6.5490729404423174e-06, Loss: 412.30145263671875
2024-08-04T03:11:05.902507644Z 
 68%|██████▊   | 6483/9500 [22:13:35<10:22:24, 12.38s/it]08/03/2024 20:11:05 - INFO - __main__ -   Step: 6483, LR: 6.546902396755038e-06, Loss: 388.1880798339844
2024-08-04T03:11:18.020215315Z 
 68%|██████▊   | 6484/9500 [22:13:47<10:18:16, 12.30s/it]08/03/2024 20:11:18 - INFO - __main__ -   Step: 6484, LR: 6.544731853067758e-06, Loss: 454.2308044433594
2024-08-04T03:11:30.670988726Z 
 68%|██████▊   | 6485/9500 [22:14:00<10:23:21, 12.41s/it]08/03/2024 20:11:30 - INFO - __main__ -   Step: 6485, LR: 6.5425613093804796e-06, Loss: 480.49505615234375
2024-08-04T03:11:42.714301680Z 
 68%|██████▊   | 6486/9500 [22:14:12<10:17:42, 12.30s/it]08/03/2024 20:11:42 - INFO - __main__ -   Step: 6486, LR: 6.540390765693201e-06, Loss: 450.949462890625
2024-08-04T03:11:54.766094910Z 
 68%|██████▊   | 6487/9500 [22:14:24<10:13:48, 12.22s/it]08/03/2024 20:11:54 - INFO - __main__ -   Step: 6487, LR: 6.538220222005922e-06, Loss: 538.086181640625
2024-08-04T03:12:06.739553404Z 
 68%|██████▊   | 6488/9500 [22:14:36<10:09:50, 12.15s/it]08/03/2024 20:12:06 - INFO - __main__ -   Step: 6488, LR: 6.536049678318643e-06, Loss: 405.822021484375
2024-08-04T03:12:19.208090842Z 
 68%|██████▊   | 6489/9500 [22:14:49<10:14:27, 12.24s/it]08/03/2024 20:12:19 - INFO - __main__ -   Step: 6489, LR: 6.533879134631365e-06, Loss: 299.7955322265625
2024-08-04T03:12:31.036846045Z 
 68%|██████▊   | 6490/9500 [22:15:00<10:08:00, 12.12s/it]08/03/2024 20:12:31 - INFO - __main__ -   Step: 6490, LR: 6.531708590944086e-06, Loss: 291.911376953125
2024-08-04T03:12:43.275714744Z 
 68%|██████▊   | 6491/9500 [22:15:13<10:09:35, 12.16s/it]08/03/2024 20:12:43 - INFO - __main__ -   Step: 6491, LR: 6.529538047256806e-06, Loss: 390.84783935546875
2024-08-04T03:12:55.913679021Z 
 68%|██████▊   | 6492/9500 [22:15:25<10:16:38, 12.30s/it]08/03/2024 20:12:55 - INFO - __main__ -   Step: 6492, LR: 6.527367503569527e-06, Loss: 364.12322998046875
2024-08-04T03:13:07.899633348Z 
 68%|██████▊   | 6493/9500 [22:15:37<10:11:43, 12.21s/it]08/03/2024 20:13:07 - INFO - __main__ -   Step: 6493, LR: 6.525196959882249e-06, Loss: 341.10430908203125
2024-08-04T03:13:20.121971895Z 
 68%|██████▊   | 6494/9500 [22:15:50<10:11:45, 12.21s/it]08/03/2024 20:13:20 - INFO - __main__ -   Step: 6494, LR: 6.523026416194969e-06, Loss: 297.1942443847656
2024-08-04T03:13:32.512501397Z 
 68%|██████▊   | 6495/9500 [22:16:02<10:14:15, 12.26s/it]08/03/2024 20:13:32 - INFO - __main__ -   Step: 6495, LR: 6.520855872507691e-06, Loss: 394.672607421875
2024-08-04T03:13:44.544242775Z 
 68%|██████▊   | 6496/9500 [22:16:14<10:10:33, 12.19s/it]08/03/2024 20:13:44 - INFO - __main__ -   Step: 6496, LR: 6.5186853288204125e-06, Loss: 362.096435546875
2024-08-04T03:13:56.806534073Z 
 68%|██████▊   | 6497/9500 [22:16:26<10:11:22, 12.22s/it]08/03/2024 20:13:56 - INFO - __main__ -   Step: 6497, LR: 6.516514785133133e-06, Loss: 357.2099914550781
2024-08-04T03:14:09.733997545Z 
 68%|██████▊   | 6498/9500 [22:16:39<10:21:51, 12.43s/it]08/03/2024 20:14:09 - INFO - __main__ -   Step: 6498, LR: 6.514344241445854e-06, Loss: 488.4126892089844
2024-08-04T03:14:22.095808211Z 
 68%|██████▊   | 6499/9500 [22:16:52<10:20:38, 12.41s/it]08/03/2024 20:14:22 - INFO - __main__ -   Step: 6499, LR: 6.512173697758575e-06, Loss: 430.04730224609375
2024-08-04T03:14:34.347740644Z 
 68%|██████▊   | 6500/9500 [22:17:04<10:18:05, 12.36s/it]08/03/2024 20:14:34 - INFO - __main__ -   Step: 6500, LR: 6.510003154071296e-06, Loss: 444.7600402832031
2024-08-04T03:14:46.863604204Z 
 68%|██████▊   | 6501/9500 [22:17:16<10:20:11, 12.41s/it]08/03/2024 20:14:46 - INFO - __main__ -   Step: 6501, LR: 6.507832610384017e-06, Loss: 390.18218994140625
2024-08-04T03:14:58.910001076Z 
 68%|██████▊   | 6502/9500 [22:17:28<10:14:33, 12.30s/it]08/03/2024 20:14:58 - INFO - __main__ -   Step: 6502, LR: 6.5056620666967385e-06, Loss: 502.80377197265625
2024-08-04T03:15:10.971353046Z 
 68%|██████▊   | 6503/9500 [22:17:40<10:10:47, 12.23s/it]08/03/2024 20:15:10 - INFO - __main__ -   Step: 6503, LR: 6.50349152300946e-06, Loss: 397.0972595214844
2024-08-04T03:15:23.619366657Z 
 68%|██████▊   | 6504/9500 [22:17:53<10:16:52, 12.35s/it]08/03/2024 20:15:23 - INFO - __main__ -   Step: 6504, LR: 6.501320979322181e-06, Loss: 381.9853515625
2024-08-04T03:15:35.814898607Z 
 68%|██████▊   | 6505/9500 [22:18:05<10:14:17, 12.31s/it]08/03/2024 20:15:35 - INFO - __main__ -   Step: 6505, LR: 6.4991504356349014e-06, Loss: 482.0108947753906
2024-08-04T03:15:47.904047217Z 
 68%|██████▊   | 6506/9500 [22:18:17<10:10:50, 12.24s/it]08/03/2024 20:15:47 - INFO - __main__ -   Step: 6506, LR: 6.496979891947622e-06, Loss: 436.61102294921875
2024-08-04T03:16:00.745294161Z 
 68%|██████▊   | 6507/9500 [22:18:30<10:19:36, 12.42s/it]08/03/2024 20:16:00 - INFO - __main__ -   Step: 6507, LR: 6.494809348260344e-06, Loss: 410.0407409667969
2024-08-04T03:16:12.907063454Z 
 69%|██████▊   | 6508/9500 [22:18:42<10:15:31, 12.34s/it]08/03/2024 20:16:12 - INFO - __main__ -   Step: 6508, LR: 6.492638804573064e-06, Loss: 384.2265930175781
2024-08-04T03:16:25.478190675Z 
 69%|██████▊   | 6509/9500 [22:18:55<10:18:43, 12.41s/it]08/03/2024 20:16:25 - INFO - __main__ -   Step: 6509, LR: 6.490468260885786e-06, Loss: 616.169189453125
2024-08-04T03:16:38.181548796Z 
 69%|██████▊   | 6510/9500 [22:19:08<10:22:52, 12.50s/it]08/03/2024 20:16:38 - INFO - __main__ -   Step: 6510, LR: 6.4882977171985075e-06, Loss: 424.050537109375
2024-08-04T03:16:50.376728346Z 
 69%|██████▊   | 6511/9500 [22:19:20<10:18:07, 12.41s/it]08/03/2024 20:16:50 - INFO - __main__ -   Step: 6511, LR: 6.486127173511228e-06, Loss: 418.396728515625
2024-08-04T03:17:02.665943341Z 
 69%|██████▊   | 6512/9500 [22:19:32<10:16:08, 12.37s/it]08/03/2024 20:17:02 - INFO - __main__ -   Step: 6512, LR: 6.483956629823949e-06, Loss: 416.08074951171875
2024-08-04T03:17:15.296533980Z 
 69%|██████▊   | 6513/9500 [22:19:45<10:19:47, 12.45s/it]08/03/2024 20:17:15 - INFO - __main__ -   Step: 6513, LR: 6.48178608613667e-06, Loss: 389.4754638671875
2024-08-04T03:17:27.348731404Z 
 69%|██████▊   | 6514/9500 [22:19:57<10:13:39, 12.33s/it]08/03/2024 20:17:27 - INFO - __main__ -   Step: 6514, LR: 6.479615542449391e-06, Loss: 503.5170593261719
2024-08-04T03:17:39.621361107Z 
 69%|██████▊   | 6515/9500 [22:20:09<10:12:34, 12.31s/it]08/03/2024 20:17:39 - INFO - __main__ -   Step: 6515, LR: 6.477444998762113e-06, Loss: 463.279052734375
2024-08-04T03:17:51.969775288Z 
 69%|██████▊   | 6516/9500 [22:20:21<10:12:54, 12.32s/it]08/03/2024 20:17:51 - INFO - __main__ -   Step: 6516, LR: 6.4752744550748335e-06, Loss: 410.8752746582031
2024-08-04T03:18:04.238517874Z 
 69%|██████▊   | 6517/9500 [22:20:34<10:11:52, 12.31s/it]08/03/2024 20:18:04 - INFO - __main__ -   Step: 6517, LR: 6.473103911387555e-06, Loss: 361.208740234375
2024-08-04T03:18:16.390686339Z 
 69%|██████▊   | 6518/9500 [22:20:46<10:09:21, 12.26s/it]08/03/2024 20:18:16 - INFO - __main__ -   Step: 6518, LR: 6.470933367700275e-06, Loss: 458.8299560546875
2024-08-04T03:18:28.810317135Z 
 69%|██████▊   | 6519/9500 [22:20:58<10:11:31, 12.31s/it]08/03/2024 20:18:28 - INFO - __main__ -   Step: 6519, LR: 6.4687628240129965e-06, Loss: 372.4388122558594
2024-08-04T03:18:41.076558612Z 
 69%|██████▊   | 6520/9500 [22:21:11<10:10:41, 12.30s/it]08/03/2024 20:18:41 - INFO - __main__ -   Step: 6520, LR: 6.466592280325717e-06, Loss: 468.62261962890625
2024-08-04T03:18:53.071108852Z 
 69%|██████▊   | 6521/9500 [22:21:23<10:05:59, 12.21s/it]08/03/2024 20:18:53 - INFO - __main__ -   Step: 6521, LR: 6.464421736638439e-06, Loss: 295.54541015625
2024-08-04T03:19:05.684183349Z 
 69%|██████▊   | 6522/9500 [22:21:35<10:11:51, 12.33s/it]08/03/2024 20:19:05 - INFO - __main__ -   Step: 6522, LR: 6.46225119295116e-06, Loss: 492.714111328125
2024-08-04T03:19:17.673647151Z 
 69%|██████▊   | 6523/9500 [22:21:47<10:06:37, 12.23s/it]08/03/2024 20:19:17 - INFO - __main__ -   Step: 6523, LR: 6.460080649263881e-06, Loss: 396.9290466308594
2024-08-04T03:19:29.609825811Z 
 69%|██████▊   | 6524/9500 [22:21:59<10:02:06, 12.14s/it]08/03/2024 20:19:29 - INFO - __main__ -   Step: 6524, LR: 6.457910105576603e-06, Loss: 501.4263916015625
2024-08-04T03:19:42.188869050Z 
 69%|██████▊   | 6525/9500 [22:22:12<10:08:26, 12.27s/it]08/03/2024 20:19:42 - INFO - __main__ -   Step: 6525, LR: 6.4557395618893225e-06, Loss: 504.1907958984375
2024-08-04T03:19:54.423365321Z 
 69%|██████▊   | 6526/9500 [22:22:24<10:07:41, 12.26s/it]08/03/2024 20:19:54 - INFO - __main__ -   Step: 6526, LR: 6.453569018202044e-06, Loss: 384.0087890625
2024-08-04T03:20:06.581988579Z 
 69%|██████▊   | 6527/9500 [22:22:36<10:05:58, 12.23s/it]08/03/2024 20:20:06 - INFO - __main__ -   Step: 6527, LR: 6.451398474514765e-06, Loss: 378.9049072265625
2024-08-04T03:20:19.278085207Z 
 69%|██████▊   | 6528/9500 [22:22:49<10:12:41, 12.37s/it]08/03/2024 20:20:19 - INFO - __main__ -   Step: 6528, LR: 6.449227930827486e-06, Loss: 437.4654541015625
2024-08-04T03:20:31.267684867Z 
 69%|██████▊   | 6529/9500 [22:23:01<10:06:51, 12.26s/it]08/03/2024 20:20:31 - INFO - __main__ -   Step: 6529, LR: 6.447057387140208e-06, Loss: 504.8552551269531
2024-08-04T03:20:43.265086717Z 
 69%|██████▊   | 6530/9500 [22:23:13<10:02:49, 12.18s/it]08/03/2024 20:20:43 - INFO - __main__ -   Step: 6530, LR: 6.4448868434529286e-06, Loss: 340.34576416015625
2024-08-04T03:20:55.234712471Z 
 69%|██████▊   | 6531/9500 [22:23:25<9:59:31, 12.12s/it] 08/03/2024 20:20:55 - INFO - __main__ -   Step: 6531, LR: 6.44271629976565e-06, Loss: 439.72705078125
2024-08-04T03:21:07.889571677Z 
 69%|██████▉   | 6532/9500 [22:23:37<10:07:19, 12.28s/it]08/03/2024 20:21:07 - INFO - __main__ -   Step: 6532, LR: 6.44054575607837e-06, Loss: 372.91497802734375
2024-08-04T03:21:20.066212088Z 
 69%|██████▉   | 6533/9500 [22:23:50<10:05:37, 12.25s/it]08/03/2024 20:21:20 - INFO - __main__ -   Step: 6533, LR: 6.4383752123910916e-06, Loss: 406.70843505859375
2024-08-04T03:21:32.226587506Z 
 69%|██████▉   | 6534/9500 [22:24:02<10:04:07, 12.22s/it]08/03/2024 20:21:32 - INFO - __main__ -   Step: 6534, LR: 6.436204668703812e-06, Loss: 341.91705322265625
2024-08-04T03:21:44.631725766Z 
 69%|██████▉   | 6535/9500 [22:24:14<10:06:39, 12.28s/it]08/03/2024 20:21:44 - INFO - __main__ -   Step: 6535, LR: 6.434034125016534e-06, Loss: 378.70086669921875
2024-08-04T03:21:56.741142758Z 
 69%|██████▉   | 6536/9500 [22:24:26<10:03:58, 12.23s/it]08/03/2024 20:21:56 - INFO - __main__ -   Step: 6536, LR: 6.431863581329255e-06, Loss: 479.04071044921875
2024-08-04T03:22:08.759653974Z 
 69%|██████▉   | 6537/9500 [22:24:38<10:00:42, 12.16s/it]08/03/2024 20:22:08 - INFO - __main__ -   Step: 6537, LR: 6.429693037641976e-06, Loss: 371.7742919921875
2024-08-04T03:22:21.368621419Z 
 69%|██████▉   | 6538/9500 [22:24:51<10:07:04, 12.30s/it]08/03/2024 20:22:21 - INFO - __main__ -   Step: 6538, LR: 6.427522493954698e-06, Loss: 412.0572509765625
2024-08-04T03:22:33.438427947Z 
 69%|██████▉   | 6539/9500 [22:25:03<10:03:30, 12.23s/it]08/03/2024 20:22:33 - INFO - __main__ -   Step: 6539, LR: 6.4253519502674175e-06, Loss: 384.417236328125
2024-08-04T03:22:45.818847320Z 
 69%|██████▉   | 6540/9500 [22:25:15<10:05:32, 12.27s/it]08/03/2024 20:22:45 - INFO - __main__ -   Step: 6540, LR: 6.423181406580139e-06, Loss: 382.97052001953125
2024-08-04T03:22:58.634083349Z 
 69%|██████▉   | 6541/9500 [22:25:28<10:13:20, 12.44s/it]08/03/2024 20:22:58 - INFO - __main__ -   Step: 6541, LR: 6.421010862892861e-06, Loss: 518.8167724609375
2024-08-04T03:23:10.847497406Z 
 69%|██████▉   | 6542/9500 [22:25:40<10:09:49, 12.37s/it]08/03/2024 20:23:10 - INFO - __main__ -   Step: 6542, LR: 6.418840319205581e-06, Loss: 383.3498229980469
2024-08-04T03:23:23.207925056Z 
 69%|██████▉   | 6543/9500 [22:25:53<10:09:29, 12.37s/it]08/03/2024 20:23:23 - INFO - __main__ -   Step: 6543, LR: 6.416669775518303e-06, Loss: 438.0327453613281
2024-08-04T03:23:35.510976033Z 
 69%|██████▉   | 6544/9500 [22:26:05<10:08:20, 12.35s/it]08/03/2024 20:23:35 - INFO - __main__ -   Step: 6544, LR: 6.414499231831024e-06, Loss: 371.07708740234375
2024-08-04T03:23:47.412431167Z 
 69%|██████▉   | 6545/9500 [22:26:17<10:01:32, 12.21s/it]08/03/2024 20:23:47 - INFO - __main__ -   Step: 6545, LR: 6.412328688143745e-06, Loss: 433.8583068847656
2024-08-04T03:23:59.546967615Z 
 69%|██████▉   | 6546/9500 [22:26:29<10:00:09, 12.19s/it]08/03/2024 20:23:59 - INFO - __main__ -   Step: 6546, LR: 6.410158144456465e-06, Loss: 465.8955078125
2024-08-04T03:24:11.948561688Z 
 69%|██████▉   | 6547/9500 [22:26:41<10:03:04, 12.25s/it]08/03/2024 20:24:11 - INFO - __main__ -   Step: 6547, LR: 6.407987600769187e-06, Loss: 371.42694091796875
2024-08-04T03:24:23.778771911Z 
 69%|██████▉   | 6548/9500 [22:26:53<9:56:37, 12.13s/it] 08/03/2024 20:24:23 - INFO - __main__ -   Step: 6548, LR: 6.405817057081908e-06, Loss: 300.62579345703125
2024-08-04T03:24:36.018820176Z 
 69%|██████▉   | 6549/9500 [22:27:05<9:58:05, 12.16s/it]08/03/2024 20:24:36 - INFO - __main__ -   Step: 6549, LR: 6.403646513394629e-06, Loss: 514.6953125
2024-08-04T03:24:49.060419792Z 
 69%|██████▉   | 6550/9500 [22:27:18<10:10:53, 12.42s/it]08/03/2024 20:24:49 - INFO - __main__ -   Step: 6550, LR: 6.4014759697073504e-06, Loss: 339.12542724609375
2024-08-04T03:25:01.348174343Z 
 69%|██████▉   | 6551/9500 [22:27:31<10:08:39, 12.38s/it]08/03/2024 20:25:01 - INFO - __main__ -   Step: 6551, LR: 6.399305426020071e-06, Loss: 481.91119384765625
2024-08-04T03:25:13.359802266Z 
 69%|██████▉   | 6552/9500 [22:27:43<10:02:58, 12.27s/it]08/03/2024 20:25:13 - INFO - __main__ -   Step: 6552, LR: 6.397134882332793e-06, Loss: 488.78900146484375
2024-08-04T03:25:25.963673945Z 
 69%|██████▉   | 6553/9500 [22:27:55<10:07:39, 12.37s/it]08/03/2024 20:25:25 - INFO - __main__ -   Step: 6553, LR: 6.394964338645513e-06, Loss: 380.4735107421875
2024-08-04T03:25:37.937557458Z 
 69%|██████▉   | 6554/9500 [22:28:07<10:01:35, 12.25s/it]08/03/2024 20:25:37 - INFO - __main__ -   Step: 6554, LR: 6.392793794958234e-06, Loss: 355.91668701171875
2024-08-04T03:25:50.050770108Z 
 69%|██████▉   | 6555/9500 [22:28:19<9:59:19, 12.21s/it] 08/03/2024 20:25:50 - INFO - __main__ -   Step: 6555, LR: 6.390623251270956e-06, Loss: 323.13739013671875
2024-08-04T03:26:02.395429909Z 
 69%|██████▉   | 6556/9500 [22:28:32<10:01:06, 12.25s/it]08/03/2024 20:26:02 - INFO - __main__ -   Step: 6556, LR: 6.388452707583676e-06, Loss: 405.051025390625
2024-08-04T03:26:14.556126322Z 
 69%|██████▉   | 6557/9500 [22:28:44<9:59:34, 12.22s/it] 08/03/2024 20:26:14 - INFO - __main__ -   Step: 6557, LR: 6.386282163896398e-06, Loss: 409.38226318359375
2024-08-04T03:26:26.691787976Z 
 69%|██████▉   | 6558/9500 [22:28:56<9:58:04, 12.20s/it]08/03/2024 20:26:26 - INFO - __main__ -   Step: 6558, LR: 6.3841116202091195e-06, Loss: 439.98876953125
2024-08-04T03:26:39.687126396Z 
 69%|██████▉   | 6559/9500 [22:29:09<10:09:36, 12.44s/it]08/03/2024 20:26:39 - INFO - __main__ -   Step: 6559, LR: 6.38194107652184e-06, Loss: 519.9808959960938
2024-08-04T03:26:51.732902116Z 
 69%|██████▉   | 6560/9500 [22:29:21<10:03:38, 12.32s/it]08/03/2024 20:26:51 - INFO - __main__ -   Step: 6560, LR: 6.37977053283456e-06, Loss: 357.43963623046875
2024-08-04T03:27:03.797824694Z 
 69%|██████▉   | 6561/9500 [22:29:33<9:59:42, 12.24s/it] 08/03/2024 20:27:03 - INFO - __main__ -   Step: 6561, LR: 6.377599989147282e-06, Loss: 435.8341064453125
2024-08-04T03:27:16.504872536Z 
 69%|██████▉   | 6562/9500 [22:29:46<10:06:19, 12.38s/it]08/03/2024 20:27:16 - INFO - __main__ -   Step: 6562, LR: 6.375429445460003e-06, Loss: 406.4881591796875
2024-08-04T03:27:28.597053132Z 
 69%|██████▉   | 6563/9500 [22:29:58<10:01:50, 12.30s/it]08/03/2024 20:27:28 - INFO - __main__ -   Step: 6563, LR: 6.373258901772724e-06, Loss: 434.57666015625
2024-08-04T03:27:40.547967972Z 
 69%|██████▉   | 6564/9500 [22:30:10<9:56:35, 12.19s/it] 08/03/2024 20:27:40 - INFO - __main__ -   Step: 6564, LR: 6.3710883580854455e-06, Loss: 356.0278015136719
2024-08-04T03:27:52.739987750Z 
 69%|██████▉   | 6565/9500 [22:30:22<9:56:23, 12.19s/it]08/03/2024 20:27:52 - INFO - __main__ -   Step: 6565, LR: 6.368917814398167e-06, Loss: 352.5345458984375
2024-08-04T03:28:04.815942318Z 
 69%|██████▉   | 6566/9500 [22:30:34<9:54:28, 12.16s/it]08/03/2024 20:28:04 - INFO - __main__ -   Step: 6566, LR: 6.366747270710888e-06, Loss: 417.3818054199219
2024-08-04T03:28:16.991048404Z 
 69%|██████▉   | 6567/9500 [22:30:46<9:54:32, 12.16s/it]08/03/2024 20:28:16 - INFO - __main__ -   Step: 6567, LR: 6.3645767270236085e-06, Loss: 409.4355773925781
2024-08-04T03:28:29.591821639Z 
 69%|██████▉   | 6568/9500 [22:30:59<10:00:45, 12.29s/it]08/03/2024 20:28:29 - INFO - __main__ -   Step: 6568, LR: 6.362406183336329e-06, Loss: 372.84161376953125
2024-08-04T03:28:41.674629212Z 
 69%|██████▉   | 6569/9500 [22:31:11<9:57:27, 12.23s/it] 08/03/2024 20:28:41 - INFO - __main__ -   Step: 6569, LR: 6.360235639649051e-06, Loss: 456.94903564453125
2024-08-04T03:28:53.601984672Z 
 69%|██████▉   | 6570/9500 [22:31:23<9:52:49, 12.14s/it]08/03/2024 20:28:53 - INFO - __main__ -   Step: 6570, LR: 6.3580650959617715e-06, Loss: 434.97442626953125
2024-08-04T03:29:06.159400533Z 
 69%|██████▉   | 6571/9500 [22:31:36<9:58:44, 12.27s/it]08/03/2024 20:29:06 - INFO - __main__ -   Step: 6571, LR: 6.355894552274493e-06, Loss: 417.05963134765625
2024-08-04T03:29:18.589972107Z 
 69%|██████▉   | 6572/9500 [22:31:48<10:00:57, 12.31s/it]08/03/2024 20:29:18 - INFO - __main__ -   Step: 6572, LR: 6.353724008587215e-06, Loss: 381.6543273925781
2024-08-04T03:29:30.786210604Z 
 69%|██████▉   | 6573/9500 [22:32:00<9:59:00, 12.28s/it] 08/03/2024 20:29:30 - INFO - __main__ -   Step: 6573, LR: 6.351553464899935e-06, Loss: 510.2397155761719
2024-08-04T03:29:43.233173801Z 
 69%|██████▉   | 6574/9500 [22:32:13<10:01:15, 12.33s/it]08/03/2024 20:29:43 - INFO - __main__ -   Step: 6574, LR: 6.349382921212656e-06, Loss: 379.931884765625
2024-08-04T03:29:55.688938450Z 
 69%|██████▉   | 6575/9500 [22:32:25<10:02:54, 12.37s/it]08/03/2024 20:29:55 - INFO - __main__ -   Step: 6575, LR: 6.347212377525377e-06, Loss: 421.98199462890625
2024-08-04T03:30:07.883215614Z 
 69%|██████▉   | 6576/9500 [22:32:37<10:00:10, 12.32s/it]08/03/2024 20:30:07 - INFO - __main__ -   Step: 6576, LR: 6.345041833838098e-06, Loss: 477.5478820800781
2024-08-04T03:30:20.010249845Z 
 69%|██████▉   | 6577/9500 [22:32:49<9:57:13, 12.26s/it] 08/03/2024 20:30:20 - INFO - __main__ -   Step: 6577, LR: 6.342871290150819e-06, Loss: 381.8478698730469
2024-08-04T03:30:32.401219803Z 
 69%|██████▉   | 6578/9500 [22:33:02<9:58:56, 12.30s/it]08/03/2024 20:30:32 - INFO - __main__ -   Step: 6578, LR: 6.3407007464635406e-06, Loss: 339.44073486328125
2024-08-04T03:30:44.384284276Z 
 69%|██████▉   | 6579/9500 [22:33:14<9:54:07, 12.20s/it]08/03/2024 20:30:44 - INFO - __main__ -   Step: 6579, LR: 6.338530202776262e-06, Loss: 328.603271484375
2024-08-04T03:30:56.844520901Z 
 69%|██████▉   | 6580/9500 [22:33:26<9:57:40, 12.28s/it]08/03/2024 20:30:56 - INFO - __main__ -   Step: 6580, LR: 6.336359659088983e-06, Loss: 428.2022705078125
2024-08-04T03:31:09.582033430Z 
 69%|██████▉   | 6581/9500 [22:33:39<10:04:07, 12.42s/it]08/03/2024 20:31:09 - INFO - __main__ -   Step: 6581, LR: 6.3341891154017035e-06, Loss: 424.95037841796875
2024-08-04T03:31:21.753650640Z 
 69%|██████▉   | 6582/9500 [22:33:51<10:00:19, 12.34s/it]08/03/2024 20:31:21 - INFO - __main__ -   Step: 6582, LR: 6.332018571714424e-06, Loss: 482.44073486328125
2024-08-04T03:31:33.870191183Z 
 69%|██████▉   | 6583/9500 [22:34:03<9:56:47, 12.28s/it] 08/03/2024 20:31:33 - INFO - __main__ -   Step: 6583, LR: 6.329848028027146e-06, Loss: 519.1015625
2024-08-04T03:31:46.458310221Z 
 69%|██████▉   | 6584/9500 [22:34:16<10:01:09, 12.37s/it]08/03/2024 20:31:46 - INFO - __main__ -   Step: 6584, LR: 6.327677484339867e-06, Loss: 353.698974609375
2024-08-04T03:31:58.177366248Z 
 69%|██████▉   | 6585/9500 [22:34:28<9:51:28, 12.17s/it] 08/03/2024 20:31:58 - INFO - __main__ -   Step: 6585, LR: 6.325506940652588e-06, Loss: 369.0147705078125
2024-08-04T03:32:10.813165147Z 
 69%|██████▉   | 6586/9500 [22:34:40<9:57:59, 12.31s/it]08/03/2024 20:32:10 - INFO - __main__ -   Step: 6586, LR: 6.32333639696531e-06, Loss: 545.634033203125
2024-08-04T03:32:23.488560572Z 
 69%|██████▉   | 6587/9500 [22:34:53<10:03:03, 12.42s/it]08/03/2024 20:32:23 - INFO - __main__ -   Step: 6587, LR: 6.32116585327803e-06, Loss: 375.33404541015625
2024-08-04T03:32:35.480685866Z 
 69%|██████▉   | 6588/9500 [22:35:05<9:56:36, 12.29s/it] 08/03/2024 20:32:35 - INFO - __main__ -   Step: 6588, LR: 6.318995309590751e-06, Loss: 423.7149658203125
2024-08-04T03:32:47.551285444Z 
 69%|██████▉   | 6589/9500 [22:35:17<9:53:10, 12.23s/it]08/03/2024 20:32:47 - INFO - __main__ -   Step: 6589, LR: 6.316824765903472e-06, Loss: 391.26544189453125
2024-08-04T03:33:00.429738073Z 
 69%|██████▉   | 6590/9500 [22:35:30<10:02:27, 12.42s/it]08/03/2024 20:33:00 - INFO - __main__ -   Step: 6590, LR: 6.314654222216193e-06, Loss: 529.0916137695312
2024-08-04T03:33:12.430671832Z 
 69%|██████▉   | 6591/9500 [22:35:42<9:56:07, 12.30s/it] 08/03/2024 20:33:12 - INFO - __main__ -   Step: 6591, LR: 6.312483678528915e-06, Loss: 348.0770568847656
2024-08-04T03:33:24.406110667Z 
 69%|██████▉   | 6592/9500 [22:35:54<9:51:16, 12.20s/it]08/03/2024 20:33:24 - INFO - __main__ -   Step: 6592, LR: 6.310313134841636e-06, Loss: 429.169921875
2024-08-04T03:33:37.295595831Z 
 69%|██████▉   | 6593/9500 [22:36:07<10:01:05, 12.41s/it]08/03/2024 20:33:37 - INFO - __main__ -   Step: 6593, LR: 6.308142591154357e-06, Loss: 404.740966796875
2024-08-04T03:33:49.516731027Z 
 69%|██████▉   | 6594/9500 [22:36:19<9:58:11, 12.35s/it] 08/03/2024 20:33:49 - INFO - __main__ -   Step: 6594, LR: 6.305972047467078e-06, Loss: 388.47528076171875
2024-08-04T03:34:01.715058709Z 
 69%|██████▉   | 6595/9500 [22:36:31<9:55:46, 12.31s/it]08/03/2024 20:34:01 - INFO - __main__ -   Step: 6595, LR: 6.303801503779799e-06, Loss: 454.2901611328125
2024-08-04T03:34:14.314212621Z 
 69%|██████▉   | 6596/9500 [22:36:44<9:59:50, 12.39s/it]08/03/2024 20:34:14 - INFO - __main__ -   Step: 6596, LR: 6.301630960092519e-06, Loss: 427.4853210449219
2024-08-04T03:34:26.540866624Z 
 69%|██████▉   | 6597/9500 [22:36:56<9:57:12, 12.34s/it]08/03/2024 20:34:26 - INFO - __main__ -   Step: 6597, LR: 6.299460416405241e-06, Loss: 401.3222961425781
2024-08-04T03:34:38.614607068Z 
 69%|██████▉   | 6598/9500 [22:37:08<9:53:05, 12.26s/it]08/03/2024 20:34:38 - INFO - __main__ -   Step: 6598, LR: 6.2972898727179624e-06, Loss: 305.77734375
2024-08-04T03:34:51.005734781Z 
 69%|██████▉   | 6599/9500 [22:37:20<9:54:45, 12.30s/it]08/03/2024 20:34:51 - INFO - __main__ -   Step: 6599, LR: 6.295119329030683e-06, Loss: 380.69244384765625
2024-08-04T03:35:03.074462261Z 
 69%|██████▉   | 6600/9500 [22:37:33<9:51:11, 12.23s/it]08/03/2024 20:35:03 - INFO - __main__ -   Step: 6600, LR: 6.292948785343405e-06, Loss: 510.5057373046875
2024-08-04T03:35:15.097228824Z 
 69%|██████▉   | 6601/9500 [22:37:45<9:47:57, 12.17s/it]08/03/2024 20:35:15 - INFO - __main__ -   Step: 6601, LR: 6.290778241656125e-06, Loss: 406.9298400878906
2024-08-04T03:35:27.497820915Z 
 69%|██████▉   | 6602/9500 [22:37:57<9:51:06, 12.24s/it]08/03/2024 20:35:27 - INFO - __main__ -   Step: 6602, LR: 6.288607697968846e-06, Loss: 394.68231201171875
2024-08-04T03:35:39.586276257Z 
 70%|██████▉   | 6603/9500 [22:38:09<9:48:44, 12.19s/it]08/03/2024 20:35:39 - INFO - __main__ -   Step: 6603, LR: 6.286437154281567e-06, Loss: 342.045654296875
2024-08-04T03:35:52.039952110Z 
 70%|██████▉   | 6604/9500 [22:38:21<9:52:18, 12.27s/it]08/03/2024 20:35:52 - INFO - __main__ -   Step: 6604, LR: 6.284266610594288e-06, Loss: 440.96527099609375
2024-08-04T03:36:04.711693833Z 
 70%|██████▉   | 6605/9500 [22:38:34<9:57:53, 12.39s/it]08/03/2024 20:36:04 - INFO - __main__ -   Step: 6605, LR: 6.28209606690701e-06, Loss: 470.078125
2024-08-04T03:36:16.871493418Z 
 70%|██████▉   | 6606/9500 [22:38:46<9:54:19, 12.32s/it]08/03/2024 20:36:16 - INFO - __main__ -   Step: 6606, LR: 6.279925523219731e-06, Loss: 453.56072998046875
2024-08-04T03:36:28.818767448Z 
 70%|██████▉   | 6607/9500 [22:38:58<9:48:42, 12.21s/it]08/03/2024 20:36:28 - INFO - __main__ -   Step: 6607, LR: 6.277754979532452e-06, Loss: 361.77276611328125
2024-08-04T03:36:41.547926948Z 
 70%|██████▉   | 6608/9500 [22:39:11<9:56:01, 12.37s/it]08/03/2024 20:36:41 - INFO - __main__ -   Step: 6608, LR: 6.275584435845174e-06, Loss: 420.16943359375
2024-08-04T03:36:53.623717331Z 
 70%|██████▉   | 6609/9500 [22:39:23<9:51:37, 12.28s/it]08/03/2024 20:36:53 - INFO - __main__ -   Step: 6609, LR: 6.273413892157894e-06, Loss: 472.0829772949219
2024-08-04T03:37:05.740202998Z 
 70%|██████▉   | 6610/9500 [22:39:35<9:49:04, 12.23s/it]08/03/2024 20:37:05 - INFO - __main__ -   Step: 6610, LR: 6.271243348470614e-06, Loss: 465.6752624511719
2024-08-04T03:37:18.319294301Z 
 70%|██████▉   | 6611/9500 [22:39:48<9:53:54, 12.33s/it]08/03/2024 20:37:18 - INFO - __main__ -   Step: 6611, LR: 6.269072804783336e-06, Loss: 473.8374938964844
2024-08-04T03:37:30.452775977Z 
 70%|██████▉   | 6612/9500 [22:40:00<9:50:48, 12.27s/it]08/03/2024 20:37:30 - INFO - __main__ -   Step: 6612, LR: 6.2669022610960575e-06, Loss: 443.5209655761719
2024-08-04T03:37:42.823539960Z 
 70%|██████▉   | 6613/9500 [22:40:12<9:51:59, 12.30s/it]08/03/2024 20:37:42 - INFO - __main__ -   Step: 6613, LR: 6.264731717408778e-06, Loss: 425.9276428222656
2024-08-04T03:37:55.447707612Z 
 70%|██████▉   | 6614/9500 [22:40:25<9:56:25, 12.40s/it]08/03/2024 20:37:55 - INFO - __main__ -   Step: 6614, LR: 6.2625611737215e-06, Loss: 470.2662658691406
2024-08-04T03:38:07.640392408Z 
 70%|██████▉   | 6615/9500 [22:40:37<9:53:13, 12.34s/it]08/03/2024 20:38:07 - INFO - __main__ -   Step: 6615, LR: 6.260390630034221e-06, Loss: 415.6666259765625
2024-08-04T03:38:19.887120668Z 
 70%|██████▉   | 6616/9500 [22:40:49<9:51:42, 12.31s/it]08/03/2024 20:38:19 - INFO - __main__ -   Step: 6616, LR: 6.258220086346941e-06, Loss: 450.4935302734375
2024-08-04T03:38:31.950441854Z 
 70%|██████▉   | 6617/9500 [22:41:01<9:47:56, 12.24s/it]08/03/2024 20:38:31 - INFO - __main__ -   Step: 6617, LR: 6.256049542659663e-06, Loss: 457.1192321777344
2024-08-04T03:38:44.786688726Z 
 70%|██████▉   | 6618/9500 [22:41:14<9:56:23, 12.42s/it]08/03/2024 20:38:44 - INFO - __main__ -   Step: 6618, LR: 6.2538789989723835e-06, Loss: 406.48779296875
2024-08-04T03:38:56.860462894Z 
 70%|██████▉   | 6619/9500 [22:41:26<9:51:15, 12.31s/it]08/03/2024 20:38:56 - INFO - __main__ -   Step: 6619, LR: 6.251708455285105e-06, Loss: 379.3056335449219
2024-08-04T03:39:08.839878924Z 
 70%|██████▉   | 6620/9500 [22:41:38<9:46:14, 12.21s/it]08/03/2024 20:39:08 - INFO - __main__ -   Step: 6620, LR: 6.249537911597826e-06, Loss: 459.807861328125
2024-08-04T03:39:21.395814015Z 
 70%|██████▉   | 6621/9500 [22:41:51<9:50:57, 12.32s/it]08/03/2024 20:39:21 - INFO - __main__ -   Step: 6621, LR: 6.247367367910547e-06, Loss: 399.5286865234375
2024-08-04T03:39:33.801412326Z 
 70%|██████▉   | 6622/9500 [22:42:03<9:52:02, 12.34s/it]08/03/2024 20:39:33 - INFO - __main__ -   Step: 6622, LR: 6.245196824223269e-06, Loss: 414.60406494140625
2024-08-04T03:39:45.546984041Z 
 70%|██████▉   | 6623/9500 [22:42:15<9:43:14, 12.16s/it]08/03/2024 20:39:45 - INFO - __main__ -   Step: 6623, LR: 6.243026280535989e-06, Loss: 346.4413757324219
2024-08-04T03:39:58.343720716Z 
 70%|██████▉   | 6624/9500 [22:42:28<9:52:08, 12.35s/it]08/03/2024 20:39:58 - INFO - __main__ -   Step: 6624, LR: 6.24085573684871e-06, Loss: 386.6803283691406
2024-08-04T03:40:10.308911593Z 
 70%|██████▉   | 6625/9500 [22:42:40<9:46:21, 12.24s/it]08/03/2024 20:40:10 - INFO - __main__ -   Step: 6625, LR: 6.238685193161431e-06, Loss: 407.01043701171875
2024-08-04T03:40:22.613136463Z 
 70%|██████▉   | 6626/9500 [22:42:52<9:47:06, 12.26s/it]08/03/2024 20:40:22 - INFO - __main__ -   Step: 6626, LR: 6.2365146494741525e-06, Loss: 357.614013671875
2024-08-04T03:40:34.967634678Z 
 70%|██████▉   | 6627/9500 [22:43:04<9:48:19, 12.29s/it]08/03/2024 20:40:34 - INFO - __main__ -   Step: 6627, LR: 6.234344105786873e-06, Loss: 440.80908203125
2024-08-04T03:40:47.043461675Z 
 70%|██████▉   | 6628/9500 [22:43:16<9:45:05, 12.22s/it]08/03/2024 20:40:47 - INFO - __main__ -   Step: 6628, LR: 6.232173562099595e-06, Loss: 377.05523681640625
2024-08-04T03:40:59.132740237Z 
 70%|██████▉   | 6629/9500 [22:43:29<9:42:57, 12.18s/it]08/03/2024 20:40:59 - INFO - __main__ -   Step: 6629, LR: 6.230003018412316e-06, Loss: 367.21917724609375
2024-08-04T03:41:11.529675957Z 
 70%|██████▉   | 6630/9500 [22:43:41<9:45:49, 12.25s/it]08/03/2024 20:41:11 - INFO - __main__ -   Step: 6630, LR: 6.227832474725036e-06, Loss: 315.2593078613281
2024-08-04T03:41:23.700998203Z 
 70%|██████▉   | 6631/9500 [22:43:53<9:44:31, 12.22s/it]08/03/2024 20:41:23 - INFO - __main__ -   Step: 6631, LR: 6.225661931037758e-06, Loss: 379.36572265625
2024-08-04T03:41:36.048914054Z 
 70%|██████▉   | 6632/9500 [22:44:05<9:46:05, 12.26s/it]08/03/2024 20:41:36 - INFO - __main__ -   Step: 6632, LR: 6.2234913873504785e-06, Loss: 441.75152587890625
2024-08-04T03:41:48.897426272Z 
 70%|██████▉   | 6633/9500 [22:44:18<9:54:18, 12.44s/it]08/03/2024 20:41:48 - INFO - __main__ -   Step: 6633, LR: 6.2213208436632e-06, Loss: 464.1153259277344
2024-08-04T03:42:00.906478480Z 
 70%|██████▉   | 6634/9500 [22:44:30<9:47:57, 12.31s/it]08/03/2024 20:42:00 - INFO - __main__ -   Step: 6634, LR: 6.219150299975922e-06, Loss: 447.9211120605469
2024-08-04T03:42:13.389924608Z 
 70%|██████▉   | 6635/9500 [22:44:43<9:50:15, 12.36s/it]08/03/2024 20:42:13 - INFO - __main__ -   Step: 6635, LR: 6.216979756288642e-06, Loss: 375.4562683105469
2024-08-04T03:42:25.758463273Z 
 70%|██████▉   | 6636/9500 [22:44:55<9:50:08, 12.36s/it]08/03/2024 20:42:25 - INFO - __main__ -   Step: 6636, LR: 6.214809212601364e-06, Loss: 360.73760986328125
2024-08-04T03:42:38.102256077Z 
 70%|██████▉   | 6637/9500 [22:45:08<9:49:39, 12.36s/it]08/03/2024 20:42:38 - INFO - __main__ -   Step: 6637, LR: 6.212638668914084e-06, Loss: 423.4705810546875
2024-08-04T03:42:50.112704342Z 
 70%|██████▉   | 6638/9500 [22:45:20<9:44:29, 12.25s/it]08/03/2024 20:42:50 - INFO - __main__ -   Step: 6638, LR: 6.210468125226805e-06, Loss: 463.0871887207031
2024-08-04T03:43:02.647952538Z 
 70%|██████▉   | 6639/9500 [22:45:32<9:48:19, 12.34s/it]08/03/2024 20:43:02 - INFO - __main__ -   Step: 6639, LR: 6.208297581539526e-06, Loss: 457.8568115234375
2024-08-04T03:43:14.807936461Z 
 70%|██████▉   | 6640/9500 [22:45:44<9:45:33, 12.28s/it]08/03/2024 20:43:14 - INFO - __main__ -   Step: 6640, LR: 6.206127037852248e-06, Loss: 396.2371826171875
2024-08-04T03:43:26.965820579Z 
 70%|██████▉   | 6641/9500 [22:45:56<9:43:33, 12.25s/it]08/03/2024 20:43:26 - INFO - __main__ -   Step: 6641, LR: 6.203956494164969e-06, Loss: 419.47320556640625
2024-08-04T03:43:40.194348968Z 
 70%|██████▉   | 6642/9500 [22:46:10<9:57:21, 12.54s/it]08/03/2024 20:43:40 - INFO - __main__ -   Step: 6642, LR: 6.20178595047769e-06, Loss: 393.7914733886719
2024-08-04T03:43:52.332084321Z 
 70%|██████▉   | 6643/9500 [22:46:22<9:51:24, 12.42s/it]08/03/2024 20:43:52 - INFO - __main__ -   Step: 6643, LR: 6.1996154067904114e-06, Loss: 387.79547119140625
2024-08-04T03:44:04.374920011Z 
 70%|██████▉   | 6644/9500 [22:46:34<9:45:48, 12.31s/it]08/03/2024 20:44:04 - INFO - __main__ -   Step: 6644, LR: 6.197444863103131e-06, Loss: 442.11236572265625
2024-08-04T03:44:16.774523320Z 
 70%|██████▉   | 6645/9500 [22:46:46<9:46:55, 12.33s/it]08/03/2024 20:44:16 - INFO - __main__ -   Step: 6645, LR: 6.195274319415853e-06, Loss: 359.8634033203125
2024-08-04T03:44:28.765403223Z 
 70%|██████▉   | 6646/9500 [22:46:58<9:41:49, 12.23s/it]08/03/2024 20:44:28 - INFO - __main__ -   Step: 6646, LR: 6.1931037757285736e-06, Loss: 317.77020263671875
2024-08-04T03:44:40.692215332Z 
 70%|██████▉   | 6647/9500 [22:47:10<9:37:15, 12.14s/it]08/03/2024 20:44:40 - INFO - __main__ -   Step: 6647, LR: 6.190933232041295e-06, Loss: 343.43682861328125
2024-08-04T03:44:53.163595985Z 
 70%|██████▉   | 6648/9500 [22:47:23<9:41:47, 12.24s/it]08/03/2024 20:44:53 - INFO - __main__ -   Step: 6648, LR: 6.188762688354017e-06, Loss: 400.21875
2024-08-04T03:45:05.623759486Z 
 70%|██████▉   | 6649/9500 [22:47:35<9:44:43, 12.31s/it]08/03/2024 20:45:05 - INFO - __main__ -   Step: 6649, LR: 6.186592144666737e-06, Loss: 415.233154296875
2024-08-04T03:45:17.909885026Z 
 70%|███████   | 6650/9500 [22:47:47<9:44:14, 12.30s/it]08/03/2024 20:45:17 - INFO - __main__ -   Step: 6650, LR: 6.184421600979459e-06, Loss: 414.819091796875
2024-08-04T03:45:30.321003620Z 
 70%|███████   | 6651/9500 [22:48:00<9:45:37, 12.33s/it]08/03/2024 20:45:30 - INFO - __main__ -   Step: 6651, LR: 6.182251057292179e-06, Loss: 409.7379150390625
2024-08-04T03:45:42.889734662Z 
 70%|███████   | 6652/9500 [22:48:12<9:48:46, 12.40s/it]08/03/2024 20:45:42 - INFO - __main__ -   Step: 6652, LR: 6.1800805136049e-06, Loss: 479.2071533203125
2024-08-04T03:45:55.292313801Z 
 70%|███████   | 6653/9500 [22:48:25<9:48:32, 12.40s/it]08/03/2024 20:45:55 - INFO - __main__ -   Step: 6653, LR: 6.177909969917621e-06, Loss: 363.56787109375
2024-08-04T03:46:07.700826742Z 
 70%|███████   | 6654/9500 [22:48:37<9:48:24, 12.40s/it]08/03/2024 20:46:07 - INFO - __main__ -   Step: 6654, LR: 6.175739426230343e-06, Loss: 260.83770751953125
2024-08-04T03:46:20.163261644Z 
 70%|███████   | 6655/9500 [22:48:50<9:49:01, 12.42s/it]08/03/2024 20:46:20 - INFO - __main__ -   Step: 6655, LR: 6.173568882543064e-06, Loss: 526.62939453125
2024-08-04T03:46:32.295334202Z 
 70%|███████   | 6656/9500 [22:49:02<9:44:41, 12.34s/it]08/03/2024 20:46:32 - INFO - __main__ -   Step: 6656, LR: 6.171398338855785e-06, Loss: 399.471435546875
2024-08-04T03:46:45.001006816Z 
 70%|███████   | 6657/9500 [22:49:14<9:49:44, 12.45s/it]08/03/2024 20:46:45 - INFO - __main__ -   Step: 6657, LR: 6.1692277951685065e-06, Loss: 555.0300903320312
2024-08-04T03:46:57.411840608Z 
 70%|███████   | 6658/9500 [22:49:27<9:49:02, 12.44s/it]08/03/2024 20:46:57 - INFO - __main__ -   Step: 6658, LR: 6.167057251481226e-06, Loss: 466.11328125
2024-08-04T03:47:09.404775363Z 
 70%|███████   | 6659/9500 [22:49:39<9:42:32, 12.30s/it]08/03/2024 20:47:09 - INFO - __main__ -   Step: 6659, LR: 6.164886707793948e-06, Loss: 381.8614501953125
2024-08-04T03:47:21.515944288Z 
 70%|███████   | 6660/9500 [22:49:51<9:39:36, 12.25s/it]08/03/2024 20:47:21 - INFO - __main__ -   Step: 6660, LR: 6.1627161641066695e-06, Loss: 346.4710693359375
2024-08-04T03:47:34.056269841Z 
 70%|███████   | 6661/9500 [22:50:03<9:43:35, 12.33s/it]08/03/2024 20:47:34 - INFO - __main__ -   Step: 6661, LR: 6.16054562041939e-06, Loss: 395.1564025878906
2024-08-04T03:47:46.151036331Z 
 70%|███████   | 6662/9500 [22:50:16<9:39:59, 12.26s/it]08/03/2024 20:47:46 - INFO - __main__ -   Step: 6662, LR: 6.158375076732112e-06, Loss: 362.53814697265625
2024-08-04T03:47:58.278573289Z 
 70%|███████   | 6663/9500 [22:50:28<9:37:52, 12.22s/it]08/03/2024 20:47:58 - INFO - __main__ -   Step: 6663, LR: 6.1562045330448325e-06, Loss: 350.95965576171875
2024-08-04T03:48:11.621140213Z 
 70%|███████   | 6664/9500 [22:50:41<9:53:34, 12.56s/it]08/03/2024 20:48:11 - INFO - __main__ -   Step: 6664, LR: 6.154033989357554e-06, Loss: 417.1788330078125
2024-08-04T03:48:23.643529088Z 
 70%|███████   | 6665/9500 [22:50:53<9:45:46, 12.40s/it]08/03/2024 20:48:23 - INFO - __main__ -   Step: 6665, LR: 6.151863445670274e-06, Loss: 375.0128173828125
2024-08-04T03:48:35.533434425Z 
 70%|███████   | 6666/9500 [22:51:05<9:38:22, 12.25s/it]08/03/2024 20:48:35 - INFO - __main__ -   Step: 6666, LR: 6.1496929019829954e-06, Loss: 331.3896789550781
2024-08-04T03:48:48.055810035Z 
 70%|███████   | 6667/9500 [22:51:17<9:42:06, 12.33s/it]08/03/2024 20:48:48 - INFO - __main__ -   Step: 6667, LR: 6.147522358295717e-06, Loss: 407.4125671386719
2024-08-04T03:49:00.190017987Z 
 70%|███████   | 6668/9500 [22:51:30<9:39:08, 12.27s/it]08/03/2024 20:49:00 - INFO - __main__ -   Step: 6668, LR: 6.145351814608438e-06, Loss: 367.0242919921875
2024-08-04T03:49:12.205618278Z 
 70%|███████   | 6669/9500 [22:51:42<9:35:20, 12.19s/it]08/03/2024 20:49:12 - INFO - __main__ -   Step: 6669, LR: 6.143181270921159e-06, Loss: 332.30029296875
2024-08-04T03:49:25.039347460Z 
 70%|███████   | 6670/9500 [22:51:54<9:44:11, 12.39s/it]08/03/2024 20:49:25 - INFO - __main__ -   Step: 6670, LR: 6.14101072723388e-06, Loss: 472.0539245605469
2024-08-04T03:49:37.135780775Z 
 70%|███████   | 6671/9500 [22:52:07<9:39:53, 12.30s/it]08/03/2024 20:49:37 - INFO - __main__ -   Step: 6671, LR: 6.1388401835466015e-06, Loss: 391.85504150390625
2024-08-04T03:49:49.315819140Z 
 70%|███████   | 6672/9500 [22:52:19<9:38:00, 12.26s/it]08/03/2024 20:49:49 - INFO - __main__ -   Step: 6672, LR: 6.136669639859321e-06, Loss: 375.4276123046875
2024-08-04T03:50:02.001351693Z 
 70%|███████   | 6673/9500 [22:52:31<9:43:46, 12.39s/it]08/03/2024 20:50:02 - INFO - __main__ -   Step: 6673, LR: 6.134499096172043e-06, Loss: 416.51226806640625
2024-08-04T03:50:13.753559991Z 
 70%|███████   | 6674/9500 [22:52:43<9:34:33, 12.20s/it]08/03/2024 20:50:13 - INFO - __main__ -   Step: 6674, LR: 6.1323285524847645e-06, Loss: 313.83892822265625
2024-08-04T03:50:25.840534580Z 
 70%|███████   | 6675/9500 [22:52:55<9:32:46, 12.17s/it]08/03/2024 20:50:25 - INFO - __main__ -   Step: 6675, LR: 6.130158008797485e-06, Loss: 561.1168212890625
2024-08-04T03:50:38.542783410Z 
 70%|███████   | 6676/9500 [22:53:08<9:40:09, 12.33s/it]08/03/2024 20:50:38 - INFO - __main__ -   Step: 6676, LR: 6.127987465110207e-06, Loss: 436.5672912597656
2024-08-04T03:50:50.748915202Z 
 70%|███████   | 6677/9500 [22:53:20<9:38:15, 12.29s/it]08/03/2024 20:50:50 - INFO - __main__ -   Step: 6677, LR: 6.125816921422928e-06, Loss: 434.74462890625
2024-08-04T03:51:03.069193361Z 
 70%|███████   | 6678/9500 [22:53:33<9:38:28, 12.30s/it]08/03/2024 20:51:03 - INFO - __main__ -   Step: 6678, LR: 6.123646377735649e-06, Loss: 484.3975524902344
2024-08-04T03:51:15.764689132Z 
 70%|███████   | 6679/9500 [22:53:45<9:43:50, 12.42s/it]08/03/2024 20:51:15 - INFO - __main__ -   Step: 6679, LR: 6.121475834048369e-06, Loss: 404.6046447753906
2024-08-04T03:51:27.927390379Z 
 70%|███████   | 6680/9500 [22:53:57<9:40:03, 12.34s/it]08/03/2024 20:51:27 - INFO - __main__ -   Step: 6680, LR: 6.1193052903610905e-06, Loss: 408.7333984375
2024-08-04T03:51:40.006169759Z 
 70%|███████   | 6681/9500 [22:54:09<9:36:08, 12.26s/it]08/03/2024 20:51:40 - INFO - __main__ -   Step: 6681, LR: 6.117134746673812e-06, Loss: 456.3787841796875
2024-08-04T03:51:52.909687488Z 
 70%|███████   | 6682/9500 [22:54:22<9:44:57, 12.45s/it]08/03/2024 20:51:52 - INFO - __main__ -   Step: 6682, LR: 6.114964202986533e-06, Loss: 373.09661865234375
2024-08-04T03:52:05.286663387Z 
 70%|███████   | 6683/9500 [22:54:35<9:43:39, 12.43s/it]08/03/2024 20:52:05 - INFO - __main__ -   Step: 6683, LR: 6.112793659299254e-06, Loss: 445.9096984863281
2024-08-04T03:52:17.741017630Z 
 70%|███████   | 6684/9500 [22:54:47<9:43:46, 12.44s/it]08/03/2024 20:52:17 - INFO - __main__ -   Step: 6684, LR: 6.110623115611976e-06, Loss: 411.6312255859375
2024-08-04T03:52:30.428038522Z 
 70%|███████   | 6685/9500 [22:55:00<9:47:04, 12.51s/it]08/03/2024 20:52:30 - INFO - __main__ -   Step: 6685, LR: 6.108452571924696e-06, Loss: 409.79058837890625
2024-08-04T03:52:42.669470340Z 
 70%|███████   | 6686/9500 [22:55:12<9:43:02, 12.43s/it]08/03/2024 20:52:42 - INFO - __main__ -   Step: 6686, LR: 6.106282028237417e-06, Loss: 352.54937744140625
2024-08-04T03:52:54.694075412Z 
 70%|███████   | 6687/9500 [22:55:24<9:37:06, 12.31s/it]08/03/2024 20:52:54 - INFO - __main__ -   Step: 6687, LR: 6.104111484550138e-06, Loss: 514.3458251953125
2024-08-04T03:53:07.668156155Z 
 70%|███████   | 6688/9500 [22:55:37<9:46:14, 12.51s/it]08/03/2024 20:53:07 - INFO - __main__ -   Step: 6688, LR: 6.10194094086286e-06, Loss: 412.647705078125
2024-08-04T03:53:19.726327570Z 
 70%|███████   | 6689/9500 [22:55:49<9:39:40, 12.37s/it]08/03/2024 20:53:19 - INFO - __main__ -   Step: 6689, LR: 6.09977039717558e-06, Loss: 378.426025390625
2024-08-04T03:53:31.856779659Z 
 70%|███████   | 6690/9500 [22:56:01<9:36:05, 12.30s/it]08/03/2024 20:53:31 - INFO - __main__ -   Step: 6690, LR: 6.097599853488302e-06, Loss: 313.71142578125
2024-08-04T03:53:44.279251293Z 
 70%|███████   | 6691/9500 [22:56:14<9:37:35, 12.34s/it]08/03/2024 20:53:44 - INFO - __main__ -   Step: 6691, LR: 6.095429309801023e-06, Loss: 343.24481201171875
2024-08-04T03:53:56.312365999Z 
 70%|███████   | 6692/9500 [22:56:26<9:33:06, 12.25s/it]08/03/2024 20:53:56 - INFO - __main__ -   Step: 6692, LR: 6.093258766113743e-06, Loss: 351.6783447265625
2024-08-04T03:54:08.100070761Z 
 70%|███████   | 6693/9500 [22:56:38<9:26:28, 12.11s/it]08/03/2024 20:54:08 - INFO - __main__ -   Step: 6693, LR: 6.091088222426465e-06, Loss: 356.783935546875
2024-08-04T03:54:20.977863154Z 
 70%|███████   | 6694/9500 [22:56:50<9:37:04, 12.34s/it]08/03/2024 20:54:20 - INFO - __main__ -   Step: 6694, LR: 6.0889176787391856e-06, Loss: 415.595947265625
2024-08-04T03:54:33.249813595Z 
 70%|███████   | 6695/9500 [22:57:03<9:35:55, 12.32s/it]08/03/2024 20:54:33 - INFO - __main__ -   Step: 6695, LR: 6.086747135051907e-06, Loss: 300.1838684082031
2024-08-04T03:54:45.373034264Z 
 70%|███████   | 6696/9500 [22:57:15<9:32:57, 12.26s/it]08/03/2024 20:54:45 - INFO - __main__ -   Step: 6696, LR: 6.084576591364628e-06, Loss: 470.72845458984375
2024-08-04T03:54:58.045173415Z 
 70%|███████   | 6697/9500 [22:57:27<9:38:31, 12.38s/it]08/03/2024 20:54:58 - INFO - __main__ -   Step: 6697, LR: 6.082406047677349e-06, Loss: 578.5474853515625
2024-08-04T03:55:10.224661208Z 
 71%|███████   | 6698/9500 [22:57:40<9:35:28, 12.32s/it]08/03/2024 20:55:10 - INFO - __main__ -   Step: 6698, LR: 6.080235503990071e-06, Loss: 434.41217041015625
2024-08-04T03:55:22.135651038Z 
 71%|███████   | 6699/9500 [22:57:52<9:29:29, 12.20s/it]08/03/2024 20:55:22 - INFO - __main__ -   Step: 6699, LR: 6.078064960302791e-06, Loss: 416.02142333984375
2024-08-04T03:55:34.488430423Z 
 71%|███████   | 6700/9500 [22:58:04<9:31:26, 12.25s/it]08/03/2024 20:55:34 - INFO - __main__ -   Step: 6700, LR: 6.075894416615512e-06, Loss: 297.8349304199219
2024-08-04T03:55:46.754533948Z 
 71%|███████   | 6701/9500 [22:58:16<9:31:31, 12.25s/it]08/03/2024 20:55:46 - INFO - __main__ -   Step: 6701, LR: 6.073723872928233e-06, Loss: 443.32293701171875
2024-08-04T03:55:58.865819881Z 
 71%|███████   | 6702/9500 [22:58:28<9:29:22, 12.21s/it]08/03/2024 20:55:58 - INFO - __main__ -   Step: 6702, LR: 6.071553329240955e-06, Loss: 428.5233154296875
2024-08-04T03:56:10.654623963Z 
 71%|███████   | 6703/9500 [22:58:40<9:23:16, 12.08s/it]08/03/2024 20:56:10 - INFO - __main__ -   Step: 6703, LR: 6.069382785553675e-06, Loss: 357.00640869140625
2024-08-04T03:56:23.097303116Z 
 71%|███████   | 6704/9500 [22:58:53<9:28:06, 12.19s/it]08/03/2024 20:56:23 - INFO - __main__ -   Step: 6704, LR: 6.067212241866397e-06, Loss: 316.6673583984375
2024-08-04T03:56:35.581992997Z 
 71%|███████   | 6705/9500 [22:59:05<9:32:00, 12.28s/it]08/03/2024 20:56:35 - INFO - __main__ -   Step: 6705, LR: 6.0650416981791185e-06, Loss: 434.39825439453125
2024-08-04T03:56:47.757516449Z 
 71%|███████   | 6706/9500 [22:59:17<9:30:20, 12.25s/it]08/03/2024 20:56:47 - INFO - __main__ -   Step: 6706, LR: 6.062871154491838e-06, Loss: 401.6846618652344
2024-08-04T03:57:00.376631236Z 
 71%|███████   | 6707/9500 [22:59:30<9:35:19, 12.36s/it]08/03/2024 20:57:00 - INFO - __main__ -   Step: 6707, LR: 6.06070061080456e-06, Loss: 400.9495849609375
2024-08-04T03:57:12.444834009Z 
 71%|███████   | 6708/9500 [22:59:42<9:31:03, 12.27s/it]08/03/2024 20:57:12 - INFO - __main__ -   Step: 6708, LR: 6.058530067117281e-06, Loss: 370.7693176269531
2024-08-04T03:57:24.500967284Z 
 71%|███████   | 6709/9500 [22:59:54<9:27:50, 12.21s/it]08/03/2024 20:57:24 - INFO - __main__ -   Step: 6709, LR: 6.056359523430002e-06, Loss: 452.59173583984375
2024-08-04T03:57:37.119929620Z 
 71%|███████   | 6710/9500 [23:00:07<9:33:22, 12.33s/it]08/03/2024 20:57:37 - INFO - __main__ -   Step: 6710, LR: 6.054188979742724e-06, Loss: 418.842041015625
2024-08-04T03:57:49.665660954Z 
 71%|███████   | 6711/9500 [23:00:19<9:36:10, 12.40s/it]08/03/2024 20:57:49 - INFO - __main__ -   Step: 6711, LR: 6.0520184360554444e-06, Loss: 497.216552734375
2024-08-04T03:58:01.635729475Z 
 71%|███████   | 6712/9500 [23:00:31<9:30:02, 12.27s/it]08/03/2024 20:58:01 - INFO - __main__ -   Step: 6712, LR: 6.049847892368166e-06, Loss: 263.0110168457031
2024-08-04T03:58:14.151308069Z 
 71%|███████   | 6713/9500 [23:00:44<9:33:17, 12.34s/it]08/03/2024 20:58:14 - INFO - __main__ -   Step: 6713, LR: 6.047677348680886e-06, Loss: 411.4559631347656
2024-08-04T03:58:26.519677770Z 
 71%|███████   | 6714/9500 [23:00:56<9:33:27, 12.35s/it]08/03/2024 20:58:26 - INFO - __main__ -   Step: 6714, LR: 6.0455068049936074e-06, Loss: 423.5865173339844
2024-08-04T03:58:38.705528255Z 
 71%|███████   | 6715/9500 [23:01:08<9:30:57, 12.30s/it]08/03/2024 20:58:38 - INFO - __main__ -   Step: 6715, LR: 6.043336261306328e-06, Loss: 434.3792419433594
2024-08-04T03:58:51.191079161Z 
 71%|███████   | 6716/9500 [23:01:21<9:33:19, 12.36s/it]08/03/2024 20:58:51 - INFO - __main__ -   Step: 6716, LR: 6.04116571761905e-06, Loss: 372.1002502441406
2024-08-04T03:59:03.374364206Z 
 71%|███████   | 6717/9500 [23:01:33<9:30:42, 12.30s/it]08/03/2024 20:59:03 - INFO - __main__ -   Step: 6717, LR: 6.038995173931771e-06, Loss: 458.95452880859375
2024-08-04T03:59:15.432862380Z 
 71%|███████   | 6718/9500 [23:01:45<9:27:05, 12.23s/it]08/03/2024 20:59:15 - INFO - __main__ -   Step: 6718, LR: 6.036824630244492e-06, Loss: 370.79150390625
2024-08-04T03:59:28.321639699Z 
 71%|███████   | 6719/9500 [23:01:58<9:36:02, 12.43s/it]08/03/2024 20:59:28 - INFO - __main__ -   Step: 6719, LR: 6.0346540865572135e-06, Loss: 454.1020812988281
2024-08-04T03:59:40.683658079Z 
 71%|███████   | 6720/9500 [23:02:10<9:34:54, 12.41s/it]08/03/2024 20:59:40 - INFO - __main__ -   Step: 6720, LR: 6.032483542869933e-06, Loss: 423.2744140625
2024-08-04T03:59:52.937330224Z 
 71%|███████   | 6721/9500 [23:02:22<9:32:33, 12.36s/it]08/03/2024 20:59:52 - INFO - __main__ -   Step: 6721, LR: 6.030312999182655e-06, Loss: 447.50518798828125
2024-08-04T04:00:05.661082556Z 
 71%|███████   | 6722/9500 [23:02:35<9:37:23, 12.47s/it]08/03/2024 21:00:05 - INFO - __main__ -   Step: 6722, LR: 6.028142455495376e-06, Loss: 478.5371398925781
2024-08-04T04:00:17.948060208Z 
 71%|███████   | 6723/9500 [23:02:47<9:34:37, 12.42s/it]08/03/2024 21:00:17 - INFO - __main__ -   Step: 6723, LR: 6.025971911808097e-06, Loss: 338.65301513671875
2024-08-04T04:00:30.331236131Z 
 71%|███████   | 6724/9500 [23:03:00<9:33:58, 12.41s/it]08/03/2024 21:00:30 - INFO - __main__ -   Step: 6724, LR: 6.023801368120819e-06, Loss: 399.3683776855469
2024-08-04T04:00:42.919603453Z 
 71%|███████   | 6725/9500 [23:03:12<9:36:18, 12.46s/it]08/03/2024 21:00:42 - INFO - __main__ -   Step: 6725, LR: 6.0216308244335395e-06, Loss: 313.9085998535156
2024-08-04T04:00:54.999916819Z 
 71%|███████   | 6726/9500 [23:03:24<9:30:49, 12.35s/it]08/03/2024 21:00:54 - INFO - __main__ -   Step: 6726, LR: 6.019460280746261e-06, Loss: 413.03302001953125
2024-08-04T04:01:07.195177015Z 
 71%|███████   | 6727/9500 [23:03:37<9:28:30, 12.30s/it]08/03/2024 21:01:07 - INFO - __main__ -   Step: 6727, LR: 6.017289737058981e-06, Loss: 483.6246337890625
2024-08-04T04:01:19.772669016Z 
 71%|███████   | 6728/9500 [23:03:49<9:32:08, 12.38s/it]08/03/2024 21:01:19 - INFO - __main__ -   Step: 6728, LR: 6.0151191933717025e-06, Loss: 406.2720642089844
2024-08-04T04:01:31.938913893Z 
 71%|███████   | 6729/9500 [23:04:01<9:28:55, 12.32s/it]08/03/2024 21:01:31 - INFO - __main__ -   Step: 6729, LR: 6.012948649684423e-06, Loss: 459.3349609375
2024-08-04T04:01:44.190272866Z 
 71%|███████   | 6730/9500 [23:04:14<9:27:46, 12.30s/it]08/03/2024 21:01:44 - INFO - __main__ -   Step: 6730, LR: 6.010778105997145e-06, Loss: 486.36285400390625
2024-08-04T04:01:56.674315144Z 
 71%|███████   | 6731/9500 [23:04:26<9:30:08, 12.35s/it]08/03/2024 21:01:56 - INFO - __main__ -   Step: 6731, LR: 6.008607562309866e-06, Loss: 386.34344482421875
2024-08-04T04:02:08.878828845Z 
 71%|███████   | 6732/9500 [23:04:38<9:27:52, 12.31s/it]08/03/2024 21:02:08 - INFO - __main__ -   Step: 6732, LR: 6.006437018622587e-06, Loss: 295.67962646484375
2024-08-04T04:02:20.965033255Z 
 71%|███████   | 6733/9500 [23:04:50<9:24:34, 12.24s/it]08/03/2024 21:02:20 - INFO - __main__ -   Step: 6733, LR: 6.004266474935309e-06, Loss: 509.96435546875
2024-08-04T04:02:33.724168500Z 
 71%|███████   | 6734/9500 [23:05:03<9:31:31, 12.40s/it]08/03/2024 21:02:33 - INFO - __main__ -   Step: 6734, LR: 6.0020959312480285e-06, Loss: 365.63323974609375
2024-08-04T04:02:45.734380338Z 
 71%|███████   | 6735/9500 [23:05:15<9:25:57, 12.28s/it]08/03/2024 21:02:45 - INFO - __main__ -   Step: 6735, LR: 5.99992538756075e-06, Loss: 446.2011413574219
2024-08-04T04:02:57.789906467Z 
 71%|███████   | 6736/9500 [23:05:27<9:22:38, 12.21s/it]08/03/2024 21:02:57 - INFO - __main__ -   Step: 6736, LR: 5.9977548438734716e-06, Loss: 399.5618896484375
2024-08-04T04:03:10.415733597Z 
 71%|███████   | 6737/9500 [23:05:40<9:28:07, 12.34s/it]08/03/2024 21:03:10 - INFO - __main__ -   Step: 6737, LR: 5.995584300186192e-06, Loss: 370.7882080078125
2024-08-04T04:03:22.777991473Z 
 71%|███████   | 6738/9500 [23:05:52<9:28:16, 12.34s/it]08/03/2024 21:03:22 - INFO - __main__ -   Step: 6738, LR: 5.993413756498914e-06, Loss: 343.5615234375
2024-08-04T04:03:34.733227761Z 
 71%|███████   | 6739/9500 [23:06:04<9:22:41, 12.23s/it]08/03/2024 21:03:34 - INFO - __main__ -   Step: 6739, LR: 5.9912432128116346e-06, Loss: 399.61102294921875
2024-08-04T04:03:47.463798991Z 
 71%|███████   | 6740/9500 [23:06:17<9:29:25, 12.38s/it]08/03/2024 21:03:47 - INFO - __main__ -   Step: 6740, LR: 5.989072669124356e-06, Loss: 359.1351318359375
2024-08-04T04:03:59.714877171Z 
 71%|███████   | 6741/9500 [23:06:29<9:27:27, 12.34s/it]08/03/2024 21:03:59 - INFO - __main__ -   Step: 6741, LR: 5.986902125437076e-06, Loss: 434.85186767578125
2024-08-04T04:04:11.810678266Z 
 71%|███████   | 6742/9500 [23:06:41<9:23:52, 12.27s/it]08/03/2024 21:04:11 - INFO - __main__ -   Step: 6742, LR: 5.9847315817497975e-06, Loss: 412.9837646484375
2024-08-04T04:04:24.756684851Z 
 71%|███████   | 6743/9500 [23:06:54<9:33:01, 12.47s/it]08/03/2024 21:04:24 - INFO - __main__ -   Step: 6743, LR: 5.982561038062519e-06, Loss: 398.6186218261719
2024-08-04T04:04:37.449117195Z 
 71%|███████   | 6744/9500 [23:07:07<9:35:52, 12.54s/it]08/03/2024 21:04:37 - INFO - __main__ -   Step: 6744, LR: 5.98039049437524e-06, Loss: 374.26416015625
2024-08-04T04:04:49.598427929Z 
 71%|███████   | 6745/9500 [23:07:19<9:30:19, 12.42s/it]08/03/2024 21:04:49 - INFO - __main__ -   Step: 6745, LR: 5.978219950687961e-06, Loss: 384.070068359375
2024-08-04T04:05:01.697037641Z 
 71%|███████   | 6746/9500 [23:07:31<9:25:40, 12.32s/it]08/03/2024 21:05:01 - INFO - __main__ -   Step: 6746, LR: 5.976049407000682e-06, Loss: 342.39190673828125
2024-08-04T04:05:14.568430969Z 
 71%|███████   | 6747/9500 [23:07:44<9:33:00, 12.49s/it]08/03/2024 21:05:14 - INFO - __main__ -   Step: 6747, LR: 5.973878863313404e-06, Loss: 513.1279296875
2024-08-04T04:05:26.418695975Z 
 71%|███████   | 6748/9500 [23:07:56<9:24:01, 12.30s/it]08/03/2024 21:05:26 - INFO - __main__ -   Step: 6748, LR: 5.9717083196261235e-06, Loss: 299.9595947265625
2024-08-04T04:05:38.469902391Z 
 71%|███████   | 6749/9500 [23:08:08<9:20:26, 12.22s/it]08/03/2024 21:05:38 - INFO - __main__ -   Step: 6749, LR: 5.969537775938845e-06, Loss: 446.2589416503906
2024-08-04T04:05:51.105742342Z 
 71%|███████   | 6750/9500 [23:08:21<9:25:54, 12.35s/it]08/03/2024 21:05:51 - INFO - __main__ -   Step: 6750, LR: 5.967367232251567e-06, Loss: 464.9647216796875
2024-08-04T04:06:03.143702579Z 
 71%|███████   | 6751/9500 [23:08:33<9:21:27, 12.25s/it]08/03/2024 21:06:03 - INFO - __main__ -   Step: 6751, LR: 5.965196688564287e-06, Loss: 444.58984375
2024-08-04T04:06:15.228873144Z 
 71%|███████   | 6752/9500 [23:08:45<9:18:55, 12.20s/it]08/03/2024 21:06:15 - INFO - __main__ -   Step: 6752, LR: 5.963026144877009e-06, Loss: 342.6391296386719
2024-08-04T04:06:28.065571644Z 
 71%|███████   | 6753/9500 [23:08:58<9:27:24, 12.39s/it]08/03/2024 21:06:28 - INFO - __main__ -   Step: 6753, LR: 5.9608556011897305e-06, Loss: 343.0809326171875
2024-08-04T04:06:40.583812115Z 
 71%|███████   | 6754/9500 [23:09:10<9:28:55, 12.43s/it]08/03/2024 21:06:40 - INFO - __main__ -   Step: 6754, LR: 5.958685057502451e-06, Loss: 366.03167724609375
2024-08-04T04:06:52.858890859Z 
 71%|███████   | 6755/9500 [23:09:22<9:26:34, 12.38s/it]08/03/2024 21:06:52 - INFO - __main__ -   Step: 6755, LR: 5.956514513815171e-06, Loss: 511.696044921875
2024-08-04T04:07:05.659182557Z 
 71%|███████   | 6756/9500 [23:09:35<9:32:04, 12.51s/it]08/03/2024 21:07:05 - INFO - __main__ -   Step: 6756, LR: 5.954343970127893e-06, Loss: 397.7024230957031
2024-08-04T04:07:17.719304225Z 
 71%|███████   | 6757/9500 [23:09:47<9:25:43, 12.37s/it]08/03/2024 21:07:17 - INFO - __main__ -   Step: 6757, LR: 5.952173426440614e-06, Loss: 355.41156005859375
2024-08-04T04:07:30.105270866Z 
 71%|███████   | 6758/9500 [23:10:00<9:25:40, 12.38s/it]08/03/2024 21:07:30 - INFO - __main__ -   Step: 6758, LR: 5.950002882753335e-06, Loss: 444.1188049316406
2024-08-04T04:07:42.847124236Z 
 71%|███████   | 6759/9500 [23:10:12<9:30:26, 12.49s/it]08/03/2024 21:07:42 - INFO - __main__ -   Step: 6759, LR: 5.9478323390660564e-06, Loss: 390.0365295410156
2024-08-04T04:07:54.822488642Z 
 71%|███████   | 6760/9500 [23:10:24<9:23:13, 12.33s/it]08/03/2024 21:07:54 - INFO - __main__ -   Step: 6760, LR: 5.945661795378778e-06, Loss: 374.5010986328125
2024-08-04T04:08:07.094013496Z 
 71%|███████   | 6761/9500 [23:10:37<9:22:10, 12.31s/it]08/03/2024 21:08:07 - INFO - __main__ -   Step: 6761, LR: 5.943491251691499e-06, Loss: 429.8953857421875
2024-08-04T04:08:19.904887065Z 
 71%|███████   | 6762/9500 [23:10:49<9:28:45, 12.46s/it]08/03/2024 21:08:19 - INFO - __main__ -   Step: 6762, LR: 5.941320708004219e-06, Loss: 324.9837646484375
2024-08-04T04:08:32.082390937Z 
 71%|███████   | 6763/9500 [23:11:02<9:24:38, 12.38s/it]08/03/2024 21:08:32 - INFO - __main__ -   Step: 6763, LR: 5.93915016431694e-06, Loss: 387.8055419921875
2024-08-04T04:08:44.072971339Z 
 71%|███████   | 6764/9500 [23:11:14<9:19:07, 12.26s/it]08/03/2024 21:08:44 - INFO - __main__ -   Step: 6764, LR: 5.936979620629662e-06, Loss: 448.18804931640625
2024-08-04T04:08:56.798144047Z 
 71%|███████   | 6765/9500 [23:11:26<9:25:15, 12.40s/it]08/03/2024 21:08:56 - INFO - __main__ -   Step: 6765, LR: 5.934809076942382e-06, Loss: 454.4841003417969
2024-08-04T04:09:09.115909204Z 
 71%|███████   | 6766/9500 [23:11:39<9:23:55, 12.38s/it]08/03/2024 21:09:09 - INFO - __main__ -   Step: 6766, LR: 5.932638533255104e-06, Loss: 472.97528076171875
2024-08-04T04:09:21.161998615Z 
 71%|███████   | 6767/9500 [23:11:51<9:19:13, 12.28s/it]08/03/2024 21:09:21 - INFO - __main__ -   Step: 6767, LR: 5.9304679895678255e-06, Loss: 355.705810546875
2024-08-04T04:09:33.811489643Z 
 71%|███████   | 6768/9500 [23:12:03<9:24:06, 12.39s/it]08/03/2024 21:09:33 - INFO - __main__ -   Step: 6768, LR: 5.928297445880546e-06, Loss: 379.43682861328125
2024-08-04T04:09:46.120060149Z 
 71%|███████▏  | 6769/9500 [23:12:16<9:22:47, 12.36s/it]08/03/2024 21:09:46 - INFO - __main__ -   Step: 6769, LR: 5.926126902193267e-06, Loss: 387.26092529296875
2024-08-04T04:09:58.337400714Z 
 71%|███████▏  | 6770/9500 [23:12:28<9:20:34, 12.32s/it]08/03/2024 21:09:58 - INFO - __main__ -   Step: 6770, LR: 5.923956358505988e-06, Loss: 412.654052734375
2024-08-04T04:10:11.030142289Z 
 71%|███████▏  | 6771/9500 [23:12:40<9:25:27, 12.43s/it]08/03/2024 21:10:11 - INFO - __main__ -   Step: 6771, LR: 5.921785814818709e-06, Loss: 461.9959411621094
2024-08-04T04:10:23.197166918Z 
 71%|███████▏  | 6772/9500 [23:12:53<9:21:37, 12.35s/it]08/03/2024 21:10:23 - INFO - __main__ -   Step: 6772, LR: 5.91961527113143e-06, Loss: 465.01934814453125
2024-08-04T04:10:35.356331093Z 
 71%|███████▏  | 6773/9500 [23:13:05<9:18:47, 12.29s/it]08/03/2024 21:10:35 - INFO - __main__ -   Step: 6773, LR: 5.9174447274441515e-06, Loss: 331.1292724609375
2024-08-04T04:10:47.804685509Z 
 71%|███████▏  | 6774/9500 [23:13:17<9:20:40, 12.34s/it]08/03/2024 21:10:47 - INFO - __main__ -   Step: 6774, LR: 5.915274183756873e-06, Loss: 411.3567810058594
2024-08-04T04:11:00.054949677Z 
 71%|███████▏  | 6775/9500 [23:13:29<9:19:14, 12.31s/it]08/03/2024 21:11:00 - INFO - __main__ -   Step: 6775, LR: 5.913103640069594e-06, Loss: 418.4869384765625
2024-08-04T04:11:12.117643445Z 
 71%|███████▏  | 6776/9500 [23:13:42<9:15:37, 12.24s/it]08/03/2024 21:11:12 - INFO - __main__ -   Step: 6776, LR: 5.9109330963823145e-06, Loss: 397.8760986328125
2024-08-04T04:11:24.443113950Z 
 71%|███████▏  | 6777/9500 [23:13:54<9:16:36, 12.26s/it]08/03/2024 21:11:24 - INFO - __main__ -   Step: 6777, LR: 5.908762552695035e-06, Loss: 344.9403381347656
2024-08-04T04:11:36.777778431Z 
 71%|███████▏  | 6778/9500 [23:14:06<9:17:21, 12.29s/it]08/03/2024 21:11:36 - INFO - __main__ -   Step: 6778, LR: 5.906592009007757e-06, Loss: 329.76361083984375
2024-08-04T04:11:48.849451316Z 
 71%|███████▏  | 6779/9500 [23:14:18<9:14:14, 12.22s/it]08/03/2024 21:11:48 - INFO - __main__ -   Step: 6779, LR: 5.904421465320478e-06, Loss: 410.7320251464844
2024-08-04T04:12:01.615691986Z 
 71%|███████▏  | 6780/9500 [23:14:31<9:21:26, 12.38s/it]08/03/2024 21:12:01 - INFO - __main__ -   Step: 6780, LR: 5.902250921633199e-06, Loss: 455.4959411621094
2024-08-04T04:12:14.070898288Z 
 71%|███████▏  | 6781/9500 [23:14:44<9:22:11, 12.41s/it]08/03/2024 21:12:14 - INFO - __main__ -   Step: 6781, LR: 5.9000803779459206e-06, Loss: 345.9416198730469
2024-08-04T04:12:26.161942769Z 
 71%|███████▏  | 6782/9500 [23:14:56<9:17:42, 12.31s/it]08/03/2024 21:12:26 - INFO - __main__ -   Step: 6782, LR: 5.897909834258641e-06, Loss: 296.36334228515625
2024-08-04T04:12:38.669838067Z 
 71%|███████▏  | 6783/9500 [23:15:08<9:20:10, 12.37s/it]08/03/2024 21:12:38 - INFO - __main__ -   Step: 6783, LR: 5.895739290571362e-06, Loss: 391.0848388671875
2024-08-04T04:12:50.907351109Z 
 71%|███████▏  | 6784/9500 [23:15:20<9:18:09, 12.33s/it]08/03/2024 21:12:50 - INFO - __main__ -   Step: 6784, LR: 5.893568746884083e-06, Loss: 404.6806945800781
2024-08-04T04:13:03.111071861Z 
 71%|███████▏  | 6785/9500 [23:15:33<9:16:13, 12.29s/it]08/03/2024 21:13:03 - INFO - __main__ -   Step: 6785, LR: 5.891398203196804e-06, Loss: 353.02093505859375
2024-08-04T04:13:15.574372915Z 
 71%|███████▏  | 6786/9500 [23:15:45<9:18:21, 12.34s/it]08/03/2024 21:13:15 - INFO - __main__ -   Step: 6786, LR: 5.889227659509526e-06, Loss: 463.056640625
2024-08-04T04:13:27.998050686Z 
 71%|███████▏  | 6787/9500 [23:15:57<9:19:13, 12.37s/it]08/03/2024 21:13:27 - INFO - __main__ -   Step: 6787, LR: 5.8870571158222465e-06, Loss: 374.2633056640625
2024-08-04T04:13:40.103377798Z 
 71%|███████▏  | 6788/9500 [23:16:10<9:15:27, 12.29s/it]08/03/2024 21:13:40 - INFO - __main__ -   Step: 6788, LR: 5.884886572134968e-06, Loss: 340.878662109375
2024-08-04T04:13:52.182632378Z 
 71%|███████▏  | 6789/9500 [23:16:22<9:12:25, 12.23s/it]08/03/2024 21:13:52 - INFO - __main__ -   Step: 6789, LR: 5.882716028447689e-06, Loss: 451.2012939453125
2024-08-04T04:14:04.703396702Z 
 71%|███████▏  | 6790/9500 [23:16:34<9:16:12, 12.31s/it]08/03/2024 21:14:04 - INFO - __main__ -   Step: 6790, LR: 5.8805454847604095e-06, Loss: 419.411376953125
2024-08-04T04:14:16.841262347Z 
 71%|███████▏  | 6791/9500 [23:16:46<9:13:36, 12.26s/it]08/03/2024 21:14:16 - INFO - __main__ -   Step: 6791, LR: 5.87837494107313e-06, Loss: 317.75311279296875
2024-08-04T04:14:28.974250847Z 
 71%|███████▏  | 6792/9500 [23:16:58<9:11:39, 12.22s/it]08/03/2024 21:14:28 - INFO - __main__ -   Step: 6792, LR: 5.876204397385852e-06, Loss: 519.8763427734375
2024-08-04T04:14:41.588929022Z 
 72%|███████▏  | 6793/9500 [23:17:11<9:16:45, 12.34s/it]08/03/2024 21:14:41 - INFO - __main__ -   Step: 6793, LR: 5.874033853698573e-06, Loss: 354.0008544921875
2024-08-04T04:14:53.980836317Z 
 72%|███████▏  | 6794/9500 [23:17:23<9:17:15, 12.36s/it]08/03/2024 21:14:53 - INFO - __main__ -   Step: 6794, LR: 5.871863310011294e-06, Loss: 533.233154296875
2024-08-04T04:15:06.181441597Z 
 72%|███████▏  | 6795/9500 [23:17:36<9:14:56, 12.31s/it]08/03/2024 21:15:06 - INFO - __main__ -   Step: 6795, LR: 5.869692766324016e-06, Loss: 369.3080139160156
2024-08-04T04:15:18.736323666Z 
 72%|███████▏  | 6796/9500 [23:17:48<9:18:03, 12.38s/it]08/03/2024 21:15:18 - INFO - __main__ -   Step: 6796, LR: 5.867522222636736e-06, Loss: 440.3748779296875
2024-08-04T04:15:31.049937214Z 
 72%|███████▏  | 6797/9500 [23:18:00<9:16:54, 12.36s/it]08/03/2024 21:15:31 - INFO - __main__ -   Step: 6797, LR: 5.865351678949457e-06, Loss: 423.2958068847656
2024-08-04T04:15:43.333237177Z 
 72%|███████▏  | 6798/9500 [23:18:13<9:15:38, 12.34s/it]08/03/2024 21:15:43 - INFO - __main__ -   Step: 6798, LR: 5.863181135262178e-06, Loss: 360.4894714355469
2024-08-04T04:15:56.129763493Z 
 72%|███████▏  | 6799/9500 [23:18:26<9:21:37, 12.48s/it]08/03/2024 21:15:56 - INFO - __main__ -   Step: 6799, LR: 5.861010591574899e-06, Loss: 448.13555908203125
2024-08-04T04:16:08.269738198Z 
 72%|███████▏  | 6800/9500 [23:18:38<9:16:52, 12.38s/it]08/03/2024 21:16:08 - INFO - __main__ -   Step: 6800, LR: 5.858840047887621e-06, Loss: 418.0189208984375
2024-08-04T04:16:20.392792016Z 
 72%|███████▏  | 6801/9500 [23:18:50<9:13:16, 12.30s/it]08/03/2024 21:16:20 - INFO - __main__ -   Step: 6801, LR: 5.856669504200342e-06, Loss: 431.0609436035156
2024-08-04T04:16:33.215600389Z 
 72%|███████▏  | 6802/9500 [23:19:03<9:20:07, 12.46s/it]08/03/2024 21:16:33 - INFO - __main__ -   Step: 6802, LR: 5.854498960513063e-06, Loss: 353.88189697265625
2024-08-04T04:16:45.412283330Z 
 72%|███████▏  | 6803/9500 [23:19:15<9:16:24, 12.38s/it]08/03/2024 21:16:45 - INFO - __main__ -   Step: 6803, LR: 5.852328416825785e-06, Loss: 385.193603515625
2024-08-04T04:16:57.609937636Z 
 72%|███████▏  | 6804/9500 [23:19:27<9:13:46, 12.32s/it]08/03/2024 21:16:57 - INFO - __main__ -   Step: 6804, LR: 5.850157873138505e-06, Loss: 420.05218505859375
2024-08-04T04:17:10.242491397Z 
 72%|███████▏  | 6805/9500 [23:19:40<9:17:43, 12.42s/it]08/03/2024 21:17:10 - INFO - __main__ -   Step: 6805, LR: 5.847987329451225e-06, Loss: 328.84100341796875
2024-08-04T04:17:22.629548795Z 
 72%|███████▏  | 6806/9500 [23:19:52<9:17:06, 12.41s/it]08/03/2024 21:17:22 - INFO - __main__ -   Step: 6806, LR: 5.845816785763947e-06, Loss: 410.185302734375
2024-08-04T04:17:34.848392244Z 
 72%|███████▏  | 6807/9500 [23:20:04<9:14:21, 12.35s/it]08/03/2024 21:17:34 - INFO - __main__ -   Step: 6807, LR: 5.843646242076668e-06, Loss: 397.1384582519531
2024-08-04T04:17:47.186567544Z 
 72%|███████▏  | 6808/9500 [23:20:17<9:13:58, 12.35s/it]08/03/2024 21:17:47 - INFO - __main__ -   Step: 6808, LR: 5.841475698389389e-06, Loss: 358.23553466796875
2024-08-04T04:17:59.668419814Z 
 72%|███████▏  | 6809/9500 [23:20:29<9:15:34, 12.39s/it]08/03/2024 21:17:59 - INFO - __main__ -   Step: 6809, LR: 5.839305154702111e-06, Loss: 503.174560546875
2024-08-04T04:18:11.938242161Z 
 72%|███████▏  | 6810/9500 [23:20:41<9:13:47, 12.35s/it]08/03/2024 21:18:11 - INFO - __main__ -   Step: 6810, LR: 5.837134611014832e-06, Loss: 417.14581298828125
2024-08-04T04:18:24.576908691Z 
 72%|███████▏  | 6811/9500 [23:20:54<9:17:26, 12.44s/it]08/03/2024 21:18:24 - INFO - __main__ -   Step: 6811, LR: 5.834964067327552e-06, Loss: 430.912841796875
2024-08-04T04:18:36.643400303Z 
 72%|███████▏  | 6812/9500 [23:21:06<9:12:13, 12.33s/it]08/03/2024 21:18:36 - INFO - __main__ -   Step: 6812, LR: 5.832793523640274e-06, Loss: 416.52532958984375
2024-08-04T04:18:48.743557798Z 
 72%|███████▏  | 6813/9500 [23:21:18<9:08:59, 12.26s/it]08/03/2024 21:18:48 - INFO - __main__ -   Step: 6813, LR: 5.830622979952994e-06, Loss: 346.42486572265625
2024-08-04T04:19:01.408812739Z 
 72%|███████▏  | 6814/9500 [23:21:31<9:14:14, 12.38s/it]08/03/2024 21:19:01 - INFO - __main__ -   Step: 6814, LR: 5.828452436265716e-06, Loss: 359.6649475097656
2024-08-04T04:19:13.590058781Z 
 72%|███████▏  | 6815/9500 [23:21:43<9:11:21, 12.32s/it]08/03/2024 21:19:13 - INFO - __main__ -   Step: 6815, LR: 5.826281892578437e-06, Loss: 327.14678955078125
2024-08-04T04:19:25.680912854Z 
 72%|███████▏  | 6816/9500 [23:21:55<9:08:04, 12.25s/it]08/03/2024 21:19:25 - INFO - __main__ -   Step: 6816, LR: 5.824111348891158e-06, Loss: 360.6929931640625
2024-08-04T04:19:38.089218553Z 
 72%|███████▏  | 6817/9500 [23:22:08<9:09:57, 12.30s/it]08/03/2024 21:19:38 - INFO - __main__ -   Step: 6817, LR: 5.82194080520388e-06, Loss: 316.40802001953125
2024-08-04T04:19:50.440273298Z 
 72%|███████▏  | 6818/9500 [23:22:20<9:10:27, 12.31s/it]08/03/2024 21:19:50 - INFO - __main__ -   Step: 6818, LR: 5.8197702615166e-06, Loss: 555.9171142578125
2024-08-04T04:20:02.327989849Z 
 72%|███████▏  | 6819/9500 [23:22:32<9:04:31, 12.19s/it]08/03/2024 21:20:02 - INFO - __main__ -   Step: 6819, LR: 5.817599717829321e-06, Loss: 304.7904052734375
2024-08-04T04:20:14.785809112Z 
 72%|███████▏  | 6820/9500 [23:22:44<9:07:57, 12.27s/it]08/03/2024 21:20:14 - INFO - __main__ -   Step: 6820, LR: 5.815429174142042e-06, Loss: 402.572265625
2024-08-04T04:20:27.091254920Z 
 72%|███████▏  | 6821/9500 [23:22:57<9:08:16, 12.28s/it]08/03/2024 21:20:27 - INFO - __main__ -   Step: 6821, LR: 5.8132586304547635e-06, Loss: 468.38128662109375
2024-08-04T04:20:39.231109210Z 
 72%|███████▏  | 6822/9500 [23:23:09<9:06:11, 12.24s/it]08/03/2024 21:20:39 - INFO - __main__ -   Step: 6822, LR: 5.811088086767484e-06, Loss: 312.5871887207031
2024-08-04T04:20:51.873344322Z 
 72%|███████▏  | 6823/9500 [23:23:21<9:11:24, 12.36s/it]08/03/2024 21:20:51 - INFO - __main__ -   Step: 6823, LR: 5.808917543080206e-06, Loss: 405.2752685546875
2024-08-04T04:21:04.262156664Z 
 72%|███████▏  | 6824/9500 [23:23:34<9:11:36, 12.37s/it]08/03/2024 21:21:04 - INFO - __main__ -   Step: 6824, LR: 5.806746999392927e-06, Loss: 478.9156494140625
2024-08-04T04:21:16.478417002Z 
 72%|███████▏  | 6825/9500 [23:23:46<9:09:22, 12.32s/it]08/03/2024 21:21:16 - INFO - __main__ -   Step: 6825, LR: 5.804576455705647e-06, Loss: 408.08721923828125
2024-08-04T04:21:29.129946530Z 
 72%|███████▏  | 6826/9500 [23:23:59<9:13:34, 12.42s/it]08/03/2024 21:21:29 - INFO - __main__ -   Step: 6826, LR: 5.802405912018369e-06, Loss: 337.2887878417969
2024-08-04T04:21:41.466407987Z 
 72%|███████▏  | 6827/9500 [23:24:11<9:12:13, 12.40s/it]08/03/2024 21:21:41 - INFO - __main__ -   Step: 6827, LR: 5.8002353683310894e-06, Loss: 419.83709716796875
2024-08-04T04:21:53.726781542Z 
 72%|███████▏  | 6828/9500 [23:24:23<9:10:12, 12.36s/it]08/03/2024 21:21:53 - INFO - __main__ -   Step: 6828, LR: 5.798064824643811e-06, Loss: 325.13336181640625
2024-08-04T04:22:06.320361088Z 
 72%|███████▏  | 6829/9500 [23:24:36<9:13:11, 12.43s/it]08/03/2024 21:22:06 - INFO - __main__ -   Step: 6829, LR: 5.7958942809565326e-06, Loss: 425.35772705078125
2024-08-04T04:22:18.626998075Z 
 72%|███████▏  | 6830/9500 [23:24:48<9:11:22, 12.39s/it]08/03/2024 21:22:18 - INFO - __main__ -   Step: 6830, LR: 5.793723737269253e-06, Loss: 374.1255798339844
2024-08-04T04:22:30.624188520Z 
 72%|███████▏  | 6831/9500 [23:25:00<9:05:55, 12.27s/it]08/03/2024 21:22:30 - INFO - __main__ -   Step: 6831, LR: 5.791553193581975e-06, Loss: 374.58782958984375
2024-08-04T04:22:42.909265193Z 
 72%|███████▏  | 6832/9500 [23:25:12<9:05:53, 12.28s/it]08/03/2024 21:22:42 - INFO - __main__ -   Step: 6832, LR: 5.789382649894695e-06, Loss: 397.4419860839844
2024-08-04T04:22:55.855791819Z 
 72%|███████▏  | 6833/9500 [23:25:25<9:14:37, 12.48s/it]08/03/2024 21:22:55 - INFO - __main__ -   Step: 6833, LR: 5.787212106207416e-06, Loss: 343.3987731933594
2024-08-04T04:23:08.047831602Z 
 72%|███████▏  | 6834/9500 [23:25:37<9:10:36, 12.39s/it]08/03/2024 21:23:08 - INFO - __main__ -   Step: 6834, LR: 5.785041562520137e-06, Loss: 434.1807556152344
2024-08-04T04:23:20.326954652Z 
 72%|███████▏  | 6835/9500 [23:25:50<9:08:53, 12.36s/it]08/03/2024 21:23:20 - INFO - __main__ -   Step: 6835, LR: 5.7828710188328585e-06, Loss: 408.7003173828125
2024-08-04T04:23:33.181769353Z 
 72%|███████▏  | 6836/9500 [23:26:03<9:15:18, 12.51s/it]08/03/2024 21:23:33 - INFO - __main__ -   Step: 6836, LR: 5.78070047514558e-06, Loss: 403.311767578125
2024-08-04T04:23:45.591671890Z 
 72%|███████▏  | 6837/9500 [23:26:15<9:13:48, 12.48s/it]08/03/2024 21:23:45 - INFO - __main__ -   Step: 6837, LR: 5.778529931458301e-06, Loss: 406.7710876464844
2024-08-04T04:23:57.599414124Z 
 72%|███████▏  | 6838/9500 [23:26:27<9:07:20, 12.34s/it]08/03/2024 21:23:57 - INFO - __main__ -   Step: 6838, LR: 5.776359387771022e-06, Loss: 285.10675048828125
2024-08-04T04:24:09.943342447Z 
 72%|███████▏  | 6839/9500 [23:26:39<9:07:13, 12.34s/it]08/03/2024 21:24:09 - INFO - __main__ -   Step: 6839, LR: 5.774188844083742e-06, Loss: 379.2659912109375
2024-08-04T04:24:22.079211177Z 
 72%|███████▏  | 6840/9500 [23:26:52<9:04:19, 12.28s/it]08/03/2024 21:24:22 - INFO - __main__ -   Step: 6840, LR: 5.772018300396464e-06, Loss: 439.7677001953125
2024-08-04T04:24:34.204552608Z 
 72%|███████▏  | 6841/9500 [23:27:04<9:02:05, 12.23s/it]08/03/2024 21:24:34 - INFO - __main__ -   Step: 6841, LR: 5.7698477567091845e-06, Loss: 352.770751953125
2024-08-04T04:24:46.740572050Z 
 72%|███████▏  | 6842/9500 [23:27:16<9:05:55, 12.32s/it]08/03/2024 21:24:46 - INFO - __main__ -   Step: 6842, LR: 5.767677213021906e-06, Loss: 382.62872314453125
2024-08-04T04:24:59.008710020Z 
 72%|███████▏  | 6843/9500 [23:27:28<9:04:59, 12.31s/it]08/03/2024 21:24:59 - INFO - __main__ -   Step: 6843, LR: 5.765506669334628e-06, Loss: 351.9854736328125
2024-08-04T04:25:11.488863881Z 
 72%|███████▏  | 6844/9500 [23:27:41<9:07:04, 12.36s/it]08/03/2024 21:25:11 - INFO - __main__ -   Step: 6844, LR: 5.763336125647348e-06, Loss: 503.2797546386719
2024-08-04T04:25:24.529150827Z 
 72%|███████▏  | 6845/9500 [23:27:54<9:15:55, 12.56s/it]08/03/2024 21:25:24 - INFO - __main__ -   Step: 6845, LR: 5.761165581960069e-06, Loss: 350.06683349609375
2024-08-04T04:25:36.719441104Z 
 72%|███████▏  | 6846/9500 [23:28:06<9:10:45, 12.45s/it]08/03/2024 21:25:36 - INFO - __main__ -   Step: 6846, LR: 5.75899503827279e-06, Loss: 392.62298583984375
2024-08-04T04:25:48.757374098Z 
 72%|███████▏  | 6847/9500 [23:28:18<9:05:04, 12.33s/it]08/03/2024 21:25:48 - INFO - __main__ -   Step: 6847, LR: 5.756824494585511e-06, Loss: 449.994384765625
2024-08-04T04:26:01.179379882Z 
 72%|███████▏  | 6848/9500 [23:28:31<9:06:07, 12.36s/it]08/03/2024 21:26:01 - INFO - __main__ -   Step: 6848, LR: 5.754653950898232e-06, Loss: 387.78765869140625
2024-08-04T04:26:13.631395772Z 
 72%|███████▏  | 6849/9500 [23:28:43<9:07:11, 12.38s/it]08/03/2024 21:26:13 - INFO - __main__ -   Step: 6849, LR: 5.752483407210954e-06, Loss: 456.8247375488281
2024-08-04T04:26:26.001683020Z 
 72%|███████▏  | 6850/9500 [23:28:55<9:06:46, 12.38s/it]08/03/2024 21:26:26 - INFO - __main__ -   Step: 6850, LR: 5.750312863523675e-06, Loss: 446.8885803222656
2024-08-04T04:26:38.508001612Z 
 72%|███████▏  | 6851/9500 [23:29:08<9:08:16, 12.42s/it]08/03/2024 21:26:38 - INFO - __main__ -   Step: 6851, LR: 5.748142319836396e-06, Loss: 379.8127136230469
2024-08-04T04:26:50.494570017Z 
 72%|███████▏  | 6852/9500 [23:29:20<9:02:20, 12.29s/it]08/03/2024 21:26:50 - INFO - __main__ -   Step: 6852, LR: 5.7459717761491166e-06, Loss: 375.0476379394531
2024-08-04T04:27:02.702221270Z 
 72%|███████▏  | 6853/9500 [23:29:32<9:01:04, 12.26s/it]08/03/2024 21:27:02 - INFO - __main__ -   Step: 6853, LR: 5.743801232461837e-06, Loss: 503.12548828125
2024-08-04T04:27:15.371207492Z 
 72%|███████▏  | 6854/9500 [23:29:45<9:06:12, 12.39s/it]08/03/2024 21:27:15 - INFO - __main__ -   Step: 6854, LR: 5.741630688774559e-06, Loss: 450.2010498046875
2024-08-04T04:27:27.443523489Z 
 72%|███████▏  | 6855/9500 [23:29:57<9:01:51, 12.29s/it]08/03/2024 21:27:27 - INFO - __main__ -   Step: 6855, LR: 5.73946014508728e-06, Loss: 406.4337158203125
2024-08-04T04:27:39.446434737Z 
 72%|███████▏  | 6856/9500 [23:30:09<8:57:50, 12.21s/it]08/03/2024 21:27:39 - INFO - __main__ -   Step: 6856, LR: 5.737289601400001e-06, Loss: 451.2539367675781
2024-08-04T04:27:51.944477123Z 
 72%|███████▏  | 6857/9500 [23:30:21<9:01:30, 12.29s/it]08/03/2024 21:27:51 - INFO - __main__ -   Step: 6857, LR: 5.735119057712723e-06, Loss: 394.0935363769531
2024-08-04T04:28:04.226919772Z 
 72%|███████▏  | 6858/9500 [23:30:34<9:01:09, 12.29s/it]08/03/2024 21:28:04 - INFO - __main__ -   Step: 6858, LR: 5.732948514025443e-06, Loss: 403.5050964355469
2024-08-04T04:28:16.432278596Z 
 72%|███████▏  | 6859/9500 [23:30:46<8:59:50, 12.26s/it]08/03/2024 21:28:16 - INFO - __main__ -   Step: 6859, LR: 5.730777970338164e-06, Loss: 418.05755615234375
2024-08-04T04:28:28.980703106Z 
 72%|███████▏  | 6860/9500 [23:30:58<9:03:23, 12.35s/it]08/03/2024 21:28:28 - INFO - __main__ -   Step: 6860, LR: 5.728607426650885e-06, Loss: 439.7872314453125
2024-08-04T04:28:41.093003598Z 
 72%|███████▏  | 6861/9500 [23:31:11<9:00:02, 12.28s/it]08/03/2024 21:28:41 - INFO - __main__ -   Step: 6861, LR: 5.726436882963606e-06, Loss: 394.22015380859375
2024-08-04T04:28:53.170890562Z 
 72%|███████▏  | 6862/9500 [23:31:23<8:57:11, 12.22s/it]08/03/2024 21:28:53 - INFO - __main__ -   Step: 6862, LR: 5.724266339276328e-06, Loss: 310.4234619140625
2024-08-04T04:29:05.576003553Z 
 72%|███████▏  | 6863/9500 [23:31:35<8:59:27, 12.27s/it]08/03/2024 21:29:05 - INFO - __main__ -   Step: 6863, LR: 5.722095795589049e-06, Loss: 404.2484130859375
2024-08-04T04:29:17.838670903Z 
 72%|███████▏  | 6864/9500 [23:31:47<8:59:05, 12.27s/it]08/03/2024 21:29:17 - INFO - __main__ -   Step: 6864, LR: 5.71992525190177e-06, Loss: 501.1763610839844
2024-08-04T04:29:29.821297693Z 
 72%|███████▏  | 6865/9500 [23:31:59<8:55:05, 12.18s/it]08/03/2024 21:29:29 - INFO - __main__ -   Step: 6865, LR: 5.717754708214491e-06, Loss: 372.97357177734375
2024-08-04T04:29:42.351450698Z 
 72%|███████▏  | 6866/9500 [23:32:12<8:59:26, 12.29s/it]08/03/2024 21:29:42 - INFO - __main__ -   Step: 6866, LR: 5.715584164527212e-06, Loss: 381.3782653808594
2024-08-04T04:29:54.857901110Z 
 72%|███████▏  | 6867/9500 [23:32:24<9:02:06, 12.35s/it]08/03/2024 21:29:54 - INFO - __main__ -   Step: 6867, LR: 5.713413620839932e-06, Loss: 467.72418212890625
2024-08-04T04:30:07.008244448Z 
 72%|███████▏  | 6868/9500 [23:32:36<8:59:14, 12.29s/it]08/03/2024 21:30:07 - INFO - __main__ -   Step: 6868, LR: 5.711243077152654e-06, Loss: 358.77728271484375
2024-08-04T04:30:19.909243593Z 
 72%|███████▏  | 6869/9500 [23:32:49<9:07:02, 12.48s/it]08/03/2024 21:30:19 - INFO - __main__ -   Step: 6869, LR: 5.7090725334653755e-06, Loss: 531.4315185546875
2024-08-04T04:30:32.581408575Z 
 72%|███████▏  | 6870/9500 [23:33:02<9:09:25, 12.53s/it]08/03/2024 21:30:32 - INFO - __main__ -   Step: 6870, LR: 5.706901989778096e-06, Loss: 504.0185546875
2024-08-04T04:30:44.731122952Z 
 72%|███████▏  | 6871/9500 [23:33:14<9:04:09, 12.42s/it]08/03/2024 21:30:44 - INFO - __main__ -   Step: 6871, LR: 5.704731446090818e-06, Loss: 470.7530517578125
2024-08-04T04:30:57.376168359Z 
 72%|███████▏  | 6872/9500 [23:33:27<9:06:55, 12.49s/it]08/03/2024 21:30:57 - INFO - __main__ -   Step: 6872, LR: 5.702560902403539e-06, Loss: 576.94677734375
2024-08-04T04:31:09.616868878Z 
 72%|███████▏  | 6873/9500 [23:33:39<9:03:28, 12.41s/it]08/03/2024 21:31:09 - INFO - __main__ -   Step: 6873, LR: 5.700390358716259e-06, Loss: 368.0987548828125
2024-08-04T04:31:21.777564832Z 
 72%|███████▏  | 6874/9500 [23:33:51<8:59:57, 12.34s/it]08/03/2024 21:31:21 - INFO - __main__ -   Step: 6874, LR: 5.69821981502898e-06, Loss: 434.228271484375
2024-08-04T04:31:33.786778068Z 
 72%|███████▏  | 6875/9500 [23:34:03<8:55:27, 12.24s/it]08/03/2024 21:31:33 - INFO - __main__ -   Step: 6875, LR: 5.6960492713417014e-06, Loss: 365.01641845703125
2024-08-04T04:31:46.316972356Z 
 72%|███████▏  | 6876/9500 [23:34:16<8:59:04, 12.33s/it]08/03/2024 21:31:46 - INFO - __main__ -   Step: 6876, LR: 5.693878727654423e-06, Loss: 445.9635009765625
2024-08-04T04:31:58.660629767Z 
 72%|███████▏  | 6877/9500 [23:34:28<8:59:05, 12.33s/it]08/03/2024 21:31:58 - INFO - __main__ -   Step: 6877, LR: 5.691708183967144e-06, Loss: 461.11474609375
2024-08-04T04:32:10.813070740Z 
 72%|███████▏  | 6878/9500 [23:34:40<8:56:32, 12.28s/it]08/03/2024 21:32:10 - INFO - __main__ -   Step: 6878, LR: 5.689537640279865e-06, Loss: 282.80120849609375
2024-08-04T04:32:23.431075897Z 
 72%|███████▏  | 6879/9500 [23:34:53<9:00:47, 12.38s/it]08/03/2024 21:32:23 - INFO - __main__ -   Step: 6879, LR: 5.687367096592587e-06, Loss: 353.57452392578125
2024-08-04T04:32:35.732815935Z 
 72%|███████▏  | 6880/9500 [23:35:05<8:59:33, 12.36s/it]08/03/2024 21:32:35 - INFO - __main__ -   Step: 6880, LR: 5.685196552905307e-06, Loss: 408.19281005859375
2024-08-04T04:32:47.798729490Z 
 72%|███████▏  | 6881/9500 [23:35:17<8:55:33, 12.27s/it]08/03/2024 21:32:47 - INFO - __main__ -   Step: 6881, LR: 5.683026009218028e-06, Loss: 419.0954895019531
2024-08-04T04:33:00.507370841Z 
 72%|███████▏  | 6882/9500 [23:35:30<9:01:06, 12.40s/it]08/03/2024 21:33:00 - INFO - __main__ -   Step: 6882, LR: 5.680855465530749e-06, Loss: 434.1602478027344
2024-08-04T04:33:12.650804920Z 
 72%|███████▏  | 6883/9500 [23:35:42<8:57:30, 12.32s/it]08/03/2024 21:33:12 - INFO - __main__ -   Step: 6883, LR: 5.6786849218434705e-06, Loss: 418.80596923828125
2024-08-04T04:33:25.056694654Z 
 72%|███████▏  | 6884/9500 [23:35:54<8:58:23, 12.35s/it]08/03/2024 21:33:25 - INFO - __main__ -   Step: 6884, LR: 5.676514378156191e-06, Loss: 474.72259521484375
2024-08-04T04:33:37.705668104Z 
 72%|███████▏  | 6885/9500 [23:36:07<9:02:07, 12.44s/it]08/03/2024 21:33:37 - INFO - __main__ -   Step: 6885, LR: 5.674343834468913e-06, Loss: 470.6571044921875
2024-08-04T04:33:50.471944096Z 
 72%|███████▏  | 6886/9500 [23:36:20<9:06:11, 12.54s/it]08/03/2024 21:33:50 - INFO - __main__ -   Step: 6886, LR: 5.672173290781634e-06, Loss: 567.4112548828125
2024-08-04T04:34:02.675050599Z 
 72%|███████▏  | 6887/9500 [23:36:32<9:01:37, 12.44s/it]08/03/2024 21:34:02 - INFO - __main__ -   Step: 6887, LR: 5.670002747094354e-06, Loss: 518.7500610351562
2024-08-04T04:34:15.485851325Z 
 73%|███████▎  | 6888/9500 [23:36:45<9:06:17, 12.55s/it]08/03/2024 21:34:15 - INFO - __main__ -   Step: 6888, LR: 5.667832203407076e-06, Loss: 449.0034484863281
2024-08-04T04:34:27.691944145Z 
 73%|███████▎  | 6889/9500 [23:36:57<9:01:36, 12.45s/it]08/03/2024 21:34:27 - INFO - __main__ -   Step: 6889, LR: 5.6656616597197965e-06, Loss: 379.1101379394531
2024-08-04T04:34:40.002727642Z 
 73%|███████▎  | 6890/9500 [23:37:09<8:59:38, 12.41s/it]08/03/2024 21:34:40 - INFO - __main__ -   Step: 6890, LR: 5.663491116032518e-06, Loss: 421.47967529296875
2024-08-04T04:34:52.542211464Z 
 73%|███████▎  | 6891/9500 [23:37:22<9:01:10, 12.45s/it]08/03/2024 21:34:52 - INFO - __main__ -   Step: 6891, LR: 5.661320572345239e-06, Loss: 519.8863525390625
2024-08-04T04:35:04.947327376Z 
 73%|███████▎  | 6892/9500 [23:37:34<9:00:26, 12.43s/it]08/03/2024 21:35:04 - INFO - __main__ -   Step: 6892, LR: 5.65915002865796e-06, Loss: 535.865966796875
2024-08-04T04:35:17.325332054Z 
 73%|███████▎  | 6893/9500 [23:37:47<8:59:30, 12.42s/it]08/03/2024 21:35:17 - INFO - __main__ -   Step: 6893, LR: 5.656979484970682e-06, Loss: 482.21759033203125
2024-08-04T04:35:29.965306908Z 
 73%|███████▎  | 6894/9500 [23:37:59<9:02:12, 12.48s/it]08/03/2024 21:35:29 - INFO - __main__ -   Step: 6894, LR: 5.654808941283402e-06, Loss: 420.3891906738281
2024-08-04T04:35:42.462918700Z 
 73%|███████▎  | 6895/9500 [23:38:12<9:02:10, 12.49s/it]08/03/2024 21:35:42 - INFO - __main__ -   Step: 6895, LR: 5.652638397596123e-06, Loss: 420.99127197265625
2024-08-04T04:35:54.451558083Z 
 73%|███████▎  | 6896/9500 [23:38:24<8:55:28, 12.34s/it]08/03/2024 21:35:54 - INFO - __main__ -   Step: 6896, LR: 5.650467853908844e-06, Loss: 420.4593200683594
2024-08-04T04:36:07.059309502Z 
 73%|███████▎  | 6897/9500 [23:38:36<8:58:46, 12.42s/it]08/03/2024 21:36:07 - INFO - __main__ -   Step: 6897, LR: 5.648297310221566e-06, Loss: 391.19146728515625
2024-08-04T04:36:19.419004975Z 
 73%|███████▎  | 6898/9500 [23:38:49<8:57:48, 12.40s/it]08/03/2024 21:36:19 - INFO - __main__ -   Step: 6898, LR: 5.646126766534286e-06, Loss: 448.2720947265625
2024-08-04T04:36:31.551680691Z 
 73%|███████▎  | 6899/9500 [23:39:01<8:54:06, 12.32s/it]08/03/2024 21:36:31 - INFO - __main__ -   Step: 6899, LR: 5.643956222847008e-06, Loss: 376.69354248046875
2024-08-04T04:36:44.115199246Z 
 73%|███████▎  | 6900/9500 [23:39:14<8:57:03, 12.39s/it]08/03/2024 21:36:44 - INFO - __main__ -   Step: 6900, LR: 5.641785679159729e-06, Loss: 401.96221923828125
2024-08-04T04:36:56.210151168Z 
 73%|███████▎  | 6901/9500 [23:39:26<8:52:57, 12.30s/it]08/03/2024 21:36:56 - INFO - __main__ -   Step: 6901, LR: 5.639615135472449e-06, Loss: 363.33740234375
2024-08-04T04:37:08.318531106Z 
 73%|███████▎  | 6902/9500 [23:39:38<8:50:13, 12.25s/it]08/03/2024 21:37:08 - INFO - __main__ -   Step: 6902, LR: 5.637444591785171e-06, Loss: 343.5814208984375
2024-08-04T04:37:20.879602378Z 
 73%|███████▎  | 6903/9500 [23:39:50<8:54:07, 12.34s/it]08/03/2024 21:37:20 - INFO - __main__ -   Step: 6903, LR: 5.6352740480978915e-06, Loss: 426.17498779296875
2024-08-04T04:37:33.095405967Z 
 73%|███████▎  | 6904/9500 [23:40:03<8:52:17, 12.30s/it]08/03/2024 21:37:33 - INFO - __main__ -   Step: 6904, LR: 5.633103504410613e-06, Loss: 348.3299560546875
2024-08-04T04:37:45.232707075Z 
 73%|███████▎  | 6905/9500 [23:40:15<8:49:56, 12.25s/it]08/03/2024 21:37:45 - INFO - __main__ -   Step: 6905, LR: 5.630932960723335e-06, Loss: 458.33099365234375
2024-08-04T04:37:57.822327897Z 
 73%|███████▎  | 6906/9500 [23:40:27<8:54:04, 12.35s/it]08/03/2024 21:37:57 - INFO - __main__ -   Step: 6906, LR: 5.628762417036055e-06, Loss: 331.88348388671875
2024-08-04T04:38:10.203026394Z 
 73%|███████▎  | 6907/9500 [23:40:40<8:54:15, 12.36s/it]08/03/2024 21:38:10 - INFO - __main__ -   Step: 6907, LR: 5.626591873348777e-06, Loss: 416.2406005859375
2024-08-04T04:38:22.312768213Z 
 73%|███████▎  | 6908/9500 [23:40:52<8:50:46, 12.29s/it]08/03/2024 21:38:22 - INFO - __main__ -   Step: 6908, LR: 5.624421329661497e-06, Loss: 365.228271484375
2024-08-04T04:38:34.668692827Z 
 73%|███████▎  | 6909/9500 [23:41:04<8:51:28, 12.31s/it]08/03/2024 21:38:34 - INFO - __main__ -   Step: 6909, LR: 5.622250785974218e-06, Loss: 486.4176025390625
2024-08-04T04:38:46.769451920Z 
 73%|███████▎  | 6910/9500 [23:41:16<8:48:35, 12.25s/it]08/03/2024 21:38:46 - INFO - __main__ -   Step: 6910, LR: 5.620080242286939e-06, Loss: 379.10137939453125
2024-08-04T04:38:58.990098767Z 
 73%|███████▎  | 6911/9500 [23:41:28<8:48:04, 12.24s/it]08/03/2024 21:38:58 - INFO - __main__ -   Step: 6911, LR: 5.617909698599661e-06, Loss: 446.8504943847656
2024-08-04T04:39:11.386394569Z 
 73%|███████▎  | 6912/9500 [23:41:41<8:49:54, 12.29s/it]08/03/2024 21:39:11 - INFO - __main__ -   Step: 6912, LR: 5.615739154912382e-06, Loss: 369.4288024902344
2024-08-04T04:39:23.281025096Z 
 73%|███████▎  | 6913/9500 [23:41:53<8:44:38, 12.17s/it]08/03/2024 21:39:23 - INFO - __main__ -   Step: 6913, LR: 5.613568611225103e-06, Loss: 357.35369873046875
2024-08-04T04:39:35.605824558Z 
 73%|███████▎  | 6914/9500 [23:42:05<8:46:28, 12.22s/it]08/03/2024 21:39:35 - INFO - __main__ -   Step: 6914, LR: 5.6113980675378245e-06, Loss: 516.2584228515625
2024-08-04T04:39:48.000111415Z 
 73%|███████▎  | 6915/9500 [23:42:17<8:48:35, 12.27s/it]08/03/2024 21:39:47 - INFO - __main__ -   Step: 6915, LR: 5.609227523850544e-06, Loss: 460.3800964355469
2024-08-04T04:40:00.635074432Z 
 73%|███████▎  | 6916/9500 [23:42:30<8:53:06, 12.38s/it]08/03/2024 21:40:00 - INFO - __main__ -   Step: 6916, LR: 5.607056980163266e-06, Loss: 482.01129150390625
2024-08-04T04:40:12.706105637Z 
 73%|███████▎  | 6917/9500 [23:42:42<8:48:55, 12.29s/it]08/03/2024 21:40:12 - INFO - __main__ -   Step: 6917, LR: 5.604886436475987e-06, Loss: 455.69696044921875
2024-08-04T04:40:24.812112810Z 
 73%|███████▎  | 6918/9500 [23:42:54<8:46:23, 12.23s/it]08/03/2024 21:40:24 - INFO - __main__ -   Step: 6918, LR: 5.602715892788708e-06, Loss: 434.15447998046875
2024-08-04T04:40:37.357888719Z 
 73%|███████▎  | 6919/9500 [23:43:07<8:50:14, 12.33s/it]08/03/2024 21:40:37 - INFO - __main__ -   Step: 6919, LR: 5.60054534910143e-06, Loss: 522.9221801757812
2024-08-04T04:40:49.504454171Z 
 73%|███████▎  | 6920/9500 [23:43:19<8:47:42, 12.27s/it]08/03/2024 21:40:49 - INFO - __main__ -   Step: 6920, LR: 5.5983748054141504e-06, Loss: 385.94500732421875
2024-08-04T04:41:01.684859035Z 
 73%|███████▎  | 6921/9500 [23:43:31<8:46:19, 12.24s/it]08/03/2024 21:41:01 - INFO - __main__ -   Step: 6921, LR: 5.596204261726872e-06, Loss: 419.6121826171875
2024-08-04T04:41:14.160404674Z 
 73%|███████▎  | 6922/9500 [23:43:44<8:49:05, 12.31s/it]08/03/2024 21:41:14 - INFO - __main__ -   Step: 6922, LR: 5.594033718039592e-06, Loss: 409.9808044433594
2024-08-04T04:41:26.693355091Z 
 73%|███████▎  | 6923/9500 [23:43:56<8:51:42, 12.38s/it]08/03/2024 21:41:26 - INFO - __main__ -   Step: 6923, LR: 5.591863174352313e-06, Loss: 404.9891052246094
2024-08-04T04:41:39.107236089Z 
 73%|███████▎  | 6924/9500 [23:44:09<8:51:56, 12.39s/it]08/03/2024 21:41:39 - INFO - __main__ -   Step: 6924, LR: 5.589692630665034e-06, Loss: 436.4321594238281
2024-08-04T04:41:51.902667533Z 
 73%|███████▎  | 6925/9500 [23:44:21<8:56:57, 12.51s/it]08/03/2024 21:41:51 - INFO - __main__ -   Step: 6925, LR: 5.587522086977756e-06, Loss: 480.42529296875
2024-08-04T04:42:04.075239126Z 
 73%|███████▎  | 6926/9500 [23:44:34<8:52:23, 12.41s/it]08/03/2024 21:42:04 - INFO - __main__ -   Step: 6926, LR: 5.585351543290477e-06, Loss: 348.4600830078125
2024-08-04T04:42:16.378366076Z 
 73%|███████▎  | 6927/9500 [23:44:46<8:50:48, 12.38s/it]08/03/2024 21:42:16 - INFO - __main__ -   Step: 6927, LR: 5.583180999603198e-06, Loss: 452.94061279296875
2024-08-04T04:42:28.651119686Z 
 73%|███████▎  | 6928/9500 [23:44:58<8:49:14, 12.35s/it]08/03/2024 21:42:28 - INFO - __main__ -   Step: 6928, LR: 5.5810104559159195e-06, Loss: 325.73565673828125
2024-08-04T04:42:41.229920565Z 
 73%|███████▎  | 6929/9500 [23:45:11<8:52:01, 12.42s/it]08/03/2024 21:42:41 - INFO - __main__ -   Step: 6929, LR: 5.578839912228639e-06, Loss: 430.94989013671875
2024-08-04T04:42:53.781775489Z 
 73%|███████▎  | 6930/9500 [23:45:23<8:53:34, 12.46s/it]08/03/2024 21:42:53 - INFO - __main__ -   Step: 6930, LR: 5.576669368541361e-06, Loss: 449.6050109863281
2024-08-04T04:43:06.233186117Z 
 73%|███████▎  | 6931/9500 [23:45:36<8:53:17, 12.46s/it]08/03/2024 21:43:06 - INFO - __main__ -   Step: 6931, LR: 5.5744988248540825e-06, Loss: 401.72613525390625
2024-08-04T04:43:18.226417891Z 
 73%|███████▎  | 6932/9500 [23:45:48<8:47:09, 12.32s/it]08/03/2024 21:43:18 - INFO - __main__ -   Step: 6932, LR: 5.572328281166803e-06, Loss: 388.113525390625
2024-08-04T04:43:30.625459312Z 
 73%|███████▎  | 6933/9500 [23:46:00<8:48:00, 12.34s/it]08/03/2024 21:43:30 - INFO - __main__ -   Step: 6933, LR: 5.570157737479525e-06, Loss: 458.3950500488281
2024-08-04T04:43:43.330632352Z 
 73%|███████▎  | 6934/9500 [23:46:13<8:52:27, 12.45s/it]08/03/2024 21:43:43 - INFO - __main__ -   Step: 6934, LR: 5.5679871937922455e-06, Loss: 395.3591613769531
2024-08-04T04:43:55.805750900Z 
 73%|███████▎  | 6935/9500 [23:46:25<8:52:34, 12.46s/it]08/03/2024 21:43:55 - INFO - __main__ -   Step: 6935, LR: 5.565816650104967e-06, Loss: 495.239501953125
2024-08-04T04:44:07.739796333Z 
 73%|███████▎  | 6936/9500 [23:46:37<8:45:39, 12.30s/it]08/03/2024 21:44:07 - INFO - __main__ -   Step: 6936, LR: 5.563646106417687e-06, Loss: 434.4810791015625
2024-08-04T04:44:20.231336826Z 
 73%|███████▎  | 6937/9500 [23:46:50<8:47:53, 12.36s/it]08/03/2024 21:44:20 - INFO - __main__ -   Step: 6937, LR: 5.5614755627304085e-06, Loss: 385.96563720703125
2024-08-04T04:44:32.583372628Z 
 73%|███████▎  | 6938/9500 [23:47:02<8:47:36, 12.36s/it]08/03/2024 21:44:32 - INFO - __main__ -   Step: 6938, LR: 5.55930501904313e-06, Loss: 427.5118103027344
2024-08-04T04:44:44.681814245Z 
 73%|███████▎  | 6939/9500 [23:47:14<8:44:06, 12.28s/it]08/03/2024 21:44:44 - INFO - __main__ -   Step: 6939, LR: 5.557134475355851e-06, Loss: 412.5328369140625
2024-08-04T04:44:57.302340736Z 
 73%|███████▎  | 6940/9500 [23:47:27<8:48:15, 12.38s/it]08/03/2024 21:44:57 - INFO - __main__ -   Step: 6940, LR: 5.554963931668572e-06, Loss: 362.287109375
2024-08-04T04:45:09.540275225Z 
 73%|███████▎  | 6941/9500 [23:47:39<8:46:14, 12.34s/it]08/03/2024 21:45:09 - INFO - __main__ -   Step: 6941, LR: 5.552793387981293e-06, Loss: 338.6004943847656
2024-08-04T04:45:21.329094814Z 
 73%|███████▎  | 6942/9500 [23:47:51<8:38:59, 12.17s/it]08/03/2024 21:45:21 - INFO - __main__ -   Step: 6942, LR: 5.550622844294015e-06, Loss: 367.7809143066406
2024-08-04T04:45:33.918345977Z 
 73%|███████▎  | 6943/9500 [23:48:03<8:44:06, 12.30s/it]08/03/2024 21:45:33 - INFO - __main__ -   Step: 6943, LR: 5.5484523006067344e-06, Loss: 555.6531372070312
2024-08-04T04:45:46.234100785Z 
 73%|███████▎  | 6944/9500 [23:48:16<8:44:07, 12.30s/it]08/03/2024 21:45:46 - INFO - __main__ -   Step: 6944, LR: 5.546281756919456e-06, Loss: 438.064453125
2024-08-04T04:45:58.241015485Z 
 73%|███████▎  | 6945/9500 [23:48:28<8:40:08, 12.21s/it]08/03/2024 21:45:58 - INFO - __main__ -   Step: 6945, LR: 5.5441112132321776e-06, Loss: 369.4017333984375
2024-08-04T04:46:10.615393024Z 
 73%|███████▎  | 6946/9500 [23:48:40<8:41:58, 12.26s/it]08/03/2024 21:46:10 - INFO - __main__ -   Step: 6946, LR: 5.541940669544898e-06, Loss: 305.0174865722656
2024-08-04T04:46:22.644684212Z 
 73%|███████▎  | 6947/9500 [23:48:52<8:38:47, 12.19s/it]08/03/2024 21:46:22 - INFO - __main__ -   Step: 6947, LR: 5.53977012585762e-06, Loss: 388.884765625
2024-08-04T04:46:34.672748621Z 
 73%|███████▎  | 6948/9500 [23:49:04<8:36:29, 12.14s/it]08/03/2024 21:46:34 - INFO - __main__ -   Step: 6948, LR: 5.537599582170341e-06, Loss: 331.12353515625
2024-08-04T04:46:47.264090260Z 
 73%|███████▎  | 6949/9500 [23:49:17<8:42:00, 12.28s/it]08/03/2024 21:46:47 - INFO - __main__ -   Step: 6949, LR: 5.535429038483062e-06, Loss: 463.8323059082031
2024-08-04T04:46:59.478955570Z 
 73%|███████▎  | 6950/9500 [23:49:29<8:41:00, 12.26s/it]08/03/2024 21:46:59 - INFO - __main__ -   Step: 6950, LR: 5.533258494795782e-06, Loss: 411.8582763671875
2024-08-04T04:47:12.042155855Z 
 73%|███████▎  | 6951/9500 [23:49:41<8:44:40, 12.35s/it]08/03/2024 21:47:12 - INFO - __main__ -   Step: 6951, LR: 5.5310879511085035e-06, Loss: 456.8966064453125
2024-08-04T04:47:24.984978432Z 
 73%|███████▎  | 6952/9500 [23:49:54<8:52:01, 12.53s/it]08/03/2024 21:47:24 - INFO - __main__ -   Step: 6952, LR: 5.528917407421225e-06, Loss: 525.8544921875
2024-08-04T04:47:37.177637671Z 
 73%|███████▎  | 6953/9500 [23:50:07<8:47:32, 12.43s/it]08/03/2024 21:47:37 - INFO - __main__ -   Step: 6953, LR: 5.526746863733946e-06, Loss: 306.33148193359375
2024-08-04T04:47:49.398692770Z 
 73%|███████▎  | 6954/9500 [23:50:19<8:44:42, 12.37s/it]08/03/2024 21:47:49 - INFO - __main__ -   Step: 6954, LR: 5.524576320046667e-06, Loss: 471.3456726074219
2024-08-04T04:48:02.338597091Z 
 73%|███████▎  | 6955/9500 [23:50:32<8:51:48, 12.54s/it]08/03/2024 21:48:02 - INFO - __main__ -   Step: 6955, LR: 5.522405776359389e-06, Loss: 380.92755126953125
2024-08-04T04:48:14.482438555Z 
 73%|███████▎  | 6956/9500 [23:50:44<8:46:35, 12.42s/it]08/03/2024 21:48:14 - INFO - __main__ -   Step: 6956, LR: 5.52023523267211e-06, Loss: 428.0227355957031
2024-08-04T04:48:26.608156549Z 
 73%|███████▎  | 6957/9500 [23:50:56<8:42:38, 12.33s/it]08/03/2024 21:48:26 - INFO - __main__ -   Step: 6957, LR: 5.51806468898483e-06, Loss: 509.1954345703125
2024-08-04T04:48:39.087193674Z 
 73%|███████▎  | 6958/9500 [23:51:09<8:44:19, 12.38s/it]08/03/2024 21:48:39 - INFO - __main__ -   Step: 6958, LR: 5.515894145297551e-06, Loss: 436.917236328125
2024-08-04T04:48:50.709391263Z 
 73%|███████▎  | 6959/9500 [23:51:20<8:34:32, 12.15s/it]08/03/2024 21:48:50 - INFO - __main__ -   Step: 6959, LR: 5.513723601610273e-06, Loss: 315.01019287109375
2024-08-04T04:49:03.053581091Z 
 73%|███████▎  | 6960/9500 [23:51:32<8:36:48, 12.21s/it]08/03/2024 21:49:03 - INFO - __main__ -   Step: 6960, LR: 5.511553057922993e-06, Loss: 416.2923583984375
2024-08-04T04:49:15.190556417Z 
 73%|███████▎  | 6961/9500 [23:51:45<8:35:42, 12.19s/it]08/03/2024 21:49:15 - INFO - __main__ -   Step: 6961, LR: 5.509382514235715e-06, Loss: 454.0018310546875
2024-08-04T04:49:28.087914683Z 
 73%|███████▎  | 6962/9500 [23:51:58<8:44:31, 12.40s/it]08/03/2024 21:49:28 - INFO - __main__ -   Step: 6962, LR: 5.5072119705484365e-06, Loss: 445.34503173828125
2024-08-04T04:49:40.529151402Z 
 73%|███████▎  | 6963/9500 [23:52:10<8:44:49, 12.41s/it]08/03/2024 21:49:40 - INFO - __main__ -   Step: 6963, LR: 5.505041426861157e-06, Loss: 490.3559875488281
2024-08-04T04:49:52.608683666Z 
 73%|███████▎  | 6964/9500 [23:52:22<8:40:24, 12.31s/it]08/03/2024 21:49:52 - INFO - __main__ -   Step: 6964, LR: 5.502870883173878e-06, Loss: 363.1734313964844
2024-08-04T04:50:05.007742685Z 
 73%|███████▎  | 6965/9500 [23:52:34<8:41:18, 12.34s/it]08/03/2024 21:50:05 - INFO - __main__ -   Step: 6965, LR: 5.500700339486599e-06, Loss: 426.12884521484375
2024-08-04T04:50:17.135309808Z 
 73%|███████▎  | 6966/9500 [23:52:47<8:38:25, 12.28s/it]08/03/2024 21:50:17 - INFO - __main__ -   Step: 6966, LR: 5.49852979579932e-06, Loss: 360.7305603027344
2024-08-04T04:50:29.239850009Z 
 73%|███████▎  | 6967/9500 [23:52:59<8:36:03, 12.22s/it]08/03/2024 21:50:29 - INFO - __main__ -   Step: 6967, LR: 5.496359252112041e-06, Loss: 371.78338623046875
2024-08-04T04:50:41.713980979Z 
 73%|███████▎  | 6968/9500 [23:53:11<8:39:01, 12.30s/it]08/03/2024 21:50:41 - INFO - __main__ -   Step: 6968, LR: 5.494188708424762e-06, Loss: 395.8525085449219
2024-08-04T04:50:53.959538562Z 
 73%|███████▎  | 6969/9500 [23:53:23<8:38:08, 12.28s/it]08/03/2024 21:50:53 - INFO - __main__ -   Step: 6969, LR: 5.492018164737484e-06, Loss: 426.6292724609375
2024-08-04T04:51:06.217495482Z 
 73%|███████▎  | 6970/9500 [23:53:36<8:37:37, 12.28s/it]08/03/2024 21:51:06 - INFO - __main__ -   Step: 6970, LR: 5.489847621050205e-06, Loss: 440.90576171875
2024-08-04T04:51:18.706123265Z 
 73%|███████▎  | 6971/9500 [23:53:48<8:40:06, 12.34s/it]08/03/2024 21:51:18 - INFO - __main__ -   Step: 6971, LR: 5.487677077362925e-06, Loss: 413.06195068359375
2024-08-04T04:51:31.003628938Z 
 73%|███████▎  | 6972/9500 [23:54:00<8:39:22, 12.33s/it]08/03/2024 21:51:31 - INFO - __main__ -   Step: 6972, LR: 5.485506533675646e-06, Loss: 435.85760498046875
2024-08-04T04:51:42.953220551Z 
 73%|███████▎  | 6973/9500 [23:54:12<8:34:24, 12.21s/it]08/03/2024 21:51:42 - INFO - __main__ -   Step: 6973, LR: 5.483335989988368e-06, Loss: 404.1086120605469
2024-08-04T04:51:55.462866995Z 
 73%|███████▎  | 6974/9500 [23:54:25<8:37:56, 12.30s/it]08/03/2024 21:51:55 - INFO - __main__ -   Step: 6974, LR: 5.481165446301089e-06, Loss: 369.33160400390625
2024-08-04T04:52:07.492186475Z 
 73%|███████▎  | 6975/9500 [23:54:37<8:34:16, 12.22s/it]08/03/2024 21:52:07 - INFO - __main__ -   Step: 6975, LR: 5.47899490261381e-06, Loss: 394.57763671875
2024-08-04T04:52:19.716172386Z 
 73%|███████▎  | 6976/9500 [23:54:49<8:34:07, 12.22s/it]08/03/2024 21:52:19 - INFO - __main__ -   Step: 6976, LR: 5.4768243589265315e-06, Loss: 488.32666015625
2024-08-04T04:52:32.127720395Z 
 73%|███████▎  | 6977/9500 [23:55:02<8:36:18, 12.28s/it]08/03/2024 21:52:32 - INFO - __main__ -   Step: 6977, LR: 5.474653815239252e-06, Loss: 374.30987548828125
2024-08-04T04:52:44.410033882Z 
 73%|███████▎  | 6978/9500 [23:55:14<8:36:08, 12.28s/it]08/03/2024 21:52:44 - INFO - __main__ -   Step: 6978, LR: 5.472483271551973e-06, Loss: 301.5240783691406
2024-08-04T04:52:56.542942748Z 
 73%|███████▎  | 6979/9500 [23:55:26<8:34:06, 12.24s/it]08/03/2024 21:52:56 - INFO - __main__ -   Step: 6979, LR: 5.470312727864694e-06, Loss: 418.01544189453125
2024-08-04T04:53:09.365601560Z 
 73%|███████▎  | 6980/9500 [23:55:39<8:41:17, 12.41s/it]08/03/2024 21:53:09 - INFO - __main__ -   Step: 6980, LR: 5.468142184177415e-06, Loss: 428.88507080078125
2024-08-04T04:53:21.542382725Z 
 73%|███████▎  | 6981/9500 [23:55:51<8:38:07, 12.34s/it]08/03/2024 21:53:21 - INFO - __main__ -   Step: 6981, LR: 5.465971640490137e-06, Loss: 371.7474365234375
2024-08-04T04:53:33.544102216Z 
 73%|███████▎  | 6982/9500 [23:56:03<8:33:38, 12.24s/it]08/03/2024 21:53:33 - INFO - __main__ -   Step: 6982, LR: 5.4638010968028575e-06, Loss: 397.1250915527344
2024-08-04T04:53:45.908873522Z 
 74%|███████▎  | 6983/9500 [23:56:15<8:35:01, 12.28s/it]08/03/2024 21:53:45 - INFO - __main__ -   Step: 6983, LR: 5.461630553115579e-06, Loss: 382.315673828125
2024-08-04T04:53:58.406315655Z 
 74%|███████▎  | 6984/9500 [23:56:28<8:37:35, 12.34s/it]08/03/2024 21:53:58 - INFO - __main__ -   Step: 6984, LR: 5.4594600094283e-06, Loss: 463.6612243652344
2024-08-04T04:54:10.671257716Z 
 74%|███████▎  | 6985/9500 [23:56:40<8:36:23, 12.32s/it]08/03/2024 21:54:10 - INFO - __main__ -   Step: 6985, LR: 5.4572894657410205e-06, Loss: 295.884521484375
2024-08-04T04:54:23.328584422Z 
 74%|███████▎  | 6986/9500 [23:56:53<8:40:26, 12.42s/it]08/03/2024 21:54:23 - INFO - __main__ -   Step: 6986, LR: 5.455118922053741e-06, Loss: 511.82208251953125
2024-08-04T04:54:35.872640165Z 
 74%|███████▎  | 6987/9500 [23:57:05<8:41:46, 12.46s/it]08/03/2024 21:54:35 - INFO - __main__ -   Step: 6987, LR: 5.452948378366463e-06, Loss: 404.22320556640625
2024-08-04T04:54:47.872614494Z 
 74%|███████▎  | 6988/9500 [23:57:17<8:35:49, 12.32s/it]08/03/2024 21:54:47 - INFO - __main__ -   Step: 6988, LR: 5.450777834679184e-06, Loss: 307.67303466796875
2024-08-04T04:55:00.333991475Z 
 74%|███████▎  | 6989/9500 [23:57:30<8:37:22, 12.36s/it]08/03/2024 21:55:00 - INFO - __main__ -   Step: 6989, LR: 5.448607290991905e-06, Loss: 353.02490234375
2024-08-04T04:55:12.333420001Z 
 74%|███████▎  | 6990/9500 [23:57:42<8:32:37, 12.25s/it]08/03/2024 21:55:12 - INFO - __main__ -   Step: 6990, LR: 5.4464367473046266e-06, Loss: 331.7502746582031
2024-08-04T04:55:24.435697515Z 
 74%|███████▎  | 6991/9500 [23:57:54<8:30:30, 12.21s/it]08/03/2024 21:55:24 - INFO - __main__ -   Step: 6991, LR: 5.444266203617347e-06, Loss: 336.1821594238281
2024-08-04T04:55:37.297894218Z 
 74%|███████▎  | 6992/9500 [23:58:07<8:38:30, 12.40s/it]08/03/2024 21:55:37 - INFO - __main__ -   Step: 6992, LR: 5.442095659930068e-06, Loss: 478.3584289550781
2024-08-04T04:55:49.311966105Z 
 74%|███████▎  | 6993/9500 [23:58:19<8:33:24, 12.29s/it]08/03/2024 21:55:49 - INFO - __main__ -   Step: 6993, LR: 5.439925116242789e-06, Loss: 566.728759765625
2024-08-04T04:56:01.646620686Z 
 74%|███████▎  | 6994/9500 [23:58:31<8:33:47, 12.30s/it]08/03/2024 21:56:01 - INFO - __main__ -   Step: 6994, LR: 5.43775457255551e-06, Loss: 373.1898193359375
2024-08-04T04:56:14.173458826Z 
 74%|███████▎  | 6995/9500 [23:58:44<8:36:24, 12.37s/it]08/03/2024 21:56:14 - INFO - __main__ -   Step: 6995, LR: 5.435584028868232e-06, Loss: 454.63348388671875
2024-08-04T04:56:26.159928475Z 
 74%|███████▎  | 6996/9500 [23:58:56<8:31:24, 12.25s/it]08/03/2024 21:56:26 - INFO - __main__ -   Step: 6996, LR: 5.4334134851809525e-06, Loss: 360.4184265136719
2024-08-04T04:56:38.424037506Z 
 74%|███████▎  | 6997/9500 [23:59:08<8:31:19, 12.26s/it]08/03/2024 21:56:38 - INFO - __main__ -   Step: 6997, LR: 5.431242941493674e-06, Loss: 399.3968505859375
2024-08-04T04:56:50.982769214Z 
 74%|███████▎  | 6998/9500 [23:59:20<8:34:54, 12.35s/it]08/03/2024 21:56:50 - INFO - __main__ -   Step: 6998, LR: 5.429072397806396e-06, Loss: 437.38238525390625
2024-08-04T04:57:03.067126215Z 
 74%|███████▎  | 6999/9500 [23:59:33<8:31:23, 12.27s/it]08/03/2024 21:57:03 - INFO - __main__ -   Step: 6999, LR: 5.4269018541191155e-06, Loss: 412.0661926269531
2024-08-04T04:57:15.271580451Z 
 74%|███████▎  | 7000/9500 [23:59:45<8:30:23, 12.25s/it]08/03/2024 21:57:15 - INFO - __main__ -   Step: 7000, LR: 5.424731310431836e-06, Loss: 368.5515441894531
2024-08-04T04:57:27.362388729Z 
 74%|███████▎  | 7001/9500 [23:59:57<8:28:12, 12.20s/it]08/03/2024 21:57:27 - INFO - __main__ -   Step: 7001, LR: 5.422560766744558e-06, Loss: 292.8687744140625
2024-08-04T04:57:39.526534910Z 
 74%|███████▎  | 7002/9500 [24:00:09<8:27:31, 12.19s/it]08/03/2024 21:57:39 - INFO - __main__ -   Step: 7002, LR: 5.420390223057279e-06, Loss: 431.4515380859375
2024-08-04T04:57:51.759819053Z 
 74%|███████▎  | 7003/9500 [24:00:21<8:27:51, 12.20s/it]08/03/2024 21:57:51 - INFO - __main__ -   Step: 7003, LR: 5.41821967937e-06, Loss: 420.8780517578125
2024-08-04T04:58:03.887738973Z 
 74%|███████▎  | 7004/9500 [24:00:33<8:26:43, 12.18s/it]08/03/2024 21:58:03 - INFO - __main__ -   Step: 7004, LR: 5.416049135682722e-06, Loss: 457.863525390625
2024-08-04T04:58:16.868192664Z 
 74%|███████▎  | 7005/9500 [24:00:46<8:36:29, 12.42s/it]08/03/2024 21:58:16 - INFO - __main__ -   Step: 7005, LR: 5.413878591995443e-06, Loss: 439.0805969238281
2024-08-04T04:58:29.592860905Z 
 74%|███████▎  | 7006/9500 [24:00:59<8:40:04, 12.51s/it]08/03/2024 21:58:29 - INFO - __main__ -   Step: 7006, LR: 5.411708048308163e-06, Loss: 369.5620422363281
2024-08-04T04:58:41.731253975Z 
 74%|███████▍  | 7007/9500 [24:01:11<8:35:12, 12.40s/it]08/03/2024 21:58:41 - INFO - __main__ -   Step: 7007, LR: 5.409537504620885e-06, Loss: 345.99151611328125
2024-08-04T04:58:54.157174981Z 
 74%|███████▍  | 7008/9500 [24:01:24<8:35:19, 12.41s/it]08/03/2024 21:58:54 - INFO - __main__ -   Step: 7008, LR: 5.407366960933605e-06, Loss: 296.3992004394531
2024-08-04T04:59:06.298945155Z 
 74%|███████▍  | 7009/9500 [24:01:36<8:31:48, 12.33s/it]08/03/2024 21:59:06 - INFO - __main__ -   Step: 7009, LR: 5.405196417246327e-06, Loss: 350.55206298828125
2024-08-04T04:59:18.786230918Z 
 74%|███████▍  | 7010/9500 [24:01:48<8:33:35, 12.38s/it]08/03/2024 21:59:18 - INFO - __main__ -   Step: 7010, LR: 5.403025873559048e-06, Loss: 476.43487548828125
2024-08-04T04:59:31.681734987Z 
 74%|███████▍  | 7011/9500 [24:02:01<8:39:51, 12.53s/it]08/03/2024 21:59:31 - INFO - __main__ -   Step: 7011, LR: 5.400855329871769e-06, Loss: 404.73822021484375
2024-08-04T04:59:43.911420065Z 
 74%|███████▍  | 7012/9500 [24:02:13<8:35:53, 12.44s/it]08/03/2024 21:59:43 - INFO - __main__ -   Step: 7012, LR: 5.398684786184489e-06, Loss: 395.1075744628906
2024-08-04T04:59:55.846551459Z 
 74%|███████▍  | 7013/9500 [24:02:25<8:29:23, 12.29s/it]08/03/2024 21:59:55 - INFO - __main__ -   Step: 7013, LR: 5.396514242497211e-06, Loss: 438.58538818359375
2024-08-04T05:00:08.138465996Z 
 74%|███████▍  | 7014/9500 [24:02:38<8:29:13, 12.29s/it]08/03/2024 22:00:08 - INFO - __main__ -   Step: 7014, LR: 5.394343698809932e-06, Loss: 367.30633544921875
2024-08-04T05:00:20.434604523Z 
 74%|███████▍  | 7015/9500 [24:02:50<8:29:05, 12.29s/it]08/03/2024 22:00:20 - INFO - __main__ -   Step: 7015, LR: 5.392173155122653e-06, Loss: 336.0482177734375
2024-08-04T05:00:32.556859006Z 
 74%|███████▍  | 7016/9500 [24:03:02<8:26:46, 12.24s/it]08/03/2024 22:00:32 - INFO - __main__ -   Step: 7016, LR: 5.390002611435374e-06, Loss: 452.4081726074219
2024-08-04T05:00:45.035799969Z 
 74%|███████▍  | 7017/9500 [24:03:14<8:29:31, 12.31s/it]08/03/2024 22:00:45 - INFO - __main__ -   Step: 7017, LR: 5.387832067748095e-06, Loss: 405.444580078125
2024-08-04T05:00:57.038350980Z 
 74%|███████▍  | 7018/9500 [24:03:26<8:25:28, 12.22s/it]08/03/2024 22:00:57 - INFO - __main__ -   Step: 7018, LR: 5.385661524060817e-06, Loss: 369.04339599609375
2024-08-04T05:01:09.029691413Z 
 74%|███████▍  | 7019/9500 [24:03:38<8:22:26, 12.15s/it]08/03/2024 22:01:09 - INFO - __main__ -   Step: 7019, LR: 5.3834909803735365e-06, Loss: 395.05828857421875
2024-08-04T05:01:21.647747680Z 
 74%|███████▍  | 7020/9500 [24:03:51<8:28:02, 12.29s/it]08/03/2024 22:01:21 - INFO - __main__ -   Step: 7020, LR: 5.381320436686258e-06, Loss: 484.21331787109375
2024-08-04T05:01:33.986872285Z 
 74%|███████▍  | 7021/9500 [24:04:03<8:28:25, 12.31s/it]08/03/2024 22:01:33 - INFO - __main__ -   Step: 7021, LR: 5.37914989299898e-06, Loss: 394.6422119140625
2024-08-04T05:01:46.172545648Z 
 74%|███████▍  | 7022/9500 [24:04:16<8:26:43, 12.27s/it]08/03/2024 22:01:46 - INFO - __main__ -   Step: 7022, LR: 5.3769793493117e-06, Loss: 490.3612060546875
2024-08-04T05:01:58.993676964Z 
 74%|███████▍  | 7023/9500 [24:04:28<8:33:21, 12.44s/it]08/03/2024 22:01:58 - INFO - __main__ -   Step: 7023, LR: 5.374808805624422e-06, Loss: 381.47796630859375
2024-08-04T05:02:11.357180282Z 
 74%|███████▍  | 7024/9500 [24:04:41<8:32:16, 12.41s/it]08/03/2024 22:02:11 - INFO - __main__ -   Step: 7024, LR: 5.3726382619371435e-06, Loss: 406.4522705078125
2024-08-04T05:02:23.540357498Z 
 74%|███████▍  | 7025/9500 [24:04:53<8:29:12, 12.34s/it]08/03/2024 22:02:23 - INFO - __main__ -   Step: 7025, LR: 5.370467718249864e-06, Loss: 312.9676818847656
2024-08-04T05:02:36.025592251Z 
 74%|███████▍  | 7026/9500 [24:05:05<8:30:44, 12.39s/it]08/03/2024 22:02:36 - INFO - __main__ -   Step: 7026, LR: 5.368297174562584e-06, Loss: 436.99212646484375
2024-08-04T05:02:48.005423801Z 
 74%|███████▍  | 7027/9500 [24:05:17<8:25:30, 12.26s/it]08/03/2024 22:02:48 - INFO - __main__ -   Step: 7027, LR: 5.366126630875306e-06, Loss: 362.16693115234375
2024-08-04T05:03:00.488758996Z 
 74%|███████▍  | 7028/9500 [24:05:30<8:28:00, 12.33s/it]08/03/2024 22:03:00 - INFO - __main__ -   Step: 7028, LR: 5.363956087188027e-06, Loss: 467.89923095703125
2024-08-04T05:03:13.009886274Z 
 74%|███████▍  | 7029/9500 [24:05:42<8:30:09, 12.39s/it]08/03/2024 22:03:13 - INFO - __main__ -   Step: 7029, LR: 5.361785543500748e-06, Loss: 434.8809814453125
2024-08-04T05:03:25.596397141Z 
 74%|███████▍  | 7030/9500 [24:05:55<8:32:24, 12.45s/it]08/03/2024 22:03:25 - INFO - __main__ -   Step: 7030, LR: 5.3596149998134695e-06, Loss: 419.03387451171875
2024-08-04T05:03:37.485568549Z 
 74%|███████▍  | 7031/9500 [24:06:07<8:25:18, 12.28s/it]08/03/2024 22:03:37 - INFO - __main__ -   Step: 7031, LR: 5.357444456126191e-06, Loss: 411.6652526855469
2024-08-04T05:03:50.060415792Z 
 74%|███████▍  | 7032/9500 [24:06:19<8:28:45, 12.37s/it]08/03/2024 22:03:50 - INFO - __main__ -   Step: 7032, LR: 5.355273912438912e-06, Loss: 482.3494567871094
2024-08-04T05:04:02.040449001Z 
 74%|███████▍  | 7033/9500 [24:06:31<8:23:45, 12.25s/it]08/03/2024 22:04:02 - INFO - __main__ -   Step: 7033, LR: 5.3531033687516325e-06, Loss: 390.41265869140625
2024-08-04T05:04:14.301721060Z 
 74%|███████▍  | 7034/9500 [24:06:44<8:23:40, 12.25s/it]08/03/2024 22:04:14 - INFO - __main__ -   Step: 7034, LR: 5.350932825064353e-06, Loss: 423.37725830078125
2024-08-04T05:04:26.739236050Z 
 74%|███████▍  | 7035/9500 [24:06:56<8:25:43, 12.31s/it]08/03/2024 22:04:26 - INFO - __main__ -   Step: 7035, LR: 5.348762281377075e-06, Loss: 423.977783203125
2024-08-04T05:04:39.031921071Z 
 74%|███████▍  | 7036/9500 [24:07:08<8:25:18, 12.30s/it]08/03/2024 22:04:39 - INFO - __main__ -   Step: 7036, LR: 5.3465917376897954e-06, Loss: 398.1427307128906
2024-08-04T05:04:51.367511275Z 
 74%|███████▍  | 7037/9500 [24:07:21<8:25:29, 12.31s/it]08/03/2024 22:04:51 - INFO - __main__ -   Step: 7037, LR: 5.344421194002517e-06, Loss: 389.3632507324219
2024-08-04T05:05:03.909321015Z 
 74%|███████▍  | 7038/9500 [24:07:33<8:28:05, 12.38s/it]08/03/2024 22:05:03 - INFO - __main__ -   Step: 7038, LR: 5.3422506503152386e-06, Loss: 393.4718322753906
2024-08-04T05:05:15.951179926Z 
 74%|███████▍  | 7039/9500 [24:07:45<8:23:41, 12.28s/it]08/03/2024 22:05:15 - INFO - __main__ -   Step: 7039, LR: 5.340080106627959e-06, Loss: 347.2464904785156
2024-08-04T05:05:27.922602134Z 
 74%|███████▍  | 7040/9500 [24:07:57<8:19:41, 12.19s/it]08/03/2024 22:05:27 - INFO - __main__ -   Step: 7040, LR: 5.33790956294068e-06, Loss: 378.75799560546875
2024-08-04T05:05:40.556793378Z 
 74%|███████▍  | 7041/9500 [24:08:10<8:24:58, 12.32s/it]08/03/2024 22:05:40 - INFO - __main__ -   Step: 7041, LR: 5.335739019253401e-06, Loss: 424.88079833984375
2024-08-04T05:05:52.503810416Z 
 74%|███████▍  | 7042/9500 [24:08:22<8:20:09, 12.21s/it]08/03/2024 22:05:52 - INFO - __main__ -   Step: 7042, LR: 5.333568475566122e-06, Loss: 406.13494873046875
2024-08-04T05:06:05.109718262Z 
 74%|███████▍  | 7043/9500 [24:08:35<8:24:50, 12.33s/it]08/03/2024 22:06:05 - INFO - __main__ -   Step: 7043, LR: 5.331397931878843e-06, Loss: 326.0389099121094
2024-08-04T05:06:17.760485368Z 
 74%|███████▍  | 7044/9500 [24:08:47<8:28:35, 12.42s/it]08/03/2024 22:06:17 - INFO - __main__ -   Step: 7044, LR: 5.3292273881915645e-06, Loss: 382.7516784667969
2024-08-04T05:06:29.888342409Z 
 74%|███████▍  | 7045/9500 [24:08:59<8:24:44, 12.34s/it]08/03/2024 22:06:29 - INFO - __main__ -   Step: 7045, LR: 5.327056844504286e-06, Loss: 425.677734375
2024-08-04T05:06:42.275275295Z 
 74%|███████▍  | 7046/9500 [24:09:12<8:25:09, 12.35s/it]08/03/2024 22:06:42 - INFO - __main__ -   Step: 7046, LR: 5.324886300817007e-06, Loss: 429.7961120605469
2024-08-04T05:06:54.584561773Z 
 74%|███████▍  | 7047/9500 [24:09:24<8:24:26, 12.34s/it]08/03/2024 22:06:54 - INFO - __main__ -   Step: 7047, LR: 5.3227157571297275e-06, Loss: 449.20123291015625
2024-08-04T05:07:07.294259680Z 
 74%|███████▍  | 7048/9500 [24:09:37<8:28:47, 12.45s/it]08/03/2024 22:07:07 - INFO - __main__ -   Step: 7048, LR: 5.320545213442448e-06, Loss: 505.557861328125
2024-08-04T05:07:19.783373154Z 
 74%|███████▍  | 7049/9500 [24:09:49<8:29:03, 12.46s/it]08/03/2024 22:07:19 - INFO - __main__ -   Step: 7049, LR: 5.31837466975517e-06, Loss: 493.3431396484375
2024-08-04T05:07:31.986573385Z 
 74%|███████▍  | 7050/9500 [24:10:01<8:25:41, 12.38s/it]08/03/2024 22:07:31 - INFO - __main__ -   Step: 7050, LR: 5.316204126067891e-06, Loss: 353.72369384765625
2024-08-04T05:07:44.417099671Z 
 74%|███████▍  | 7051/9500 [24:10:14<8:26:02, 12.40s/it]08/03/2024 22:07:44 - INFO - __main__ -   Step: 7051, LR: 5.314033582380612e-06, Loss: 365.55950927734375
2024-08-04T05:07:56.539850342Z 
 74%|███████▍  | 7052/9500 [24:10:26<8:22:28, 12.32s/it]08/03/2024 22:07:56 - INFO - __main__ -   Step: 7052, LR: 5.311863038693334e-06, Loss: 257.30596923828125
2024-08-04T05:08:08.573509689Z 
 74%|███████▍  | 7053/9500 [24:10:38<8:18:48, 12.23s/it]08/03/2024 22:08:08 - INFO - __main__ -   Step: 7053, LR: 5.309692495006054e-06, Loss: 359.14453125
2024-08-04T05:08:21.345939808Z 
 74%|███████▍  | 7054/9500 [24:10:51<8:25:13, 12.39s/it]08/03/2024 22:08:21 - INFO - __main__ -   Step: 7054, LR: 5.307521951318775e-06, Loss: 468.1061096191406
2024-08-04T05:08:33.620638697Z 
 74%|███████▍  | 7055/9500 [24:11:03<8:23:34, 12.36s/it]08/03/2024 22:08:33 - INFO - __main__ -   Step: 7055, LR: 5.305351407631496e-06, Loss: 370.6864318847656
2024-08-04T05:08:45.828481297Z 
 74%|███████▍  | 7056/9500 [24:11:15<8:21:32, 12.31s/it]08/03/2024 22:08:45 - INFO - __main__ -   Step: 7056, LR: 5.303180863944217e-06, Loss: 455.30078125
2024-08-04T05:08:58.318840538Z 
 74%|███████▍  | 7057/9500 [24:11:28<8:23:30, 12.37s/it]08/03/2024 22:08:58 - INFO - __main__ -   Step: 7057, LR: 5.301010320256939e-06, Loss: 246.96690368652344
2024-08-04T05:09:10.677382602Z 
 74%|███████▍  | 7058/9500 [24:11:40<8:23:12, 12.36s/it]08/03/2024 22:09:10 - INFO - __main__ -   Step: 7058, LR: 5.29883977656966e-06, Loss: 405.82281494140625
2024-08-04T05:09:22.790468954Z 
 74%|███████▍  | 7059/9500 [24:11:52<8:19:56, 12.29s/it]08/03/2024 22:09:22 - INFO - __main__ -   Step: 7059, LR: 5.296669232882381e-06, Loss: 398.9610290527344
2024-08-04T05:09:35.537354511Z 
 74%|███████▍  | 7060/9500 [24:12:05<8:25:19, 12.43s/it]08/03/2024 22:09:35 - INFO - __main__ -   Step: 7060, LR: 5.294498689195102e-06, Loss: 541.8722534179688
2024-08-04T05:09:47.954539360Z 
 74%|███████▍  | 7061/9500 [24:12:17<8:25:00, 12.42s/it]08/03/2024 22:09:47 - INFO - __main__ -   Step: 7061, LR: 5.2923281455078226e-06, Loss: 505.8905029296875
2024-08-04T05:10:00.004034645Z 
 74%|███████▍  | 7062/9500 [24:12:29<8:20:14, 12.31s/it]08/03/2024 22:10:00 - INFO - __main__ -   Step: 7062, LR: 5.290157601820543e-06, Loss: 381.8678894042969
2024-08-04T05:10:12.400123242Z 
 74%|███████▍  | 7063/9500 [24:12:42<8:21:04, 12.34s/it]08/03/2024 22:10:12 - INFO - __main__ -   Step: 7063, LR: 5.287987058133265e-06, Loss: 398.1973876953125
2024-08-04T05:10:24.662655185Z 
 74%|███████▍  | 7064/9500 [24:12:54<8:19:58, 12.31s/it]08/03/2024 22:10:24 - INFO - __main__ -   Step: 7064, LR: 5.285816514445986e-06, Loss: 468.36431884765625
2024-08-04T05:10:36.777770905Z 
 74%|███████▍  | 7065/9500 [24:13:06<8:17:20, 12.25s/it]08/03/2024 22:10:36 - INFO - __main__ -   Step: 7065, LR: 5.283645970758707e-06, Loss: 408.24725341796875
2024-08-04T05:10:49.331579459Z 
 74%|███████▍  | 7066/9500 [24:13:19<8:20:46, 12.34s/it]08/03/2024 22:10:49 - INFO - __main__ -   Step: 7066, LR: 5.281475427071429e-06, Loss: 490.5498046875
2024-08-04T05:11:01.835007743Z 
 74%|███████▍  | 7067/9500 [24:13:31<8:22:29, 12.39s/it]08/03/2024 22:11:01 - INFO - __main__ -   Step: 7067, LR: 5.27930488338415e-06, Loss: 502.90814208984375
2024-08-04T05:11:14.010533869Z 
 74%|███████▍  | 7068/9500 [24:13:43<8:19:39, 12.33s/it]08/03/2024 22:11:14 - INFO - __main__ -   Step: 7068, LR: 5.27713433969687e-06, Loss: 430.04742431640625
2024-08-04T05:11:26.601801338Z 
 74%|███████▍  | 7069/9500 [24:13:56<8:22:40, 12.41s/it]08/03/2024 22:11:26 - INFO - __main__ -   Step: 7069, LR: 5.274963796009591e-06, Loss: 466.8625183105469
2024-08-04T05:11:38.868197822Z 
 74%|███████▍  | 7070/9500 [24:14:08<8:20:45, 12.36s/it]08/03/2024 22:11:38 - INFO - __main__ -   Step: 7070, LR: 5.272793252322312e-06, Loss: 481.98095703125
2024-08-04T05:11:51.082121512Z 
 74%|███████▍  | 7071/9500 [24:14:21<8:18:43, 12.32s/it]08/03/2024 22:11:51 - INFO - __main__ -   Step: 7071, LR: 5.270622708635034e-06, Loss: 310.18304443359375
2024-08-04T05:12:03.542504444Z 
 74%|███████▍  | 7072/9500 [24:14:33<8:20:13, 12.36s/it]08/03/2024 22:12:03 - INFO - __main__ -   Step: 7072, LR: 5.268452164947755e-06, Loss: 364.30548095703125
2024-08-04T05:12:15.537952754Z 
 74%|███████▍  | 7073/9500 [24:14:45<8:15:34, 12.25s/it]08/03/2024 22:12:15 - INFO - __main__ -   Step: 7073, LR: 5.266281621260476e-06, Loss: 443.60498046875
2024-08-04T05:12:27.970223886Z 
 74%|███████▍  | 7074/9500 [24:14:57<8:17:34, 12.31s/it]08/03/2024 22:12:27 - INFO - __main__ -   Step: 7074, LR: 5.264111077573198e-06, Loss: 430.85064697265625
2024-08-04T05:12:40.654387474Z 
 74%|███████▍  | 7075/9500 [24:15:10<8:21:57, 12.42s/it]08/03/2024 22:12:40 - INFO - __main__ -   Step: 7075, LR: 5.261940533885918e-06, Loss: 469.78253173828125
2024-08-04T05:12:52.753471902Z 
 74%|███████▍  | 7076/9500 [24:15:22<8:17:51, 12.32s/it]08/03/2024 22:12:52 - INFO - __main__ -   Step: 7076, LR: 5.259769990198639e-06, Loss: 455.20166015625
2024-08-04T05:13:05.021880609Z 
 74%|███████▍  | 7077/9500 [24:15:34<8:16:59, 12.31s/it]08/03/2024 22:13:05 - INFO - __main__ -   Step: 7077, LR: 5.25759944651136e-06, Loss: 459.1375427246094
2024-08-04T05:13:17.833600667Z 
 75%|███████▍  | 7078/9500 [24:15:47<8:22:54, 12.46s/it]08/03/2024 22:13:17 - INFO - __main__ -   Step: 7078, LR: 5.2554289028240815e-06, Loss: 348.13140869140625
2024-08-04T05:13:30.295670910Z 
 75%|███████▍  | 7079/9500 [24:16:00<8:22:44, 12.46s/it]08/03/2024 22:13:30 - INFO - __main__ -   Step: 7079, LR: 5.253258359136802e-06, Loss: 504.04669189453125
2024-08-04T05:13:42.630160178Z 
 75%|███████▍  | 7080/9500 [24:16:12<8:21:01, 12.42s/it]08/03/2024 22:13:42 - INFO - __main__ -   Step: 7080, LR: 5.251087815449524e-06, Loss: 462.88397216796875
2024-08-04T05:13:55.199459120Z 
 75%|███████▍  | 7081/9500 [24:16:25<8:22:35, 12.47s/it]08/03/2024 22:13:55 - INFO - __main__ -   Step: 7081, LR: 5.248917271762245e-06, Loss: 439.75543212890625
2024-08-04T05:14:07.556349175Z 
 75%|███████▍  | 7082/9500 [24:16:37<8:21:03, 12.43s/it]08/03/2024 22:14:07 - INFO - __main__ -   Step: 7082, LR: 5.246746728074965e-06, Loss: 407.6409912109375
2024-08-04T05:14:19.735715851Z 
 75%|███████▍  | 7083/9500 [24:16:49<8:17:47, 12.36s/it]08/03/2024 22:14:19 - INFO - __main__ -   Step: 7083, LR: 5.244576184387687e-06, Loss: 364.1403503417969
2024-08-04T05:14:32.508331462Z 
 75%|███████▍  | 7084/9500 [24:17:02<8:22:36, 12.48s/it]08/03/2024 22:14:32 - INFO - __main__ -   Step: 7084, LR: 5.2424056407004074e-06, Loss: 418.1795959472656
2024-08-04T05:14:44.756727280Z 
 75%|███████▍  | 7085/9500 [24:17:14<8:19:34, 12.41s/it]08/03/2024 22:14:44 - INFO - __main__ -   Step: 7085, LR: 5.240235097013129e-06, Loss: 377.5350646972656
2024-08-04T05:14:56.859433351Z 
 75%|███████▍  | 7086/9500 [24:17:26<8:15:38, 12.32s/it]08/03/2024 22:14:56 - INFO - __main__ -   Step: 7086, LR: 5.23806455332585e-06, Loss: 309.8067626953125
2024-08-04T05:15:09.305588517Z 
 75%|███████▍  | 7087/9500 [24:17:39<8:16:57, 12.36s/it]08/03/2024 22:15:09 - INFO - __main__ -   Step: 7087, LR: 5.235894009638571e-06, Loss: 462.99853515625
2024-08-04T05:15:21.487673915Z 
 75%|███████▍  | 7088/9500 [24:17:51<8:14:38, 12.30s/it]08/03/2024 22:15:21 - INFO - __main__ -   Step: 7088, LR: 5.233723465951293e-06, Loss: 413.09552001953125
2024-08-04T05:15:33.606694675Z 
 75%|███████▍  | 7089/9500 [24:18:03<8:12:12, 12.25s/it]08/03/2024 22:15:33 - INFO - __main__ -   Step: 7089, LR: 5.231552922264013e-06, Loss: 443.635498046875
2024-08-04T05:15:45.844079268Z 
 75%|███████▍  | 7090/9500 [24:18:15<8:11:51, 12.25s/it]08/03/2024 22:15:45 - INFO - __main__ -   Step: 7090, LR: 5.229382378576734e-06, Loss: 431.24591064453125
2024-08-04T05:15:58.587670870Z 
 75%|███████▍  | 7091/9500 [24:18:28<8:17:39, 12.39s/it]08/03/2024 22:15:58 - INFO - __main__ -   Step: 7091, LR: 5.227211834889455e-06, Loss: 553.936279296875
2024-08-04T05:16:11.006534617Z 
 75%|███████▍  | 7092/9500 [24:18:40<8:17:44, 12.40s/it]08/03/2024 22:16:11 - INFO - __main__ -   Step: 7092, LR: 5.2250412912021765e-06, Loss: 403.48486328125
2024-08-04T05:16:23.507135099Z 
 75%|███████▍  | 7093/9500 [24:18:53<8:18:43, 12.43s/it]08/03/2024 22:16:23 - INFO - __main__ -   Step: 7093, LR: 5.222870747514897e-06, Loss: 533.4671630859375
2024-08-04T05:16:36.026114636Z 
 75%|███████▍  | 7094/9500 [24:19:05<8:19:33, 12.46s/it]08/03/2024 22:16:36 - INFO - __main__ -   Step: 7094, LR: 5.220700203827619e-06, Loss: 430.32928466796875
2024-08-04T05:16:48.393372476Z 
 75%|███████▍  | 7095/9500 [24:19:18<8:18:15, 12.43s/it]08/03/2024 22:16:48 - INFO - __main__ -   Step: 7095, LR: 5.21852966014034e-06, Loss: 393.275390625
2024-08-04T05:17:00.530656645Z 
 75%|███████▍  | 7096/9500 [24:19:30<8:14:31, 12.34s/it]08/03/2024 22:17:00 - INFO - __main__ -   Step: 7096, LR: 5.21635911645306e-06, Loss: 443.1925964355469
2024-08-04T05:17:13.350474758Z 
 75%|███████▍  | 7097/9500 [24:19:43<8:20:03, 12.49s/it]08/03/2024 22:17:13 - INFO - __main__ -   Step: 7097, LR: 5.214188572765782e-06, Loss: 476.557861328125
2024-08-04T05:17:25.957068726Z 
 75%|███████▍  | 7098/9500 [24:19:55<8:21:17, 12.52s/it]08/03/2024 22:17:25 - INFO - __main__ -   Step: 7098, LR: 5.2120180290785025e-06, Loss: 387.6826171875
2024-08-04T05:17:38.440061636Z 
 75%|███████▍  | 7099/9500 [24:20:08<8:20:37, 12.51s/it]08/03/2024 22:17:38 - INFO - __main__ -   Step: 7099, LR: 5.209847485391224e-06, Loss: 481.78155517578125
2024-08-04T05:17:51.012574582Z 
 75%|███████▍  | 7100/9500 [24:20:20<8:21:09, 12.53s/it]08/03/2024 22:17:51 - INFO - __main__ -   Step: 7100, LR: 5.207676941703946e-06, Loss: 352.72613525390625
2024-08-04T05:18:03.526557807Z 
 75%|███████▍  | 7101/9500 [24:20:33<8:20:46, 12.52s/it]08/03/2024 22:18:03 - INFO - __main__ -   Step: 7101, LR: 5.205506398016666e-06, Loss: 430.1250305175781
2024-08-04T05:18:15.314035742Z 
 75%|███████▍  | 7102/9500 [24:20:45<8:11:43, 12.30s/it]08/03/2024 22:18:15 - INFO - __main__ -   Step: 7102, LR: 5.203335854329388e-06, Loss: 407.1019592285156
2024-08-04T05:18:27.586594980Z 
 75%|███████▍  | 7103/9500 [24:20:57<8:11:09, 12.29s/it]08/03/2024 22:18:27 - INFO - __main__ -   Step: 7103, LR: 5.201165310642108e-06, Loss: 382.284912109375
2024-08-04T05:18:39.868577617Z 
 75%|███████▍  | 7104/9500 [24:21:09<8:10:48, 12.29s/it]08/03/2024 22:18:39 - INFO - __main__ -   Step: 7104, LR: 5.198994766954829e-06, Loss: 417.0443115234375
2024-08-04T05:18:52.230284033Z 
 75%|███████▍  | 7105/9500 [24:21:22<8:11:26, 12.31s/it]08/03/2024 22:18:52 - INFO - __main__ -   Step: 7105, LR: 5.19682422326755e-06, Loss: 458.31927490234375
2024-08-04T05:19:04.757494469Z 
 75%|███████▍  | 7106/9500 [24:21:34<8:13:49, 12.38s/it]08/03/2024 22:19:04 - INFO - __main__ -   Step: 7106, LR: 5.1946536795802716e-06, Loss: 387.581787109375
2024-08-04T05:19:16.987090935Z 
 75%|███████▍  | 7107/9500 [24:21:46<8:11:51, 12.33s/it]08/03/2024 22:19:16 - INFO - __main__ -   Step: 7107, LR: 5.192483135892993e-06, Loss: 467.4116516113281
2024-08-04T05:19:29.258293671Z 
 75%|███████▍  | 7108/9500 [24:21:59<8:10:55, 12.31s/it]08/03/2024 22:19:29 - INFO - __main__ -   Step: 7108, LR: 5.190312592205714e-06, Loss: 401.7191162109375
2024-08-04T05:19:41.645145897Z 
 75%|███████▍  | 7109/9500 [24:22:11<8:11:35, 12.34s/it]08/03/2024 22:19:41 - INFO - __main__ -   Step: 7109, LR: 5.188142048518435e-06, Loss: 421.81988525390625
2024-08-04T05:19:54.020532602Z 
 75%|███████▍  | 7110/9500 [24:22:23<8:11:50, 12.35s/it]08/03/2024 22:19:54 - INFO - __main__ -   Step: 7110, LR: 5.185971504831155e-06, Loss: 365.1363525390625
2024-08-04T05:20:06.179741522Z 
 75%|███████▍  | 7111/9500 [24:22:36<8:09:23, 12.29s/it]08/03/2024 22:20:06 - INFO - __main__ -   Step: 7111, LR: 5.183800961143877e-06, Loss: 438.6568908691406
2024-08-04T05:20:18.530725429Z 
 75%|███████▍  | 7112/9500 [24:22:48<8:09:54, 12.31s/it]08/03/2024 22:20:18 - INFO - __main__ -   Step: 7112, LR: 5.1816304174565975e-06, Loss: 442.2261962890625
2024-08-04T05:20:30.619483339Z 
 75%|███████▍  | 7113/9500 [24:23:00<8:07:04, 12.24s/it]08/03/2024 22:20:30 - INFO - __main__ -   Step: 7113, LR: 5.179459873769319e-06, Loss: 391.2227478027344
2024-08-04T05:20:42.558481156Z 
 75%|███████▍  | 7114/9500 [24:23:12<8:03:14, 12.15s/it]08/03/2024 22:20:42 - INFO - __main__ -   Step: 7114, LR: 5.177289330082041e-06, Loss: 396.1859130859375
2024-08-04T05:20:55.094964846Z 
 75%|███████▍  | 7115/9500 [24:23:25<8:07:37, 12.27s/it]08/03/2024 22:20:55 - INFO - __main__ -   Step: 7115, LR: 5.175118786394761e-06, Loss: 548.820556640625
2024-08-04T05:21:07.338545304Z 
 75%|███████▍  | 7116/9500 [24:23:37<8:07:08, 12.26s/it]08/03/2024 22:21:07 - INFO - __main__ -   Step: 7116, LR: 5.172948242707483e-06, Loss: 335.54498291015625
2024-08-04T05:21:19.505279495Z 
 75%|███████▍  | 7117/9500 [24:23:49<8:05:49, 12.23s/it]08/03/2024 22:21:19 - INFO - __main__ -   Step: 7117, LR: 5.170777699020203e-06, Loss: 384.7774658203125
2024-08-04T05:21:31.797634205Z 
 75%|███████▍  | 7118/9500 [24:24:01<8:06:19, 12.25s/it]08/03/2024 22:21:31 - INFO - __main__ -   Step: 7118, LR: 5.168607155332924e-06, Loss: 367.0580139160156
2024-08-04T05:21:44.020107345Z 
 75%|███████▍  | 7119/9500 [24:24:13<8:05:47, 12.24s/it]08/03/2024 22:21:44 - INFO - __main__ -   Step: 7119, LR: 5.166436611645645e-06, Loss: 462.35748291015625
2024-08-04T05:21:56.416151988Z 
 75%|███████▍  | 7120/9500 [24:24:26<8:07:25, 12.29s/it]08/03/2024 22:21:56 - INFO - __main__ -   Step: 7120, LR: 5.164266067958367e-06, Loss: 358.2093811035156
2024-08-04T05:22:08.712072475Z 
 75%|███████▍  | 7121/9500 [24:24:38<8:07:18, 12.29s/it]08/03/2024 22:22:08 - INFO - __main__ -   Step: 7121, LR: 5.162095524271088e-06, Loss: 358.5883483886719
2024-08-04T05:22:20.686471016Z 
 75%|███████▍  | 7122/9500 [24:24:50<8:03:21, 12.20s/it]08/03/2024 22:22:20 - INFO - __main__ -   Step: 7122, LR: 5.159924980583809e-06, Loss: 458.26141357421875
2024-08-04T05:22:33.035233204Z 
 75%|███████▍  | 7123/9500 [24:25:02<8:04:58, 12.24s/it]08/03/2024 22:22:33 - INFO - __main__ -   Step: 7123, LR: 5.1577544368965305e-06, Loss: 286.4940490722656
2024-08-04T05:22:46.013065458Z 
 75%|███████▍  | 7124/9500 [24:25:15<8:13:30, 12.46s/it]08/03/2024 22:22:46 - INFO - __main__ -   Step: 7124, LR: 5.15558389320925e-06, Loss: 360.563720703125
2024-08-04T05:22:58.451806430Z 
 75%|███████▌  | 7125/9500 [24:25:28<8:13:01, 12.46s/it]08/03/2024 22:22:58 - INFO - __main__ -   Step: 7125, LR: 5.153413349521972e-06, Loss: 539.82958984375
2024-08-04T05:23:10.952835244Z 
 75%|███████▌  | 7126/9500 [24:25:40<8:13:21, 12.47s/it]08/03/2024 22:23:10 - INFO - __main__ -   Step: 7126, LR: 5.1512428058346934e-06, Loss: 488.11370849609375
2024-08-04T05:23:23.388634516Z 
 75%|███████▌  | 7127/9500 [24:25:53<8:12:45, 12.46s/it]08/03/2024 22:23:23 - INFO - __main__ -   Step: 7127, LR: 5.149072262147414e-06, Loss: 422.6669006347656
2024-08-04T05:23:35.528629357Z 
 75%|███████▌  | 7128/9500 [24:26:05<8:08:45, 12.36s/it]08/03/2024 22:23:35 - INFO - __main__ -   Step: 7128, LR: 5.146901718460136e-06, Loss: 411.3612976074219
2024-08-04T05:23:48.087478588Z 
 75%|███████▌  | 7129/9500 [24:26:18<8:10:52, 12.42s/it]08/03/2024 22:23:48 - INFO - __main__ -   Step: 7129, LR: 5.1447311747728564e-06, Loss: 492.4788513183594
2024-08-04T05:24:00.658312582Z 
 75%|███████▌  | 7130/9500 [24:26:30<8:12:25, 12.47s/it]08/03/2024 22:24:00 - INFO - __main__ -   Step: 7130, LR: 5.142560631085578e-06, Loss: 314.07562255859375
2024-08-04T05:24:12.887975271Z 
 75%|███████▌  | 7131/9500 [24:26:42<8:09:24, 12.40s/it]08/03/2024 22:24:12 - INFO - __main__ -   Step: 7131, LR: 5.140390087398298e-06, Loss: 543.489013671875
2024-08-04T05:24:24.966334310Z 
 75%|███████▌  | 7132/9500 [24:26:54<8:05:27, 12.30s/it]08/03/2024 22:24:24 - INFO - __main__ -   Step: 7132, LR: 5.138219543711019e-06, Loss: 399.54132080078125
2024-08-04T05:24:37.096938189Z 
 75%|███████▌  | 7133/9500 [24:27:07<8:03:14, 12.25s/it]08/03/2024 22:24:37 - INFO - __main__ -   Step: 7133, LR: 5.136049000023741e-06, Loss: 388.42364501953125
2024-08-04T05:24:49.772850012Z 
 75%|███████▌  | 7134/9500 [24:27:19<8:08:04, 12.38s/it]08/03/2024 22:24:49 - INFO - __main__ -   Step: 7134, LR: 5.133878456336462e-06, Loss: 394.7488708496094
2024-08-04T05:25:02.007098425Z 
 75%|███████▌  | 7135/9500 [24:27:31<8:06:11, 12.33s/it]08/03/2024 22:25:02 - INFO - __main__ -   Step: 7135, LR: 5.131707912649183e-06, Loss: 394.21392822265625
2024-08-04T05:25:14.195783941Z 
 75%|███████▌  | 7136/9500 [24:27:44<8:04:15, 12.29s/it]08/03/2024 22:25:14 - INFO - __main__ -   Step: 7136, LR: 5.129537368961904e-06, Loss: 384.82867431640625
2024-08-04T05:25:26.643695675Z 
 75%|███████▌  | 7137/9500 [24:27:56<8:05:54, 12.34s/it]08/03/2024 22:25:26 - INFO - __main__ -   Step: 7137, LR: 5.1273668252746255e-06, Loss: 373.2171325683594
2024-08-04T05:25:39.483900606Z 
 75%|███████▌  | 7138/9500 [24:28:09<8:11:38, 12.49s/it]08/03/2024 22:25:39 - INFO - __main__ -   Step: 7138, LR: 5.125196281587345e-06, Loss: 420.924560546875
2024-08-04T05:25:51.505448610Z 
 75%|███████▌  | 7139/9500 [24:28:21<8:05:54, 12.35s/it]08/03/2024 22:25:51 - INFO - __main__ -   Step: 7139, LR: 5.123025737900067e-06, Loss: 352.43475341796875
2024-08-04T05:26:04.160233511Z 
 75%|███████▌  | 7140/9500 [24:28:34<8:09:19, 12.44s/it]08/03/2024 22:26:04 - INFO - __main__ -   Step: 7140, LR: 5.1208551942127885e-06, Loss: 350.5919494628906
2024-08-04T05:26:16.303100228Z 
 75%|███████▌  | 7141/9500 [24:28:46<8:05:36, 12.35s/it]08/03/2024 22:26:16 - INFO - __main__ -   Step: 7141, LR: 5.118684650525509e-06, Loss: 387.3594055175781
2024-08-04T05:26:28.721313259Z 
 75%|███████▌  | 7142/9500 [24:28:58<8:06:11, 12.37s/it]08/03/2024 22:26:28 - INFO - __main__ -   Step: 7142, LR: 5.116514106838231e-06, Loss: 449.15118408203125
2024-08-04T05:26:40.933520935Z 
 75%|███████▌  | 7143/9500 [24:29:10<8:04:06, 12.32s/it]08/03/2024 22:26:40 - INFO - __main__ -   Step: 7143, LR: 5.114343563150952e-06, Loss: 320.8729553222656
2024-08-04T05:26:53.397520707Z 
 75%|███████▌  | 7144/9500 [24:29:23<8:05:33, 12.37s/it]08/03/2024 22:26:53 - INFO - __main__ -   Step: 7144, LR: 5.112173019463673e-06, Loss: 312.99267578125
2024-08-04T05:27:05.368440846Z 
 75%|███████▌  | 7145/9500 [24:29:35<8:00:42, 12.25s/it]08/03/2024 22:27:05 - INFO - __main__ -   Step: 7145, LR: 5.110002475776393e-06, Loss: 386.3697509765625
2024-08-04T05:27:17.934213186Z 
 75%|███████▌  | 7146/9500 [24:29:47<8:04:14, 12.34s/it]08/03/2024 22:27:17 - INFO - __main__ -   Step: 7146, LR: 5.1078319320891145e-06, Loss: 436.40887451171875
2024-08-04T05:27:30.220099297Z 
 75%|███████▌  | 7147/9500 [24:30:00<8:03:22, 12.33s/it]08/03/2024 22:27:30 - INFO - __main__ -   Step: 7147, LR: 5.105661388401836e-06, Loss: 394.81207275390625
2024-08-04T05:27:42.514133703Z 
 75%|███████▌  | 7148/9500 [24:30:12<8:02:47, 12.32s/it]08/03/2024 22:27:42 - INFO - __main__ -   Step: 7148, LR: 5.103490844714557e-06, Loss: 467.27166748046875
2024-08-04T05:27:54.838524180Z 
 75%|███████▌  | 7149/9500 [24:30:24<8:02:41, 12.32s/it]08/03/2024 22:27:54 - INFO - __main__ -   Step: 7149, LR: 5.101320301027278e-06, Loss: 397.92816162109375
2024-08-04T05:28:07.028974966Z 
 75%|███████▌  | 7150/9500 [24:30:36<8:00:58, 12.28s/it]08/03/2024 22:28:07 - INFO - __main__ -   Step: 7150, LR: 5.09914975734e-06, Loss: 290.4599914550781
2024-08-04T05:28:19.486726497Z 
 75%|███████▌  | 7151/9500 [24:30:49<8:02:51, 12.33s/it]08/03/2024 22:28:19 - INFO - __main__ -   Step: 7151, LR: 5.0969792136527206e-06, Loss: 495.9660339355469
2024-08-04T05:28:31.664659209Z 
 75%|███████▌  | 7152/9500 [24:31:01<8:00:49, 12.29s/it]08/03/2024 22:28:31 - INFO - __main__ -   Step: 7152, LR: 5.094808669965441e-06, Loss: 383.6749267578125
2024-08-04T05:28:44.053059743Z 
 75%|███████▌  | 7153/9500 [24:31:13<8:01:48, 12.32s/it]08/03/2024 22:28:44 - INFO - __main__ -   Step: 7153, LR: 5.092638126278162e-06, Loss: 455.0997619628906
2024-08-04T05:28:56.209230008Z 
 75%|███████▌  | 7154/9500 [24:31:26<7:59:42, 12.27s/it]08/03/2024 22:28:56 - INFO - __main__ -   Step: 7154, LR: 5.0904675825908836e-06, Loss: 339.80517578125
2024-08-04T05:29:08.781205023Z 
 75%|███████▌  | 7155/9500 [24:31:38<8:03:03, 12.36s/it]08/03/2024 22:29:08 - INFO - __main__ -   Step: 7155, LR: 5.088297038903604e-06, Loss: 373.5227355957031
2024-08-04T05:29:21.179958596Z 
 75%|███████▌  | 7156/9500 [24:31:51<8:03:18, 12.37s/it]08/03/2024 22:29:21 - INFO - __main__ -   Step: 7156, LR: 5.086126495216326e-06, Loss: 543.2831420898438
2024-08-04T05:29:33.366924394Z 
 75%|███████▌  | 7157/9500 [24:32:03<8:00:56, 12.32s/it]08/03/2024 22:29:33 - INFO - __main__ -   Step: 7157, LR: 5.083955951529047e-06, Loss: 413.15191650390625
2024-08-04T05:29:46.260152684Z 
 75%|███████▌  | 7158/9500 [24:32:16<8:07:29, 12.49s/it]08/03/2024 22:29:46 - INFO - __main__ -   Step: 7158, LR: 5.081785407841768e-06, Loss: 372.3138427734375
2024-08-04T05:29:58.281739043Z 
 75%|███████▌  | 7159/9500 [24:32:28<8:01:48, 12.35s/it]08/03/2024 22:29:58 - INFO - __main__ -   Step: 7159, LR: 5.079614864154489e-06, Loss: 387.0052185058594
2024-08-04T05:30:10.481477298Z 
 75%|███████▌  | 7160/9500 [24:32:40<7:59:51, 12.30s/it]08/03/2024 22:30:10 - INFO - __main__ -   Step: 7160, LR: 5.0774443204672095e-06, Loss: 431.806640625
2024-08-04T05:30:23.165493915Z 
 75%|███████▌  | 7161/9500 [24:32:53<8:04:06, 12.42s/it]08/03/2024 22:30:23 - INFO - __main__ -   Step: 7161, LR: 5.075273776779931e-06, Loss: 465.8765869140625
2024-08-04T05:30:35.464936224Z 
 75%|███████▌  | 7162/9500 [24:33:05<8:02:30, 12.38s/it]08/03/2024 22:30:35 - INFO - __main__ -   Step: 7162, LR: 5.073103233092652e-06, Loss: 369.6068420410156
2024-08-04T05:30:47.921162713Z 
 75%|███████▌  | 7163/9500 [24:33:17<8:03:09, 12.40s/it]08/03/2024 22:30:47 - INFO - __main__ -   Step: 7163, LR: 5.070932689405373e-06, Loss: 408.2451171875
2024-08-04T05:31:00.261945688Z 
 75%|███████▌  | 7164/9500 [24:33:30<8:02:11, 12.39s/it]08/03/2024 22:31:00 - INFO - __main__ -   Step: 7164, LR: 5.068762145718095e-06, Loss: 387.6067810058594
2024-08-04T05:31:12.350893324Z 
 75%|███████▌  | 7165/9500 [24:33:42<7:58:32, 12.30s/it]08/03/2024 22:31:12 - INFO - __main__ -   Step: 7165, LR: 5.066591602030816e-06, Loss: 465.6144104003906
2024-08-04T05:31:24.540916787Z 
 75%|███████▌  | 7166/9500 [24:33:54<7:57:05, 12.26s/it]08/03/2024 22:31:24 - INFO - __main__ -   Step: 7166, LR: 5.064421058343536e-06, Loss: 413.0997009277344
2024-08-04T05:31:37.310136973Z 
 75%|███████▌  | 7167/9500 [24:34:07<8:02:46, 12.42s/it]08/03/2024 22:31:37 - INFO - __main__ -   Step: 7167, LR: 5.062250514656257e-06, Loss: 343.0748291015625
2024-08-04T05:31:49.172825134Z 
 75%|███████▌  | 7168/9500 [24:34:19<7:56:07, 12.25s/it]08/03/2024 22:31:49 - INFO - __main__ -   Step: 7168, LR: 5.060079970968979e-06, Loss: 271.02691650390625
2024-08-04T05:32:01.618427951Z 
 75%|███████▌  | 7169/9500 [24:34:31<7:58:11, 12.31s/it]08/03/2024 22:32:01 - INFO - __main__ -   Step: 7169, LR: 5.0579094272817e-06, Loss: 541.7012329101562
2024-08-04T05:32:14.039909221Z 
 75%|███████▌  | 7170/9500 [24:34:43<7:59:18, 12.34s/it]08/03/2024 22:32:14 - INFO - __main__ -   Step: 7170, LR: 5.055738883594421e-06, Loss: 438.00531005859375
2024-08-04T05:32:26.213386820Z 
 75%|███████▌  | 7171/9500 [24:34:56<7:57:07, 12.29s/it]08/03/2024 22:32:26 - INFO - __main__ -   Step: 7171, LR: 5.0535683399071424e-06, Loss: 403.29833984375
2024-08-04T05:32:38.511543121Z 
 75%|███████▌  | 7172/9500 [24:35:08<7:56:59, 12.29s/it]08/03/2024 22:32:38 - INFO - __main__ -   Step: 7172, LR: 5.051397796219862e-06, Loss: 440.3652038574219
2024-08-04T05:32:50.738058456Z 
 76%|███████▌  | 7173/9500 [24:35:20<7:56:00, 12.27s/it]08/03/2024 22:32:50 - INFO - __main__ -   Step: 7173, LR: 5.049227252532584e-06, Loss: 294.16546630859375
2024-08-04T05:33:02.781282435Z 
 76%|███████▌  | 7174/9500 [24:35:32<7:53:07, 12.20s/it]08/03/2024 22:33:02 - INFO - __main__ -   Step: 7174, LR: 5.047056708845305e-06, Loss: 390.05712890625
2024-08-04T05:33:14.892176277Z 
 76%|███████▌  | 7175/9500 [24:35:44<7:51:50, 12.18s/it]08/03/2024 22:33:14 - INFO - __main__ -   Step: 7175, LR: 5.044886165158026e-06, Loss: 268.3177185058594
2024-08-04T05:33:27.312435140Z 
 76%|███████▌  | 7176/9500 [24:35:57<7:54:27, 12.25s/it]08/03/2024 22:33:27 - INFO - __main__ -   Step: 7176, LR: 5.042715621470748e-06, Loss: 608.3287353515625
2024-08-04T05:33:39.672627976Z 
 76%|███████▌  | 7177/9500 [24:36:09<7:55:32, 12.28s/it]08/03/2024 22:33:39 - INFO - __main__ -   Step: 7177, LR: 5.040545077783468e-06, Loss: 498.65234375
2024-08-04T05:33:51.770546032Z 
 76%|███████▌  | 7178/9500 [24:36:21<7:53:11, 12.23s/it]08/03/2024 22:33:51 - INFO - __main__ -   Step: 7178, LR: 5.03837453409619e-06, Loss: 469.7755126953125
2024-08-04T05:34:03.799469655Z 
 76%|███████▌  | 7179/9500 [24:36:33<7:50:41, 12.17s/it]08/03/2024 22:34:03 - INFO - __main__ -   Step: 7179, LR: 5.03620399040891e-06, Loss: 354.0854187011719
2024-08-04T05:34:16.197889234Z 
 76%|███████▌  | 7180/9500 [24:36:46<7:53:09, 12.24s/it]08/03/2024 22:34:16 - INFO - __main__ -   Step: 7180, LR: 5.034033446721631e-06, Loss: 357.1189880371094
2024-08-04T05:34:28.294459216Z 
 76%|███████▌  | 7181/9500 [24:36:58<7:51:19, 12.19s/it]08/03/2024 22:34:28 - INFO - __main__ -   Step: 7181, LR: 5.031862903034352e-06, Loss: 385.10467529296875
2024-08-04T05:34:40.761569859Z 
 76%|███████▌  | 7182/9500 [24:37:10<7:54:17, 12.28s/it]08/03/2024 22:34:40 - INFO - __main__ -   Step: 7182, LR: 5.029692359347074e-06, Loss: 406.46649169921875
2024-08-04T05:34:53.245078787Z 
 76%|███████▌  | 7183/9500 [24:37:23<7:56:28, 12.34s/it]08/03/2024 22:34:53 - INFO - __main__ -   Step: 7183, LR: 5.027521815659795e-06, Loss: 271.20965576171875
2024-08-04T05:35:05.284407573Z 
 76%|███████▌  | 7184/9500 [24:37:35<7:52:48, 12.25s/it]08/03/2024 22:35:05 - INFO - __main__ -   Step: 7184, LR: 5.025351271972516e-06, Loss: 430.8453063964844
2024-08-04T05:35:17.671864734Z 
 76%|███████▌  | 7185/9500 [24:37:47<7:54:12, 12.29s/it]08/03/2024 22:35:17 - INFO - __main__ -   Step: 7185, LR: 5.0231807282852375e-06, Loss: 465.345703125
2024-08-04T05:35:30.171422311Z 
 76%|███████▌  | 7186/9500 [24:38:00<7:56:25, 12.35s/it]08/03/2024 22:35:30 - INFO - __main__ -   Step: 7186, LR: 5.021010184597957e-06, Loss: 368.46624755859375
2024-08-04T05:35:42.435226733Z 
 76%|███████▌  | 7187/9500 [24:38:12<7:55:10, 12.33s/it]08/03/2024 22:35:42 - INFO - __main__ -   Step: 7187, LR: 5.018839640910679e-06, Loss: 336.48046875
2024-08-04T05:35:54.923171055Z 
 76%|███████▌  | 7188/9500 [24:38:24<7:56:50, 12.37s/it]08/03/2024 22:35:54 - INFO - __main__ -   Step: 7188, LR: 5.0166690972234e-06, Loss: 412.995361328125
2024-08-04T05:36:07.256377029Z 
 76%|███████▌  | 7189/9500 [24:38:37<7:56:09, 12.36s/it]08/03/2024 22:36:07 - INFO - __main__ -   Step: 7189, LR: 5.014498553536121e-06, Loss: 407.8650817871094
2024-08-04T05:36:19.424491894Z 
 76%|███████▌  | 7190/9500 [24:38:49<7:53:42, 12.30s/it]08/03/2024 22:36:19 - INFO - __main__ -   Step: 7190, LR: 5.012328009848843e-06, Loss: 529.6060180664062
2024-08-04T05:36:31.612406952Z 
 76%|███████▌  | 7191/9500 [24:39:01<7:52:09, 12.27s/it]08/03/2024 22:36:31 - INFO - __main__ -   Step: 7191, LR: 5.0101574661615635e-06, Loss: 522.516845703125
2024-08-04T05:36:44.230182940Z 
 76%|███████▌  | 7192/9500 [24:39:14<7:55:58, 12.37s/it]08/03/2024 22:36:44 - INFO - __main__ -   Step: 7192, LR: 5.007986922474285e-06, Loss: 622.6588134765625
2024-08-04T05:36:56.634810793Z 
 76%|███████▌  | 7193/9500 [24:39:26<7:56:07, 12.38s/it]08/03/2024 22:36:56 - INFO - __main__ -   Step: 7193, LR: 5.005816378787005e-06, Loss: 424.45513916015625
2024-08-04T05:37:08.818383643Z 
 76%|███████▌  | 7194/9500 [24:39:38<7:53:37, 12.32s/it]08/03/2024 22:37:08 - INFO - __main__ -   Step: 7194, LR: 5.0036458350997265e-06, Loss: 326.29705810546875
2024-08-04T05:37:21.148836856Z 
 76%|███████▌  | 7195/9500 [24:39:51<7:53:30, 12.33s/it]08/03/2024 22:37:21 - INFO - __main__ -   Step: 7195, LR: 5.001475291412447e-06, Loss: 436.466064453125
2024-08-04T05:37:33.319461151Z 
 76%|███████▌  | 7196/9500 [24:40:03<7:51:30, 12.28s/it]08/03/2024 22:37:33 - INFO - __main__ -   Step: 7196, LR: 4.999304747725169e-06, Loss: 355.62286376953125
2024-08-04T05:37:45.438700238Z 
 76%|███████▌  | 7197/9500 [24:40:15<7:49:27, 12.23s/it]08/03/2024 22:37:45 - INFO - __main__ -   Step: 7197, LR: 4.99713420403789e-06, Loss: 365.3177185058594
2024-08-04T05:37:57.800682296Z 
 76%|███████▌  | 7198/9500 [24:40:27<7:50:46, 12.27s/it]08/03/2024 22:37:57 - INFO - __main__ -   Step: 7198, LR: 4.994963660350611e-06, Loss: 493.50164794921875
2024-08-04T05:38:09.851785900Z 
 76%|███████▌  | 7199/9500 [24:40:39<7:48:02, 12.20s/it]08/03/2024 22:38:09 - INFO - __main__ -   Step: 7199, LR: 4.992793116663332e-06, Loss: 320.976806640625
2024-08-04T05:38:22.112933283Z 
 76%|███████▌  | 7200/9500 [24:40:52<7:48:29, 12.22s/it]08/03/2024 22:38:22 - INFO - __main__ -   Step: 7200, LR: 4.990622572976053e-06, Loss: 332.23028564453125
2024-08-04T05:38:34.412304757Z 
 76%|███████▌  | 7201/9500 [24:41:04<7:49:10, 12.24s/it]08/03/2024 22:38:34 - INFO - __main__ -   Step: 7201, LR: 4.988452029288775e-06, Loss: 339.7655029296875
2024-08-04T05:38:46.425216106Z 
 76%|███████▌  | 7202/9500 [24:41:16<7:46:18, 12.18s/it]08/03/2024 22:38:46 - INFO - __main__ -   Step: 7202, LR: 4.9862814856014955e-06, Loss: 361.104736328125
2024-08-04T05:38:58.750974592Z 
 76%|███████▌  | 7203/9500 [24:41:28<7:47:50, 12.22s/it]08/03/2024 22:38:58 - INFO - __main__ -   Step: 7203, LR: 4.984110941914216e-06, Loss: 469.8773498535156
2024-08-04T05:39:11.318697224Z 
 76%|███████▌  | 7204/9500 [24:41:41<7:51:37, 12.32s/it]08/03/2024 22:39:11 - INFO - __main__ -   Step: 7204, LR: 4.981940398226938e-06, Loss: 451.9773864746094
2024-08-04T05:39:23.482141715Z 
 76%|███████▌  | 7205/9500 [24:41:53<7:49:34, 12.28s/it]08/03/2024 22:39:23 - INFO - __main__ -   Step: 7205, LR: 4.9797698545396585e-06, Loss: 412.255126953125
2024-08-04T05:39:35.714517992Z 
 76%|███████▌  | 7206/9500 [24:42:05<7:48:51, 12.26s/it]08/03/2024 22:39:35 - INFO - __main__ -   Step: 7206, LR: 4.977599310852379e-06, Loss: 437.6783752441406
2024-08-04T05:39:48.322589423Z 
 76%|███████▌  | 7207/9500 [24:42:18<7:52:36, 12.37s/it]08/03/2024 22:39:48 - INFO - __main__ -   Step: 7207, LR: 4.975428767165101e-06, Loss: 427.59112548828125
2024-08-04T05:40:00.346707757Z 
 76%|███████▌  | 7208/9500 [24:42:30<7:48:28, 12.26s/it]08/03/2024 22:40:00 - INFO - __main__ -   Step: 7208, LR: 4.973258223477822e-06, Loss: 384.914306640625
2024-08-04T05:40:12.577919003Z 
 76%|███████▌  | 7209/9500 [24:42:42<7:47:54, 12.25s/it]08/03/2024 22:40:12 - INFO - __main__ -   Step: 7209, LR: 4.971087679790543e-06, Loss: 336.2281799316406
2024-08-04T05:40:25.515148917Z 
 76%|███████▌  | 7210/9500 [24:42:55<7:55:31, 12.46s/it]08/03/2024 22:40:25 - INFO - __main__ -   Step: 7210, LR: 4.968917136103264e-06, Loss: 348.9320983886719
2024-08-04T05:40:37.645996329Z 
 76%|███████▌  | 7211/9500 [24:43:07<7:51:33, 12.36s/it]08/03/2024 22:40:37 - INFO - __main__ -   Step: 7211, LR: 4.966746592415985e-06, Loss: 360.5393371582031
2024-08-04T05:40:49.899119078Z 
 76%|███████▌  | 7212/9500 [24:43:19<7:50:06, 12.33s/it]08/03/2024 22:40:49 - INFO - __main__ -   Step: 7212, LR: 4.964576048728706e-06, Loss: 350.6313171386719
2024-08-04T05:41:02.978611397Z 
 76%|███████▌  | 7213/9500 [24:43:32<7:58:30, 12.55s/it]08/03/2024 22:41:02 - INFO - __main__ -   Step: 7213, LR: 4.962405505041427e-06, Loss: 495.6376953125
2024-08-04T05:41:14.896452104Z 
 76%|███████▌  | 7214/9500 [24:43:44<7:51:01, 12.36s/it]08/03/2024 22:41:14 - INFO - __main__ -   Step: 7214, LR: 4.960234961354148e-06, Loss: 288.15118408203125
2024-08-04T05:41:27.132089167Z 
 76%|███████▌  | 7215/9500 [24:43:57<7:49:22, 12.32s/it]08/03/2024 22:41:27 - INFO - __main__ -   Step: 7215, LR: 4.95806441766687e-06, Loss: 336.502685546875
2024-08-04T05:41:39.476435885Z 
 76%|███████▌  | 7216/9500 [24:44:09<7:49:23, 12.33s/it]08/03/2024 22:41:39 - INFO - __main__ -   Step: 7216, LR: 4.955893873979591e-06, Loss: 415.7066650390625
2024-08-04T05:41:51.774496583Z 
 76%|███████▌  | 7217/9500 [24:44:21<7:48:48, 12.32s/it]08/03/2024 22:41:51 - INFO - __main__ -   Step: 7217, LR: 4.953723330292311e-06, Loss: 433.38037109375
2024-08-04T05:42:04.077130691Z 
 76%|███████▌  | 7218/9500 [24:44:34<7:48:23, 12.32s/it]08/03/2024 22:42:04 - INFO - __main__ -   Step: 7218, LR: 4.951552786605033e-06, Loss: 389.5965270996094
2024-08-04T05:42:16.113242159Z 
 76%|███████▌  | 7219/9500 [24:44:46<7:45:00, 12.23s/it]08/03/2024 22:42:16 - INFO - __main__ -   Step: 7219, LR: 4.949382242917754e-06, Loss: 378.5362548828125
2024-08-04T05:42:28.947327322Z 
 76%|███████▌  | 7220/9500 [24:44:58<7:51:40, 12.41s/it]08/03/2024 22:42:28 - INFO - __main__ -   Step: 7220, LR: 4.947211699230475e-06, Loss: 434.66448974609375
2024-08-04T05:42:41.354843568Z 
 76%|███████▌  | 7221/9500 [24:45:11<7:51:24, 12.41s/it]08/03/2024 22:42:41 - INFO - __main__ -   Step: 7221, LR: 4.945041155543196e-06, Loss: 358.08343505859375
2024-08-04T05:42:53.817110616Z 
 76%|███████▌  | 7222/9500 [24:45:23<7:51:47, 12.43s/it]08/03/2024 22:42:53 - INFO - __main__ -   Step: 7222, LR: 4.942870611855917e-06, Loss: 504.9726257324219
2024-08-04T05:43:06.664302353Z 
 76%|███████▌  | 7223/9500 [24:45:36<7:56:22, 12.55s/it]08/03/2024 22:43:06 - INFO - __main__ -   Step: 7223, LR: 4.940700068168638e-06, Loss: 561.1796875
2024-08-04T05:43:19.036210050Z 
 76%|███████▌  | 7224/9500 [24:45:48<7:54:06, 12.50s/it]08/03/2024 22:43:19 - INFO - __main__ -   Step: 7224, LR: 4.938529524481359e-06, Loss: 468.26446533203125
2024-08-04T05:43:31.210975475Z 
 76%|███████▌  | 7225/9500 [24:46:01<7:50:12, 12.40s/it]08/03/2024 22:43:31 - INFO - __main__ -   Step: 7225, LR: 4.93635898079408e-06, Loss: 400.7958984375
2024-08-04T05:43:43.550313625Z 
 76%|███████▌  | 7226/9500 [24:46:13<7:49:18, 12.38s/it]08/03/2024 22:43:43 - INFO - __main__ -   Step: 7226, LR: 4.934188437106801e-06, Loss: 357.7595520019531
2024-08-04T05:43:55.514965272Z 
 76%|███████▌  | 7227/9500 [24:46:25<7:44:20, 12.26s/it]08/03/2024 22:43:55 - INFO - __main__ -   Step: 7227, LR: 4.932017893419523e-06, Loss: 437.80657958984375
2024-08-04T05:44:07.775880193Z 
 76%|███████▌  | 7228/9500 [24:46:37<7:44:10, 12.26s/it]08/03/2024 22:44:07 - INFO - __main__ -   Step: 7228, LR: 4.929847349732243e-06, Loss: 512.5781860351562
2024-08-04T05:44:20.316003542Z 
 76%|███████▌  | 7229/9500 [24:46:50<7:47:10, 12.34s/it]08/03/2024 22:44:20 - INFO - __main__ -   Step: 7229, LR: 4.927676806044965e-06, Loss: 440.920166015625
2024-08-04T05:44:32.665205727Z 
 76%|███████▌  | 7230/9500 [24:47:02<7:47:02, 12.34s/it]08/03/2024 22:44:32 - INFO - __main__ -   Step: 7230, LR: 4.925506262357686e-06, Loss: 346.7000732421875
2024-08-04T05:44:44.877372150Z 
 76%|███████▌  | 7231/9500 [24:47:14<7:45:20, 12.31s/it]08/03/2024 22:44:44 - INFO - __main__ -   Step: 7231, LR: 4.923335718670406e-06, Loss: 380.7373962402344
2024-08-04T05:44:57.651671472Z 
 76%|███████▌  | 7232/9500 [24:47:27<7:50:27, 12.45s/it]08/03/2024 22:44:57 - INFO - __main__ -   Step: 7232, LR: 4.921165174983128e-06, Loss: 449.1994323730469
2024-08-04T05:45:09.857981576Z 
 76%|███████▌  | 7233/9500 [24:47:39<7:47:31, 12.37s/it]08/03/2024 22:45:09 - INFO - __main__ -   Step: 7233, LR: 4.918994631295849e-06, Loss: 334.6684265136719
2024-08-04T05:45:22.176811339Z 
 76%|███████▌  | 7234/9500 [24:47:52<7:46:41, 12.36s/it]08/03/2024 22:45:22 - INFO - __main__ -   Step: 7234, LR: 4.91682408760857e-06, Loss: 268.61517333984375
2024-08-04T05:45:34.776368958Z 
 76%|███████▌  | 7235/9500 [24:48:04<7:49:14, 12.43s/it]08/03/2024 22:45:34 - INFO - __main__ -   Step: 7235, LR: 4.914653543921291e-06, Loss: 516.8831787109375
2024-08-04T05:45:47.097671508Z 
 76%|███████▌  | 7236/9500 [24:48:17<7:47:47, 12.40s/it]08/03/2024 22:45:47 - INFO - __main__ -   Step: 7236, LR: 4.9124830002340125e-06, Loss: 422.21051025390625
2024-08-04T05:45:59.524767655Z 
 76%|███████▌  | 7237/9500 [24:48:29<7:47:55, 12.41s/it]08/03/2024 22:45:59 - INFO - __main__ -   Step: 7237, LR: 4.910312456546733e-06, Loss: 473.5536193847656
2024-08-04T05:46:12.585427547Z 
 76%|███████▌  | 7238/9500 [24:48:42<7:55:07, 12.60s/it]08/03/2024 22:46:12 - INFO - __main__ -   Step: 7238, LR: 4.908141912859454e-06, Loss: 363.9444885253906
2024-08-04T05:46:24.961010723Z 
 76%|███████▌  | 7239/9500 [24:48:54<7:52:20, 12.53s/it]08/03/2024 22:46:24 - INFO - __main__ -   Step: 7239, LR: 4.9059713691721755e-06, Loss: 408.6473388671875
2024-08-04T05:46:36.935846760Z 
 76%|███████▌  | 7240/9500 [24:49:06<7:45:48, 12.37s/it]08/03/2024 22:46:36 - INFO - __main__ -   Step: 7240, LR: 4.903800825484896e-06, Loss: 358.8369445800781
2024-08-04T05:46:49.271057469Z 
 76%|███████▌  | 7241/9500 [24:49:19<7:45:14, 12.36s/it]08/03/2024 22:46:49 - INFO - __main__ -   Step: 7241, LR: 4.901630281797618e-06, Loss: 374.44964599609375
2024-08-04T05:47:01.312958534Z 
 76%|███████▌  | 7242/9500 [24:49:31<7:41:28, 12.26s/it]08/03/2024 22:47:01 - INFO - __main__ -   Step: 7242, LR: 4.8994597381103384e-06, Loss: 426.4764404296875
2024-08-04T05:47:13.454818979Z 
 76%|███████▌  | 7243/9500 [24:49:43<7:39:54, 12.23s/it]08/03/2024 22:47:13 - INFO - __main__ -   Step: 7243, LR: 4.89728919442306e-06, Loss: 475.65838623046875
2024-08-04T05:47:25.895576330Z 
 76%|███████▋  | 7244/9500 [24:49:55<7:42:07, 12.29s/it]08/03/2024 22:47:25 - INFO - __main__ -   Step: 7244, LR: 4.895118650735781e-06, Loss: 409.777587890625
2024-08-04T05:47:37.943250883Z 
 76%|███████▋  | 7245/9500 [24:50:07<7:39:11, 12.22s/it]08/03/2024 22:47:37 - INFO - __main__ -   Step: 7245, LR: 4.892948107048502e-06, Loss: 395.496337890625
2024-08-04T05:47:50.060352350Z 
 76%|███████▋  | 7246/9500 [24:50:19<7:37:51, 12.19s/it]08/03/2024 22:47:50 - INFO - __main__ -   Step: 7246, LR: 4.890777563361223e-06, Loss: 466.71484375
2024-08-04T05:48:02.556495302Z 
 76%|███████▋  | 7247/9500 [24:50:32<7:41:07, 12.28s/it]08/03/2024 22:48:02 - INFO - __main__ -   Step: 7247, LR: 4.888607019673944e-06, Loss: 334.3070373535156
2024-08-04T05:48:14.626672434Z 
 76%|███████▋  | 7248/9500 [24:50:44<7:38:32, 12.22s/it]08/03/2024 22:48:14 - INFO - __main__ -   Step: 7248, LR: 4.886436475986665e-06, Loss: 404.53948974609375
2024-08-04T05:48:27.107926040Z 
 76%|███████▋  | 7249/9500 [24:50:57<7:41:19, 12.30s/it]08/03/2024 22:48:27 - INFO - __main__ -   Step: 7249, LR: 4.884265932299386e-06, Loss: 411.47088623046875
2024-08-04T05:48:39.888020938Z 
 76%|███████▋  | 7250/9500 [24:51:09<7:46:33, 12.44s/it]08/03/2024 22:48:39 - INFO - __main__ -   Step: 7250, LR: 4.8820953886121075e-06, Loss: 378.5191650390625
2024-08-04T05:48:52.067547454Z 
 76%|███████▋  | 7251/9500 [24:51:22<7:43:24, 12.36s/it]08/03/2024 22:48:52 - INFO - __main__ -   Step: 7251, LR: 4.879924844924828e-06, Loss: 369.302490234375
2024-08-04T05:49:04.179805751Z 
 76%|███████▋  | 7252/9500 [24:51:34<7:40:22, 12.29s/it]08/03/2024 22:49:04 - INFO - __main__ -   Step: 7252, LR: 4.87775430123755e-06, Loss: 409.29986572265625
2024-08-04T05:49:16.578025450Z 
 76%|███████▋  | 7253/9500 [24:51:46<7:41:24, 12.32s/it]08/03/2024 22:49:16 - INFO - __main__ -   Step: 7253, LR: 4.8755837575502705e-06, Loss: 551.3333740234375
2024-08-04T05:49:28.708339646Z 
 76%|███████▋  | 7254/9500 [24:51:58<7:39:04, 12.26s/it]08/03/2024 22:49:28 - INFO - __main__ -   Step: 7254, LR: 4.873413213862991e-06, Loss: 454.180419921875
2024-08-04T05:49:41.025810074Z 
 76%|███████▋  | 7255/9500 [24:52:10<7:39:28, 12.28s/it]08/03/2024 22:49:41 - INFO - __main__ -   Step: 7255, LR: 4.871242670175713e-06, Loss: 347.95269775390625
2024-08-04T05:49:53.791257513Z 
 76%|███████▋  | 7256/9500 [24:52:23<7:44:42, 12.43s/it]08/03/2024 22:49:53 - INFO - __main__ -   Step: 7256, LR: 4.8690721264884335e-06, Loss: 439.27191162109375
2024-08-04T05:50:06.026517738Z 
 76%|███████▋  | 7257/9500 [24:52:35<7:42:22, 12.37s/it]08/03/2024 22:50:06 - INFO - __main__ -   Step: 7257, LR: 4.866901582801155e-06, Loss: 323.35455322265625
2024-08-04T05:50:18.333497870Z 
 76%|███████▋  | 7258/9500 [24:52:48<7:41:28, 12.35s/it]08/03/2024 22:50:18 - INFO - __main__ -   Step: 7258, LR: 4.864731039113876e-06, Loss: 279.60748291015625
2024-08-04T05:50:30.753859262Z 
 76%|███████▋  | 7259/9500 [24:53:00<7:42:03, 12.37s/it]08/03/2024 22:50:30 - INFO - __main__ -   Step: 7259, LR: 4.862560495426597e-06, Loss: 401.24029541015625
2024-08-04T05:50:43.089627858Z 
 76%|███████▋  | 7260/9500 [24:53:13<7:41:27, 12.36s/it]08/03/2024 22:50:43 - INFO - __main__ -   Step: 7260, LR: 4.860389951739318e-06, Loss: 478.22601318359375
2024-08-04T05:50:55.423018837Z 
 76%|███████▋  | 7261/9500 [24:53:25<7:40:57, 12.35s/it]08/03/2024 22:50:55 - INFO - __main__ -   Step: 7261, LR: 4.858219408052039e-06, Loss: 431.4117736816406
2024-08-04T05:51:07.544959750Z 
 76%|███████▋  | 7262/9500 [24:53:37<7:38:09, 12.28s/it]08/03/2024 22:51:07 - INFO - __main__ -   Step: 7262, LR: 4.85604886436476e-06, Loss: 328.04388427734375
2024-08-04T05:51:19.863330423Z 
 76%|███████▋  | 7263/9500 [24:53:49<7:38:21, 12.29s/it]08/03/2024 22:51:19 - INFO - __main__ -   Step: 7263, LR: 4.853878320677481e-06, Loss: 344.74261474609375
2024-08-04T05:51:31.951466797Z 
 76%|███████▋  | 7264/9500 [24:54:01<7:35:50, 12.23s/it]08/03/2024 22:51:31 - INFO - __main__ -   Step: 7264, LR: 4.851707776990203e-06, Loss: 409.40447998046875
2024-08-04T05:51:44.274693703Z 
 76%|███████▋  | 7265/9500 [24:54:14<7:36:39, 12.26s/it]08/03/2024 22:51:44 - INFO - __main__ -   Step: 7265, LR: 4.849537233302923e-06, Loss: 403.75616455078125
2024-08-04T05:51:56.747459158Z 
 76%|███████▋  | 7266/9500 [24:54:26<7:38:50, 12.32s/it]08/03/2024 22:51:56 - INFO - __main__ -   Step: 7266, LR: 4.847366689615645e-06, Loss: 348.38897705078125
2024-08-04T05:52:08.824869435Z 
 76%|███████▋  | 7267/9500 [24:54:38<7:35:53, 12.25s/it]08/03/2024 22:52:08 - INFO - __main__ -   Step: 7267, LR: 4.8451961459283656e-06, Loss: 284.40972900390625
2024-08-04T05:52:21.134888141Z 
 77%|███████▋  | 7268/9500 [24:54:51<7:36:21, 12.27s/it]08/03/2024 22:52:21 - INFO - __main__ -   Step: 7268, LR: 4.843025602241086e-06, Loss: 472.9999694824219
2024-08-04T05:52:34.017522283Z 
 77%|███████▋  | 7269/9500 [24:55:03<7:43:00, 12.45s/it]08/03/2024 22:52:34 - INFO - __main__ -   Step: 7269, LR: 4.840855058553808e-06, Loss: 427.41876220703125
2024-08-04T05:52:46.012040204Z 
 77%|███████▋  | 7270/9500 [24:55:15<7:37:42, 12.31s/it]08/03/2024 22:52:46 - INFO - __main__ -   Step: 7270, LR: 4.838684514866529e-06, Loss: 279.0242919921875
2024-08-04T05:52:58.142314671Z 
 77%|███████▋  | 7271/9500 [24:55:28<7:35:26, 12.26s/it]08/03/2024 22:52:58 - INFO - __main__ -   Step: 7271, LR: 4.83651397117925e-06, Loss: 379.40435791015625
2024-08-04T05:53:10.890101838Z 
 77%|███████▋  | 7272/9500 [24:55:40<7:40:40, 12.41s/it]08/03/2024 22:53:10 - INFO - __main__ -   Step: 7272, LR: 4.834343427491971e-06, Loss: 396.32696533203125
2024-08-04T05:53:22.774468558Z 
 77%|███████▋  | 7273/9500 [24:55:52<7:34:39, 12.25s/it]08/03/2024 22:53:22 - INFO - __main__ -   Step: 7273, LR: 4.832172883804692e-06, Loss: 359.65203857421875
2024-08-04T05:53:34.780524399Z 
 77%|███████▋  | 7274/9500 [24:56:04<7:31:44, 12.18s/it]08/03/2024 22:53:34 - INFO - __main__ -   Step: 7274, LR: 4.830002340117413e-06, Loss: 329.89666748046875
2024-08-04T05:53:47.521221680Z 
 77%|███████▋  | 7275/9500 [24:56:17<7:37:48, 12.35s/it]08/03/2024 22:53:47 - INFO - __main__ -   Step: 7275, LR: 4.827831796430134e-06, Loss: 421.86907958984375
2024-08-04T05:54:00.103800161Z 
 77%|███████▋  | 7276/9500 [24:56:30<7:40:15, 12.42s/it]08/03/2024 22:54:00 - INFO - __main__ -   Step: 7276, LR: 4.825661252742855e-06, Loss: 448.9765625
2024-08-04T05:54:12.227649209Z 
 77%|███████▋  | 7277/9500 [24:56:42<7:36:47, 12.33s/it]08/03/2024 22:54:12 - INFO - __main__ -   Step: 7277, LR: 4.823490709055577e-06, Loss: 354.2207946777344
2024-08-04T05:54:25.036736006Z 
 77%|███████▋  | 7278/9500 [24:56:54<7:41:55, 12.47s/it]08/03/2024 22:54:25 - INFO - __main__ -   Step: 7278, LR: 4.821320165368298e-06, Loss: 440.61285400390625
2024-08-04T05:54:37.236826349Z 
 77%|███████▋  | 7279/9500 [24:57:07<7:38:40, 12.39s/it]08/03/2024 22:54:37 - INFO - __main__ -   Step: 7279, LR: 4.819149621681018e-06, Loss: 423.1708679199219
2024-08-04T05:54:49.343611530Z 
 77%|███████▋  | 7280/9500 [24:57:19<7:35:18, 12.31s/it]08/03/2024 22:54:49 - INFO - __main__ -   Step: 7280, LR: 4.81697907799374e-06, Loss: 383.74365234375
2024-08-04T05:55:02.375491834Z 
 77%|███████▋  | 7281/9500 [24:57:32<7:43:09, 12.52s/it]08/03/2024 22:55:02 - INFO - __main__ -   Step: 7281, LR: 4.814808534306461e-06, Loss: 433.6674499511719
2024-08-04T05:55:14.473535809Z 
 77%|███████▋  | 7282/9500 [24:57:44<7:38:13, 12.40s/it]08/03/2024 22:55:14 - INFO - __main__ -   Step: 7282, LR: 4.812637990619181e-06, Loss: 382.4657287597656
2024-08-04T05:55:27.116145282Z 
 77%|███████▋  | 7283/9500 [24:57:57<7:40:46, 12.47s/it]08/03/2024 22:55:27 - INFO - __main__ -   Step: 7283, LR: 4.810467446931903e-06, Loss: 432.5268249511719
2024-08-04T05:55:39.662536787Z 
 77%|███████▋  | 7284/9500 [24:58:09<7:41:24, 12.49s/it]08/03/2024 22:55:39 - INFO - __main__ -   Step: 7284, LR: 4.8082969032446245e-06, Loss: 395.17340087890625
2024-08-04T05:55:51.974939395Z 
 77%|███████▋  | 7285/9500 [24:58:21<7:39:11, 12.44s/it]08/03/2024 22:55:51 - INFO - __main__ -   Step: 7285, LR: 4.806126359557345e-06, Loss: 449.9422302246094
2024-08-04T05:56:04.220839727Z 
 77%|███████▋  | 7286/9500 [24:58:34<7:36:51, 12.38s/it]08/03/2024 22:56:04 - INFO - __main__ -   Step: 7286, LR: 4.803955815870066e-06, Loss: 442.9217224121094
2024-08-04T05:56:16.920575186Z 
 77%|███████▋  | 7287/9500 [24:58:46<7:40:10, 12.48s/it]08/03/2024 22:56:16 - INFO - __main__ -   Step: 7287, LR: 4.8017852721827874e-06, Loss: 374.62890625
2024-08-04T05:56:29.103913659Z 
 77%|███████▋  | 7288/9500 [24:58:59<7:36:43, 12.39s/it]08/03/2024 22:56:29 - INFO - __main__ -   Step: 7288, LR: 4.799614728495508e-06, Loss: 456.2424621582031
2024-08-04T05:56:41.329628712Z 
 77%|███████▋  | 7289/9500 [24:59:11<7:34:42, 12.34s/it]08/03/2024 22:56:41 - INFO - __main__ -   Step: 7289, LR: 4.797444184808229e-06, Loss: 328.2801208496094
2024-08-04T05:56:54.254131511Z 
 77%|███████▋  | 7290/9500 [24:59:24<7:40:58, 12.52s/it]08/03/2024 22:56:54 - INFO - __main__ -   Step: 7290, LR: 4.7952736411209504e-06, Loss: 452.05029296875
2024-08-04T05:57:06.337367990Z 
 77%|███████▋  | 7291/9500 [24:59:36<7:35:59, 12.39s/it]08/03/2024 22:57:06 - INFO - __main__ -   Step: 7291, LR: 4.793103097433672e-06, Loss: 430.26251220703125
2024-08-04T05:57:18.550976636Z 
 77%|███████▋  | 7292/9500 [24:59:48<7:33:53, 12.33s/it]08/03/2024 22:57:18 - INFO - __main__ -   Step: 7292, LR: 4.790932553746393e-06, Loss: 458.7073669433594
2024-08-04T05:57:31.258351027Z 
 77%|███████▋  | 7293/9500 [25:00:01<7:37:48, 12.45s/it]08/03/2024 22:57:31 - INFO - __main__ -   Step: 7293, LR: 4.788762010059113e-06, Loss: 451.13446044921875
2024-08-04T05:57:43.288997256Z 
 77%|███████▋  | 7294/9500 [25:00:13<7:33:01, 12.32s/it]08/03/2024 22:57:43 - INFO - __main__ -   Step: 7294, LR: 4.786591466371835e-06, Loss: 390.37652587890625
2024-08-04T05:57:55.381648280Z 
 77%|███████▋  | 7295/9500 [25:00:25<7:30:17, 12.25s/it]08/03/2024 22:57:55 - INFO - __main__ -   Step: 7295, LR: 4.7844209226845565e-06, Loss: 395.4827880859375
2024-08-04T05:58:07.973535389Z 
 77%|███████▋  | 7296/9500 [25:00:37<7:33:49, 12.35s/it]08/03/2024 22:58:07 - INFO - __main__ -   Step: 7296, LR: 4.782250378997277e-06, Loss: 391.114990234375
2024-08-04T05:58:20.180197160Z 
 77%|███████▋  | 7297/9500 [25:00:50<7:31:59, 12.31s/it]08/03/2024 22:58:20 - INFO - __main__ -   Step: 7297, LR: 4.780079835309998e-06, Loss: 434.2251892089844
2024-08-04T05:58:32.305772520Z 
 77%|███████▋  | 7298/9500 [25:01:02<7:29:45, 12.25s/it]08/03/2024 22:58:32 - INFO - __main__ -   Step: 7298, LR: 4.7779092916227195e-06, Loss: 432.8206481933594
2024-08-04T05:58:44.738057615Z 
 77%|███████▋  | 7299/9500 [25:01:14<7:31:30, 12.31s/it]08/03/2024 22:58:44 - INFO - __main__ -   Step: 7299, LR: 4.77573874793544e-06, Loss: 397.18310546875
2024-08-04T05:58:56.740384862Z 
 77%|███████▋  | 7300/9500 [25:01:26<7:27:55, 12.22s/it]08/03/2024 22:58:56 - INFO - __main__ -   Step: 7300, LR: 4.773568204248161e-06, Loss: 411.10687255859375
2024-08-04T05:59:09.063264480Z 
 77%|███████▋  | 7301/9500 [25:01:39<7:28:54, 12.25s/it]08/03/2024 22:59:09 - INFO - __main__ -   Step: 7301, LR: 4.7713976605608825e-06, Loss: 473.00775146484375
2024-08-04T05:59:21.726194588Z 
 77%|███████▋  | 7302/9500 [25:01:51<7:33:15, 12.37s/it]08/03/2024 22:59:21 - INFO - __main__ -   Step: 7302, LR: 4.769227116873604e-06, Loss: 412.8273620605469
2024-08-04T05:59:34.280271417Z 
 77%|███████▋  | 7303/9500 [25:02:04<7:35:02, 12.43s/it]08/03/2024 22:59:34 - INFO - __main__ -   Step: 7303, LR: 4.767056573186325e-06, Loss: 365.54150390625
2024-08-04T05:59:46.393808232Z 
 77%|███████▋  | 7304/9500 [25:02:16<7:31:23, 12.33s/it]08/03/2024 22:59:46 - INFO - __main__ -   Step: 7304, LR: 4.7648860294990455e-06, Loss: 319.720703125
2024-08-04T05:59:58.608521407Z 
 77%|███████▋  | 7305/9500 [25:02:28<7:29:52, 12.30s/it]08/03/2024 22:59:58 - INFO - __main__ -   Step: 7305, LR: 4.762715485811767e-06, Loss: 367.36810302734375
2024-08-04T06:00:11.206879951Z 
 77%|███████▋  | 7306/9500 [25:02:41<7:32:58, 12.39s/it]08/03/2024 23:00:11 - INFO - __main__ -   Step: 7306, LR: 4.760544942124488e-06, Loss: 377.52801513671875
2024-08-04T06:00:23.404376864Z 
 77%|███████▋  | 7307/9500 [25:02:53<7:30:41, 12.33s/it]08/03/2024 23:00:23 - INFO - __main__ -   Step: 7307, LR: 4.7583743984372085e-06, Loss: 413.4964294433594
2024-08-04T06:00:35.419795750Z 
 77%|███████▋  | 7308/9500 [25:03:05<7:27:01, 12.24s/it]08/03/2024 23:00:35 - INFO - __main__ -   Step: 7308, LR: 4.75620385474993e-06, Loss: 381.38861083984375
2024-08-04T06:00:47.957769205Z 
 77%|███████▋  | 7309/9500 [25:03:17<7:30:07, 12.33s/it]08/03/2024 23:00:47 - INFO - __main__ -   Step: 7309, LR: 4.754033311062652e-06, Loss: 410.51800537109375
2024-08-04T06:01:00.013901113Z 
 77%|███████▋  | 7310/9500 [25:03:29<7:26:57, 12.25s/it]08/03/2024 23:01:00 - INFO - __main__ -   Step: 7310, LR: 4.751862767375372e-06, Loss: 387.82568359375
2024-08-04T06:01:12.443982166Z 
 77%|███████▋  | 7311/9500 [25:03:42<7:28:46, 12.30s/it]08/03/2024 23:01:12 - INFO - __main__ -   Step: 7311, LR: 4.749692223688093e-06, Loss: 381.98553466796875
2024-08-04T06:01:24.955054844Z 
 77%|███████▋  | 7312/9500 [25:03:54<7:30:52, 12.36s/it]08/03/2024 23:01:24 - INFO - __main__ -   Step: 7312, LR: 4.747521680000815e-06, Loss: 460.1242980957031
2024-08-04T06:01:36.980560621Z 
 77%|███████▋  | 7313/9500 [25:04:06<7:26:58, 12.26s/it]08/03/2024 23:01:36 - INFO - __main__ -   Step: 7313, LR: 4.745351136313535e-06, Loss: 438.1925048828125
2024-08-04T06:01:49.460531923Z 
 77%|███████▋  | 7314/9500 [25:04:19<7:29:08, 12.33s/it]08/03/2024 23:01:49 - INFO - __main__ -   Step: 7314, LR: 4.743180592626256e-06, Loss: 400.72540283203125
2024-08-04T06:02:01.784429187Z 
 77%|███████▋  | 7315/9500 [25:04:31<7:28:53, 12.33s/it]08/03/2024 23:02:01 - INFO - __main__ -   Step: 7315, LR: 4.7410100489389776e-06, Loss: 315.3593444824219
2024-08-04T06:02:14.116809433Z 
 77%|███████▋  | 7316/9500 [25:04:44<7:28:44, 12.33s/it]08/03/2024 23:02:14 - INFO - __main__ -   Step: 7316, LR: 4.738839505251699e-06, Loss: 404.8675537109375
2024-08-04T06:02:26.388550094Z 
 77%|███████▋  | 7317/9500 [25:04:56<7:27:55, 12.31s/it]08/03/2024 23:02:26 - INFO - __main__ -   Step: 7317, LR: 4.73666896156442e-06, Loss: 381.57098388671875
2024-08-04T06:02:39.112429232Z 
 77%|███████▋  | 7318/9500 [25:05:09<7:32:13, 12.44s/it]08/03/2024 23:02:39 - INFO - __main__ -   Step: 7318, LR: 4.7344984178771405e-06, Loss: 388.3005676269531
2024-08-04T06:02:51.308309027Z 
 77%|███████▋  | 7319/9500 [25:05:21<7:29:24, 12.36s/it]08/03/2024 23:02:51 - INFO - __main__ -   Step: 7319, LR: 4.732327874189862e-06, Loss: 480.5692443847656
2024-08-04T06:03:03.408494800Z 
 77%|███████▋  | 7320/9500 [25:05:33<7:26:19, 12.28s/it]08/03/2024 23:03:03 - INFO - __main__ -   Step: 7320, LR: 4.730157330502583e-06, Loss: 444.23492431640625
2024-08-04T06:03:15.982577601Z 
 77%|███████▋  | 7321/9500 [25:05:45<7:29:17, 12.37s/it]08/03/2024 23:03:15 - INFO - __main__ -   Step: 7321, LR: 4.727986786815304e-06, Loss: 387.8934020996094
2024-08-04T06:03:27.897351266Z 
 77%|███████▋  | 7322/9500 [25:05:57<7:24:06, 12.23s/it]08/03/2024 23:03:27 - INFO - __main__ -   Step: 7322, LR: 4.725816243128025e-06, Loss: 331.7923583984375
2024-08-04T06:03:39.883312913Z 
 77%|███████▋  | 7323/9500 [25:06:09<7:21:12, 12.16s/it]08/03/2024 23:03:39 - INFO - __main__ -   Step: 7323, LR: 4.723645699440747e-06, Loss: 407.97369384765625
2024-08-04T06:03:52.240371121Z 
 77%|███████▋  | 7324/9500 [25:06:22<7:23:08, 12.22s/it]08/03/2024 23:03:52 - INFO - __main__ -   Step: 7324, LR: 4.721475155753467e-06, Loss: 340.32391357421875
2024-08-04T06:04:04.741752339Z 
 77%|███████▋  | 7325/9500 [25:06:34<7:26:00, 12.30s/it]08/03/2024 23:04:04 - INFO - __main__ -   Step: 7325, LR: 4.719304612066188e-06, Loss: 449.599853515625
2024-08-04T06:04:17.437387252Z 
 77%|███████▋  | 7326/9500 [25:06:47<7:30:03, 12.42s/it]08/03/2024 23:04:17 - INFO - __main__ -   Step: 7326, LR: 4.71713406837891e-06, Loss: 353.6159973144531
2024-08-04T06:04:29.814110942Z 
 77%|███████▋  | 7327/9500 [25:06:59<7:29:22, 12.41s/it]08/03/2024 23:04:29 - INFO - __main__ -   Step: 7327, LR: 4.71496352469163e-06, Loss: 389.3601989746094
2024-08-04T06:04:41.742543067Z 
 77%|███████▋  | 7328/9500 [25:07:11<7:23:57, 12.26s/it]08/03/2024 23:04:41 - INFO - __main__ -   Step: 7328, LR: 4.712792981004352e-06, Loss: 386.9659729003906
2024-08-04T06:04:54.197952640Z 
 77%|███████▋  | 7329/9500 [25:07:24<7:25:49, 12.32s/it]08/03/2024 23:04:54 - INFO - __main__ -   Step: 7329, LR: 4.710622437317073e-06, Loss: 479.3002624511719
2024-08-04T06:05:06.772072704Z 
 77%|███████▋  | 7330/9500 [25:07:36<7:28:22, 12.40s/it]08/03/2024 23:05:06 - INFO - __main__ -   Step: 7330, LR: 4.708451893629794e-06, Loss: 383.6322326660156
2024-08-04T06:05:19.204617336Z 
 77%|███████▋  | 7331/9500 [25:07:49<7:28:32, 12.41s/it]08/03/2024 23:05:19 - INFO - __main__ -   Step: 7331, LR: 4.706281349942515e-06, Loss: 330.3087463378906
2024-08-04T06:05:31.224290157Z 
 77%|███████▋  | 7332/9500 [25:08:01<7:24:07, 12.29s/it]08/03/2024 23:05:31 - INFO - __main__ -   Step: 7332, LR: 4.704110806255236e-06, Loss: 345.31536865234375
2024-08-04T06:05:43.838218833Z 
 77%|███████▋  | 7333/9500 [25:08:13<7:27:25, 12.39s/it]08/03/2024 23:05:43 - INFO - __main__ -   Step: 7333, LR: 4.701940262567957e-06, Loss: 377.1876220703125
2024-08-04T06:05:55.850314041Z 
 77%|███████▋  | 7334/9500 [25:08:25<7:23:08, 12.28s/it]08/03/2024 23:05:55 - INFO - __main__ -   Step: 7334, LR: 4.699769718880678e-06, Loss: 380.9742431640625
2024-08-04T06:06:07.888900896Z 
 77%|███████▋  | 7335/9500 [25:08:37<7:20:21, 12.20s/it]08/03/2024 23:06:07 - INFO - __main__ -   Step: 7335, LR: 4.6975991751933994e-06, Loss: 486.7168884277344
2024-08-04T06:06:20.311988164Z 
 77%|███████▋  | 7336/9500 [25:08:50<7:22:32, 12.27s/it]08/03/2024 23:06:20 - INFO - __main__ -   Step: 7336, LR: 4.69542863150612e-06, Loss: 378.1289367675781
2024-08-04T06:06:32.591413822Z 
 77%|███████▋  | 7337/9500 [25:09:02<7:22:26, 12.27s/it]08/03/2024 23:06:32 - INFO - __main__ -   Step: 7337, LR: 4.693258087818842e-06, Loss: 496.21728515625
2024-08-04T06:06:45.266670057Z 
 77%|███████▋  | 7338/9500 [25:09:15<7:26:34, 12.39s/it]08/03/2024 23:06:45 - INFO - __main__ -   Step: 7338, LR: 4.691087544131562e-06, Loss: 410.7344970703125
2024-08-04T06:06:57.645168726Z 
 77%|███████▋  | 7339/9500 [25:09:27<7:26:12, 12.39s/it]08/03/2024 23:06:57 - INFO - __main__ -   Step: 7339, LR: 4.688917000444283e-06, Loss: 439.5823974609375
2024-08-04T06:07:09.717019968Z 
 77%|███████▋  | 7340/9500 [25:09:39<7:22:34, 12.29s/it]08/03/2024 23:07:09 - INFO - __main__ -   Step: 7340, LR: 4.686746456757005e-06, Loss: 296.72015380859375
2024-08-04T06:07:21.869841825Z 
 77%|███████▋  | 7341/9500 [25:09:51<7:20:51, 12.25s/it]08/03/2024 23:07:21 - INFO - __main__ -   Step: 7341, LR: 4.684575913069725e-06, Loss: 421.0434875488281
2024-08-04T06:07:34.576902585Z 
 77%|███████▋  | 7342/9500 [25:10:04<7:25:33, 12.39s/it]08/03/2024 23:07:34 - INFO - __main__ -   Step: 7342, LR: 4.682405369382447e-06, Loss: 380.531494140625
2024-08-04T06:07:46.696805414Z 
 77%|███████▋  | 7343/9500 [25:10:16<7:22:27, 12.31s/it]08/03/2024 23:07:46 - INFO - __main__ -   Step: 7343, LR: 4.680234825695168e-06, Loss: 357.28094482421875
2024-08-04T06:07:58.753716292Z 
 77%|███████▋  | 7344/9500 [25:10:28<7:19:33, 12.23s/it]08/03/2024 23:07:58 - INFO - __main__ -   Step: 7344, LR: 4.678064282007889e-06, Loss: 409.5367431640625
2024-08-04T06:08:11.276838583Z 
 77%|███████▋  | 7345/9500 [25:10:41<7:22:29, 12.32s/it]08/03/2024 23:08:11 - INFO - __main__ -   Step: 7345, LR: 4.67589373832061e-06, Loss: 496.818359375
2024-08-04T06:08:23.291350534Z 
 77%|███████▋  | 7346/9500 [25:10:53<7:18:59, 12.23s/it]08/03/2024 23:08:23 - INFO - __main__ -   Step: 7346, LR: 4.6737231946333315e-06, Loss: 388.0881652832031
2024-08-04T06:08:35.379986156Z 
 77%|███████▋  | 7347/9500 [25:11:05<7:17:17, 12.19s/it]08/03/2024 23:08:35 - INFO - __main__ -   Step: 7347, LR: 4.671552650946052e-06, Loss: 385.19525146484375
2024-08-04T06:08:48.081164114Z 
 77%|███████▋  | 7348/9500 [25:11:18<7:22:37, 12.34s/it]08/03/2024 23:08:48 - INFO - __main__ -   Step: 7348, LR: 4.669382107258773e-06, Loss: 398.0943908691406
2024-08-04T06:09:00.368390797Z 
 77%|███████▋  | 7349/9500 [25:11:30<7:21:50, 12.32s/it]08/03/2024 23:09:00 - INFO - __main__ -   Step: 7349, LR: 4.6672115635714945e-06, Loss: 286.2572021484375
2024-08-04T06:09:12.337511637Z 
 77%|███████▋  | 7350/9500 [25:11:42<7:17:48, 12.22s/it]08/03/2024 23:09:12 - INFO - __main__ -   Step: 7350, LR: 4.665041019884215e-06, Loss: 339.7789306640625
2024-08-04T06:09:24.727533856Z 
 77%|███████▋  | 7351/9500 [25:11:54<7:19:27, 12.27s/it]08/03/2024 23:09:24 - INFO - __main__ -   Step: 7351, LR: 4.662870476196937e-06, Loss: 421.6508483886719
2024-08-04T06:09:37.145373867Z 
 77%|███████▋  | 7352/9500 [25:12:07<7:20:50, 12.31s/it]08/03/2024 23:09:37 - INFO - __main__ -   Step: 7352, LR: 4.6606999325096575e-06, Loss: 336.0472717285156
2024-08-04T06:09:49.515932082Z 
 77%|███████▋  | 7353/9500 [25:12:19<7:21:14, 12.33s/it]08/03/2024 23:09:49 - INFO - __main__ -   Step: 7353, LR: 4.658529388822379e-06, Loss: 424.20916748046875
2024-08-04T06:10:01.593101943Z 
 77%|███████▋  | 7354/9500 [25:12:31<7:18:18, 12.25s/it]08/03/2024 23:10:01 - INFO - __main__ -   Step: 7354, LR: 4.6563588451351e-06, Loss: 363.7428894042969
2024-08-04T06:10:14.230625569Z 
 77%|███████▋  | 7355/9500 [25:12:44<7:22:13, 12.37s/it]08/03/2024 23:10:14 - INFO - __main__ -   Step: 7355, LR: 4.6541883014478205e-06, Loss: 496.91357421875
2024-08-04T06:10:26.510872702Z 
 77%|███████▋  | 7356/9500 [25:12:56<7:21:02, 12.34s/it]08/03/2024 23:10:26 - INFO - __main__ -   Step: 7356, LR: 4.652017757760542e-06, Loss: 390.1005859375
2024-08-04T06:10:39.204529202Z 
 77%|███████▋  | 7357/9500 [25:13:09<7:24:36, 12.45s/it]08/03/2024 23:10:39 - INFO - __main__ -   Step: 7357, LR: 4.649847214073263e-06, Loss: 530.6986083984375
2024-08-04T06:10:51.820867898Z 
 77%|███████▋  | 7358/9500 [25:13:21<7:26:11, 12.50s/it]08/03/2024 23:10:51 - INFO - __main__ -   Step: 7358, LR: 4.647676670385984e-06, Loss: 400.7732238769531
2024-08-04T06:11:03.833864187Z 
 77%|███████▋  | 7359/9500 [25:13:33<7:20:47, 12.35s/it]08/03/2024 23:11:03 - INFO - __main__ -   Step: 7359, LR: 4.645506126698705e-06, Loss: 396.57745361328125
2024-08-04T06:11:15.919619752Z 
 77%|███████▋  | 7360/9500 [25:13:45<7:17:43, 12.27s/it]08/03/2024 23:11:15 - INFO - __main__ -   Step: 7360, LR: 4.6433355830114266e-06, Loss: 464.32745361328125
2024-08-04T06:11:28.153493677Z 
 77%|███████▋  | 7361/9500 [25:13:58<7:17:06, 12.26s/it]08/03/2024 23:11:28 - INFO - __main__ -   Step: 7361, LR: 4.641165039324147e-06, Loss: 314.42340087890625
2024-08-04T06:11:40.262383023Z 
 77%|███████▋  | 7362/9500 [25:14:10<7:15:16, 12.22s/it]08/03/2024 23:11:40 - INFO - __main__ -   Step: 7362, LR: 4.638994495636868e-06, Loss: 467.128173828125
2024-08-04T06:11:52.566709634Z 
 78%|███████▊  | 7363/9500 [25:14:22<7:16:01, 12.24s/it]08/03/2024 23:11:52 - INFO - __main__ -   Step: 7363, LR: 4.6368239519495895e-06, Loss: 488.12969970703125
2024-08-04T06:12:05.211070553Z 
 78%|███████▊  | 7364/9500 [25:14:35<7:20:07, 12.36s/it]08/03/2024 23:12:05 - INFO - __main__ -   Step: 7364, LR: 4.634653408262311e-06, Loss: 384.2159118652344
2024-08-04T06:12:17.348909774Z 
 78%|███████▊  | 7365/9500 [25:14:47<7:17:30, 12.30s/it]08/03/2024 23:12:17 - INFO - __main__ -   Step: 7365, LR: 4.632482864575032e-06, Loss: 456.69378662109375
2024-08-04T06:12:29.739291523Z 
 78%|███████▊  | 7366/9500 [25:14:59<7:18:19, 12.32s/it]08/03/2024 23:12:29 - INFO - __main__ -   Step: 7366, LR: 4.6303123208877525e-06, Loss: 442.6886291503906
2024-08-04T06:12:42.477433833Z 
 78%|███████▊  | 7367/9500 [25:15:12<7:22:31, 12.45s/it]08/03/2024 23:12:42 - INFO - __main__ -   Step: 7367, LR: 4.628141777200474e-06, Loss: 472.49774169921875
2024-08-04T06:12:54.650909732Z 
 78%|███████▊  | 7368/9500 [25:15:24<7:19:23, 12.37s/it]08/03/2024 23:12:54 - INFO - __main__ -   Step: 7368, LR: 4.625971233513195e-06, Loss: 510.1015625
2024-08-04T06:13:06.912365916Z 
 78%|███████▊  | 7369/9500 [25:15:36<7:18:04, 12.33s/it]08/03/2024 23:13:06 - INFO - __main__ -   Step: 7369, LR: 4.6238006898259155e-06, Loss: 363.796142578125
2024-08-04T06:13:19.587017064Z 
 78%|███████▊  | 7370/9500 [25:15:49<7:21:29, 12.44s/it]08/03/2024 23:13:19 - INFO - __main__ -   Step: 7370, LR: 4.621630146138637e-06, Loss: 352.23126220703125
2024-08-04T06:13:31.509968462Z 
 78%|███████▊  | 7371/9500 [25:16:01<7:15:49, 12.28s/it]08/03/2024 23:13:31 - INFO - __main__ -   Step: 7371, LR: 4.619459602451359e-06, Loss: 345.79052734375
2024-08-04T06:13:43.780540667Z 
 78%|███████▊  | 7372/9500 [25:16:13<7:15:29, 12.28s/it]08/03/2024 23:13:43 - INFO - __main__ -   Step: 7372, LR: 4.617289058764079e-06, Loss: 316.750732421875
2024-08-04T06:13:56.494406810Z 
 78%|███████▊  | 7373/9500 [25:16:26<7:19:54, 12.41s/it]08/03/2024 23:13:56 - INFO - __main__ -   Step: 7373, LR: 4.6151185150768e-06, Loss: 485.6663818359375
2024-08-04T06:14:08.582289962Z 
 78%|███████▊  | 7374/9500 [25:16:38<7:16:17, 12.31s/it]08/03/2024 23:14:08 - INFO - __main__ -   Step: 7374, LR: 4.612947971389522e-06, Loss: 446.46026611328125
2024-08-04T06:14:20.823423864Z 
 78%|███████▊  | 7375/9500 [25:16:50<7:15:19, 12.29s/it]08/03/2024 23:14:20 - INFO - __main__ -   Step: 7375, LR: 4.610777427702242e-06, Loss: 392.1219482421875
2024-08-04T06:14:33.722122068Z 
 78%|███████▊  | 7376/9500 [25:17:03<7:21:33, 12.47s/it]08/03/2024 23:14:33 - INFO - __main__ -   Step: 7376, LR: 4.608606884014963e-06, Loss: 478.280517578125
2024-08-04T06:14:45.879777110Z 
 78%|███████▊  | 7377/9500 [25:17:15<7:17:59, 12.38s/it]08/03/2024 23:14:45 - INFO - __main__ -   Step: 7377, LR: 4.606436340327685e-06, Loss: 387.11688232421875
2024-08-04T06:14:58.181308004Z 
 78%|███████▊  | 7378/9500 [25:17:28<7:16:58, 12.36s/it]08/03/2024 23:14:58 - INFO - __main__ -   Step: 7378, LR: 4.604265796640406e-06, Loss: 444.1282958984375
2024-08-04T06:15:10.769810273Z 
 78%|███████▊  | 7379/9500 [25:17:40<7:19:14, 12.43s/it]08/03/2024 23:15:10 - INFO - __main__ -   Step: 7379, LR: 4.602095252953127e-06, Loss: 432.32244873046875
2024-08-04T06:15:23.027463545Z 
 78%|███████▊  | 7380/9500 [25:17:52<7:17:15, 12.38s/it]08/03/2024 23:15:23 - INFO - __main__ -   Step: 7380, LR: 4.599924709265848e-06, Loss: 361.1593322753906
2024-08-04T06:15:35.049224978Z 
 78%|███████▊  | 7381/9500 [25:18:04<7:13:18, 12.27s/it]08/03/2024 23:15:35 - INFO - __main__ -   Step: 7381, LR: 4.597754165578569e-06, Loss: 383.159423828125
2024-08-04T06:15:48.042377587Z 
 78%|███████▊  | 7382/9500 [25:18:17<7:20:46, 12.49s/it]08/03/2024 23:15:48 - INFO - __main__ -   Step: 7382, LR: 4.59558362189129e-06, Loss: 471.20794677734375
2024-08-04T06:16:00.262800860Z 
 78%|███████▊  | 7383/9500 [25:18:30<7:17:44, 12.41s/it]08/03/2024 23:16:00 - INFO - __main__ -   Step: 7383, LR: 4.5934130782040106e-06, Loss: 440.2921142578125
2024-08-04T06:16:12.774334876Z 
 78%|███████▊  | 7384/9500 [25:18:42<7:18:38, 12.44s/it]08/03/2024 23:16:12 - INFO - __main__ -   Step: 7384, LR: 4.591242534516732e-06, Loss: 345.46649169921875
2024-08-04T06:16:25.299781834Z 
 78%|███████▊  | 7385/9500 [25:18:55<7:19:21, 12.46s/it]08/03/2024 23:16:25 - INFO - __main__ -   Step: 7385, LR: 4.589071990829454e-06, Loss: 350.18402099609375
2024-08-04T06:16:37.516794957Z 
 78%|███████▊  | 7386/9500 [25:19:07<7:16:32, 12.39s/it]08/03/2024 23:16:37 - INFO - __main__ -   Step: 7386, LR: 4.586901447142174e-06, Loss: 327.2347106933594
2024-08-04T06:16:49.712248703Z 
 78%|███████▊  | 7387/9500 [25:19:19<7:14:16, 12.33s/it]08/03/2024 23:16:49 - INFO - __main__ -   Step: 7387, LR: 4.584730903454895e-06, Loss: 411.95562744140625
2024-08-04T06:17:02.386541756Z 
 78%|███████▊  | 7388/9500 [25:19:32<7:17:41, 12.43s/it]08/03/2024 23:17:02 - INFO - __main__ -   Step: 7388, LR: 4.582560359767617e-06, Loss: 398.75347900390625
2024-08-04T06:17:14.764025201Z 
 78%|███████▊  | 7389/9500 [25:19:44<7:16:53, 12.42s/it]08/03/2024 23:17:14 - INFO - __main__ -   Step: 7389, LR: 4.580389816080338e-06, Loss: 376.012939453125
2024-08-04T06:17:27.295426019Z 
 78%|███████▊  | 7390/9500 [25:19:57<7:17:52, 12.45s/it]08/03/2024 23:17:27 - INFO - __main__ -   Step: 7390, LR: 4.578219272393058e-06, Loss: 441.6372985839844
2024-08-04T06:17:39.526357822Z 
 78%|███████▊  | 7391/9500 [25:20:09<7:15:20, 12.39s/it]08/03/2024 23:17:39 - INFO - __main__ -   Step: 7391, LR: 4.57604872870578e-06, Loss: 400.47174072265625
2024-08-04T06:17:52.004118956Z 
 78%|███████▊  | 7392/9500 [25:20:21<7:16:06, 12.41s/it]08/03/2024 23:17:52 - INFO - __main__ -   Step: 7392, LR: 4.573878185018501e-06, Loss: 364.5722351074219
2024-08-04T06:18:04.157043544Z 
 78%|███████▊  | 7393/9500 [25:20:34<7:13:10, 12.34s/it]08/03/2024 23:18:04 - INFO - __main__ -   Step: 7393, LR: 4.571707641331222e-06, Loss: 409.02618408203125
2024-08-04T06:18:16.355777570Z 
 78%|███████▊  | 7394/9500 [25:20:46<7:11:31, 12.29s/it]08/03/2024 23:18:16 - INFO - __main__ -   Step: 7394, LR: 4.569537097643943e-06, Loss: 338.93634033203125
2024-08-04T06:18:28.847626342Z 
 78%|███████▊  | 7395/9500 [25:20:58<7:13:24, 12.35s/it]08/03/2024 23:18:28 - INFO - __main__ -   Step: 7395, LR: 4.567366553956664e-06, Loss: 263.53472900390625
2024-08-04T06:18:40.871105392Z 
 78%|███████▊  | 7396/9500 [25:21:10<7:09:43, 12.25s/it]08/03/2024 23:18:40 - INFO - __main__ -   Step: 7396, LR: 4.565196010269386e-06, Loss: 352.8129577636719
2024-08-04T06:18:53.064752437Z 
 78%|███████▊  | 7397/9500 [25:21:23<7:08:52, 12.24s/it]08/03/2024 23:18:53 - INFO - __main__ -   Step: 7397, LR: 4.5630254665821065e-06, Loss: 441.2969970703125
2024-08-04T06:19:05.445572574Z 
 78%|███████▊  | 7398/9500 [25:21:35<7:10:11, 12.28s/it]08/03/2024 23:19:05 - INFO - __main__ -   Step: 7398, LR: 4.560854922894827e-06, Loss: 342.0793151855469
2024-08-04T06:19:17.546507150Z 
 78%|███████▊  | 7399/9500 [25:21:47<7:08:06, 12.23s/it]08/03/2024 23:19:17 - INFO - __main__ -   Step: 7399, LR: 4.558684379207549e-06, Loss: 342.7845458984375
2024-08-04T06:19:29.755460185Z 
 78%|███████▊  | 7400/9500 [25:21:59<7:07:43, 12.22s/it]08/03/2024 23:19:29 - INFO - __main__ -   Step: 7400, LR: 4.5565138355202695e-06, Loss: 501.31103515625
2024-08-04T06:19:42.613252182Z 
 78%|███████▊  | 7401/9500 [25:22:12<7:14:11, 12.41s/it]08/03/2024 23:19:42 - INFO - __main__ -   Step: 7401, LR: 4.55434329183299e-06, Loss: 444.8125305175781
2024-08-04T06:19:54.644269755Z 
 78%|███████▊  | 7402/9500 [25:22:24<7:10:00, 12.30s/it]08/03/2024 23:19:54 - INFO - __main__ -   Step: 7402, LR: 4.552172748145712e-06, Loss: 394.5599670410156
2024-08-04T06:20:06.756253678Z 
 78%|███████▊  | 7403/9500 [25:22:36<7:07:51, 12.24s/it]08/03/2024 23:20:06 - INFO - __main__ -   Step: 7403, LR: 4.550002204458433e-06, Loss: 359.2770690917969
2024-08-04T06:20:19.461192863Z 
 78%|███████▊  | 7404/9500 [25:22:49<7:12:30, 12.38s/it]08/03/2024 23:20:19 - INFO - __main__ -   Step: 7404, LR: 4.547831660771154e-06, Loss: 378.6617736816406
2024-08-04T06:20:31.462634430Z 
 78%|███████▊  | 7405/9500 [25:23:01<7:08:19, 12.27s/it]08/03/2024 23:20:31 - INFO - __main__ -   Step: 7405, LR: 4.545661117083875e-06, Loss: 377.29205322265625
2024-08-04T06:20:43.524555288Z 
 78%|███████▊  | 7406/9500 [25:23:13<7:05:58, 12.21s/it]08/03/2024 23:20:43 - INFO - __main__ -   Step: 7406, LR: 4.543490573396596e-06, Loss: 345.26470947265625
2024-08-04T06:20:56.122773194Z 
 78%|███████▊  | 7407/9500 [25:23:26<7:09:52, 12.32s/it]08/03/2024 23:20:56 - INFO - __main__ -   Step: 7407, LR: 4.541320029709317e-06, Loss: 486.9954833984375
2024-08-04T06:21:08.507298392Z 
 78%|███████▊  | 7408/9500 [25:23:38<7:10:18, 12.34s/it]08/03/2024 23:21:08 - INFO - __main__ -   Step: 7408, LR: 4.539149486022038e-06, Loss: 352.0484924316406
2024-08-04T06:21:20.995619469Z 
 78%|███████▊  | 7409/9500 [25:23:50<7:11:38, 12.39s/it]08/03/2024 23:21:20 - INFO - __main__ -   Step: 7409, LR: 4.536978942334759e-06, Loss: 408.90313720703125
2024-08-04T06:21:33.446489497Z 
 78%|███████▊  | 7410/9500 [25:24:03<7:12:06, 12.41s/it]08/03/2024 23:21:33 - INFO - __main__ -   Step: 7410, LR: 4.534808398647481e-06, Loss: 363.35107421875
2024-08-04T06:21:45.514635076Z 
 78%|███████▊  | 7411/9500 [25:24:15<7:08:23, 12.30s/it]08/03/2024 23:21:45 - INFO - __main__ -   Step: 7411, LR: 4.5326378549602015e-06, Loss: 522.5511474609375
2024-08-04T06:21:57.506034982Z 
 78%|███████▊  | 7412/9500 [25:24:27<7:04:55, 12.21s/it]08/03/2024 23:21:57 - INFO - __main__ -   Step: 7412, LR: 4.530467311272922e-06, Loss: 317.93341064453125
2024-08-04T06:22:10.392025204Z 
 78%|███████▊  | 7413/9500 [25:24:40<7:11:45, 12.41s/it]08/03/2024 23:22:10 - INFO - __main__ -   Step: 7413, LR: 4.528296767585644e-06, Loss: 338.0128173828125
2024-08-04T06:22:22.718231302Z 
 78%|███████▊  | 7414/9500 [25:24:52<7:10:39, 12.39s/it]08/03/2024 23:22:22 - INFO - __main__ -   Step: 7414, LR: 4.5261262238983645e-06, Loss: 370.81060791015625
2024-08-04T06:22:35.537342320Z 
 78%|███████▊  | 7415/9500 [25:25:05<7:14:57, 12.52s/it]08/03/2024 23:22:35 - INFO - __main__ -   Step: 7415, LR: 4.523955680211086e-06, Loss: 389.8017272949219
2024-08-04T06:22:48.031912940Z 
 78%|███████▊  | 7416/9500 [25:25:17<7:14:30, 12.51s/it]08/03/2024 23:22:48 - INFO - __main__ -   Step: 7416, LR: 4.521785136523807e-06, Loss: 351.2601013183594
2024-08-04T06:23:00.051261927Z 
 78%|███████▊  | 7417/9500 [25:25:29<7:09:11, 12.36s/it]08/03/2024 23:23:00 - INFO - __main__ -   Step: 7417, LR: 4.519614592836528e-06, Loss: 341.994384765625
2024-08-04T06:23:12.170280095Z 
 78%|███████▊  | 7418/9500 [25:25:42<7:06:27, 12.29s/it]08/03/2024 23:23:12 - INFO - __main__ -   Step: 7418, LR: 4.517444049149249e-06, Loss: 369.5128173828125
2024-08-04T06:23:25.034227052Z 
 78%|███████▊  | 7419/9500 [25:25:54<7:12:13, 12.46s/it]08/03/2024 23:23:25 - INFO - __main__ -   Step: 7419, LR: 4.51527350546197e-06, Loss: 519.9774169921875
2024-08-04T06:23:36.964345887Z 
 78%|███████▊  | 7420/9500 [25:26:06<7:06:28, 12.30s/it]08/03/2024 23:23:36 - INFO - __main__ -   Step: 7420, LR: 4.513102961774691e-06, Loss: 315.210205078125
2024-08-04T06:23:49.301345795Z 
 78%|███████▊  | 7421/9500 [25:26:19<7:06:38, 12.31s/it]08/03/2024 23:23:49 - INFO - __main__ -   Step: 7421, LR: 4.510932418087412e-06, Loss: 478.5041198730469
2024-08-04T06:24:01.756673384Z 
 78%|███████▊  | 7422/9500 [25:26:31<7:07:54, 12.36s/it]08/03/2024 23:24:01 - INFO - __main__ -   Step: 7422, LR: 4.508761874400134e-06, Loss: 395.22601318359375
2024-08-04T06:24:13.796458719Z 
 78%|███████▊  | 7423/9500 [25:26:43<7:04:25, 12.26s/it]08/03/2024 23:24:13 - INFO - __main__ -   Step: 7423, LR: 4.506591330712854e-06, Loss: 367.9940185546875
2024-08-04T06:24:25.904867606Z 
 78%|███████▊  | 7424/9500 [25:26:55<7:02:38, 12.22s/it]08/03/2024 23:24:25 - INFO - __main__ -   Step: 7424, LR: 4.504420787025576e-06, Loss: 338.78546142578125
2024-08-04T06:24:38.325440036Z 
 78%|███████▊  | 7425/9500 [25:27:08<7:04:34, 12.28s/it]08/03/2024 23:24:38 - INFO - __main__ -   Step: 7425, LR: 4.502250243338297e-06, Loss: 384.6278076171875
2024-08-04T06:24:50.320797893Z 
 78%|███████▊  | 7426/9500 [25:27:20<7:01:27, 12.19s/it]08/03/2024 23:24:50 - INFO - __main__ -   Step: 7426, LR: 4.500079699651017e-06, Loss: 425.271728515625
2024-08-04T06:25:02.727275699Z 
 78%|███████▊  | 7427/9500 [25:27:32<7:03:27, 12.26s/it]08/03/2024 23:25:02 - INFO - __main__ -   Step: 7427, LR: 4.497909155963739e-06, Loss: 413.6678466796875
2024-08-04T06:25:15.329340018Z 
 78%|███████▊  | 7428/9500 [25:27:45<7:06:50, 12.36s/it]08/03/2024 23:25:15 - INFO - __main__ -   Step: 7428, LR: 4.49573861227646e-06, Loss: 391.56719970703125
2024-08-04T06:25:27.476352172Z 
 78%|███████▊  | 7429/9500 [25:27:57<7:04:25, 12.30s/it]08/03/2024 23:25:27 - INFO - __main__ -   Step: 7429, LR: 4.493568068589181e-06, Loss: 326.5732421875
2024-08-04T06:25:39.872894384Z 
 78%|███████▊  | 7430/9500 [25:28:09<7:05:15, 12.33s/it]08/03/2024 23:25:39 - INFO - __main__ -   Step: 7430, LR: 4.491397524901902e-06, Loss: 452.7373046875
2024-08-04T06:25:52.428068783Z 
 78%|███████▊  | 7431/9500 [25:28:22<7:07:25, 12.40s/it]08/03/2024 23:25:52 - INFO - __main__ -   Step: 7431, LR: 4.489226981214623e-06, Loss: 432.0331115722656
2024-08-04T06:26:04.539538774Z 
 78%|███████▊  | 7432/9500 [25:28:34<7:04:16, 12.31s/it]08/03/2024 23:26:04 - INFO - __main__ -   Step: 7432, LR: 4.487056437527344e-06, Loss: 481.0788269042969
2024-08-04T06:26:16.512012263Z 
 78%|███████▊  | 7433/9500 [25:28:46<7:00:35, 12.21s/it]08/03/2024 23:26:16 - INFO - __main__ -   Step: 7433, LR: 4.484885893840065e-06, Loss: 315.67059326171875
2024-08-04T06:26:28.761627196Z 
 78%|███████▊  | 7434/9500 [25:28:58<7:00:48, 12.22s/it]08/03/2024 23:26:28 - INFO - __main__ -   Step: 7434, LR: 4.482715350152786e-06, Loss: 452.3644104003906
2024-08-04T06:26:41.266135483Z 
 78%|███████▊  | 7435/9500 [25:29:11<7:03:32, 12.31s/it]08/03/2024 23:26:41 - INFO - __main__ -   Step: 7435, LR: 4.480544806465507e-06, Loss: 324.96527099609375
2024-08-04T06:26:53.330399266Z 
 78%|███████▊  | 7436/9500 [25:29:23<7:00:49, 12.23s/it]08/03/2024 23:26:53 - INFO - __main__ -   Step: 7436, LR: 4.478374262778229e-06, Loss: 434.98822021484375
2024-08-04T06:27:05.699647905Z 
 78%|███████▊  | 7437/9500 [25:29:35<7:02:01, 12.27s/it]08/03/2024 23:27:05 - INFO - __main__ -   Step: 7437, LR: 4.476203719090949e-06, Loss: 394.1964416503906
2024-08-04T06:27:18.148469048Z 
 78%|███████▊  | 7438/9500 [25:29:48<7:03:37, 12.33s/it]08/03/2024 23:27:18 - INFO - __main__ -   Step: 7438, LR: 4.474033175403671e-06, Loss: 329.7160339355469
2024-08-04T06:27:30.185562425Z 
 78%|███████▊  | 7439/9500 [25:30:00<7:00:26, 12.24s/it]08/03/2024 23:27:30 - INFO - __main__ -   Step: 7439, LR: 4.471862631716392e-06, Loss: 299.72113037109375
2024-08-04T06:27:42.789411136Z 
 78%|███████▊  | 7440/9500 [25:30:12<7:03:58, 12.35s/it]08/03/2024 23:27:42 - INFO - __main__ -   Step: 7440, LR: 4.469692088029113e-06, Loss: 349.2559509277344
2024-08-04T06:27:55.505572098Z 
 78%|███████▊  | 7441/9500 [25:30:25<7:07:33, 12.46s/it]08/03/2024 23:27:55 - INFO - __main__ -   Step: 7441, LR: 4.467521544341834e-06, Loss: 452.293212890625
2024-08-04T06:28:07.968381265Z 
 78%|███████▊  | 7442/9500 [25:30:37<7:07:23, 12.46s/it]08/03/2024 23:28:07 - INFO - __main__ -   Step: 7442, LR: 4.465351000654555e-06, Loss: 521.4298095703125
2024-08-04T06:28:20.141444483Z 
 78%|███████▊  | 7443/9500 [25:30:50<7:04:13, 12.37s/it]08/03/2024 23:28:20 - INFO - __main__ -   Step: 7443, LR: 4.463180456967276e-06, Loss: 481.096435546875
2024-08-04T06:28:32.517714642Z 
 78%|███████▊  | 7444/9500 [25:31:02<7:04:02, 12.37s/it]08/03/2024 23:28:32 - INFO - __main__ -   Step: 7444, LR: 4.461009913279997e-06, Loss: 354.426025390625
2024-08-04T06:28:45.239652764Z 
 78%|███████▊  | 7445/9500 [25:31:15<7:07:24, 12.48s/it]08/03/2024 23:28:45 - INFO - __main__ -   Step: 7445, LR: 4.4588393695927185e-06, Loss: 487.7666015625
2024-08-04T06:28:57.664881892Z 
 78%|███████▊  | 7446/9500 [25:31:27<7:06:38, 12.46s/it]08/03/2024 23:28:57 - INFO - __main__ -   Step: 7446, LR: 4.456668825905439e-06, Loss: 420.5617370605469
2024-08-04T06:29:10.571462411Z 
 78%|███████▊  | 7447/9500 [25:31:40<7:10:59, 12.60s/it]08/03/2024 23:29:10 - INFO - __main__ -   Step: 7447, LR: 4.454498282218161e-06, Loss: 475.5069885253906
2024-08-04T06:29:22.494848136Z 
 78%|███████▊  | 7448/9500 [25:31:52<7:03:53, 12.39s/it]08/03/2024 23:29:22 - INFO - __main__ -   Step: 7448, LR: 4.4523277385308815e-06, Loss: 287.5682678222656
2024-08-04T06:29:34.378853923Z 
 78%|███████▊  | 7449/9500 [25:32:04<6:58:26, 12.24s/it]08/03/2024 23:29:34 - INFO - __main__ -   Step: 7449, LR: 4.450157194843602e-06, Loss: 373.4573974609375
2024-08-04T06:29:47.207178312Z 
 78%|███████▊  | 7450/9500 [25:32:17<7:04:15, 12.42s/it]08/03/2024 23:29:47 - INFO - __main__ -   Step: 7450, LR: 4.447986651156324e-06, Loss: 516.6170654296875
2024-08-04T06:29:59.455904921Z 
 78%|███████▊  | 7451/9500 [25:32:29<7:02:19, 12.37s/it]08/03/2024 23:29:59 - INFO - __main__ -   Step: 7451, LR: 4.4458161074690444e-06, Loss: 401.825439453125
2024-08-04T06:30:11.446266347Z 
 78%|███████▊  | 7452/9500 [25:32:41<6:58:15, 12.25s/it]08/03/2024 23:30:11 - INFO - __main__ -   Step: 7452, LR: 4.443645563781766e-06, Loss: 323.6878967285156
2024-08-04T06:30:24.048307683Z 
 78%|███████▊  | 7453/9500 [25:32:53<7:01:37, 12.36s/it]08/03/2024 23:30:24 - INFO - __main__ -   Step: 7453, LR: 4.441475020094487e-06, Loss: 370.67694091796875
2024-08-04T06:30:36.386133830Z 
 78%|███████▊  | 7454/9500 [25:33:06<7:01:12, 12.35s/it]08/03/2024 23:30:36 - INFO - __main__ -   Step: 7454, LR: 4.439304476407208e-06, Loss: 364.3919677734375
2024-08-04T06:30:49.032572398Z 
 78%|███████▊  | 7455/9500 [25:33:18<7:04:00, 12.44s/it]08/03/2024 23:30:49 - INFO - __main__ -   Step: 7455, LR: 4.437133932719929e-06, Loss: 495.4725036621094
2024-08-04T06:31:01.779154284Z 
 78%|███████▊  | 7456/9500 [25:33:31<7:06:55, 12.53s/it]08/03/2024 23:31:01 - INFO - __main__ -   Step: 7456, LR: 4.43496338903265e-06, Loss: 392.9948425292969
2024-08-04T06:31:14.143450612Z 
 78%|███████▊  | 7457/9500 [25:33:44<7:05:00, 12.48s/it]08/03/2024 23:31:14 - INFO - __main__ -   Step: 7457, LR: 4.432792845345371e-06, Loss: 395.90557861328125
2024-08-04T06:31:26.219744567Z 
 79%|███████▊  | 7458/9500 [25:33:56<7:00:39, 12.36s/it]08/03/2024 23:31:26 - INFO - __main__ -   Step: 7458, LR: 4.430622301658092e-06, Loss: 394.0460205078125
2024-08-04T06:31:38.799116514Z 
 79%|███████▊  | 7459/9500 [25:34:08<7:02:41, 12.43s/it]08/03/2024 23:31:38 - INFO - __main__ -   Step: 7459, LR: 4.428451757970813e-06, Loss: 430.15582275390625
2024-08-04T06:31:50.975528898Z 
 79%|███████▊  | 7460/9500 [25:34:20<6:59:56, 12.35s/it]08/03/2024 23:31:50 - INFO - __main__ -   Step: 7460, LR: 4.426281214283534e-06, Loss: 360.69818115234375
2024-08-04T06:32:03.436649900Z 
 79%|███████▊  | 7461/9500 [25:34:33<7:00:51, 12.38s/it]08/03/2024 23:32:03 - INFO - __main__ -   Step: 7461, LR: 4.424110670596256e-06, Loss: 476.94757080078125
2024-08-04T06:32:16.361255221Z 
 79%|███████▊  | 7462/9500 [25:34:46<7:06:09, 12.55s/it]08/03/2024 23:32:16 - INFO - __main__ -   Step: 7462, LR: 4.4219401269089765e-06, Loss: 348.54229736328125
2024-08-04T06:32:28.785881056Z 
 79%|███████▊  | 7463/9500 [25:34:58<7:04:42, 12.51s/it]08/03/2024 23:32:28 - INFO - __main__ -   Step: 7463, LR: 4.419769583221697e-06, Loss: 529.34033203125
2024-08-04T06:32:40.841947515Z 
 79%|███████▊  | 7464/9500 [25:35:10<6:59:52, 12.37s/it]08/03/2024 23:32:40 - INFO - __main__ -   Step: 7464, LR: 4.417599039534419e-06, Loss: 405.04046630859375
2024-08-04T06:32:53.719674321Z 
 79%|███████▊  | 7465/9500 [25:35:23<7:04:48, 12.52s/it]08/03/2024 23:32:53 - INFO - __main__ -   Step: 7465, LR: 4.41542849584714e-06, Loss: 502.3738098144531
2024-08-04T06:33:06.008199288Z 
 79%|███████▊  | 7466/9500 [25:35:35<7:02:11, 12.45s/it]08/03/2024 23:33:06 - INFO - __main__ -   Step: 7466, LR: 4.413257952159861e-06, Loss: 525.6165771484375
2024-08-04T06:33:18.208929946Z 
 79%|███████▊  | 7467/9500 [25:35:48<6:59:24, 12.38s/it]08/03/2024 23:33:18 - INFO - __main__ -   Step: 7467, LR: 4.411087408472582e-06, Loss: 397.9087829589844
2024-08-04T06:33:30.820521012Z 
 79%|███████▊  | 7468/9500 [25:36:00<7:01:34, 12.45s/it]08/03/2024 23:33:30 - INFO - __main__ -   Step: 7468, LR: 4.408916864785303e-06, Loss: 395.5797119140625
2024-08-04T06:33:43.041369434Z 
 79%|███████▊  | 7469/9500 [25:36:12<6:59:03, 12.38s/it]08/03/2024 23:33:43 - INFO - __main__ -   Step: 7469, LR: 4.406746321098024e-06, Loss: 345.2499694824219
2024-08-04T06:33:55.137759110Z 
 79%|███████▊  | 7470/9500 [25:36:25<6:55:58, 12.29s/it]08/03/2024 23:33:55 - INFO - __main__ -   Step: 7470, LR: 4.404575777410745e-06, Loss: 398.71820068359375
2024-08-04T06:34:07.872565375Z 
 79%|███████▊  | 7471/9500 [25:36:37<7:00:14, 12.43s/it]08/03/2024 23:34:07 - INFO - __main__ -   Step: 7471, LR: 4.402405233723466e-06, Loss: 480.9451599121094
2024-08-04T06:34:20.166215639Z 
 79%|███████▊  | 7472/9500 [25:36:50<6:58:40, 12.39s/it]08/03/2024 23:34:20 - INFO - __main__ -   Step: 7472, LR: 4.400234690036188e-06, Loss: 322.3359375
2024-08-04T06:34:32.590697728Z 
 79%|███████▊  | 7473/9500 [25:37:02<6:58:51, 12.40s/it]08/03/2024 23:34:32 - INFO - __main__ -   Step: 7473, LR: 4.398064146348909e-06, Loss: 394.98590087890625
2024-08-04T06:34:45.147083540Z 
 79%|███████▊  | 7474/9500 [25:37:15<7:00:14, 12.45s/it]08/03/2024 23:34:45 - INFO - __main__ -   Step: 7474, LR: 4.395893602661629e-06, Loss: 414.0181884765625
2024-08-04T06:34:57.144358873Z 
 79%|███████▊  | 7475/9500 [25:37:27<6:55:30, 12.31s/it]08/03/2024 23:34:57 - INFO - __main__ -   Step: 7475, LR: 4.393723058974351e-06, Loss: 405.73089599609375
2024-08-04T06:35:09.465543609Z 
 79%|███████▊  | 7476/9500 [25:37:39<6:55:23, 12.31s/it]08/03/2024 23:35:09 - INFO - __main__ -   Step: 7476, LR: 4.3915525152870716e-06, Loss: 420.06707763671875
2024-08-04T06:35:21.676558409Z 
 79%|███████▊  | 7477/9500 [25:37:51<6:54:08, 12.28s/it]08/03/2024 23:35:21 - INFO - __main__ -   Step: 7477, LR: 4.389381971599792e-06, Loss: 555.355224609375
2024-08-04T06:35:34.107855606Z 
 79%|███████▊  | 7478/9500 [25:38:04<6:55:26, 12.33s/it]08/03/2024 23:35:34 - INFO - __main__ -   Step: 7478, LR: 4.387211427912514e-06, Loss: 419.30584716796875
2024-08-04T06:35:46.322216374Z 
 79%|███████▊  | 7479/9500 [25:38:16<6:54:05, 12.29s/it]08/03/2024 23:35:46 - INFO - __main__ -   Step: 7479, LR: 4.385040884225235e-06, Loss: 317.8189697265625
2024-08-04T06:35:58.778170545Z 
 79%|███████▊  | 7480/9500 [25:38:28<6:55:31, 12.34s/it]08/03/2024 23:35:58 - INFO - __main__ -   Step: 7480, LR: 4.382870340537956e-06, Loss: 418.20233154296875
2024-08-04T06:36:11.207931932Z 
 79%|███████▊  | 7481/9500 [25:38:41<6:56:12, 12.37s/it]08/03/2024 23:36:11 - INFO - __main__ -   Step: 7481, LR: 4.380699796850677e-06, Loss: 357.5765686035156
2024-08-04T06:36:23.396801420Z 
 79%|███████▉  | 7482/9500 [25:38:53<6:54:11, 12.31s/it]08/03/2024 23:36:23 - INFO - __main__ -   Step: 7482, LR: 4.378529253163398e-06, Loss: 490.8083190917969
2024-08-04T06:36:35.440566248Z 
 79%|███████▉  | 7483/9500 [25:39:05<6:51:14, 12.23s/it]08/03/2024 23:36:35 - INFO - __main__ -   Step: 7483, LR: 4.376358709476119e-06, Loss: 371.26971435546875
2024-08-04T06:36:48.435881266Z 
 79%|███████▉  | 7484/9500 [25:39:18<6:58:43, 12.46s/it]08/03/2024 23:36:48 - INFO - __main__ -   Step: 7484, LR: 4.37418816578884e-06, Loss: 500.21002197265625
2024-08-04T06:37:00.830164977Z 
 79%|███████▉  | 7485/9500 [25:39:30<6:57:49, 12.44s/it]08/03/2024 23:37:00 - INFO - __main__ -   Step: 7485, LR: 4.372017622101561e-06, Loss: 451.212646484375
2024-08-04T06:37:13.167612685Z 
 79%|███████▉  | 7486/9500 [25:39:43<6:56:34, 12.41s/it]08/03/2024 23:37:13 - INFO - __main__ -   Step: 7486, LR: 4.369847078414283e-06, Loss: 379.30389404296875
2024-08-04T06:37:25.735787910Z 
 79%|███████▉  | 7487/9500 [25:39:55<6:57:57, 12.46s/it]08/03/2024 23:37:25 - INFO - __main__ -   Step: 7487, LR: 4.367676534727004e-06, Loss: 468.9593505859375
2024-08-04T06:37:37.947571897Z 
 79%|███████▉  | 7488/9500 [25:40:07<6:55:16, 12.38s/it]08/03/2024 23:37:37 - INFO - __main__ -   Step: 7488, LR: 4.365505991039724e-06, Loss: 446.615966796875
2024-08-04T06:37:49.922567156Z 
 79%|███████▉  | 7489/9500 [25:40:19<6:50:57, 12.26s/it]08/03/2024 23:37:49 - INFO - __main__ -   Step: 7489, LR: 4.363335447352446e-06, Loss: 351.20037841796875
2024-08-04T06:38:02.655027990Z 
 79%|███████▉  | 7490/9500 [25:40:32<6:55:29, 12.40s/it]08/03/2024 23:38:02 - INFO - __main__ -   Step: 7490, LR: 4.3611649036651675e-06, Loss: 390.1802978515625
2024-08-04T06:38:14.806446392Z 
 79%|███████▉  | 7491/9500 [25:40:44<6:52:45, 12.33s/it]08/03/2024 23:38:14 - INFO - __main__ -   Step: 7491, LR: 4.358994359977888e-06, Loss: 410.4839782714844
2024-08-04T06:38:26.913690957Z 
 79%|███████▉  | 7492/9500 [25:40:56<6:50:20, 12.26s/it]08/03/2024 23:38:26 - INFO - __main__ -   Step: 7492, LR: 4.356823816290609e-06, Loss: 555.32861328125
2024-08-04T06:38:39.535766103Z 
 79%|███████▉  | 7493/9500 [25:41:09<6:53:45, 12.37s/it]08/03/2024 23:38:39 - INFO - __main__ -   Step: 7493, LR: 4.3546532726033305e-06, Loss: 443.88140869140625
2024-08-04T06:38:51.981048753Z 
 79%|███████▉  | 7494/9500 [25:41:21<6:54:18, 12.39s/it]08/03/2024 23:38:51 - INFO - __main__ -   Step: 7494, LR: 4.352482728916051e-06, Loss: 426.7188720703125
2024-08-04T06:39:04.327840488Z 
 79%|███████▉  | 7495/9500 [25:41:34<6:53:39, 12.38s/it]08/03/2024 23:39:04 - INFO - __main__ -   Step: 7495, LR: 4.350312185228772e-06, Loss: 383.1763916015625
2024-08-04T06:39:16.681624450Z 
 79%|███████▉  | 7496/9500 [25:41:46<6:53:11, 12.37s/it]08/03/2024 23:39:16 - INFO - __main__ -   Step: 7496, LR: 4.3481416415414934e-06, Loss: 379.98968505859375
2024-08-04T06:39:28.718625315Z 
 79%|███████▉  | 7497/9500 [25:41:58<6:49:38, 12.27s/it]08/03/2024 23:39:28 - INFO - __main__ -   Step: 7497, LR: 4.345971097854215e-06, Loss: 424.08831787109375
2024-08-04T06:39:40.817908517Z 
 79%|███████▉  | 7498/9500 [25:42:10<6:47:43, 12.22s/it]08/03/2024 23:39:40 - INFO - __main__ -   Step: 7498, LR: 4.343800554166936e-06, Loss: 326.018310546875
2024-08-04T06:39:53.299934150Z 
 79%|███████▉  | 7499/9500 [25:42:23<6:50:08, 12.30s/it]08/03/2024 23:39:53 - INFO - __main__ -   Step: 7499, LR: 4.3416300104796564e-06, Loss: 314.8291015625
2024-08-04T06:40:05.594664388Z 
 79%|███████▉  | 7500/9500 [25:42:35<6:49:54, 12.30s/it]08/03/2024 23:40:05 - INFO - __main__ -   Step: 7500, LR: 4.339459466792378e-06, Loss: 445.48480224609375
2024-08-04T06:40:17.728973680Z 
 79%|███████▉  | 7501/9500 [25:42:47<6:48:04, 12.25s/it]08/03/2024 23:40:17 - INFO - __main__ -   Step: 7501, LR: 4.337288923105099e-06, Loss: 287.1636657714844
2024-08-04T06:40:30.780807026Z 
 79%|███████▉  | 7502/9500 [25:43:00<6:55:53, 12.49s/it]08/03/2024 23:40:30 - INFO - __main__ -   Step: 7502, LR: 4.335118379417819e-06, Loss: 519.7803955078125
2024-08-04T06:40:42.900903632Z 
 79%|███████▉  | 7503/9500 [25:43:12<6:52:00, 12.38s/it]08/03/2024 23:40:42 - INFO - __main__ -   Step: 7503, LR: 4.332947835730541e-06, Loss: 371.09765625
2024-08-04T06:40:55.149918189Z 
 79%|███████▉  | 7504/9500 [25:43:25<6:50:29, 12.34s/it]08/03/2024 23:40:55 - INFO - __main__ -   Step: 7504, LR: 4.3307772920432625e-06, Loss: 428.8233947753906
2024-08-04T06:41:07.874294058Z 
 79%|███████▉  | 7505/9500 [25:43:37<6:54:07, 12.46s/it]08/03/2024 23:41:07 - INFO - __main__ -   Step: 7505, LR: 4.328606748355983e-06, Loss: 456.77764892578125
2024-08-04T06:41:20.009413334Z 
 79%|███████▉  | 7506/9500 [25:43:49<6:50:44, 12.36s/it]08/03/2024 23:41:20 - INFO - __main__ -   Step: 7506, LR: 4.326436204668704e-06, Loss: 300.25201416015625
2024-08-04T06:41:32.274212497Z 
 79%|███████▉  | 7507/9500 [25:44:02<6:49:35, 12.33s/it]08/03/2024 23:41:32 - INFO - __main__ -   Step: 7507, LR: 4.3242656609814255e-06, Loss: 401.7912292480469
2024-08-04T06:41:44.856255406Z 
 79%|███████▉  | 7508/9500 [25:44:14<6:51:53, 12.41s/it]08/03/2024 23:41:44 - INFO - __main__ -   Step: 7508, LR: 4.322095117294146e-06, Loss: 433.42352294921875
2024-08-04T06:41:57.166715041Z 
 79%|███████▉  | 7509/9500 [25:44:27<6:50:43, 12.38s/it]08/03/2024 23:41:57 - INFO - __main__ -   Step: 7509, LR: 4.319924573606867e-06, Loss: 501.84539794921875
2024-08-04T06:42:09.363810191Z 
 79%|███████▉  | 7510/9500 [25:44:39<6:48:43, 12.32s/it]08/03/2024 23:42:09 - INFO - __main__ -   Step: 7510, LR: 4.3177540299195885e-06, Loss: 415.34979248046875
2024-08-04T06:42:21.991036790Z 
 79%|███████▉  | 7511/9500 [25:44:51<6:51:32, 12.41s/it]08/03/2024 23:42:21 - INFO - __main__ -   Step: 7511, LR: 4.31558348623231e-06, Loss: 355.88262939453125
2024-08-04T06:42:34.169507762Z 
 79%|███████▉  | 7512/9500 [25:45:04<6:48:59, 12.34s/it]08/03/2024 23:42:34 - INFO - __main__ -   Step: 7512, LR: 4.313412942545031e-06, Loss: 341.9080810546875
2024-08-04T06:42:46.259714991Z 
 79%|███████▉  | 7513/9500 [25:45:16<6:46:15, 12.27s/it]08/03/2024 23:42:46 - INFO - __main__ -   Step: 7513, LR: 4.3112423988577515e-06, Loss: 415.7928466796875
2024-08-04T06:42:59.125768546Z 
 79%|███████▉  | 7514/9500 [25:45:29<6:52:00, 12.45s/it]08/03/2024 23:42:59 - INFO - __main__ -   Step: 7514, LR: 4.309071855170473e-06, Loss: 444.1630859375
2024-08-04T06:43:11.275635332Z 
 79%|███████▉  | 7515/9500 [25:45:41<6:48:50, 12.36s/it]08/03/2024 23:43:11 - INFO - __main__ -   Step: 7515, LR: 4.306901311483194e-06, Loss: 336.00140380859375
2024-08-04T06:43:23.429764338Z 
 79%|███████▉  | 7516/9500 [25:45:53<6:46:36, 12.30s/it]08/03/2024 23:43:23 - INFO - __main__ -   Step: 7516, LR: 4.304730767795915e-06, Loss: 441.8780517578125
2024-08-04T06:43:35.937315284Z 
 79%|███████▉  | 7517/9500 [25:46:05<6:48:29, 12.36s/it]08/03/2024 23:43:35 - INFO - __main__ -   Step: 7517, LR: 4.302560224108636e-06, Loss: 416.37890625
2024-08-04T06:43:48.149645933Z 
 79%|███████▉  | 7518/9500 [25:46:18<6:46:49, 12.32s/it]08/03/2024 23:43:48 - INFO - __main__ -   Step: 7518, LR: 4.300389680421358e-06, Loss: 464.8447570800781
2024-08-04T06:44:00.253367383Z 
 79%|███████▉  | 7519/9500 [25:46:30<6:44:31, 12.25s/it]08/03/2024 23:44:00 - INFO - __main__ -   Step: 7519, LR: 4.298219136734078e-06, Loss: 441.42938232421875
2024-08-04T06:44:12.313149618Z 
 79%|███████▉  | 7520/9500 [25:46:42<6:42:25, 12.19s/it]08/03/2024 23:44:12 - INFO - __main__ -   Step: 7520, LR: 4.296048593046799e-06, Loss: 312.59857177734375
2024-08-04T06:44:25.039582018Z 
 79%|███████▉  | 7521/9500 [25:46:54<6:47:28, 12.35s/it]08/03/2024 23:44:25 - INFO - __main__ -   Step: 7521, LR: 4.2938780493595206e-06, Loss: 389.2674255371094
2024-08-04T06:44:37.072872240Z 
 79%|███████▉  | 7522/9500 [25:47:07<6:44:05, 12.26s/it]08/03/2024 23:44:37 - INFO - __main__ -   Step: 7522, LR: 4.291707505672241e-06, Loss: 357.49591064453125
2024-08-04T06:44:48.902110614Z 
 79%|███████▉  | 7523/9500 [25:47:18<6:39:39, 12.13s/it]08/03/2024 23:44:48 - INFO - __main__ -   Step: 7523, LR: 4.289536961984963e-06, Loss: 313.28936767578125
2024-08-04T06:45:01.462838432Z 
 79%|███████▉  | 7524/9500 [25:47:31<6:43:43, 12.26s/it]08/03/2024 23:45:01 - INFO - __main__ -   Step: 7524, LR: 4.2873664182976836e-06, Loss: 451.112060546875
2024-08-04T06:45:13.830144577Z 
 79%|███████▉  | 7525/9500 [25:47:43<6:44:35, 12.29s/it]08/03/2024 23:45:13 - INFO - __main__ -   Step: 7525, LR: 4.285195874610405e-06, Loss: 403.6197204589844
2024-08-04T06:45:25.824196281Z 
 79%|███████▉  | 7526/9500 [25:47:55<6:41:27, 12.20s/it]08/03/2024 23:45:25 - INFO - __main__ -   Step: 7526, LR: 4.283025330923126e-06, Loss: 391.4759216308594
2024-08-04T06:45:38.349168694Z 
 79%|███████▉  | 7527/9500 [25:48:08<6:44:25, 12.30s/it]08/03/2024 23:45:38 - INFO - __main__ -   Step: 7527, LR: 4.2808547872358465e-06, Loss: 398.0688171386719
2024-08-04T06:45:50.706802289Z 
 79%|███████▉  | 7528/9500 [25:48:20<6:44:48, 12.32s/it]08/03/2024 23:45:50 - INFO - __main__ -   Step: 7528, LR: 4.278684243548568e-06, Loss: 413.95440673828125
2024-08-04T06:46:03.323677295Z 
 79%|███████▉  | 7529/9500 [25:48:33<6:47:33, 12.41s/it]08/03/2024 23:46:03 - INFO - __main__ -   Step: 7529, LR: 4.276513699861289e-06, Loss: 409.33673095703125
2024-08-04T06:46:15.458895243Z 
 79%|███████▉  | 7530/9500 [25:48:45<6:44:40, 12.33s/it]08/03/2024 23:46:15 - INFO - __main__ -   Step: 7530, LR: 4.27434315617401e-06, Loss: 371.19305419921875
2024-08-04T06:46:27.650343290Z 
 79%|███████▉  | 7531/9500 [25:48:57<6:43:09, 12.29s/it]08/03/2024 23:46:27 - INFO - __main__ -   Step: 7531, LR: 4.272172612486731e-06, Loss: 398.81475830078125
2024-08-04T06:46:39.864176793Z 
 79%|███████▉  | 7532/9500 [25:49:09<6:42:15, 12.26s/it]08/03/2024 23:46:39 - INFO - __main__ -   Step: 7532, LR: 4.270002068799453e-06, Loss: 523.0494384765625
2024-08-04T06:46:52.345971435Z 
 79%|███████▉  | 7533/9500 [25:49:22<6:44:11, 12.33s/it]08/03/2024 23:46:52 - INFO - __main__ -   Step: 7533, LR: 4.267831525112173e-06, Loss: 438.9566955566406
2024-08-04T06:47:04.482009809Z 
 79%|███████▉  | 7534/9500 [25:49:34<6:42:05, 12.27s/it]08/03/2024 23:47:04 - INFO - __main__ -   Step: 7534, LR: 4.265660981424894e-06, Loss: 446.4599609375
2024-08-04T06:47:16.575548804Z 
 79%|███████▉  | 7535/9500 [25:49:46<6:40:08, 12.22s/it]08/03/2024 23:47:16 - INFO - __main__ -   Step: 7535, LR: 4.263490437737616e-06, Loss: 404.1611328125
2024-08-04T06:47:29.017332012Z 
 79%|███████▉  | 7536/9500 [25:49:58<6:42:07, 12.29s/it]08/03/2024 23:47:29 - INFO - __main__ -   Step: 7536, LR: 4.261319894050336e-06, Loss: 390.7701416015625
2024-08-04T06:47:41.254991000Z 
 79%|███████▉  | 7537/9500 [25:50:11<6:41:27, 12.27s/it]08/03/2024 23:47:41 - INFO - __main__ -   Step: 7537, LR: 4.259149350363058e-06, Loss: 489.3699645996094
2024-08-04T06:47:53.492415427Z 
 79%|███████▉  | 7538/9500 [25:50:23<6:40:55, 12.26s/it]08/03/2024 23:47:53 - INFO - __main__ -   Step: 7538, LR: 4.256978806675779e-06, Loss: 364.5689392089844
2024-08-04T06:48:05.923723931Z 
 79%|███████▉  | 7539/9500 [25:50:35<6:42:23, 12.31s/it]08/03/2024 23:48:05 - INFO - __main__ -   Step: 7539, LR: 4.254808262988499e-06, Loss: 478.8451232910156
2024-08-04T06:48:18.061617871Z 
 79%|███████▉  | 7540/9500 [25:50:47<6:40:29, 12.26s/it]08/03/2024 23:48:18 - INFO - __main__ -   Step: 7540, LR: 4.252637719301221e-06, Loss: 398.51171875
2024-08-04T06:48:29.903238581Z 
 79%|███████▉  | 7541/9500 [25:50:59<6:36:11, 12.13s/it]08/03/2024 23:48:29 - INFO - __main__ -   Step: 7541, LR: 4.2504671756139424e-06, Loss: 342.6361999511719
2024-08-04T06:48:42.390666772Z 
 79%|███████▉  | 7542/9500 [25:51:12<6:39:26, 12.24s/it]08/03/2024 23:48:42 - INFO - __main__ -   Step: 7542, LR: 4.248296631926663e-06, Loss: 365.6758117675781
2024-08-04T06:48:54.339548084Z 
 79%|███████▉  | 7543/9500 [25:51:24<6:36:22, 12.15s/it]08/03/2024 23:48:54 - INFO - __main__ -   Step: 7543, LR: 4.246126088239384e-06, Loss: 511.79547119140625
2024-08-04T06:49:06.373093533Z 
 79%|███████▉  | 7544/9500 [25:51:36<6:35:00, 12.12s/it]08/03/2024 23:49:06 - INFO - __main__ -   Step: 7544, LR: 4.2439555445521054e-06, Loss: 470.96270751953125
2024-08-04T06:49:18.907302708Z 
 79%|███████▉  | 7545/9500 [25:51:48<6:38:53, 12.24s/it]08/03/2024 23:49:18 - INFO - __main__ -   Step: 7545, LR: 4.241785000864826e-06, Loss: 411.56036376953125
2024-08-04T06:49:30.915780514Z 
 79%|███████▉  | 7546/9500 [25:52:00<6:36:24, 12.17s/it]08/03/2024 23:49:30 - INFO - __main__ -   Step: 7546, LR: 4.239614457177547e-06, Loss: 353.77001953125
2024-08-04T06:49:42.675131911Z 
 79%|███████▉  | 7547/9500 [25:52:12<6:32:10, 12.05s/it]08/03/2024 23:49:42 - INFO - __main__ -   Step: 7547, LR: 4.237443913490268e-06, Loss: 445.5255126953125
2024-08-04T06:49:55.349459884Z 
 79%|███████▉  | 7548/9500 [25:52:25<6:38:04, 12.24s/it]08/03/2024 23:49:55 - INFO - __main__ -   Step: 7548, LR: 4.23527336980299e-06, Loss: 445.7434997558594
2024-08-04T06:50:07.405173721Z 
 79%|███████▉  | 7549/9500 [25:52:37<6:36:06, 12.18s/it]08/03/2024 23:50:07 - INFO - __main__ -   Step: 7549, LR: 4.233102826115711e-06, Loss: 491.03466796875
2024-08-04T06:50:19.455123085Z 
 79%|███████▉  | 7550/9500 [25:52:49<6:34:37, 12.14s/it]08/03/2024 23:50:19 - INFO - __main__ -   Step: 7550, LR: 4.230932282428431e-06, Loss: 431.845458984375
2024-08-04T06:50:31.800509055Z 
 79%|███████▉  | 7551/9500 [25:53:01<6:36:24, 12.20s/it]08/03/2024 23:50:31 - INFO - __main__ -   Step: 7551, LR: 4.228761738741153e-06, Loss: 430.393310546875
2024-08-04T06:50:44.033312079Z 
 79%|███████▉  | 7552/9500 [25:53:13<6:36:29, 12.21s/it]08/03/2024 23:50:44 - INFO - __main__ -   Step: 7552, LR: 4.226591195053874e-06, Loss: 427.5540771484375
2024-08-04T06:50:56.004261267Z 
 80%|███████▉  | 7553/9500 [25:53:25<6:33:56, 12.14s/it]08/03/2024 23:50:56 - INFO - __main__ -   Step: 7553, LR: 4.224420651366594e-06, Loss: 352.9577331542969
2024-08-04T06:51:08.836416915Z 
 80%|███████▉  | 7554/9500 [25:53:38<6:40:27, 12.35s/it]08/03/2024 23:51:08 - INFO - __main__ -   Step: 7554, LR: 4.222250107679316e-06, Loss: 487.14447021484375
2024-08-04T06:51:20.988040637Z 
 80%|███████▉  | 7555/9500 [25:53:50<6:38:21, 12.29s/it]08/03/2024 23:51:20 - INFO - __main__ -   Step: 7555, LR: 4.2200795639920375e-06, Loss: 366.1973876953125
2024-08-04T06:51:32.893319705Z 
 80%|███████▉  | 7556/9500 [25:54:02<6:34:25, 12.17s/it]08/03/2024 23:51:32 - INFO - __main__ -   Step: 7556, LR: 4.217909020304758e-06, Loss: 471.2641906738281
2024-08-04T06:51:45.278321749Z 
 80%|███████▉  | 7557/9500 [25:54:15<6:36:16, 12.24s/it]08/03/2024 23:51:45 - INFO - __main__ -   Step: 7557, LR: 4.215738476617479e-06, Loss: 392.4815673828125
2024-08-04T06:51:57.261583247Z 
 80%|███████▉  | 7558/9500 [25:54:27<6:33:36, 12.16s/it]08/03/2024 23:51:57 - INFO - __main__ -   Step: 7558, LR: 4.2135679329302005e-06, Loss: 337.14251708984375
2024-08-04T06:52:09.624902192Z 
 80%|███████▉  | 7559/9500 [25:54:39<6:35:22, 12.22s/it]08/03/2024 23:52:09 - INFO - __main__ -   Step: 7559, LR: 4.211397389242922e-06, Loss: 402.7417297363281
2024-08-04T06:52:22.067291254Z 
 80%|███████▉  | 7560/9500 [25:54:52<6:37:18, 12.29s/it]08/03/2024 23:52:22 - INFO - __main__ -   Step: 7560, LR: 4.209226845555642e-06, Loss: 394.91754150390625
2024-08-04T06:52:34.200908974Z 
 80%|███████▉  | 7561/9500 [25:55:04<6:35:36, 12.24s/it]08/03/2024 23:52:34 - INFO - __main__ -   Step: 7561, LR: 4.2070563018683635e-06, Loss: 424.4599304199219
2024-08-04T06:52:46.333612360Z 
 80%|███████▉  | 7562/9500 [25:55:16<6:34:20, 12.21s/it]08/03/2024 23:52:46 - INFO - __main__ -   Step: 7562, LR: 4.204885758181085e-06, Loss: 475.1239013671875
2024-08-04T06:52:58.443249051Z 
 80%|███████▉  | 7563/9500 [25:55:28<6:33:11, 12.18s/it]08/03/2024 23:52:58 - INFO - __main__ -   Step: 7563, LR: 4.202715214493806e-06, Loss: 452.91424560546875
2024-08-04T06:53:11.552636746Z 
 80%|███████▉  | 7564/9500 [25:55:41<6:41:59, 12.46s/it]08/03/2024 23:53:11 - INFO - __main__ -   Step: 7564, LR: 4.2005446708065265e-06, Loss: 387.4989013671875
2024-08-04T06:53:23.905335401Z 
 80%|███████▉  | 7565/9500 [25:55:53<6:40:45, 12.43s/it]08/03/2024 23:53:23 - INFO - __main__ -   Step: 7565, LR: 4.198374127119248e-06, Loss: 352.1603698730469
2024-08-04T06:53:35.973871889Z 
 80%|███████▉  | 7566/9500 [25:56:05<6:37:05, 12.32s/it]08/03/2024 23:53:35 - INFO - __main__ -   Step: 7566, LR: 4.1962035834319696e-06, Loss: 386.13385009765625
2024-08-04T06:53:48.653330995Z 
 80%|███████▉  | 7567/9500 [25:56:18<6:40:21, 12.43s/it]08/03/2024 23:53:48 - INFO - __main__ -   Step: 7567, LR: 4.19403303974469e-06, Loss: 514.4523315429688
2024-08-04T06:54:00.649192586Z 
 80%|███████▉  | 7568/9500 [25:56:30<6:35:59, 12.30s/it]08/03/2024 23:54:00 - INFO - __main__ -   Step: 7568, LR: 4.191862496057411e-06, Loss: 410.28228759765625
2024-08-04T06:54:13.209647332Z 
 80%|███████▉  | 7569/9500 [25:56:43<6:38:19, 12.38s/it]08/03/2024 23:54:13 - INFO - __main__ -   Step: 7569, LR: 4.1896919523701326e-06, Loss: 499.93426513671875
2024-08-04T06:54:25.733113994Z 
 80%|███████▉  | 7570/9500 [25:56:55<6:39:31, 12.42s/it]08/03/2024 23:54:25 - INFO - __main__ -   Step: 7570, LR: 4.187521408682853e-06, Loss: 587.2363891601562
2024-08-04T06:54:37.916496444Z 
 80%|███████▉  | 7571/9500 [25:57:07<6:37:02, 12.35s/it]08/03/2024 23:54:37 - INFO - __main__ -   Step: 7571, LR: 4.185350864995574e-06, Loss: 470.4019775390625
2024-08-04T06:54:49.902473005Z 
 80%|███████▉  | 7572/9500 [25:57:19<6:33:19, 12.24s/it]08/03/2024 23:54:49 - INFO - __main__ -   Step: 7572, LR: 4.1831803213082955e-06, Loss: 302.4996337890625
2024-08-04T06:55:02.352396269Z 
 80%|███████▉  | 7573/9500 [25:57:32<6:35:08, 12.30s/it]08/03/2024 23:55:02 - INFO - __main__ -   Step: 7573, LR: 4.181009777621017e-06, Loss: 415.08026123046875
2024-08-04T06:55:14.601177001Z 
 80%|███████▉  | 7574/9500 [25:57:44<6:34:24, 12.29s/it]08/03/2024 23:55:14 - INFO - __main__ -   Step: 7574, LR: 4.178839233933738e-06, Loss: 479.4449462890625
2024-08-04T06:55:26.840356604Z 
 80%|███████▉  | 7575/9500 [25:57:56<6:33:44, 12.27s/it]08/03/2024 23:55:26 - INFO - __main__ -   Step: 7575, LR: 4.1766686902464585e-06, Loss: 341.22900390625
2024-08-04T06:55:39.603649653Z 
 80%|███████▉  | 7576/9500 [25:58:09<6:38:15, 12.42s/it]08/03/2024 23:55:39 - INFO - __main__ -   Step: 7576, LR: 4.17449814655918e-06, Loss: 295.8616943359375
2024-08-04T06:55:51.687917316Z 
 80%|███████▉  | 7577/9500 [25:58:21<6:34:49, 12.32s/it]08/03/2024 23:55:51 - INFO - __main__ -   Step: 7577, LR: 4.172327602871901e-06, Loss: 397.19415283203125
2024-08-04T06:56:03.947100361Z 
 80%|███████▉  | 7578/9500 [25:58:33<6:34:02, 12.30s/it]08/03/2024 23:56:03 - INFO - __main__ -   Step: 7578, LR: 4.1701570591846215e-06, Loss: 457.51702880859375
2024-08-04T06:56:16.736075567Z 
 80%|███████▉  | 7579/9500 [25:58:46<6:38:31, 12.45s/it]08/03/2024 23:56:16 - INFO - __main__ -   Step: 7579, LR: 4.167986515497343e-06, Loss: 392.3065185546875
2024-08-04T06:56:28.603960342Z 
 80%|███████▉  | 7580/9500 [25:58:58<6:32:45, 12.27s/it]08/03/2024 23:56:28 - INFO - __main__ -   Step: 7580, LR: 4.165815971810065e-06, Loss: 328.5616455078125
2024-08-04T06:56:40.557831695Z 
 80%|███████▉  | 7581/9500 [25:59:10<6:29:28, 12.18s/it]08/03/2024 23:56:40 - INFO - __main__ -   Step: 7581, LR: 4.163645428122785e-06, Loss: 388.7822265625
2024-08-04T06:56:53.630136777Z 
 80%|███████▉  | 7582/9500 [25:59:23<6:37:51, 12.45s/it]08/03/2024 23:56:53 - INFO - __main__ -   Step: 7582, LR: 4.161474884435506e-06, Loss: 447.90216064453125
2024-08-04T06:57:06.145834319Z 
 80%|███████▉  | 7583/9500 [25:59:36<6:38:19, 12.47s/it]08/03/2024 23:57:06 - INFO - __main__ -   Step: 7583, LR: 4.159304340748228e-06, Loss: 499.81793212890625
2024-08-04T06:57:18.281651586Z 
 80%|███████▉  | 7584/9500 [25:59:48<6:34:56, 12.37s/it]08/03/2024 23:57:18 - INFO - __main__ -   Step: 7584, LR: 4.157133797060949e-06, Loss: 455.2803955078125
2024-08-04T06:57:30.844523258Z 
 80%|███████▉  | 7585/9500 [26:00:00<6:36:36, 12.43s/it]08/03/2024 23:57:30 - INFO - __main__ -   Step: 7585, LR: 4.154963253373669e-06, Loss: 377.5907897949219
2024-08-04T06:57:43.345526345Z 
 80%|███████▉  | 7586/9500 [26:00:13<6:37:06, 12.45s/it]08/03/2024 23:57:43 - INFO - __main__ -   Step: 7586, LR: 4.152792709686391e-06, Loss: 460.99176025390625
2024-08-04T06:57:55.777926096Z 
 80%|███████▉  | 7587/9500 [26:00:25<6:36:44, 12.44s/it]08/03/2024 23:57:55 - INFO - __main__ -   Step: 7587, LR: 4.150622165999112e-06, Loss: 368.8262023925781
2024-08-04T06:58:08.401866952Z 
 80%|███████▉  | 7588/9500 [26:00:38<6:38:15, 12.50s/it]08/03/2024 23:58:08 - INFO - __main__ -   Step: 7588, LR: 4.148451622311833e-06, Loss: 502.6658020019531
2024-08-04T06:58:20.892105274Z 
 80%|███████▉  | 7589/9500 [26:00:50<6:37:58, 12.50s/it]08/03/2024 23:58:20 - INFO - __main__ -   Step: 7589, LR: 4.146281078624554e-06, Loss: 606.01953125
2024-08-04T06:58:32.858072231Z 
 80%|███████▉  | 7590/9500 [26:01:02<6:32:43, 12.34s/it]08/03/2024 23:58:32 - INFO - __main__ -   Step: 7590, LR: 4.144110534937275e-06, Loss: 315.45697021484375
2024-08-04T06:58:45.208640784Z 
 80%|███████▉  | 7591/9500 [26:01:15<6:32:38, 12.34s/it]08/03/2024 23:58:45 - INFO - __main__ -   Step: 7591, LR: 4.141939991249997e-06, Loss: 334.3544616699219
2024-08-04T06:58:57.363508676Z 
 80%|███████▉  | 7592/9500 [26:01:27<6:30:39, 12.29s/it]08/03/2024 23:58:57 - INFO - __main__ -   Step: 7592, LR: 4.139769447562717e-06, Loss: 461.8348693847656
2024-08-04T06:59:09.689141051Z 
 80%|███████▉  | 7593/9500 [26:01:39<6:30:50, 12.30s/it]08/03/2024 23:59:09 - INFO - __main__ -   Step: 7593, LR: 4.137598903875438e-06, Loss: 451.3505859375
2024-08-04T06:59:22.227922644Z 
 80%|███████▉  | 7594/9500 [26:01:52<6:32:56, 12.37s/it]08/03/2024 23:59:22 - INFO - __main__ -   Step: 7594, LR: 4.13542836018816e-06, Loss: 331.49853515625
2024-08-04T06:59:34.042814134Z 
 80%|███████▉  | 7595/9500 [26:02:03<6:27:27, 12.20s/it]08/03/2024 23:59:34 - INFO - __main__ -   Step: 7595, LR: 4.13325781650088e-06, Loss: 356.7769775390625
2024-08-04T06:59:46.289640117Z 
 80%|███████▉  | 7596/9500 [26:02:16<6:27:39, 12.22s/it]08/03/2024 23:59:46 - INFO - __main__ -   Step: 7596, LR: 4.131087272813601e-06, Loss: 413.988037109375
2024-08-04T06:59:59.207318205Z 
 80%|███████▉  | 7597/9500 [26:02:29<6:34:08, 12.43s/it]08/03/2024 23:59:59 - INFO - __main__ -   Step: 7597, LR: 4.128916729126323e-06, Loss: 610.8236083984375
2024-08-04T07:00:11.427959278Z 
 80%|███████▉  | 7598/9500 [26:02:41<6:31:58, 12.36s/it]08/04/2024 00:00:11 - INFO - __main__ -   Step: 7598, LR: 4.126746185439044e-06, Loss: 388.8188781738281
2024-08-04T07:00:23.743053912Z 
 80%|███████▉  | 7599/9500 [26:02:53<6:31:17, 12.35s/it]08/04/2024 00:00:23 - INFO - __main__ -   Step: 7599, LR: 4.124575641751765e-06, Loss: 346.61871337890625
2024-08-04T07:00:36.450332570Z 
 80%|████████  | 7600/9500 [26:03:06<6:34:28, 12.46s/it]08/04/2024 00:00:36 - INFO - __main__ -   Step: 7600, LR: 4.122405098064486e-06, Loss: 415.2851257324219
2024-08-04T07:00:49.116865731Z 
 80%|████████  | 7601/9500 [26:03:19<6:36:15, 12.52s/it]08/04/2024 00:00:49 - INFO - __main__ -   Step: 7601, LR: 4.120234554377207e-06, Loss: 482.5203552246094
2024-08-04T07:01:01.742302927Z 
 80%|████████  | 7602/9500 [26:03:31<6:37:03, 12.55s/it]08/04/2024 00:01:01 - INFO - __main__ -   Step: 7602, LR: 4.118064010689928e-06, Loss: 442.13616943359375
2024-08-04T07:01:14.354131715Z 
 80%|████████  | 7603/9500 [26:03:44<6:37:24, 12.57s/it]08/04/2024 00:01:14 - INFO - __main__ -   Step: 7603, LR: 4.115893467002649e-06, Loss: 453.35906982421875
2024-08-04T07:01:26.981298388Z 
 80%|████████  | 7604/9500 [26:03:56<6:37:44, 12.59s/it]08/04/2024 00:01:26 - INFO - __main__ -   Step: 7604, LR: 4.11372292331537e-06, Loss: 423.2261962890625
2024-08-04T07:01:39.100386451Z 
 80%|████████  | 7605/9500 [26:04:09<6:33:06, 12.45s/it]08/04/2024 00:01:39 - INFO - __main__ -   Step: 7605, LR: 4.111552379628092e-06, Loss: 371.8619384765625
2024-08-04T07:01:51.277406409Z 
 80%|████████  | 7606/9500 [26:04:21<6:30:20, 12.37s/it]08/04/2024 00:01:51 - INFO - __main__ -   Step: 7606, LR: 4.1093818359408125e-06, Loss: 452.703369140625
2024-08-04T07:02:04.128390467Z 
 80%|████████  | 7607/9500 [26:04:34<6:34:43, 12.51s/it]08/04/2024 00:02:04 - INFO - __main__ -   Step: 7607, LR: 4.107211292253533e-06, Loss: 452.983154296875
2024-08-04T07:02:16.336406923Z 
 80%|████████  | 7608/9500 [26:04:46<6:31:39, 12.42s/it]08/04/2024 00:02:16 - INFO - __main__ -   Step: 7608, LR: 4.105040748566255e-06, Loss: 426.4994812011719
2024-08-04T07:02:28.829837015Z 
 80%|████████  | 7609/9500 [26:04:58<6:32:08, 12.44s/it]08/04/2024 00:02:28 - INFO - __main__ -   Step: 7609, LR: 4.1028702048789755e-06, Loss: 498.0738830566406
2024-08-04T07:02:41.164929206Z 
 80%|████████  | 7610/9500 [26:05:11<6:30:55, 12.41s/it]08/04/2024 00:02:41 - INFO - __main__ -   Step: 7610, LR: 4.100699661191697e-06, Loss: 309.7962951660156
2024-08-04T07:02:53.658939941Z 
 80%|████████  | 7611/9500 [26:05:23<6:31:30, 12.44s/it]08/04/2024 00:02:53 - INFO - __main__ -   Step: 7611, LR: 4.098529117504418e-06, Loss: 463.43646240234375
2024-08-04T07:03:05.988244286Z 
 80%|████████  | 7612/9500 [26:05:35<6:30:17, 12.40s/it]08/04/2024 00:03:05 - INFO - __main__ -   Step: 7612, LR: 4.096358573817139e-06, Loss: 444.630126953125
2024-08-04T07:03:18.341132852Z 
 80%|████████  | 7613/9500 [26:05:48<6:29:36, 12.39s/it]08/04/2024 00:03:18 - INFO - __main__ -   Step: 7613, LR: 4.09418803012986e-06, Loss: 325.76409912109375
2024-08-04T07:03:30.623883510Z 
 80%|████████  | 7614/9500 [26:06:00<6:28:24, 12.36s/it]08/04/2024 00:03:30 - INFO - __main__ -   Step: 7614, LR: 4.092017486442581e-06, Loss: 568.370849609375
2024-08-04T07:03:42.836354720Z 
 80%|████████  | 7615/9500 [26:06:12<6:26:50, 12.31s/it]08/04/2024 00:03:42 - INFO - __main__ -   Step: 7615, LR: 4.089846942755302e-06, Loss: 391.7010498046875
2024-08-04T07:03:55.416818215Z 
 80%|████████  | 7616/9500 [26:06:25<6:29:09, 12.39s/it]08/04/2024 00:03:55 - INFO - __main__ -   Step: 7616, LR: 4.087676399068023e-06, Loss: 315.80938720703125
2024-08-04T07:04:07.611141042Z 
 80%|████████  | 7617/9500 [26:06:37<6:27:04, 12.33s/it]08/04/2024 00:04:07 - INFO - __main__ -   Step: 7617, LR: 4.0855058553807445e-06, Loss: 392.9941711425781
2024-08-04T07:04:20.075745584Z 
 80%|████████  | 7618/9500 [26:06:50<6:28:06, 12.37s/it]08/04/2024 00:04:20 - INFO - __main__ -   Step: 7618, LR: 4.083335311693465e-06, Loss: 355.56671142578125
2024-08-04T07:04:33.253263008Z 
 80%|████████  | 7619/9500 [26:07:03<6:35:27, 12.61s/it]08/04/2024 00:04:33 - INFO - __main__ -   Step: 7619, LR: 4.081164768006186e-06, Loss: 452.05120849609375
2024-08-04T07:04:45.475490838Z 
 80%|████████  | 7620/9500 [26:07:15<6:31:33, 12.50s/it]08/04/2024 00:04:45 - INFO - __main__ -   Step: 7620, LR: 4.0789942243189075e-06, Loss: 453.717529296875
2024-08-04T07:04:57.752300681Z 
 80%|████████  | 7621/9500 [26:07:27<6:29:17, 12.43s/it]08/04/2024 00:04:57 - INFO - __main__ -   Step: 7621, LR: 4.076823680631628e-06, Loss: 426.74859619140625
2024-08-04T07:05:10.545998662Z 
 80%|████████  | 7622/9500 [26:07:40<6:32:29, 12.54s/it]08/04/2024 00:05:10 - INFO - __main__ -   Step: 7622, LR: 4.07465313694435e-06, Loss: 445.2166442871094
2024-08-04T07:05:22.571512616Z 
 80%|████████  | 7623/9500 [26:07:52<6:27:27, 12.39s/it]08/04/2024 00:05:22 - INFO - __main__ -   Step: 7623, LR: 4.0724825932570705e-06, Loss: 384.3504638671875
2024-08-04T07:05:34.369038876Z 
 80%|████████  | 7624/9500 [26:08:04<6:21:44, 12.21s/it]08/04/2024 00:05:34 - INFO - __main__ -   Step: 7624, LR: 4.070312049569792e-06, Loss: 507.10821533203125
2024-08-04T07:05:47.498289594Z 
 80%|████████  | 7625/9500 [26:08:17<6:30:09, 12.49s/it]08/04/2024 00:05:47 - INFO - __main__ -   Step: 7625, LR: 4.068141505882513e-06, Loss: 461.03619384765625
2024-08-04T07:05:59.560038951Z 
 80%|████████  | 7626/9500 [26:08:29<6:25:59, 12.36s/it]08/04/2024 00:05:59 - INFO - __main__ -   Step: 7626, LR: 4.0659709621952335e-06, Loss: 331.39093017578125
2024-08-04T07:06:11.504732771Z 
 80%|████████  | 7627/9500 [26:08:41<6:21:54, 12.23s/it]08/04/2024 00:06:11 - INFO - __main__ -   Step: 7627, LR: 4.063800418507955e-06, Loss: 349.264404296875
2024-08-04T07:06:24.017144946Z 
 80%|████████  | 7628/9500 [26:08:53<6:24:18, 12.32s/it]08/04/2024 00:06:24 - INFO - __main__ -   Step: 7628, LR: 4.061629874820676e-06, Loss: 465.5361328125
2024-08-04T07:06:36.264729483Z 
 80%|████████  | 7629/9500 [26:09:06<6:23:26, 12.30s/it]08/04/2024 00:06:36 - INFO - __main__ -   Step: 7629, LR: 4.059459331133397e-06, Loss: 472.98486328125
2024-08-04T07:06:48.309440492Z 
 80%|████████  | 7630/9500 [26:09:18<6:20:53, 12.22s/it]08/04/2024 00:06:48 - INFO - __main__ -   Step: 7630, LR: 4.057288787446118e-06, Loss: 384.553466796875
2024-08-04T07:07:00.871140192Z 
 80%|████████  | 7631/9500 [26:09:30<6:23:52, 12.32s/it]08/04/2024 00:07:00 - INFO - __main__ -   Step: 7631, LR: 4.05511824375884e-06, Loss: 413.947509765625
2024-08-04T07:07:12.809720121Z 
 80%|████████  | 7632/9500 [26:09:42<6:20:04, 12.21s/it]08/04/2024 00:07:12 - INFO - __main__ -   Step: 7632, LR: 4.05294770007156e-06, Loss: 297.4548645019531
2024-08-04T07:07:25.138354827Z 
 80%|████████  | 7633/9500 [26:09:55<6:20:59, 12.24s/it]08/04/2024 00:07:25 - INFO - __main__ -   Step: 7633, LR: 4.050777156384281e-06, Loss: 457.29486083984375
2024-08-04T07:07:38.075108810Z 
 80%|████████  | 7634/9500 [26:10:08<6:27:15, 12.45s/it]08/04/2024 00:07:38 - INFO - __main__ -   Step: 7634, LR: 4.048606612697003e-06, Loss: 497.83636474609375
2024-08-04T07:07:50.009047169Z 
 80%|████████  | 7635/9500 [26:10:19<6:22:12, 12.30s/it]08/04/2024 00:07:50 - INFO - __main__ -   Step: 7635, LR: 4.046436069009724e-06, Loss: 317.822509765625
2024-08-04T07:08:01.984573361Z 
 80%|████████  | 7636/9500 [26:10:31<6:19:01, 12.20s/it]08/04/2024 00:08:01 - INFO - __main__ -   Step: 7636, LR: 4.044265525322445e-06, Loss: 406.43511962890625
2024-08-04T07:08:14.380817366Z 
 80%|████████  | 7637/9500 [26:10:44<6:20:38, 12.26s/it]08/04/2024 00:08:14 - INFO - __main__ -   Step: 7637, LR: 4.0420949816351656e-06, Loss: 380.3060302734375
2024-08-04T07:08:26.385436970Z 
 80%|████████  | 7638/9500 [26:10:56<6:18:04, 12.18s/it]08/04/2024 00:08:26 - INFO - __main__ -   Step: 7638, LR: 4.039924437947887e-06, Loss: 322.98358154296875
2024-08-04T07:08:38.658810050Z 
 80%|████████  | 7639/9500 [26:11:08<6:18:42, 12.21s/it]08/04/2024 00:08:38 - INFO - __main__ -   Step: 7639, LR: 4.037753894260608e-06, Loss: 317.3688659667969
2024-08-04T07:08:51.268341694Z 
 80%|████████  | 7640/9500 [26:11:21<6:22:13, 12.33s/it]08/04/2024 00:08:51 - INFO - __main__ -   Step: 7640, LR: 4.0355833505733286e-06, Loss: 432.1431884765625
2024-08-04T07:09:03.421677572Z 
 80%|████████  | 7641/9500 [26:11:33<6:20:22, 12.28s/it]08/04/2024 00:09:03 - INFO - __main__ -   Step: 7641, LR: 4.03341280688605e-06, Loss: 399.1820068359375
2024-08-04T07:09:15.493367159Z 
 80%|████████  | 7642/9500 [26:11:45<6:18:16, 12.22s/it]08/04/2024 00:09:15 - INFO - __main__ -   Step: 7642, LR: 4.031242263198772e-06, Loss: 343.21832275390625
2024-08-04T07:09:28.128503079Z 
 80%|████████  | 7643/9500 [26:11:58<6:21:57, 12.34s/it]08/04/2024 00:09:28 - INFO - __main__ -   Step: 7643, LR: 4.029071719511492e-06, Loss: 333.39874267578125
2024-08-04T07:09:40.718496102Z 
 80%|████████  | 7644/9500 [26:12:10<6:24:03, 12.42s/it]08/04/2024 00:09:40 - INFO - __main__ -   Step: 7644, LR: 4.026901175824213e-06, Loss: 394.8235778808594
2024-08-04T07:09:52.797669375Z 
 80%|████████  | 7645/9500 [26:12:22<6:20:43, 12.31s/it]08/04/2024 00:09:52 - INFO - __main__ -   Step: 7645, LR: 4.024730632136935e-06, Loss: 372.18841552734375
2024-08-04T07:10:05.427546510Z 
 80%|████████  | 7646/9500 [26:12:35<6:23:27, 12.41s/it]08/04/2024 00:10:05 - INFO - __main__ -   Step: 7646, LR: 4.022560088449655e-06, Loss: 348.66607666015625
2024-08-04T07:10:17.592331822Z 
 80%|████████  | 7647/9500 [26:12:47<6:20:58, 12.34s/it]08/04/2024 00:10:17 - INFO - __main__ -   Step: 7647, LR: 4.020389544762376e-06, Loss: 508.7563781738281
2024-08-04T07:10:29.572438434Z 
 81%|████████  | 7648/9500 [26:12:59<6:17:28, 12.23s/it]08/04/2024 00:10:29 - INFO - __main__ -   Step: 7648, LR: 4.018219001075098e-06, Loss: 441.02313232421875
2024-08-04T07:10:41.574667158Z 
 81%|████████  | 7649/9500 [26:13:11<6:15:10, 12.16s/it]08/04/2024 00:10:41 - INFO - __main__ -   Step: 7649, LR: 4.016048457387819e-06, Loss: 452.6836242675781
2024-08-04T07:10:54.165618141Z 
 81%|████████  | 7650/9500 [26:13:24<6:18:56, 12.29s/it]08/04/2024 00:10:54 - INFO - __main__ -   Step: 7650, LR: 4.01387791370054e-06, Loss: 412.2627258300781
2024-08-04T07:11:06.486252943Z 
 81%|████████  | 7651/9500 [26:13:36<6:19:01, 12.30s/it]08/04/2024 00:11:06 - INFO - __main__ -   Step: 7651, LR: 4.011707370013261e-06, Loss: 470.27178955078125
2024-08-04T07:11:18.393406480Z 
 81%|████████  | 7652/9500 [26:13:48<6:15:11, 12.18s/it]08/04/2024 00:11:18 - INFO - __main__ -   Step: 7652, LR: 4.009536826325982e-06, Loss: 358.940673828125
2024-08-04T07:11:31.216457372Z 
 81%|████████  | 7653/9500 [26:14:01<6:20:54, 12.37s/it]08/04/2024 00:11:31 - INFO - __main__ -   Step: 7653, LR: 4.007366282638703e-06, Loss: 392.2015075683594
2024-08-04T07:11:43.046039255Z 
 81%|████████  | 7654/9500 [26:14:12<6:15:41, 12.21s/it]08/04/2024 00:11:43 - INFO - __main__ -   Step: 7654, LR: 4.005195738951424e-06, Loss: 255.608154296875
2024-08-04T07:11:55.324603034Z 
 81%|████████  | 7655/9500 [26:14:25<6:16:06, 12.23s/it]08/04/2024 00:11:55 - INFO - __main__ -   Step: 7655, LR: 4.003025195264145e-06, Loss: 381.92425537109375
2024-08-04T07:12:08.520312872Z 
 81%|████████  | 7656/9500 [26:14:38<6:24:47, 12.52s/it]08/04/2024 00:12:08 - INFO - __main__ -   Step: 7656, LR: 4.000854651576867e-06, Loss: 334.4665222167969
2024-08-04T07:12:20.711849601Z 
 81%|████████  | 7657/9500 [26:14:50<6:21:33, 12.42s/it]08/04/2024 00:12:20 - INFO - __main__ -   Step: 7657, LR: 3.9986841078895874e-06, Loss: 408.4591979980469
2024-08-04T07:12:32.849406662Z 
 81%|████████  | 7658/9500 [26:15:02<6:18:43, 12.34s/it]08/04/2024 00:12:32 - INFO - __main__ -   Step: 7658, LR: 3.996513564202308e-06, Loss: 440.84405517578125
2024-08-04T07:12:45.374535628Z 
 81%|████████  | 7659/9500 [26:15:15<6:20:15, 12.39s/it]08/04/2024 00:12:45 - INFO - __main__ -   Step: 7659, LR: 3.99434302051503e-06, Loss: 492.66876220703125
2024-08-04T07:12:57.281876130Z 
 81%|████████  | 7660/9500 [26:15:27<6:15:34, 12.25s/it]08/04/2024 00:12:57 - INFO - __main__ -   Step: 7660, LR: 3.992172476827751e-06, Loss: 268.75006103515625
2024-08-04T07:13:09.570679337Z 
 81%|████████  | 7661/9500 [26:15:39<6:15:45, 12.26s/it]08/04/2024 00:13:09 - INFO - __main__ -   Step: 7661, LR: 3.990001933140472e-06, Loss: 439.0715637207031
2024-08-04T07:13:22.425986461Z 
 81%|████████  | 7662/9500 [26:15:52<6:21:01, 12.44s/it]08/04/2024 00:13:22 - INFO - __main__ -   Step: 7662, LR: 3.987831389453193e-06, Loss: 427.34136962890625
2024-08-04T07:13:34.430932879Z 
 81%|████████  | 7663/9500 [26:16:04<6:16:50, 12.31s/it]08/04/2024 00:13:34 - INFO - __main__ -   Step: 7663, LR: 3.985660845765914e-06, Loss: 366.12750244140625
2024-08-04T07:13:46.780832909Z 
 81%|████████  | 7664/9500 [26:16:16<6:17:01, 12.32s/it]08/04/2024 00:13:46 - INFO - __main__ -   Step: 7664, LR: 3.983490302078635e-06, Loss: 342.8420715332031
2024-08-04T07:13:59.280225077Z 
 81%|████████  | 7665/9500 [26:16:29<6:18:27, 12.37s/it]08/04/2024 00:13:59 - INFO - __main__ -   Step: 7665, LR: 3.981319758391356e-06, Loss: 421.86566162109375
2024-08-04T07:14:11.389397997Z 
 81%|████████  | 7666/9500 [26:16:41<6:15:48, 12.29s/it]08/04/2024 00:14:11 - INFO - __main__ -   Step: 7666, LR: 3.979149214704077e-06, Loss: 407.3544006347656
2024-08-04T07:14:23.606174124Z 
 81%|████████  | 7667/9500 [26:16:53<6:14:53, 12.27s/it]08/04/2024 00:14:23 - INFO - __main__ -   Step: 7667, LR: 3.976978671016799e-06, Loss: 310.78314208984375
2024-08-04T07:14:36.067067978Z 
 81%|████████  | 7668/9500 [26:17:06<6:16:24, 12.33s/it]08/04/2024 00:14:36 - INFO - __main__ -   Step: 7668, LR: 3.9748081273295195e-06, Loss: 348.4772644042969
2024-08-04T07:14:48.366189447Z 
 81%|████████  | 7669/9500 [26:17:18<6:15:57, 12.32s/it]08/04/2024 00:14:48 - INFO - __main__ -   Step: 7669, LR: 3.97263758364224e-06, Loss: 413.85919189453125
2024-08-04T07:15:00.362814122Z 
 81%|████████  | 7670/9500 [26:17:30<6:12:47, 12.22s/it]08/04/2024 00:15:00 - INFO - __main__ -   Step: 7670, LR: 3.970467039954962e-06, Loss: 358.3647766113281
2024-08-04T07:15:12.815680017Z 
 81%|████████  | 7671/9500 [26:17:42<6:14:41, 12.29s/it]08/04/2024 00:15:12 - INFO - __main__ -   Step: 7671, LR: 3.9682964962676825e-06, Loss: 311.95843505859375
2024-08-04T07:15:24.967677610Z 
 81%|████████  | 7672/9500 [26:17:54<6:13:12, 12.25s/it]08/04/2024 00:15:24 - INFO - __main__ -   Step: 7672, LR: 3.966125952580403e-06, Loss: 414.21429443359375
2024-08-04T07:15:36.776560429Z 
 81%|████████  | 7673/9500 [26:18:06<6:08:58, 12.12s/it]08/04/2024 00:15:36 - INFO - __main__ -   Step: 7673, LR: 3.963955408893125e-06, Loss: 325.5693664550781
2024-08-04T07:15:49.503459959Z 
 81%|████████  | 7674/9500 [26:18:19<6:14:20, 12.30s/it]08/04/2024 00:15:49 - INFO - __main__ -   Step: 7674, LR: 3.961784865205846e-06, Loss: 377.6761474609375
2024-08-04T07:16:01.471064902Z 
 81%|████████  | 7675/9500 [26:18:31<6:11:06, 12.20s/it]08/04/2024 00:16:01 - INFO - __main__ -   Step: 7675, LR: 3.959614321518567e-06, Loss: 340.8511962890625
2024-08-04T07:16:13.400677019Z 
 81%|████████  | 7676/9500 [26:18:43<6:08:25, 12.12s/it]08/04/2024 00:16:13 - INFO - __main__ -   Step: 7676, LR: 3.957443777831288e-06, Loss: 335.397216796875
2024-08-04T07:16:25.883313211Z 
 81%|████████  | 7677/9500 [26:18:55<6:11:32, 12.23s/it]08/04/2024 00:16:25 - INFO - __main__ -   Step: 7677, LR: 3.955273234144009e-06, Loss: 394.6917419433594
2024-08-04T07:16:38.415640343Z 
 81%|████████  | 7678/9500 [26:19:08<6:14:06, 12.32s/it]08/04/2024 00:16:38 - INFO - __main__ -   Step: 7678, LR: 3.95310269045673e-06, Loss: 448.896728515625
2024-08-04T07:16:50.851657108Z 
 81%|████████  | 7679/9500 [26:19:20<6:14:57, 12.35s/it]08/04/2024 00:16:50 - INFO - __main__ -   Step: 7679, LR: 3.950932146769451e-06, Loss: 437.8864440917969
2024-08-04T07:17:03.458867923Z 
 81%|████████  | 7680/9500 [26:19:33<6:17:03, 12.43s/it]08/04/2024 00:17:03 - INFO - __main__ -   Step: 7680, LR: 3.948761603082172e-06, Loss: 413.2483215332031
2024-08-04T07:17:15.561031339Z 
 81%|████████  | 7681/9500 [26:19:45<6:13:51, 12.33s/it]08/04/2024 00:17:15 - INFO - __main__ -   Step: 7681, LR: 3.946591059394894e-06, Loss: 342.6333312988281
2024-08-04T07:17:27.938925816Z 
 81%|████████  | 7682/9500 [26:19:57<6:14:04, 12.35s/it]08/04/2024 00:17:27 - INFO - __main__ -   Step: 7682, LR: 3.9444205157076146e-06, Loss: 414.3858642578125
2024-08-04T07:17:40.279262571Z 
 81%|████████  | 7683/9500 [26:20:10<6:13:49, 12.34s/it]08/04/2024 00:17:40 - INFO - __main__ -   Step: 7683, LR: 3.942249972020335e-06, Loss: 406.1315002441406
2024-08-04T07:17:52.295021606Z 
 81%|████████  | 7684/9500 [26:20:22<6:10:38, 12.25s/it]08/04/2024 00:17:52 - INFO - __main__ -   Step: 7684, LR: 3.940079428333057e-06, Loss: 411.3683776855469
2024-08-04T07:18:04.444485514Z 
 81%|████████  | 7685/9500 [26:20:34<6:09:33, 12.22s/it]08/04/2024 00:18:04 - INFO - __main__ -   Step: 7685, LR: 3.937908884645778e-06, Loss: 479.3254089355469
2024-08-04T07:18:16.547767780Z 
 81%|████████  | 7686/9500 [26:20:46<6:08:19, 12.18s/it]08/04/2024 00:18:16 - INFO - __main__ -   Step: 7686, LR: 3.935738340958499e-06, Loss: 390.8875732421875
2024-08-04T07:18:28.772139858Z 
 81%|████████  | 7687/9500 [26:20:58<6:08:29, 12.20s/it]08/04/2024 00:18:28 - INFO - __main__ -   Step: 7687, LR: 3.93356779727122e-06, Loss: 379.6776123046875
2024-08-04T07:18:40.826753945Z 
 81%|████████  | 7688/9500 [26:21:10<6:07:01, 12.15s/it]08/04/2024 00:18:40 - INFO - __main__ -   Step: 7688, LR: 3.931397253583941e-06, Loss: 386.9276428222656
2024-08-04T07:18:53.445461920Z 
 81%|████████  | 7689/9500 [26:21:23<6:11:02, 12.29s/it]08/04/2024 00:18:53 - INFO - __main__ -   Step: 7689, LR: 3.929226709896662e-06, Loss: 350.4731750488281
2024-08-04T07:19:05.792112594Z 
 81%|████████  | 7690/9500 [26:21:35<6:11:19, 12.31s/it]08/04/2024 00:19:05 - INFO - __main__ -   Step: 7690, LR: 3.927056166209383e-06, Loss: 437.4496154785156
2024-08-04T07:19:17.866463822Z 
 81%|████████  | 7691/9500 [26:21:47<6:08:59, 12.24s/it]08/04/2024 00:19:17 - INFO - __main__ -   Step: 7691, LR: 3.924885622522104e-06, Loss: 434.3543395996094
2024-08-04T07:19:30.010209532Z 
 81%|████████  | 7692/9500 [26:21:59<6:07:55, 12.21s/it]08/04/2024 00:19:30 - INFO - __main__ -   Step: 7692, LR: 3.922715078834826e-06, Loss: 485.0429382324219
2024-08-04T07:19:42.641910006Z 
 81%|████████  | 7693/9500 [26:22:12<6:11:32, 12.34s/it]08/04/2024 00:19:42 - INFO - __main__ -   Step: 7693, LR: 3.920544535147547e-06, Loss: 494.5306091308594
2024-08-04T07:19:55.168120699Z 
 81%|████████  | 7694/9500 [26:22:25<6:13:02, 12.39s/it]08/04/2024 00:19:55 - INFO - __main__ -   Step: 7694, LR: 3.918373991460267e-06, Loss: 400.23406982421875
2024-08-04T07:20:07.375997997Z 
 81%|████████  | 7695/9500 [26:22:37<6:11:09, 12.34s/it]08/04/2024 00:20:07 - INFO - __main__ -   Step: 7695, LR: 3.916203447772989e-06, Loss: 305.7637023925781
2024-08-04T07:20:20.066281957Z 
 81%|████████  | 7696/9500 [26:22:50<6:14:08, 12.44s/it]08/04/2024 00:20:20 - INFO - __main__ -   Step: 7696, LR: 3.91403290408571e-06, Loss: 401.422119140625
2024-08-04T07:20:32.008350109Z 
 81%|████████  | 7697/9500 [26:23:01<6:09:24, 12.29s/it]08/04/2024 00:20:32 - INFO - __main__ -   Step: 7697, LR: 3.91186236039843e-06, Loss: 451.0585021972656
2024-08-04T07:20:44.214525221Z 
 81%|████████  | 7698/9500 [26:23:14<6:08:24, 12.27s/it]08/04/2024 00:20:44 - INFO - __main__ -   Step: 7698, LR: 3.909691816711152e-06, Loss: 465.61346435546875
2024-08-04T07:20:56.765378216Z 
 81%|████████  | 7699/9500 [26:23:26<6:10:46, 12.35s/it]08/04/2024 00:20:56 - INFO - __main__ -   Step: 7699, LR: 3.9075212730238735e-06, Loss: 353.6101989746094
2024-08-04T07:21:08.793139516Z 
 81%|████████  | 7700/9500 [26:23:38<6:07:38, 12.25s/it]08/04/2024 00:21:08 - INFO - __main__ -   Step: 7700, LR: 3.905350729336594e-06, Loss: 409.76947021484375
2024-08-04T07:21:21.235784493Z 
 81%|████████  | 7701/9500 [26:23:51<6:09:07, 12.31s/it]08/04/2024 00:21:21 - INFO - __main__ -   Step: 7701, LR: 3.903180185649315e-06, Loss: 335.36309814453125
2024-08-04T07:21:34.063899675Z 
 81%|████████  | 7702/9500 [26:24:04<6:13:34, 12.47s/it]08/04/2024 00:21:34 - INFO - __main__ -   Step: 7702, LR: 3.9010096419620364e-06, Loss: 491.4715270996094
2024-08-04T07:21:45.967931181Z 
 81%|████████  | 7703/9500 [26:24:15<6:08:18, 12.30s/it]08/04/2024 00:21:45 - INFO - __main__ -   Step: 7703, LR: 3.898839098274757e-06, Loss: 378.6298828125
2024-08-04T07:21:58.265547036Z 
 81%|████████  | 7704/9500 [26:24:28<6:08:06, 12.30s/it]08/04/2024 00:21:58 - INFO - __main__ -   Step: 7704, LR: 3.896668554587478e-06, Loss: 429.05096435546875
2024-08-04T07:22:10.816881778Z 
 81%|████████  | 7705/9500 [26:24:40<6:10:10, 12.37s/it]08/04/2024 00:22:10 - INFO - __main__ -   Step: 7705, LR: 3.8944980109001994e-06, Loss: 404.5794372558594
2024-08-04T07:22:23.018528848Z 
 81%|████████  | 7706/9500 [26:24:52<6:08:25, 12.32s/it]08/04/2024 00:22:23 - INFO - __main__ -   Step: 7706, LR: 3.89232746721292e-06, Loss: 341.035888671875
2024-08-04T07:22:35.274835632Z 
 81%|████████  | 7707/9500 [26:25:05<6:07:37, 12.30s/it]08/04/2024 00:22:35 - INFO - __main__ -   Step: 7707, LR: 3.890156923525642e-06, Loss: 358.04217529296875
2024-08-04T07:22:47.712204619Z 
 81%|████████  | 7708/9500 [26:25:17<6:08:38, 12.34s/it]08/04/2024 00:22:47 - INFO - __main__ -   Step: 7708, LR: 3.887986379838362e-06, Loss: 339.6131591796875
2024-08-04T07:22:59.764889482Z 
 81%|████████  | 7709/9500 [26:25:29<6:05:50, 12.26s/it]08/04/2024 00:22:59 - INFO - __main__ -   Step: 7709, LR: 3.885815836151084e-06, Loss: 317.73779296875
2024-08-04T07:23:11.724658747Z 
 81%|████████  | 7710/9500 [26:25:41<6:02:58, 12.17s/it]08/04/2024 00:23:11 - INFO - __main__ -   Step: 7710, LR: 3.883645292463805e-06, Loss: 292.25555419921875
2024-08-04T07:23:24.236500384Z 
 81%|████████  | 7711/9500 [26:25:54<6:05:51, 12.27s/it]08/04/2024 00:23:24 - INFO - __main__ -   Step: 7711, LR: 3.881474748776526e-06, Loss: 435.3412170410156
2024-08-04T07:23:36.484121073Z 
 81%|████████  | 7712/9500 [26:26:06<6:05:27, 12.26s/it]08/04/2024 00:23:36 - INFO - __main__ -   Step: 7712, LR: 3.879304205089247e-06, Loss: 423.6690979003906
2024-08-04T07:23:48.613416697Z 
 81%|████████  | 7713/9500 [26:26:18<6:04:03, 12.22s/it]08/04/2024 00:23:48 - INFO - __main__ -   Step: 7713, LR: 3.877133661401968e-06, Loss: 404.2156982421875
2024-08-04T07:24:01.603592864Z 
 81%|████████  | 7714/9500 [26:26:31<6:10:41, 12.45s/it]08/04/2024 00:24:01 - INFO - __main__ -   Step: 7714, LR: 3.874963117714689e-06, Loss: 415.532958984375
2024-08-04T07:24:13.349754563Z 
 81%|████████  | 7715/9500 [26:26:43<6:04:10, 12.24s/it]08/04/2024 00:24:13 - INFO - __main__ -   Step: 7715, LR: 3.87279257402741e-06, Loss: 292.9129638671875
2024-08-04T07:24:25.356147498Z 
 81%|████████  | 7716/9500 [26:26:55<6:01:52, 12.17s/it]08/04/2024 00:24:25 - INFO - __main__ -   Step: 7716, LR: 3.8706220303401315e-06, Loss: 389.43048095703125
2024-08-04T07:24:37.873094208Z 
 81%|████████  | 7717/9500 [26:27:07<6:04:45, 12.27s/it]08/04/2024 00:24:37 - INFO - __main__ -   Step: 7717, LR: 3.868451486652852e-06, Loss: 391.931640625
2024-08-04T07:24:50.178105790Z 
 81%|████████  | 7718/9500 [26:27:20<6:04:49, 12.28s/it]08/04/2024 00:24:50 - INFO - __main__ -   Step: 7718, LR: 3.866280942965574e-06, Loss: 398.58135986328125
2024-08-04T07:25:02.177836247Z 
 81%|████████▏ | 7719/9500 [26:27:32<6:02:05, 12.20s/it]08/04/2024 00:25:02 - INFO - __main__ -   Step: 7719, LR: 3.8641103992782945e-06, Loss: 408.56011962890625
2024-08-04T07:25:14.645914290Z 
 81%|████████▏ | 7720/9500 [26:27:44<6:04:17, 12.28s/it]08/04/2024 00:25:14 - INFO - __main__ -   Step: 7720, LR: 3.861939855591015e-06, Loss: 473.9931640625
2024-08-04T07:25:26.945660895Z 
 81%|████████▏ | 7721/9500 [26:27:56<6:04:15, 12.29s/it]08/04/2024 00:25:26 - INFO - __main__ -   Step: 7721, LR: 3.859769311903737e-06, Loss: 398.9568176269531
2024-08-04T07:25:38.993475387Z 
 81%|████████▏ | 7722/9500 [26:28:08<6:01:56, 12.21s/it]08/04/2024 00:25:38 - INFO - __main__ -   Step: 7722, LR: 3.8575987682164575e-06, Loss: 275.9710388183594
2024-08-04T07:25:51.062116927Z 
 81%|████████▏ | 7723/9500 [26:28:20<6:00:26, 12.17s/it]08/04/2024 00:25:51 - INFO - __main__ -   Step: 7723, LR: 3.855428224529179e-06, Loss: 244.44818115234375
2024-08-04T07:26:03.154116660Z 
 81%|████████▏ | 7724/9500 [26:28:33<5:59:33, 12.15s/it]08/04/2024 00:26:03 - INFO - __main__ -   Step: 7724, LR: 3.8532576808419e-06, Loss: 430.4513244628906
2024-08-04T07:26:15.246046145Z 
 81%|████████▏ | 7725/9500 [26:28:45<5:58:51, 12.13s/it]08/04/2024 00:26:15 - INFO - __main__ -   Step: 7725, LR: 3.851087137154621e-06, Loss: 394.21875
2024-08-04T07:26:27.848621023Z 
 81%|████████▏ | 7726/9500 [26:28:57<6:02:50, 12.27s/it]08/04/2024 00:26:27 - INFO - __main__ -   Step: 7726, LR: 3.848916593467342e-06, Loss: 393.86932373046875
2024-08-04T07:26:39.982093224Z 
 81%|████████▏ | 7727/9500 [26:29:09<6:01:24, 12.23s/it]08/04/2024 00:26:39 - INFO - __main__ -   Step: 7727, LR: 3.846746049780063e-06, Loss: 400.0504150390625
2024-08-04T07:26:52.174286834Z 
 81%|████████▏ | 7728/9500 [26:29:22<6:00:51, 12.22s/it]08/04/2024 00:26:52 - INFO - __main__ -   Step: 7728, LR: 3.844575506092784e-06, Loss: 410.832763671875
2024-08-04T07:27:04.858897738Z 
 81%|████████▏ | 7729/9500 [26:29:34<6:04:47, 12.36s/it]08/04/2024 00:27:04 - INFO - __main__ -   Step: 7729, LR: 3.842404962405505e-06, Loss: 433.6248474121094
2024-08-04T07:27:16.972476290Z 
 81%|████████▏ | 7730/9500 [26:29:46<6:02:24, 12.29s/it]08/04/2024 00:27:16 - INFO - __main__ -   Step: 7730, LR: 3.8402344187182266e-06, Loss: 346.2037353515625
2024-08-04T07:27:28.946570317Z 
 81%|████████▏ | 7731/9500 [26:29:58<5:59:27, 12.19s/it]08/04/2024 00:27:28 - INFO - __main__ -   Step: 7731, LR: 3.838063875030947e-06, Loss: 366.92584228515625
2024-08-04T07:27:41.584706963Z 
 81%|████████▏ | 7732/9500 [26:30:11<6:03:11, 12.33s/it]08/04/2024 00:27:41 - INFO - __main__ -   Step: 7732, LR: 3.835893331343669e-06, Loss: 431.4194641113281
2024-08-04T07:27:53.655932060Z 
 81%|████████▏ | 7733/9500 [26:30:23<6:00:44, 12.25s/it]08/04/2024 00:27:53 - INFO - __main__ -   Step: 7733, LR: 3.8337227876563895e-06, Loss: 367.2703857421875
2024-08-04T07:28:05.923020711Z 
 81%|████████▏ | 7734/9500 [26:30:35<6:00:41, 12.25s/it]08/04/2024 00:28:05 - INFO - __main__ -   Step: 7734, LR: 3.83155224396911e-06, Loss: 418.2861022949219
2024-08-04T07:28:17.926430351Z 
 81%|████████▏ | 7735/9500 [26:30:47<5:58:16, 12.18s/it]08/04/2024 00:28:17 - INFO - __main__ -   Step: 7735, LR: 3.829381700281832e-06, Loss: 389.9403076171875
2024-08-04T07:28:30.668452492Z 
 81%|████████▏ | 7736/9500 [26:31:00<6:03:02, 12.35s/it]08/04/2024 00:28:30 - INFO - __main__ -   Step: 7736, LR: 3.827211156594553e-06, Loss: 360.165771484375
2024-08-04T07:28:42.716181122Z 
 81%|████████▏ | 7737/9500 [26:31:12<6:00:10, 12.26s/it]08/04/2024 00:28:42 - INFO - __main__ -   Step: 7737, LR: 3.825040612907274e-06, Loss: 362.56060791015625
2024-08-04T07:28:55.108021058Z 
 81%|████████▏ | 7738/9500 [26:31:25<6:01:09, 12.30s/it]08/04/2024 00:28:55 - INFO - __main__ -   Step: 7738, LR: 3.822870069219995e-06, Loss: 455.0787658691406
2024-08-04T07:29:07.941961737Z 
 81%|████████▏ | 7739/9500 [26:31:37<6:05:40, 12.46s/it]08/04/2024 00:29:07 - INFO - __main__ -   Step: 7739, LR: 3.820699525532716e-06, Loss: 376.21112060546875
2024-08-04T07:29:20.182886365Z 
 81%|████████▏ | 7740/9500 [26:31:50<6:03:32, 12.39s/it]08/04/2024 00:29:20 - INFO - __main__ -   Step: 7740, LR: 3.818528981845437e-06, Loss: 360.9273376464844
2024-08-04T07:29:32.761415582Z 
 81%|████████▏ | 7741/9500 [26:32:02<6:04:57, 12.45s/it]08/04/2024 00:29:32 - INFO - __main__ -   Step: 7741, LR: 3.816358438158158e-06, Loss: 445.4705505371094
2024-08-04T07:29:45.348988748Z 
 81%|████████▏ | 7742/9500 [26:32:15<6:05:58, 12.49s/it]08/04/2024 00:29:45 - INFO - __main__ -   Step: 7742, LR: 3.8141878944708793e-06, Loss: 379.2605285644531
2024-08-04T07:29:57.981913378Z 
 82%|████████▏ | 7743/9500 [26:32:27<6:07:01, 12.53s/it]08/04/2024 00:29:57 - INFO - __main__ -   Step: 7743, LR: 3.8120173507836005e-06, Loss: 501.0295715332031
2024-08-04T07:30:10.315363969Z 
 82%|████████▏ | 7744/9500 [26:32:40<6:05:03, 12.47s/it]08/04/2024 00:30:10 - INFO - __main__ -   Step: 7744, LR: 3.8098468070963216e-06, Loss: 529.4942626953125
2024-08-04T07:30:23.101576973Z 
 82%|████████▏ | 7745/9500 [26:32:53<6:07:35, 12.57s/it]08/04/2024 00:30:23 - INFO - __main__ -   Step: 7745, LR: 3.8076762634090423e-06, Loss: 458.9397888183594
2024-08-04T07:30:35.220527666Z 
 82%|████████▏ | 7746/9500 [26:33:05<6:03:27, 12.43s/it]08/04/2024 00:30:35 - INFO - __main__ -   Step: 7746, LR: 3.805505719721764e-06, Loss: 394.49066162109375
2024-08-04T07:30:47.216851360Z 
 82%|████████▏ | 7747/9500 [26:33:17<5:59:25, 12.30s/it]08/04/2024 00:30:47 - INFO - __main__ -   Step: 7747, LR: 3.803335176034485e-06, Loss: 400.05670166015625
2024-08-04T07:30:59.876232420Z 
 82%|████████▏ | 7748/9500 [26:33:29<6:02:20, 12.41s/it]08/04/2024 00:30:59 - INFO - __main__ -   Step: 7748, LR: 3.8011646323472057e-06, Loss: 337.607666015625
2024-08-04T07:31:12.063656441Z 
 82%|████████▏ | 7749/9500 [26:33:42<6:00:11, 12.34s/it]08/04/2024 00:31:12 - INFO - __main__ -   Step: 7749, LR: 3.798994088659927e-06, Loss: 434.14593505859375
2024-08-04T07:31:24.185532818Z 
 82%|████████▏ | 7750/9500 [26:33:54<5:58:03, 12.28s/it]08/04/2024 00:31:24 - INFO - __main__ -   Step: 7750, LR: 3.796823544972648e-06, Loss: 453.7276916503906
2024-08-04T07:31:36.964158259Z 
 82%|████████▏ | 7751/9500 [26:34:06<6:02:14, 12.43s/it]08/04/2024 00:31:36 - INFO - __main__ -   Step: 7751, LR: 3.7946530012853696e-06, Loss: 462.8890686035156
2024-08-04T07:31:48.971849234Z 
 82%|████████▏ | 7752/9500 [26:34:18<5:58:22, 12.30s/it]08/04/2024 00:31:48 - INFO - __main__ -   Step: 7752, LR: 3.7924824575980903e-06, Loss: 348.4877014160156
2024-08-04T07:32:01.084756676Z 
 82%|████████▏ | 7753/9500 [26:34:31<5:56:30, 12.24s/it]08/04/2024 00:32:01 - INFO - __main__ -   Step: 7753, LR: 3.7903119139108114e-06, Loss: 341.01055908203125
2024-08-04T07:32:13.932466661Z 
 82%|████████▏ | 7754/9500 [26:34:43<6:01:35, 12.43s/it]08/04/2024 00:32:13 - INFO - __main__ -   Step: 7754, LR: 3.7881413702235326e-06, Loss: 453.53570556640625
2024-08-04T07:32:26.382047094Z 
 82%|████████▏ | 7755/9500 [26:34:56<6:01:35, 12.43s/it]08/04/2024 00:32:26 - INFO - __main__ -   Step: 7755, LR: 3.7859708265362533e-06, Loss: 384.8787536621094
2024-08-04T07:32:37.938124190Z 
 82%|████████▏ | 7756/9500 [26:35:07<5:53:44, 12.17s/it]08/04/2024 00:32:37 - INFO - __main__ -   Step: 7756, LR: 3.7838002828489744e-06, Loss: 310.99578857421875
2024-08-04T07:32:50.155938726Z 
 82%|████████▏ | 7757/9500 [26:35:20<5:53:57, 12.18s/it]08/04/2024 00:32:50 - INFO - __main__ -   Step: 7757, LR: 3.7816297391616955e-06, Loss: 266.4793701171875
2024-08-04T07:33:02.457638337Z 
 82%|████████▏ | 7758/9500 [26:35:32<5:54:46, 12.22s/it]08/04/2024 00:33:02 - INFO - __main__ -   Step: 7758, LR: 3.779459195474417e-06, Loss: 342.12109375
2024-08-04T07:33:14.330248390Z 
 82%|████████▏ | 7759/9500 [26:35:44<5:51:32, 12.12s/it]08/04/2024 00:33:14 - INFO - __main__ -   Step: 7759, LR: 3.777288651787138e-06, Loss: 325.8421936035156
2024-08-04T07:33:26.796058962Z 
 82%|████████▏ | 7760/9500 [26:35:56<5:54:23, 12.22s/it]08/04/2024 00:33:26 - INFO - __main__ -   Step: 7760, LR: 3.775118108099859e-06, Loss: 369.8387451171875
2024-08-04T07:33:39.048964755Z 
 82%|████████▏ | 7761/9500 [26:36:08<5:54:28, 12.23s/it]08/04/2024 00:33:39 - INFO - __main__ -   Step: 7761, LR: 3.77294756441258e-06, Loss: 442.51434326171875
2024-08-04T07:33:51.063282631Z 
 82%|████████▏ | 7762/9500 [26:36:21<5:52:23, 12.17s/it]08/04/2024 00:33:51 - INFO - __main__ -   Step: 7762, LR: 3.770777020725301e-06, Loss: 374.45654296875
2024-08-04T07:34:03.568414985Z 
 82%|████████▏ | 7763/9500 [26:36:33<5:55:08, 12.27s/it]08/04/2024 00:34:03 - INFO - __main__ -   Step: 7763, LR: 3.768606477038022e-06, Loss: 397.57891845703125
2024-08-04T07:34:16.098340723Z 
 82%|████████▏ | 7764/9500 [26:36:46<5:57:12, 12.35s/it]08/04/2024 00:34:16 - INFO - __main__ -   Step: 7764, LR: 3.7664359333507435e-06, Loss: 339.38800048828125
2024-08-04T07:34:28.259149986Z 
 82%|████████▏ | 7765/9500 [26:36:58<5:55:23, 12.29s/it]08/04/2024 00:34:28 - INFO - __main__ -   Step: 7765, LR: 3.7642653896634646e-06, Loss: 451.33587646484375
2024-08-04T07:34:40.729265883Z 
 82%|████████▏ | 7766/9500 [26:37:10<5:56:45, 12.34s/it]08/04/2024 00:34:40 - INFO - __main__ -   Step: 7766, LR: 3.7620948459761853e-06, Loss: 303.13525390625
2024-08-04T07:34:52.827794820Z 
 82%|████████▏ | 7767/9500 [26:37:22<5:54:24, 12.27s/it]08/04/2024 00:34:52 - INFO - __main__ -   Step: 7767, LR: 3.7599243022889065e-06, Loss: 517.2337036132812
2024-08-04T07:35:04.976556333Z 
 82%|████████▏ | 7768/9500 [26:37:34<5:53:09, 12.23s/it]08/04/2024 00:35:04 - INFO - __main__ -   Step: 7768, LR: 3.7577537586016276e-06, Loss: 519.8301391601562
2024-08-04T07:35:17.893971059Z 
 82%|████████▏ | 7769/9500 [26:37:47<5:58:52, 12.44s/it]08/04/2024 00:35:17 - INFO - __main__ -   Step: 7769, LR: 3.7555832149143483e-06, Loss: 420.94647216796875
2024-08-04T07:35:30.163937712Z 
 82%|████████▏ | 7770/9500 [26:38:00<5:57:11, 12.39s/it]08/04/2024 00:35:30 - INFO - __main__ -   Step: 7770, LR: 3.7534126712270695e-06, Loss: 433.0223388671875
2024-08-04T07:35:42.250569535Z 
 82%|████████▏ | 7771/9500 [26:38:12<5:54:22, 12.30s/it]08/04/2024 00:35:42 - INFO - __main__ -   Step: 7771, LR: 3.751242127539791e-06, Loss: 407.5330810546875
2024-08-04T07:35:55.010318943Z 
 82%|████████▏ | 7772/9500 [26:38:24<5:58:10, 12.44s/it]08/04/2024 00:35:55 - INFO - __main__ -   Step: 7772, LR: 3.749071583852512e-06, Loss: 465.349609375
2024-08-04T07:36:07.583392002Z 
 82%|████████▏ | 7773/9500 [26:38:37<5:59:08, 12.48s/it]08/04/2024 00:36:07 - INFO - __main__ -   Step: 7773, LR: 3.746901040165233e-06, Loss: 587.671630859375
2024-08-04T07:36:19.652765851Z 
 82%|████████▏ | 7774/9500 [26:38:49<5:55:24, 12.35s/it]08/04/2024 00:36:19 - INFO - __main__ -   Step: 7774, LR: 3.744730496477954e-06, Loss: 489.2975158691406
2024-08-04T07:36:32.321853744Z 
 82%|████████▏ | 7775/9500 [26:39:02<5:57:55, 12.45s/it]08/04/2024 00:36:32 - INFO - __main__ -   Step: 7775, LR: 3.742559952790675e-06, Loss: 432.0660705566406
2024-08-04T07:36:45.063867793Z 
 82%|████████▏ | 7776/9500 [26:39:15<6:00:13, 12.54s/it]08/04/2024 00:36:45 - INFO - __main__ -   Step: 7776, LR: 3.740389409103396e-06, Loss: 388.75555419921875
2024-08-04T07:36:57.435339871Z 
 82%|████████▏ | 7777/9500 [26:39:27<5:58:35, 12.49s/it]08/04/2024 00:36:57 - INFO - __main__ -   Step: 7777, LR: 3.7382188654161174e-06, Loss: 304.8354187011719
2024-08-04T07:37:09.676549165Z 
 82%|████████▏ | 7778/9500 [26:39:39<5:56:16, 12.41s/it]08/04/2024 00:37:09 - INFO - __main__ -   Step: 7778, LR: 3.7360483217288385e-06, Loss: 461.18719482421875
2024-08-04T07:37:22.233410892Z 
 82%|████████▏ | 7779/9500 [26:39:52<5:57:17, 12.46s/it]08/04/2024 00:37:22 - INFO - __main__ -   Step: 7779, LR: 3.7338777780415597e-06, Loss: 442.2781982421875
2024-08-04T07:37:34.578055045Z 
 82%|████████▏ | 7780/9500 [26:40:04<5:56:07, 12.42s/it]08/04/2024 00:37:34 - INFO - __main__ -   Step: 7780, LR: 3.7317072343542804e-06, Loss: 378.63739013671875
2024-08-04T07:37:46.387722150Z 
 82%|████████▏ | 7781/9500 [26:40:16<5:50:38, 12.24s/it]08/04/2024 00:37:46 - INFO - __main__ -   Step: 7781, LR: 3.7295366906670015e-06, Loss: 354.7440185546875
2024-08-04T07:37:59.067070670Z 
 82%|████████▏ | 7782/9500 [26:40:29<5:54:13, 12.37s/it]08/04/2024 00:37:59 - INFO - __main__ -   Step: 7782, LR: 3.7273661469797227e-06, Loss: 331.6265869140625
2024-08-04T07:38:11.134722810Z 
 82%|████████▏ | 7783/9500 [26:40:41<5:51:24, 12.28s/it]08/04/2024 00:38:11 - INFO - __main__ -   Step: 7783, LR: 3.7251956032924434e-06, Loss: 403.3739318847656
2024-08-04T07:38:23.289449797Z 
 82%|████████▏ | 7784/9500 [26:40:53<5:50:08, 12.24s/it]08/04/2024 00:38:23 - INFO - __main__ -   Step: 7784, LR: 3.723025059605165e-06, Loss: 516.761962890625
2024-08-04T07:38:36.388577065Z 
 82%|████████▏ | 7785/9500 [26:41:06<5:57:16, 12.50s/it]08/04/2024 00:38:36 - INFO - __main__ -   Step: 7785, LR: 3.720854515917886e-06, Loss: 475.1541748046875
2024-08-04T07:38:48.629369064Z 
 82%|████████▏ | 7786/9500 [26:41:18<5:54:51, 12.42s/it]08/04/2024 00:38:48 - INFO - __main__ -   Step: 7786, LR: 3.718683972230607e-06, Loss: 415.86212158203125
2024-08-04T07:39:00.913259753Z 
 82%|████████▏ | 7787/9500 [26:41:30<5:53:27, 12.38s/it]08/04/2024 00:39:00 - INFO - __main__ -   Step: 7787, LR: 3.716513428543328e-06, Loss: 350.43170166015625
2024-08-04T07:39:13.341172733Z 
 82%|████████▏ | 7788/9500 [26:41:43<5:53:39, 12.39s/it]08/04/2024 00:39:13 - INFO - __main__ -   Step: 7788, LR: 3.714342884856049e-06, Loss: 355.99041748046875
2024-08-04T07:39:25.399226814Z 
 82%|████████▏ | 7789/9500 [26:41:55<5:50:34, 12.29s/it]08/04/2024 00:39:25 - INFO - __main__ -   Step: 7789, LR: 3.7121723411687706e-06, Loss: 376.468505859375
2024-08-04T07:39:37.711636891Z 
 82%|████████▏ | 7790/9500 [26:42:07<5:50:31, 12.30s/it]08/04/2024 00:39:37 - INFO - __main__ -   Step: 7790, LR: 3.7100017974814913e-06, Loss: 407.70587158203125
2024-08-04T07:39:50.274077055Z 
 82%|████████▏ | 7791/9500 [26:42:20<5:52:34, 12.38s/it]08/04/2024 00:39:50 - INFO - __main__ -   Step: 7791, LR: 3.7078312537942125e-06, Loss: 389.5801086425781
2024-08-04T07:40:02.510397314Z 
 82%|████████▏ | 7792/9500 [26:42:32<5:51:09, 12.34s/it]08/04/2024 00:40:02 - INFO - __main__ -   Step: 7792, LR: 3.7056607101069336e-06, Loss: 464.2274169921875
2024-08-04T07:40:14.457385522Z 
 82%|████████▏ | 7793/9500 [26:42:44<5:47:37, 12.22s/it]08/04/2024 00:40:14 - INFO - __main__ -   Step: 7793, LR: 3.7034901664196543e-06, Loss: 303.30157470703125
2024-08-04T07:40:27.176861935Z 
 82%|████████▏ | 7794/9500 [26:42:57<5:51:41, 12.37s/it]08/04/2024 00:40:27 - INFO - __main__ -   Step: 7794, LR: 3.7013196227323755e-06, Loss: 468.05908203125
2024-08-04T07:40:39.211076510Z 
 82%|████████▏ | 7795/9500 [26:43:09<5:48:38, 12.27s/it]08/04/2024 00:40:39 - INFO - __main__ -   Step: 7795, LR: 3.6991490790450966e-06, Loss: 337.08551025390625
2024-08-04T07:40:51.425357058Z 
 82%|████████▏ | 7796/9500 [26:43:21<5:47:57, 12.25s/it]08/04/2024 00:40:51 - INFO - __main__ -   Step: 7796, LR: 3.696978535357818e-06, Loss: 476.53485107421875
2024-08-04T07:41:03.917111024Z 
 82%|████████▏ | 7797/9500 [26:43:33<5:49:48, 12.32s/it]08/04/2024 00:41:03 - INFO - __main__ -   Step: 7797, LR: 3.694807991670539e-06, Loss: 496.05523681640625
2024-08-04T07:41:16.073373974Z 
 82%|████████▏ | 7798/9500 [26:43:46<5:48:10, 12.27s/it]08/04/2024 00:41:16 - INFO - __main__ -   Step: 7798, LR: 3.69263744798326e-06, Loss: 356.62615966796875
2024-08-04T07:41:28.011179504Z 
 82%|████████▏ | 7799/9500 [26:43:57<5:45:06, 12.17s/it]08/04/2024 00:41:28 - INFO - __main__ -   Step: 7799, LR: 3.690466904295981e-06, Loss: 328.9513244628906
2024-08-04T07:41:40.540566937Z 
 82%|████████▏ | 7800/9500 [26:44:10<5:47:55, 12.28s/it]08/04/2024 00:41:40 - INFO - __main__ -   Step: 7800, LR: 3.688296360608702e-06, Loss: 468.42071533203125
2024-08-04T07:41:52.524151226Z 
 82%|████████▏ | 7801/9500 [26:44:22<5:45:12, 12.19s/it]08/04/2024 00:41:52 - INFO - __main__ -   Step: 7801, LR: 3.686125816921423e-06, Loss: 294.14385986328125
2024-08-04T07:42:04.804464688Z 
 82%|████████▏ | 7802/9500 [26:44:34<5:45:45, 12.22s/it]08/04/2024 00:42:04 - INFO - __main__ -   Step: 7802, LR: 3.6839552732341445e-06, Loss: 431.0078430175781
2024-08-04T07:42:17.899296944Z 
 82%|████████▏ | 7803/9500 [26:44:47<5:53:00, 12.48s/it]08/04/2024 00:42:17 - INFO - __main__ -   Step: 7803, LR: 3.6817847295468657e-06, Loss: 446.00445556640625
2024-08-04T07:42:30.342640948Z 
 82%|████████▏ | 7804/9500 [26:45:00<5:52:28, 12.47s/it]08/04/2024 00:42:30 - INFO - __main__ -   Step: 7804, LR: 3.6796141858595864e-06, Loss: 484.0642395019531
2024-08-04T07:42:42.387437281Z 
 82%|████████▏ | 7805/9500 [26:45:12<5:48:39, 12.34s/it]08/04/2024 00:42:42 - INFO - __main__ -   Step: 7805, LR: 3.6774436421723075e-06, Loss: 371.0019836425781
2024-08-04T07:42:54.707258626Z 
 82%|████████▏ | 7806/9500 [26:45:24<5:48:16, 12.34s/it]08/04/2024 00:42:54 - INFO - __main__ -   Step: 7806, LR: 3.6752730984850287e-06, Loss: 357.5714111328125
2024-08-04T07:43:07.062299122Z 
 82%|████████▏ | 7807/9500 [26:45:36<5:48:13, 12.34s/it]08/04/2024 00:43:07 - INFO - __main__ -   Step: 7807, LR: 3.6731025547977494e-06, Loss: 490.5997619628906
2024-08-04T07:43:19.393281207Z 
 82%|████████▏ | 7808/9500 [26:45:49<5:47:56, 12.34s/it]08/04/2024 00:43:19 - INFO - __main__ -   Step: 7808, LR: 3.6709320111104705e-06, Loss: 408.6087646484375
2024-08-04T07:43:32.163184712Z 
 82%|████████▏ | 7809/9500 [26:46:02<5:51:23, 12.47s/it]08/04/2024 00:43:32 - INFO - __main__ -   Step: 7809, LR: 3.668761467423192e-06, Loss: 499.47821044921875
2024-08-04T07:43:44.287699843Z 
 82%|████████▏ | 7810/9500 [26:46:14<5:48:16, 12.36s/it]08/04/2024 00:43:44 - INFO - __main__ -   Step: 7810, LR: 3.666590923735913e-06, Loss: 322.00421142578125
2024-08-04T07:43:56.196407694Z 
 82%|████████▏ | 7811/9500 [26:46:26<5:44:13, 12.23s/it]08/04/2024 00:43:56 - INFO - __main__ -   Step: 7811, LR: 3.664420380048634e-06, Loss: 371.26788330078125
2024-08-04T07:44:08.935169711Z 
 82%|████████▏ | 7812/9500 [26:46:38<5:48:19, 12.38s/it]08/04/2024 00:44:08 - INFO - __main__ -   Step: 7812, LR: 3.662249836361355e-06, Loss: 488.17816162109375
2024-08-04T07:44:21.081996043Z 
 82%|████████▏ | 7813/9500 [26:46:51<5:46:08, 12.31s/it]08/04/2024 00:44:21 - INFO - __main__ -   Step: 7813, LR: 3.660079292674076e-06, Loss: 364.8028564453125
2024-08-04T07:44:33.176756781Z 
 82%|████████▏ | 7814/9500 [26:47:03<5:44:06, 12.25s/it]08/04/2024 00:44:33 - INFO - __main__ -   Step: 7814, LR: 3.657908748986797e-06, Loss: 330.23486328125
2024-08-04T07:44:45.628831981Z 
 82%|████████▏ | 7815/9500 [26:47:15<5:45:38, 12.31s/it]08/04/2024 00:44:45 - INFO - __main__ -   Step: 7815, LR: 3.6557382052995185e-06, Loss: 347.3148498535156
2024-08-04T07:44:58.316253632Z 
 82%|████████▏ | 7816/9500 [26:47:28<5:48:38, 12.42s/it]08/04/2024 00:44:58 - INFO - __main__ -   Step: 7816, LR: 3.6535676616122396e-06, Loss: 372.4172668457031
2024-08-04T07:45:10.231264910Z 
 82%|████████▏ | 7817/9500 [26:47:40<5:44:09, 12.27s/it]08/04/2024 00:45:10 - INFO - __main__ -   Step: 7817, LR: 3.6513971179249607e-06, Loss: 324.680908203125
2024-08-04T07:45:22.726109251Z 
 82%|████████▏ | 7818/9500 [26:47:52<5:45:51, 12.34s/it]08/04/2024 00:45:22 - INFO - __main__ -   Step: 7818, LR: 3.6492265742376814e-06, Loss: 567.2464599609375
2024-08-04T07:45:35.197163519Z 
 82%|████████▏ | 7819/9500 [26:48:05<5:46:46, 12.38s/it]08/04/2024 00:45:35 - INFO - __main__ -   Step: 7819, LR: 3.6470560305504026e-06, Loss: 518.4200439453125
2024-08-04T07:45:47.371546120Z 
 82%|████████▏ | 7820/9500 [26:48:17<5:44:51, 12.32s/it]08/04/2024 00:45:47 - INFO - __main__ -   Step: 7820, LR: 3.644885486863124e-06, Loss: 412.3428649902344
2024-08-04T07:45:59.725716669Z 
 82%|████████▏ | 7821/9500 [26:48:29<5:44:58, 12.33s/it]08/04/2024 00:45:59 - INFO - __main__ -   Step: 7821, LR: 3.6427149431758444e-06, Loss: 369.13800048828125
2024-08-04T07:46:12.372556545Z 
 82%|████████▏ | 7822/9500 [26:48:42<5:47:26, 12.42s/it]08/04/2024 00:46:12 - INFO - __main__ -   Step: 7822, LR: 3.640544399488566e-06, Loss: 428.64703369140625
2024-08-04T07:46:24.643482032Z 
 82%|████████▏ | 7823/9500 [26:48:54<5:45:57, 12.38s/it]08/04/2024 00:46:24 - INFO - __main__ -   Step: 7823, LR: 3.638373855801287e-06, Loss: 397.02691650390625
2024-08-04T07:46:36.605581772Z 
 82%|████████▏ | 7824/9500 [26:49:06<5:42:16, 12.25s/it]08/04/2024 00:46:36 - INFO - __main__ -   Step: 7824, LR: 3.6362033121140083e-06, Loss: 336.54986572265625
2024-08-04T07:46:48.907007616Z 
 82%|████████▏ | 7825/9500 [26:49:18<5:42:28, 12.27s/it]08/04/2024 00:46:48 - INFO - __main__ -   Step: 7825, LR: 3.634032768426729e-06, Loss: 321.959716796875
2024-08-04T07:47:01.066130611Z 
 82%|████████▏ | 7826/9500 [26:49:31<5:41:21, 12.24s/it]08/04/2024 00:47:01 - INFO - __main__ -   Step: 7826, LR: 3.63186222473945e-06, Loss: 327.9498291015625
2024-08-04T07:47:13.080955669Z 
 82%|████████▏ | 7827/9500 [26:49:43<5:39:18, 12.17s/it]08/04/2024 00:47:13 - INFO - __main__ -   Step: 7827, LR: 3.6296916810521717e-06, Loss: 406.23321533203125
2024-08-04T07:47:26.150046598Z 
 82%|████████▏ | 7828/9500 [26:49:56<5:46:38, 12.44s/it]08/04/2024 00:47:26 - INFO - __main__ -   Step: 7828, LR: 3.6275211373648924e-06, Loss: 488.05828857421875
2024-08-04T07:47:38.330888433Z 
 82%|████████▏ | 7829/9500 [26:50:08<5:44:16, 12.36s/it]08/04/2024 00:47:38 - INFO - __main__ -   Step: 7829, LR: 3.6253505936776135e-06, Loss: 435.5318908691406
2024-08-04T07:47:50.737203266Z 
 82%|████████▏ | 7830/9500 [26:50:20<5:44:26, 12.37s/it]08/04/2024 00:47:50 - INFO - __main__ -   Step: 7830, LR: 3.6231800499903347e-06, Loss: 404.991943359375
2024-08-04T07:48:03.464156862Z 
 82%|████████▏ | 7831/9500 [26:50:33<5:47:10, 12.48s/it]08/04/2024 00:48:03 - INFO - __main__ -   Step: 7831, LR: 3.621009506303056e-06, Loss: 393.517333984375
2024-08-04T07:48:15.895290084Z 
 82%|████████▏ | 7832/9500 [26:50:45<5:46:32, 12.47s/it]08/04/2024 00:48:15 - INFO - __main__ -   Step: 7832, LR: 3.6188389626157765e-06, Loss: 437.2676696777344
2024-08-04T07:48:28.147029117Z 
 82%|████████▏ | 7833/9500 [26:50:58<5:44:33, 12.40s/it]08/04/2024 00:48:28 - INFO - __main__ -   Step: 7833, LR: 3.6166684189284976e-06, Loss: 443.4878845214844
2024-08-04T07:48:40.602099949Z 
 82%|████████▏ | 7834/9500 [26:51:10<5:44:47, 12.42s/it]08/04/2024 00:48:40 - INFO - __main__ -   Step: 7834, LR: 3.614497875241219e-06, Loss: 365.45904541015625
2024-08-04T07:48:53.461719332Z 
 82%|████████▏ | 7835/9500 [26:51:23<5:48:16, 12.55s/it]08/04/2024 00:48:53 - INFO - __main__ -   Step: 7835, LR: 3.61232733155394e-06, Loss: 524.820556640625
2024-08-04T07:49:05.597856907Z 
 82%|████████▏ | 7836/9500 [26:51:35<5:44:36, 12.43s/it]08/04/2024 00:49:05 - INFO - __main__ -   Step: 7836, LR: 3.610156787866661e-06, Loss: 331.92181396484375
2024-08-04T07:49:18.192681952Z 
 82%|████████▏ | 7837/9500 [26:51:48<5:45:48, 12.48s/it]08/04/2024 00:49:18 - INFO - __main__ -   Step: 7837, LR: 3.607986244179382e-06, Loss: 514.95751953125
2024-08-04T07:49:30.743675882Z 
 83%|████████▎ | 7838/9500 [26:52:00<5:46:13, 12.50s/it]08/04/2024 00:49:30 - INFO - __main__ -   Step: 7838, LR: 3.6058157004921033e-06, Loss: 451.0498046875
2024-08-04T07:49:42.875494742Z 
 83%|████████▎ | 7839/9500 [26:52:12<5:42:57, 12.39s/it]08/04/2024 00:49:42 - INFO - __main__ -   Step: 7839, LR: 3.603645156804824e-06, Loss: 372.29632568359375
2024-08-04T07:49:55.347798466Z 
 83%|████████▎ | 7840/9500 [26:52:25<5:43:27, 12.41s/it]08/04/2024 00:49:55 - INFO - __main__ -   Step: 7840, LR: 3.6014746131175456e-06, Loss: 367.8036804199219
2024-08-04T07:50:07.432204879Z 
 83%|████████▎ | 7841/9500 [26:52:37<5:40:30, 12.32s/it]08/04/2024 00:50:07 - INFO - __main__ -   Step: 7841, LR: 3.5993040694302667e-06, Loss: 347.758544921875
2024-08-04T07:50:19.529378338Z 
 83%|████████▎ | 7842/9500 [26:52:49<5:38:29, 12.25s/it]08/04/2024 00:50:19 - INFO - __main__ -   Step: 7842, LR: 3.5971335257429874e-06, Loss: 437.87408447265625
2024-08-04T07:50:31.935834798Z 
 83%|████████▎ | 7843/9500 [26:53:01<5:39:35, 12.30s/it]08/04/2024 00:50:31 - INFO - __main__ -   Step: 7843, LR: 3.5949629820557086e-06, Loss: 425.5917053222656
2024-08-04T07:50:44.248924776Z 
 83%|████████▎ | 7844/9500 [26:53:14<5:39:31, 12.30s/it]08/04/2024 00:50:44 - INFO - __main__ -   Step: 7844, LR: 3.5927924383684297e-06, Loss: 379.9034423828125
2024-08-04T07:50:56.224956081Z 
 83%|████████▎ | 7845/9500 [26:53:26<5:36:37, 12.20s/it]08/04/2024 00:50:56 - INFO - __main__ -   Step: 7845, LR: 3.5906218946811513e-06, Loss: 397.4897766113281
2024-08-04T07:51:08.768010820Z 
 83%|████████▎ | 7846/9500 [26:53:38<5:39:13, 12.31s/it]08/04/2024 00:51:08 - INFO - __main__ -   Step: 7846, LR: 3.5884513509938716e-06, Loss: 309.6350402832031
2024-08-04T07:51:21.196462678Z 
 83%|████████▎ | 7847/9500 [26:53:51<5:40:02, 12.34s/it]08/04/2024 00:51:21 - INFO - __main__ -   Step: 7847, LR: 3.586280807306593e-06, Loss: 480.9171142578125
2024-08-04T07:51:33.389229423Z 
 83%|████████▎ | 7848/9500 [26:54:03<5:38:35, 12.30s/it]08/04/2024 00:51:33 - INFO - __main__ -   Step: 7848, LR: 3.5841102636193143e-06, Loss: 466.14202880859375
2024-08-04T07:51:45.877412648Z 
 83%|████████▎ | 7849/9500 [26:54:15<5:39:57, 12.35s/it]08/04/2024 00:51:45 - INFO - __main__ -   Step: 7849, LR: 3.581939719932035e-06, Loss: 333.26251220703125
2024-08-04T07:51:58.105043321Z 
 83%|████████▎ | 7850/9500 [26:54:28<5:38:42, 12.32s/it]08/04/2024 00:51:58 - INFO - __main__ -   Step: 7850, LR: 3.579769176244756e-06, Loss: 393.65802001953125
2024-08-04T07:52:10.398630452Z 
 83%|████████▎ | 7851/9500 [26:54:40<5:38:18, 12.31s/it]08/04/2024 00:52:10 - INFO - __main__ -   Step: 7851, LR: 3.5775986325574772e-06, Loss: 353.7017822265625
2024-08-04T07:52:22.878727138Z 
 83%|████████▎ | 7852/9500 [26:54:52<5:39:30, 12.36s/it]08/04/2024 00:52:22 - INFO - __main__ -   Step: 7852, LR: 3.575428088870199e-06, Loss: 346.7430114746094
2024-08-04T07:52:35.182962413Z 
 83%|████████▎ | 7853/9500 [26:55:05<5:38:50, 12.34s/it]08/04/2024 00:52:35 - INFO - __main__ -   Step: 7853, LR: 3.5732575451829195e-06, Loss: 339.5691223144531
2024-08-04T07:52:46.965833543Z 
 83%|████████▎ | 7854/9500 [26:55:16<5:34:00, 12.18s/it]08/04/2024 00:52:46 - INFO - __main__ -   Step: 7854, LR: 3.5710870014956407e-06, Loss: 354.25079345703125
2024-08-04T07:52:59.636105507Z 
 83%|████████▎ | 7855/9500 [26:55:29<5:37:52, 12.32s/it]08/04/2024 00:52:59 - INFO - __main__ -   Step: 7855, LR: 3.5689164578083618e-06, Loss: 412.8548583984375
2024-08-04T07:53:12.038241982Z 
 83%|████████▎ | 7856/9500 [26:55:41<5:38:19, 12.35s/it]08/04/2024 00:53:12 - INFO - __main__ -   Step: 7856, LR: 3.5667459141210825e-06, Loss: 327.63763427734375
2024-08-04T07:53:24.326219736Z 
 83%|████████▎ | 7857/9500 [26:55:54<5:37:37, 12.33s/it]08/04/2024 00:53:24 - INFO - __main__ -   Step: 7857, LR: 3.5645753704338036e-06, Loss: 462.90338134765625
2024-08-04T07:53:36.601501781Z 
 83%|████████▎ | 7858/9500 [26:56:06<5:36:58, 12.31s/it]08/04/2024 00:53:36 - INFO - __main__ -   Step: 7858, LR: 3.562404826746525e-06, Loss: 434.8843994140625
2024-08-04T07:53:48.800497649Z 
 83%|████████▎ | 7859/9500 [26:56:18<5:35:49, 12.28s/it]08/04/2024 00:53:48 - INFO - __main__ -   Step: 7859, LR: 3.5602342830592463e-06, Loss: 378.2796325683594
2024-08-04T07:54:01.105828675Z 
 83%|████████▎ | 7860/9500 [26:56:31<5:35:50, 12.29s/it]08/04/2024 00:54:01 - INFO - __main__ -   Step: 7860, LR: 3.558063739371967e-06, Loss: 425.9043884277344
2024-08-04T07:54:13.656844748Z 
 83%|████████▎ | 7861/9500 [26:56:43<5:37:48, 12.37s/it]08/04/2024 00:54:13 - INFO - __main__ -   Step: 7861, LR: 3.555893195684688e-06, Loss: 384.84722900390625
2024-08-04T07:54:25.760272693Z 
 83%|████████▎ | 7862/9500 [26:56:55<5:35:26, 12.29s/it]08/04/2024 00:54:25 - INFO - __main__ -   Step: 7862, LR: 3.5537226519974093e-06, Loss: 497.302001953125
2024-08-04T07:54:37.955961359Z 
 83%|████████▎ | 7863/9500 [26:57:07<5:34:29, 12.26s/it]08/04/2024 00:54:37 - INFO - __main__ -   Step: 7863, LR: 3.55155210831013e-06, Loss: 572.5283203125
2024-08-04T07:54:50.119191925Z 
 83%|████████▎ | 7864/9500 [26:57:20<5:33:29, 12.23s/it]08/04/2024 00:54:50 - INFO - __main__ -   Step: 7864, LR: 3.549381564622851e-06, Loss: 382.0207824707031
2024-08-04T07:55:03.031136304Z 
 83%|████████▎ | 7865/9500 [26:57:32<5:38:51, 12.44s/it]08/04/2024 00:55:03 - INFO - __main__ -   Step: 7865, LR: 3.5472110209355727e-06, Loss: 475.72064208984375
2024-08-04T07:55:15.235717428Z 
 83%|████████▎ | 7866/9500 [26:57:45<5:36:46, 12.37s/it]08/04/2024 00:55:15 - INFO - __main__ -   Step: 7866, LR: 3.5450404772482934e-06, Loss: 435.32452392578125
2024-08-04T07:55:27.497923872Z 
 83%|████████▎ | 7867/9500 [26:57:57<5:35:42, 12.33s/it]08/04/2024 00:55:27 - INFO - __main__ -   Step: 7867, LR: 3.5428699335610146e-06, Loss: 495.394775390625
2024-08-04T07:55:40.217075314Z 
 83%|████████▎ | 7868/9500 [26:58:10<5:38:38, 12.45s/it]08/04/2024 00:55:40 - INFO - __main__ -   Step: 7868, LR: 3.5406993898737357e-06, Loss: 370.7955322265625
2024-08-04T07:55:52.783868624Z 
 83%|████████▎ | 7869/9500 [26:58:22<5:39:23, 12.49s/it]08/04/2024 00:55:52 - INFO - __main__ -   Step: 7869, LR: 3.538528846186457e-06, Loss: 382.7194519042969
2024-08-04T07:56:04.716506107Z 
 83%|████████▎ | 7870/9500 [26:58:34<5:34:40, 12.32s/it]08/04/2024 00:56:04 - INFO - __main__ -   Step: 7870, LR: 3.5363583024991776e-06, Loss: 398.83856201171875
2024-08-04T07:56:17.501104167Z 
 83%|████████▎ | 7871/9500 [26:58:47<5:38:15, 12.46s/it]08/04/2024 00:56:17 - INFO - __main__ -   Step: 7871, LR: 3.534187758811899e-06, Loss: 446.87115478515625
2024-08-04T07:56:29.832491423Z 
 83%|████████▎ | 7872/9500 [26:58:59<5:37:01, 12.42s/it]08/04/2024 00:56:29 - INFO - __main__ -   Step: 7872, LR: 3.5320172151246203e-06, Loss: 453.63458251953125
2024-08-04T07:56:41.933022321Z 
 83%|████████▎ | 7873/9500 [26:59:11<5:34:12, 12.32s/it]08/04/2024 00:56:41 - INFO - __main__ -   Step: 7873, LR: 3.529846671437341e-06, Loss: 454.3326416015625
2024-08-04T07:56:54.288572548Z 
 83%|████████▎ | 7874/9500 [26:59:24<5:34:14, 12.33s/it]08/04/2024 00:56:54 - INFO - __main__ -   Step: 7874, LR: 3.527676127750062e-06, Loss: 375.2327880859375
2024-08-04T07:57:06.548914572Z 
 83%|████████▎ | 7875/9500 [26:59:36<5:33:26, 12.31s/it]08/04/2024 00:57:06 - INFO - __main__ -   Step: 7875, LR: 3.5255055840627832e-06, Loss: 396.515869140625
2024-08-04T07:57:18.679152697Z 
 83%|████████▎ | 7876/9500 [26:59:48<5:31:45, 12.26s/it]08/04/2024 00:57:18 - INFO - __main__ -   Step: 7876, LR: 3.5233350403755044e-06, Loss: 481.94866943359375
2024-08-04T07:57:31.435728094Z 
 83%|████████▎ | 7877/9500 [27:00:01<5:35:36, 12.41s/it]08/04/2024 00:57:31 - INFO - __main__ -   Step: 7877, LR: 3.521164496688225e-06, Loss: 406.0379638671875
2024-08-04T07:57:43.695496809Z 
 83%|████████▎ | 7878/9500 [27:00:13<5:34:12, 12.36s/it]08/04/2024 00:57:43 - INFO - __main__ -   Step: 7878, LR: 3.5189939530009466e-06, Loss: 376.18072509765625
2024-08-04T07:57:55.806202491Z 
 83%|████████▎ | 7879/9500 [27:00:25<5:31:57, 12.29s/it]08/04/2024 00:57:55 - INFO - __main__ -   Step: 7879, LR: 3.5168234093136678e-06, Loss: 440.4737243652344
2024-08-04T07:58:08.249277877Z 
 83%|████████▎ | 7880/9500 [27:00:38<5:33:01, 12.33s/it]08/04/2024 00:58:08 - INFO - __main__ -   Step: 7880, LR: 3.5146528656263885e-06, Loss: 389.1665344238281
2024-08-04T07:58:20.443620894Z 
 83%|████████▎ | 7881/9500 [27:00:50<5:31:40, 12.29s/it]08/04/2024 00:58:20 - INFO - __main__ -   Step: 7881, LR: 3.5124823219391096e-06, Loss: 386.761474609375
2024-08-04T07:58:32.577980204Z 
 83%|████████▎ | 7882/9500 [27:01:02<5:30:11, 12.24s/it]08/04/2024 00:58:32 - INFO - __main__ -   Step: 7882, LR: 3.5103117782518308e-06, Loss: 303.80731201171875
2024-08-04T07:58:45.325438194Z 
 83%|████████▎ | 7883/9500 [27:01:15<5:34:03, 12.40s/it]08/04/2024 00:58:45 - INFO - __main__ -   Step: 7883, LR: 3.5081412345645523e-06, Loss: 443.747314453125
2024-08-04T07:58:57.495000476Z 
 83%|████████▎ | 7884/9500 [27:01:27<5:32:01, 12.33s/it]08/04/2024 00:58:57 - INFO - __main__ -   Step: 7884, LR: 3.5059706908772726e-06, Loss: 376.4020690917969
2024-08-04T07:59:09.789402339Z 
 83%|████████▎ | 7885/9500 [27:01:39<5:31:33, 12.32s/it]08/04/2024 00:59:09 - INFO - __main__ -   Step: 7885, LR: 3.503800147189994e-06, Loss: 400.44439697265625
2024-08-04T07:59:22.448118103Z 
 83%|████████▎ | 7886/9500 [27:01:52<5:34:05, 12.42s/it]08/04/2024 00:59:22 - INFO - __main__ -   Step: 7886, LR: 3.5016296035027153e-06, Loss: 443.8137512207031
2024-08-04T07:59:34.742266062Z 
 83%|████████▎ | 7887/9500 [27:02:04<5:32:52, 12.38s/it]08/04/2024 00:59:34 - INFO - __main__ -   Step: 7887, LR: 3.499459059815436e-06, Loss: 419.3858642578125
2024-08-04T07:59:46.745347452Z 
 83%|████████▎ | 7888/9500 [27:02:16<5:29:36, 12.27s/it]08/04/2024 00:59:46 - INFO - __main__ -   Step: 7888, LR: 3.497288516128157e-06, Loss: 272.982666015625
2024-08-04T07:59:59.367368614Z 
 83%|████████▎ | 7889/9500 [27:02:29<5:32:15, 12.37s/it]08/04/2024 00:59:59 - INFO - __main__ -   Step: 7889, LR: 3.4951179724408783e-06, Loss: 347.0283508300781
2024-08-04T08:00:11.610483924Z 
 83%|████████▎ | 7890/9500 [27:02:41<5:30:59, 12.34s/it]08/04/2024 01:00:11 - INFO - __main__ -   Step: 7890, LR: 3.4929474287536e-06, Loss: 399.30877685546875
2024-08-04T08:00:23.807420049Z 
 83%|████████▎ | 7891/9500 [27:02:53<5:29:40, 12.29s/it]08/04/2024 01:00:23 - INFO - __main__ -   Step: 7891, LR: 3.4907768850663206e-06, Loss: 435.0958557128906
2024-08-04T08:00:36.396456556Z 
 83%|████████▎ | 7892/9500 [27:03:06<5:31:50, 12.38s/it]08/04/2024 01:00:36 - INFO - __main__ -   Step: 7892, LR: 3.4886063413790417e-06, Loss: 488.6932373046875
2024-08-04T08:00:48.593073182Z 
 83%|████████▎ | 7893/9500 [27:03:18<5:30:08, 12.33s/it]08/04/2024 01:00:48 - INFO - __main__ -   Step: 7893, LR: 3.486435797691763e-06, Loss: 432.87567138671875
2024-08-04T08:01:00.603806735Z 
 83%|████████▎ | 7894/9500 [27:03:30<5:27:24, 12.23s/it]08/04/2024 01:01:00 - INFO - __main__ -   Step: 7894, LR: 3.4842652540044836e-06, Loss: 345.9586181640625
2024-08-04T08:01:13.115637211Z 
 83%|████████▎ | 7895/9500 [27:03:43<5:29:26, 12.32s/it]08/04/2024 01:01:13 - INFO - __main__ -   Step: 7895, LR: 3.4820947103172047e-06, Loss: 396.313232421875
2024-08-04T08:01:25.955397209Z 
 83%|████████▎ | 7896/9500 [27:03:55<5:33:26, 12.47s/it]08/04/2024 01:01:25 - INFO - __main__ -   Step: 7896, LR: 3.4799241666299262e-06, Loss: 437.9564208984375
2024-08-04T08:01:38.406786269Z 
 83%|████████▎ | 7897/9500 [27:04:08<5:33:04, 12.47s/it]08/04/2024 01:01:38 - INFO - __main__ -   Step: 7897, LR: 3.4777536229426474e-06, Loss: 443.64654541015625
2024-08-04T08:01:50.702977561Z 
 83%|████████▎ | 7898/9500 [27:04:20<5:31:29, 12.42s/it]08/04/2024 01:01:50 - INFO - __main__ -   Step: 7898, LR: 3.475583079255368e-06, Loss: 430.91375732421875
2024-08-04T08:02:02.844995452Z 
 83%|████████▎ | 7899/9500 [27:04:32<5:29:05, 12.33s/it]08/04/2024 01:02:02 - INFO - __main__ -   Step: 7899, LR: 3.4734125355680892e-06, Loss: 328.23565673828125
2024-08-04T08:02:14.820334504Z 
 83%|████████▎ | 7900/9500 [27:04:44<5:26:01, 12.23s/it]08/04/2024 01:02:14 - INFO - __main__ -   Step: 7900, LR: 3.4712419918808104e-06, Loss: 346.9432373046875
2024-08-04T08:02:27.259988434Z 
 83%|████████▎ | 7901/9500 [27:04:57<5:27:31, 12.29s/it]08/04/2024 01:02:27 - INFO - __main__ -   Step: 7901, LR: 3.469071448193531e-06, Loss: 348.60626220703125
2024-08-04T08:02:39.895484401Z 
 83%|████████▎ | 7902/9500 [27:05:09<5:30:05, 12.39s/it]08/04/2024 01:02:39 - INFO - __main__ -   Step: 7902, LR: 3.4669009045062522e-06, Loss: 475.9158020019531
2024-08-04T08:02:52.211383273Z 
 83%|████████▎ | 7903/9500 [27:05:22<5:29:15, 12.37s/it]08/04/2024 01:02:52 - INFO - __main__ -   Step: 7903, LR: 3.4647303608189738e-06, Loss: 454.46649169921875
2024-08-04T08:03:04.797180195Z 
 83%|████████▎ | 7904/9500 [27:05:34<5:30:46, 12.44s/it]08/04/2024 01:03:04 - INFO - __main__ -   Step: 7904, LR: 3.462559817131695e-06, Loss: 368.92718505859375
2024-08-04T08:03:16.783066988Z 
 83%|████████▎ | 7905/9500 [27:05:46<5:26:58, 12.30s/it]08/04/2024 01:03:16 - INFO - __main__ -   Step: 7905, LR: 3.4603892734444156e-06, Loss: 377.2344665527344
2024-08-04T08:03:29.215043350Z 
 83%|████████▎ | 7906/9500 [27:05:59<5:27:49, 12.34s/it]08/04/2024 01:03:29 - INFO - __main__ -   Step: 7906, LR: 3.4582187297571368e-06, Loss: 359.98748779296875
2024-08-04T08:03:41.495470376Z 
 83%|████████▎ | 7907/9500 [27:06:11<5:27:08, 12.32s/it]08/04/2024 01:03:41 - INFO - __main__ -   Step: 7907, LR: 3.456048186069858e-06, Loss: 465.4825744628906
2024-08-04T08:03:54.155804644Z 
 83%|████████▎ | 7908/9500 [27:06:24<5:29:38, 12.42s/it]08/04/2024 01:03:54 - INFO - __main__ -   Step: 7908, LR: 3.4538776423825786e-06, Loss: 473.731689453125
2024-08-04T08:04:06.390172268Z 
 83%|████████▎ | 7909/9500 [27:06:36<5:27:55, 12.37s/it]08/04/2024 01:04:06 - INFO - __main__ -   Step: 7909, LR: 3.4517070986953e-06, Loss: 464.29669189453125
2024-08-04T08:04:18.660712575Z 
 83%|████████▎ | 7910/9500 [27:06:48<5:26:57, 12.34s/it]08/04/2024 01:04:18 - INFO - __main__ -   Step: 7910, LR: 3.4495365550080213e-06, Loss: 415.7525634765625
2024-08-04T08:04:31.383013412Z 
 83%|████████▎ | 7911/9500 [27:07:01<5:29:48, 12.45s/it]08/04/2024 01:04:31 - INFO - __main__ -   Step: 7911, LR: 3.4473660113207424e-06, Loss: 475.909912109375
2024-08-04T08:04:44.070664816Z 
 83%|████████▎ | 7912/9500 [27:07:14<5:31:27, 12.52s/it]08/04/2024 01:04:44 - INFO - __main__ -   Step: 7912, LR: 3.445195467633463e-06, Loss: 501.6861877441406
2024-08-04T08:04:56.367816944Z 
 83%|████████▎ | 7913/9500 [27:07:26<5:29:27, 12.46s/it]08/04/2024 01:04:56 - INFO - __main__ -   Step: 7913, LR: 3.4430249239461843e-06, Loss: 461.73724365234375
2024-08-04T08:05:08.839779783Z 
 83%|████████▎ | 7914/9500 [27:07:38<5:29:22, 12.46s/it]08/04/2024 01:05:08 - INFO - __main__ -   Step: 7914, LR: 3.4408543802589054e-06, Loss: 363.1164855957031
2024-08-04T08:05:21.429124473Z 
 83%|████████▎ | 7915/9500 [27:07:51<5:30:11, 12.50s/it]08/04/2024 01:05:21 - INFO - __main__ -   Step: 7915, LR: 3.438683836571626e-06, Loss: 368.8870849609375
2024-08-04T08:05:33.654001761Z 
 83%|████████▎ | 7916/9500 [27:08:03<5:27:48, 12.42s/it]08/04/2024 01:05:33 - INFO - __main__ -   Step: 7916, LR: 3.4365132928843477e-06, Loss: 349.1887512207031
2024-08-04T08:05:46.364883071Z 
 83%|████████▎ | 7917/9500 [27:08:16<5:29:55, 12.51s/it]08/04/2024 01:05:46 - INFO - __main__ -   Step: 7917, LR: 3.434342749197069e-06, Loss: 409.36749267578125
2024-08-04T08:05:58.846481989Z 
 83%|████████▎ | 7918/9500 [27:08:28<5:29:31, 12.50s/it]08/04/2024 01:05:58 - INFO - __main__ -   Step: 7918, LR: 3.43217220550979e-06, Loss: 399.8462829589844
2024-08-04T08:06:10.805281545Z 
 83%|████████▎ | 7919/9500 [27:08:40<5:25:03, 12.34s/it]08/04/2024 01:06:10 - INFO - __main__ -   Step: 7919, LR: 3.4300016618225107e-06, Loss: 361.0339050292969
2024-08-04T08:06:23.354775168Z 
 83%|████████▎ | 7920/9500 [27:08:53<5:26:32, 12.40s/it]08/04/2024 01:06:23 - INFO - __main__ -   Step: 7920, LR: 3.427831118135232e-06, Loss: 408.8072509765625
2024-08-04T08:06:35.541533408Z 
 83%|████████▎ | 7921/9500 [27:09:05<5:24:38, 12.34s/it]08/04/2024 01:06:35 - INFO - __main__ -   Step: 7921, LR: 3.4256605744479534e-06, Loss: 355.4884033203125
2024-08-04T08:06:47.778469648Z 
 83%|████████▎ | 7922/9500 [27:09:17<5:23:39, 12.31s/it]08/04/2024 01:06:47 - INFO - __main__ -   Step: 7922, LR: 3.4234900307606737e-06, Loss: 442.79156494140625
2024-08-04T08:07:00.202737500Z 
 83%|████████▎ | 7923/9500 [27:09:30<5:24:23, 12.34s/it]08/04/2024 01:07:00 - INFO - __main__ -   Step: 7923, LR: 3.4213194870733952e-06, Loss: 318.4918212890625
2024-08-04T08:07:12.513929292Z 
 83%|████████▎ | 7924/9500 [27:09:42<5:23:56, 12.33s/it]08/04/2024 01:07:12 - INFO - __main__ -   Step: 7924, LR: 3.4191489433861164e-06, Loss: 436.0511169433594
2024-08-04T08:07:24.628587843Z 
 83%|████████▎ | 7925/9500 [27:09:54<5:22:00, 12.27s/it]08/04/2024 01:07:24 - INFO - __main__ -   Step: 7925, LR: 3.4169783996988375e-06, Loss: 435.169677734375
2024-08-04T08:07:37.429394228Z 
 83%|████████▎ | 7926/9500 [27:10:07<5:26:00, 12.43s/it]08/04/2024 01:07:37 - INFO - __main__ -   Step: 7926, LR: 3.414807856011558e-06, Loss: 423.83843994140625
2024-08-04T08:07:49.681641668Z 
 83%|████████▎ | 7927/9500 [27:10:19<5:24:25, 12.37s/it]08/04/2024 01:07:49 - INFO - __main__ -   Step: 7927, LR: 3.4126373123242793e-06, Loss: 354.95599365234375
2024-08-04T08:08:01.625347082Z 
 83%|████████▎ | 7928/9500 [27:10:31<5:20:49, 12.25s/it]08/04/2024 01:08:01 - INFO - __main__ -   Step: 7928, LR: 3.410466768637001e-06, Loss: 391.5896911621094
2024-08-04T08:08:14.328056158Z 
 83%|████████▎ | 7929/9500 [27:10:44<5:24:13, 12.38s/it]08/04/2024 01:08:14 - INFO - __main__ -   Step: 7929, LR: 3.4082962249497216e-06, Loss: 376.2967529296875
2024-08-04T08:08:26.497671035Z 
 83%|████████▎ | 7930/9500 [27:10:56<5:22:20, 12.32s/it]08/04/2024 01:08:26 - INFO - __main__ -   Step: 7930, LR: 3.4061256812624428e-06, Loss: 434.34112548828125
2024-08-04T08:08:38.658259956Z 
 83%|████████▎ | 7931/9500 [27:11:08<5:20:53, 12.27s/it]08/04/2024 01:08:38 - INFO - __main__ -   Step: 7931, LR: 3.403955137575164e-06, Loss: 364.0632629394531
2024-08-04T08:08:51.191648223Z 
 83%|████████▎ | 7932/9500 [27:11:21<5:22:43, 12.35s/it]08/04/2024 01:08:51 - INFO - __main__ -   Step: 7932, LR: 3.401784593887885e-06, Loss: 525.8056030273438
2024-08-04T08:09:03.284339301Z 
 84%|████████▎ | 7933/9500 [27:11:33<5:20:31, 12.27s/it]08/04/2024 01:09:03 - INFO - __main__ -   Step: 7933, LR: 3.3996140502006057e-06, Loss: 324.8202209472656
2024-08-04T08:09:15.668401487Z 
 84%|████████▎ | 7934/9500 [27:11:45<5:21:11, 12.31s/it]08/04/2024 01:09:15 - INFO - __main__ -   Step: 7934, LR: 3.3974435065133273e-06, Loss: 334.4009704589844
2024-08-04T08:09:28.021852549Z 
 84%|████████▎ | 7935/9500 [27:11:57<5:21:21, 12.32s/it]08/04/2024 01:09:28 - INFO - __main__ -   Step: 7935, LR: 3.3952729628260484e-06, Loss: 334.43731689453125
2024-08-04T08:09:40.387194662Z 
 84%|████████▎ | 7936/9500 [27:12:10<5:21:30, 12.33s/it]08/04/2024 01:09:40 - INFO - __main__ -   Step: 7936, LR: 3.393102419138769e-06, Loss: 442.70269775390625
2024-08-04T08:09:52.515684575Z 
 84%|████████▎ | 7937/9500 [27:12:22<5:19:41, 12.27s/it]08/04/2024 01:09:52 - INFO - __main__ -   Step: 7937, LR: 3.3909318754514903e-06, Loss: 399.67938232421875
2024-08-04T08:10:04.992125812Z 
 84%|████████▎ | 7938/9500 [27:12:34<5:21:04, 12.33s/it]08/04/2024 01:10:04 - INFO - __main__ -   Step: 7938, LR: 3.3887613317642114e-06, Loss: 357.51263427734375
2024-08-04T08:10:17.506381671Z 
 84%|████████▎ | 7939/9500 [27:12:47<5:22:17, 12.39s/it]08/04/2024 01:10:17 - INFO - __main__ -   Step: 7939, LR: 3.3865907880769326e-06, Loss: 438.71368408203125
2024-08-04T08:10:29.798603938Z 
 84%|████████▎ | 7940/9500 [27:12:59<5:21:20, 12.36s/it]08/04/2024 01:10:29 - INFO - __main__ -   Step: 7940, LR: 3.3844202443896533e-06, Loss: 454.25823974609375
2024-08-04T08:10:42.649252715Z 
 84%|████████▎ | 7941/9500 [27:13:12<5:24:57, 12.51s/it]08/04/2024 01:10:42 - INFO - __main__ -   Step: 7941, LR: 3.382249700702375e-06, Loss: 488.7967529296875
2024-08-04T08:10:54.980978349Z 
 84%|████████▎ | 7942/9500 [27:13:24<5:23:23, 12.45s/it]08/04/2024 01:10:54 - INFO - __main__ -   Step: 7942, LR: 3.380079157015096e-06, Loss: 335.89874267578125
2024-08-04T08:11:06.998663836Z 
 84%|████████▎ | 7943/9500 [27:13:36<5:19:47, 12.32s/it]08/04/2024 01:11:06 - INFO - __main__ -   Step: 7943, LR: 3.3779086133278167e-06, Loss: 424.27490234375
2024-08-04T08:11:19.451163707Z 
 84%|████████▎ | 7944/9500 [27:13:49<5:20:35, 12.36s/it]08/04/2024 01:11:19 - INFO - __main__ -   Step: 7944, LR: 3.375738069640538e-06, Loss: 416.07122802734375
2024-08-04T08:11:31.708192725Z 
 84%|████████▎ | 7945/9500 [27:14:01<5:19:33, 12.33s/it]08/04/2024 01:11:31 - INFO - __main__ -   Step: 7945, LR: 3.373567525953259e-06, Loss: 397.04791259765625
2024-08-04T08:11:44.218651313Z 
 84%|████████▎ | 7946/9500 [27:14:14<5:20:45, 12.38s/it]08/04/2024 01:11:44 - INFO - __main__ -   Step: 7946, LR: 3.3713969822659797e-06, Loss: 362.2586669921875
2024-08-04T08:11:56.748039546Z 
 84%|████████▎ | 7947/9500 [27:14:26<5:21:40, 12.43s/it]08/04/2024 01:11:56 - INFO - __main__ -   Step: 7947, LR: 3.3692264385787012e-06, Loss: 447.268310546875
2024-08-04T08:12:09.095520413Z 
 84%|████████▎ | 7948/9500 [27:14:39<5:20:50, 12.40s/it]08/04/2024 01:12:09 - INFO - __main__ -   Step: 7948, LR: 3.3670558948914224e-06, Loss: 430.97320556640625
2024-08-04T08:12:21.132674689Z 
 84%|████████▎ | 7949/9500 [27:14:51<5:17:47, 12.29s/it]08/04/2024 01:12:21 - INFO - __main__ -   Step: 7949, LR: 3.3648853512041435e-06, Loss: 351.6942443847656
2024-08-04T08:12:33.205136373Z 
 84%|████████▎ | 7950/9500 [27:15:03<5:15:52, 12.23s/it]08/04/2024 01:12:33 - INFO - __main__ -   Step: 7950, LR: 3.362714807516864e-06, Loss: 345.5077819824219
2024-08-04T08:12:45.923174197Z 
 84%|████████▎ | 7951/9500 [27:15:15<5:19:28, 12.37s/it]08/04/2024 01:12:45 - INFO - __main__ -   Step: 7951, LR: 3.3605442638295853e-06, Loss: 383.9930419921875
2024-08-04T08:12:57.904523346Z 
 84%|████████▎ | 7952/9500 [27:15:27<5:16:13, 12.26s/it]08/04/2024 01:12:57 - INFO - __main__ -   Step: 7952, LR: 3.3583737201423065e-06, Loss: 351.3429870605469
2024-08-04T08:13:10.146641963Z 
 84%|████████▎ | 7953/9500 [27:15:40<5:15:54, 12.25s/it]08/04/2024 01:13:10 - INFO - __main__ -   Step: 7953, LR: 3.356203176455027e-06, Loss: 454.6651611328125
2024-08-04T08:13:22.747528270Z 
 84%|████████▎ | 7954/9500 [27:15:52<5:18:23, 12.36s/it]08/04/2024 01:13:22 - INFO - __main__ -   Step: 7954, LR: 3.3540326327677487e-06, Loss: 361.8581848144531
2024-08-04T08:13:35.050382443Z 
 84%|████████▎ | 7955/9500 [27:16:04<5:17:46, 12.34s/it]08/04/2024 01:13:35 - INFO - __main__ -   Step: 7955, LR: 3.35186208908047e-06, Loss: 421.7386169433594
2024-08-04T08:13:47.446061241Z 
 84%|████████▎ | 7956/9500 [27:16:17<5:17:59, 12.36s/it]08/04/2024 01:13:47 - INFO - __main__ -   Step: 7956, LR: 3.349691545393191e-06, Loss: 471.88739013671875
2024-08-04T08:13:59.904980437Z 
 84%|████████▍ | 7957/9500 [27:16:29<5:18:34, 12.39s/it]08/04/2024 01:13:59 - INFO - __main__ -   Step: 7957, LR: 3.3475210017059117e-06, Loss: 362.7715759277344
2024-08-04T08:14:11.972973128Z 
 84%|████████▍ | 7958/9500 [27:16:41<5:15:53, 12.29s/it]08/04/2024 01:14:11 - INFO - __main__ -   Step: 7958, LR: 3.345350458018633e-06, Loss: 296.28997802734375
2024-08-04T08:14:24.200199177Z 
 84%|████████▍ | 7959/9500 [27:16:54<5:15:11, 12.27s/it]08/04/2024 01:14:24 - INFO - __main__ -   Step: 7959, LR: 3.3431799143313544e-06, Loss: 339.836669921875
2024-08-04T08:14:36.601777386Z 
 84%|████████▍ | 7960/9500 [27:17:06<5:15:59, 12.31s/it]08/04/2024 01:14:36 - INFO - __main__ -   Step: 7960, LR: 3.341009370644075e-06, Loss: 399.34661865234375
2024-08-04T08:14:48.820018222Z 
 84%|████████▍ | 7961/9500 [27:17:18<5:15:03, 12.28s/it]08/04/2024 01:14:48 - INFO - __main__ -   Step: 7961, LR: 3.3388388269567963e-06, Loss: 444.87152099609375
2024-08-04T08:15:01.026282777Z 
 84%|████████▍ | 7962/9500 [27:17:30<5:14:16, 12.26s/it]08/04/2024 01:15:01 - INFO - __main__ -   Step: 7962, LR: 3.3366682832695174e-06, Loss: 434.6744689941406
2024-08-04T08:15:13.721831110Z 
 84%|████████▍ | 7963/9500 [27:17:43<5:17:24, 12.39s/it]08/04/2024 01:15:13 - INFO - __main__ -   Step: 7963, LR: 3.3344977395822385e-06, Loss: 270.61590576171875
2024-08-04T08:15:25.822391935Z 
 84%|████████▍ | 7964/9500 [27:17:55<5:14:58, 12.30s/it]08/04/2024 01:15:25 - INFO - __main__ -   Step: 7964, LR: 3.3323271958949593e-06, Loss: 401.660400390625
2024-08-04T08:15:37.894639234Z 
 84%|████████▍ | 7965/9500 [27:18:07<5:12:59, 12.23s/it]08/04/2024 01:15:37 - INFO - __main__ -   Step: 7965, LR: 3.3301566522076804e-06, Loss: 421.5502624511719
2024-08-04T08:15:50.347736127Z 
 84%|████████▍ | 7966/9500 [27:18:20<5:14:28, 12.30s/it]08/04/2024 01:15:50 - INFO - __main__ -   Step: 7966, LR: 3.327986108520402e-06, Loss: 456.6368408203125
2024-08-04T08:16:03.118470723Z 
 84%|████████▍ | 7967/9500 [27:18:33<5:17:52, 12.44s/it]08/04/2024 01:16:03 - INFO - __main__ -   Step: 7967, LR: 3.3258155648331227e-06, Loss: 501.7523193359375
2024-08-04T08:16:15.453448746Z 
 84%|████████▍ | 7968/9500 [27:18:45<5:16:51, 12.41s/it]08/04/2024 01:16:15 - INFO - __main__ -   Step: 7968, LR: 3.323645021145844e-06, Loss: 399.14788818359375
2024-08-04T08:16:28.162867568Z 
 84%|████████▍ | 7969/9500 [27:18:58<5:18:56, 12.50s/it]08/04/2024 01:16:28 - INFO - __main__ -   Step: 7969, LR: 3.321474477458565e-06, Loss: 425.9170837402344
2024-08-04T08:16:40.552574785Z 
 84%|████████▍ | 7970/9500 [27:19:10<5:17:53, 12.47s/it]08/04/2024 01:16:40 - INFO - __main__ -   Step: 7970, LR: 3.319303933771286e-06, Loss: 468.27301025390625
2024-08-04T08:16:52.816163952Z 
 84%|████████▍ | 7971/9500 [27:19:22<5:16:08, 12.41s/it]08/04/2024 01:16:52 - INFO - __main__ -   Step: 7971, LR: 3.317133390084007e-06, Loss: 405.43157958984375
2024-08-04T08:17:05.454006092Z 
 84%|████████▍ | 7972/9500 [27:19:35<5:17:42, 12.48s/it]08/04/2024 01:17:05 - INFO - __main__ -   Step: 7972, LR: 3.3149628463967283e-06, Loss: 374.91986083984375
2024-08-04T08:17:17.507039115Z 
 84%|████████▍ | 7973/9500 [27:19:47<5:14:16, 12.35s/it]08/04/2024 01:17:17 - INFO - __main__ -   Step: 7973, LR: 3.3127923027094495e-06, Loss: 445.309814453125
2024-08-04T08:17:29.671560363Z 
 84%|████████▍ | 7974/9500 [27:19:59<5:12:39, 12.29s/it]08/04/2024 01:17:29 - INFO - __main__ -   Step: 7974, LR: 3.31062175902217e-06, Loss: 347.1941833496094
2024-08-04T08:17:42.216562473Z 
 84%|████████▍ | 7975/9500 [27:20:12<5:14:22, 12.37s/it]08/04/2024 01:17:42 - INFO - __main__ -   Step: 7975, LR: 3.3084512153348913e-06, Loss: 324.89886474609375
2024-08-04T08:17:54.245033459Z 
 84%|████████▍ | 7976/9500 [27:20:24<5:11:34, 12.27s/it]08/04/2024 01:17:54 - INFO - __main__ -   Step: 7976, LR: 3.3062806716476125e-06, Loss: 357.10516357421875
2024-08-04T08:18:06.519777864Z 
 84%|████████▍ | 7977/9500 [27:20:36<5:11:25, 12.27s/it]08/04/2024 01:18:06 - INFO - __main__ -   Step: 7977, LR: 3.3041101279603336e-06, Loss: 297.1868896484375
2024-08-04T08:18:19.044589164Z 
 84%|████████▍ | 7978/9500 [27:20:48<5:13:10, 12.35s/it]08/04/2024 01:18:19 - INFO - __main__ -   Step: 7978, LR: 3.3019395842730543e-06, Loss: 491.2694091796875
2024-08-04T08:18:31.419769577Z 
 84%|████████▍ | 7979/9500 [27:21:01<5:13:11, 12.35s/it]08/04/2024 01:18:31 - INFO - __main__ -   Step: 7979, LR: 3.299769040585776e-06, Loss: 398.6521301269531
2024-08-04T08:18:43.636441208Z 
 84%|████████▍ | 7980/9500 [27:21:13<5:11:56, 12.31s/it]08/04/2024 01:18:43 - INFO - __main__ -   Step: 7980, LR: 3.297598496898497e-06, Loss: 464.98577880859375
2024-08-04T08:18:56.010138451Z 
 84%|████████▍ | 7981/9500 [27:21:25<5:12:11, 12.33s/it]08/04/2024 01:18:56 - INFO - __main__ -   Step: 7981, LR: 3.2954279532112177e-06, Loss: 364.3161926269531
2024-08-04T08:19:07.966444673Z 
 84%|████████▍ | 7982/9500 [27:21:37<5:09:08, 12.22s/it]08/04/2024 01:19:07 - INFO - __main__ -   Step: 7982, LR: 3.293257409523939e-06, Loss: 377.75518798828125
2024-08-04T08:19:20.161718416Z 
 84%|████████▍ | 7983/9500 [27:21:50<5:08:45, 12.21s/it]08/04/2024 01:19:20 - INFO - __main__ -   Step: 7983, LR: 3.29108686583666e-06, Loss: 348.36962890625
2024-08-04T08:19:32.568205353Z 
 84%|████████▍ | 7984/9500 [27:22:02<5:10:01, 12.27s/it]08/04/2024 01:19:32 - INFO - __main__ -   Step: 7984, LR: 3.2889163221493816e-06, Loss: 416.0150146484375
2024-08-04T08:19:45.088198543Z 
 84%|████████▍ | 7985/9500 [27:22:15<5:11:42, 12.35s/it]08/04/2024 01:19:45 - INFO - __main__ -   Step: 7985, LR: 3.2867457784621023e-06, Loss: 364.276123046875
2024-08-04T08:19:57.675329448Z 
 84%|████████▍ | 7986/9500 [27:22:27<5:13:20, 12.42s/it]08/04/2024 01:19:57 - INFO - __main__ -   Step: 7986, LR: 3.2845752347748234e-06, Loss: 438.0542907714844
2024-08-04T08:20:10.199039598Z 
 84%|████████▍ | 7987/9500 [27:22:40<5:13:56, 12.45s/it]08/04/2024 01:20:10 - INFO - __main__ -   Step: 7987, LR: 3.2824046910875445e-06, Loss: 380.29583740234375
2024-08-04T08:20:22.565797513Z 
 84%|████████▍ | 7988/9500 [27:22:52<5:13:06, 12.42s/it]08/04/2024 01:20:22 - INFO - __main__ -   Step: 7988, LR: 3.2802341474002653e-06, Loss: 414.6729736328125
2024-08-04T08:20:34.879435942Z 
 84%|████████▍ | 7989/9500 [27:23:04<5:12:03, 12.39s/it]08/04/2024 01:20:34 - INFO - __main__ -   Step: 7989, LR: 3.2780636037129864e-06, Loss: 422.6190490722656
2024-08-04T08:20:47.698079315Z 
 84%|████████▍ | 7990/9500 [27:23:17<5:15:04, 12.52s/it]08/04/2024 01:20:47 - INFO - __main__ -   Step: 7990, LR: 3.2758930600257075e-06, Loss: 498.90045166015625
2024-08-04T08:20:59.983634151Z 
 84%|████████▍ | 7991/9500 [27:23:29<5:13:06, 12.45s/it]08/04/2024 01:20:59 - INFO - __main__ -   Step: 7991, LR: 3.273722516338429e-06, Loss: 466.6870422363281
2024-08-04T08:21:12.180517446Z 
 84%|████████▍ | 7992/9500 [27:23:42<5:10:59, 12.37s/it]08/04/2024 01:21:12 - INFO - __main__ -   Step: 7992, LR: 3.27155197265115e-06, Loss: 425.44940185546875
2024-08-04T08:21:24.316838481Z 
 84%|████████▍ | 7993/9500 [27:23:54<5:08:59, 12.30s/it]08/04/2024 01:21:24 - INFO - __main__ -   Step: 7993, LR: 3.269381428963871e-06, Loss: 299.919189453125
2024-08-04T08:21:36.693934910Z 
 84%|████████▍ | 7994/9500 [27:24:06<5:09:21, 12.32s/it]08/04/2024 01:21:36 - INFO - __main__ -   Step: 7994, LR: 3.267210885276592e-06, Loss: 380.7767333984375
2024-08-04T08:21:49.019939245Z 
 84%|████████▍ | 7995/9500 [27:24:18<5:09:09, 12.33s/it]08/04/2024 01:21:49 - INFO - __main__ -   Step: 7995, LR: 3.2650403415893128e-06, Loss: 354.9847412109375
2024-08-04T08:22:00.728567532Z 
 84%|████████▍ | 7996/9500 [27:24:30<5:04:18, 12.14s/it]08/04/2024 01:22:00 - INFO - __main__ -   Step: 7996, LR: 3.262869797902034e-06, Loss: 283.9363708496094
2024-08-04T08:22:13.516551629Z 
 84%|████████▍ | 7997/9500 [27:24:43<5:08:58, 12.33s/it]08/04/2024 01:22:13 - INFO - __main__ -   Step: 7997, LR: 3.2606992542147555e-06, Loss: 478.87738037109375
2024-08-04T08:22:25.512591496Z 
 84%|████████▍ | 7998/9500 [27:24:55<5:06:14, 12.23s/it]08/04/2024 01:22:25 - INFO - __main__ -   Step: 7998, LR: 3.2585287105274766e-06, Loss: 358.6601867675781
2024-08-04T08:22:37.505871377Z 
 84%|████████▍ | 7999/9500 [27:25:07<5:04:13, 12.16s/it]08/04/2024 01:22:37 - INFO - __main__ -   Step: 7999, LR: 3.2563581668401973e-06, Loss: 436.88934326171875
2024-08-04T08:22:49.844552306Z 
 84%|████████▍ | 8000/9500 [27:25:19<5:05:21, 12.21s/it]08/04/2024 01:22:49 - INFO - __main__ -   Step: 8000, LR: 3.2541876231529185e-06, Loss: 346.55694580078125
2024-08-04T08:23:02.149993964Z 
 84%|████████▍ | 8001/9500 [27:25:32<5:05:50, 12.24s/it]08/04/2024 01:23:02 - INFO - __main__ -   Step: 8001, LR: 3.2520170794656396e-06, Loss: 511.1700134277344
2024-08-04T08:23:14.244292607Z 
 84%|████████▍ | 8002/9500 [27:25:44<5:04:31, 12.20s/it]08/04/2024 01:23:14 - INFO - __main__ -   Step: 8002, LR: 3.2498465357783603e-06, Loss: 517.4000244140625
2024-08-04T08:23:26.733900672Z 
 84%|████████▍ | 8003/9500 [27:25:56<5:06:30, 12.29s/it]08/04/2024 01:23:26 - INFO - __main__ -   Step: 8003, LR: 3.2476759920910814e-06, Loss: 348.1920471191406
2024-08-04T08:23:38.667726196Z 
 84%|████████▍ | 8004/9500 [27:26:08<5:03:40, 12.18s/it]08/04/2024 01:23:38 - INFO - __main__ -   Step: 8004, LR: 3.245505448403803e-06, Loss: 402.773681640625
2024-08-04T08:23:50.641317526Z 
 84%|████████▍ | 8005/9500 [27:26:20<5:01:56, 12.12s/it]08/04/2024 01:23:50 - INFO - __main__ -   Step: 8005, LR: 3.243334904716524e-06, Loss: 430.533447265625
2024-08-04T08:24:03.336794746Z 
 84%|████████▍ | 8006/9500 [27:26:33<5:06:02, 12.29s/it]08/04/2024 01:24:03 - INFO - __main__ -   Step: 8006, LR: 3.241164361029245e-06, Loss: 450.681396484375
2024-08-04T08:24:15.894869039Z 
 84%|████████▍ | 8007/9500 [27:26:45<5:07:50, 12.37s/it]08/04/2024 01:24:15 - INFO - __main__ -   Step: 8007, LR: 3.238993817341966e-06, Loss: 451.80560302734375
2024-08-04T08:24:28.145069068Z 
 84%|████████▍ | 8008/9500 [27:26:58<5:06:43, 12.33s/it]08/04/2024 01:24:28 - INFO - __main__ -   Step: 8008, LR: 3.236823273654687e-06, Loss: 392.7528991699219
2024-08-04T08:24:40.717791886Z 
 84%|████████▍ | 8009/9500 [27:27:10<5:08:17, 12.41s/it]08/04/2024 01:24:40 - INFO - __main__ -   Step: 8009, LR: 3.234652729967408e-06, Loss: 431.65850830078125
2024-08-04T08:24:53.359752848Z 
 84%|████████▍ | 8010/9500 [27:27:23<5:09:50, 12.48s/it]08/04/2024 01:24:53 - INFO - __main__ -   Step: 8010, LR: 3.2324821862801294e-06, Loss: 407.421630859375
2024-08-04T08:25:05.671692093Z 
 84%|████████▍ | 8011/9500 [27:27:35<5:08:24, 12.43s/it]08/04/2024 01:25:05 - INFO - __main__ -   Step: 8011, LR: 3.2303116425928505e-06, Loss: 329.20263671875
2024-08-04T08:25:18.665546823Z 
 84%|████████▍ | 8012/9500 [27:27:48<5:12:24, 12.60s/it]08/04/2024 01:25:18 - INFO - __main__ -   Step: 8012, LR: 3.2281410989055717e-06, Loss: 487.659423828125
2024-08-04T08:25:30.892281217Z 
 84%|████████▍ | 8013/9500 [27:28:00<5:09:27, 12.49s/it]08/04/2024 01:25:30 - INFO - __main__ -   Step: 8013, LR: 3.2259705552182924e-06, Loss: 428.9501953125
2024-08-04T08:25:42.932303406Z 
 84%|████████▍ | 8014/9500 [27:28:12<5:05:55, 12.35s/it]08/04/2024 01:25:42 - INFO - __main__ -   Step: 8014, LR: 3.2238000115310135e-06, Loss: 363.4468994140625
2024-08-04T08:25:55.424021532Z 
 84%|████████▍ | 8015/9500 [27:28:25<5:06:45, 12.39s/it]08/04/2024 01:25:55 - INFO - __main__ -   Step: 8015, LR: 3.221629467843735e-06, Loss: 461.09979248046875
2024-08-04T08:26:07.701734020Z 
 84%|████████▍ | 8016/9500 [27:28:37<5:05:41, 12.36s/it]08/04/2024 01:26:07 - INFO - __main__ -   Step: 8016, LR: 3.2194589241564554e-06, Loss: 474.1488037109375
2024-08-04T08:26:20.247817944Z 
 84%|████████▍ | 8017/9500 [27:28:50<5:06:51, 12.42s/it]08/04/2024 01:26:20 - INFO - __main__ -   Step: 8017, LR: 3.217288380469177e-06, Loss: 430.2208251953125
2024-08-04T08:26:32.898918487Z 
 84%|████████▍ | 8018/9500 [27:29:02<5:08:24, 12.49s/it]08/04/2024 01:26:32 - INFO - __main__ -   Step: 8018, LR: 3.215117836781898e-06, Loss: 375.9151306152344
2024-08-04T08:26:44.997092069Z 
 84%|████████▍ | 8019/9500 [27:29:14<5:05:19, 12.37s/it]08/04/2024 01:26:44 - INFO - __main__ -   Step: 8019, LR: 3.212947293094619e-06, Loss: 480.6463317871094
2024-08-04T08:26:57.510930983Z 
 84%|████████▍ | 8020/9500 [27:29:27<5:06:11, 12.41s/it]08/04/2024 01:26:57 - INFO - __main__ -   Step: 8020, LR: 3.21077674940734e-06, Loss: 336.2633972167969
2024-08-04T08:27:10.024220522Z 
 84%|████████▍ | 8021/9500 [27:29:39<5:06:42, 12.44s/it]08/04/2024 01:27:10 - INFO - __main__ -   Step: 8021, LR: 3.208606205720061e-06, Loss: 355.2811279296875
2024-08-04T08:27:22.063103843Z 
 84%|████████▍ | 8022/9500 [27:29:51<5:03:31, 12.32s/it]08/04/2024 01:27:22 - INFO - __main__ -   Step: 8022, LR: 3.2064356620327826e-06, Loss: 363.35552978515625
2024-08-04T08:27:34.579891618Z 
 84%|████████▍ | 8023/9500 [27:30:04<5:04:45, 12.38s/it]08/04/2024 01:27:34 - INFO - __main__ -   Step: 8023, LR: 3.2042651183455033e-06, Loss: 360.33905029296875
2024-08-04T08:27:47.184089096Z 
 84%|████████▍ | 8024/9500 [27:30:17<5:06:12, 12.45s/it]08/04/2024 01:27:47 - INFO - __main__ -   Step: 8024, LR: 3.2020945746582245e-06, Loss: 471.4025573730469
2024-08-04T08:27:59.307933133Z 
 84%|████████▍ | 8025/9500 [27:30:29<5:03:36, 12.35s/it]08/04/2024 01:27:59 - INFO - __main__ -   Step: 8025, LR: 3.1999240309709456e-06, Loss: 363.4774475097656
2024-08-04T08:28:11.615832506Z 
 84%|████████▍ | 8026/9500 [27:30:41<5:03:05, 12.34s/it]08/04/2024 01:28:11 - INFO - __main__ -   Step: 8026, LR: 3.1977534872836667e-06, Loss: 354.34259033203125
2024-08-04T08:28:24.286338188Z 
 84%|████████▍ | 8027/9500 [27:30:54<5:05:20, 12.44s/it]08/04/2024 01:28:24 - INFO - __main__ -   Step: 8027, LR: 3.1955829435963874e-06, Loss: 492.7589111328125
2024-08-04T08:28:36.377710065Z 
 85%|████████▍ | 8028/9500 [27:31:06<5:02:35, 12.33s/it]08/04/2024 01:28:36 - INFO - __main__ -   Step: 8028, LR: 3.1934123999091086e-06, Loss: 427.335693359375
2024-08-04T08:28:48.722995389Z 
 85%|████████▍ | 8029/9500 [27:31:18<5:02:27, 12.34s/it]08/04/2024 01:28:48 - INFO - __main__ -   Step: 8029, LR: 3.19124185622183e-06, Loss: 317.86627197265625
2024-08-04T08:29:01.371425711Z 
 85%|████████▍ | 8030/9500 [27:31:31<5:04:32, 12.43s/it]08/04/2024 01:29:01 - INFO - __main__ -   Step: 8030, LR: 3.189071312534551e-06, Loss: 397.338134765625
2024-08-04T08:29:13.266170686Z 
 85%|████████▍ | 8031/9500 [27:31:43<5:00:24, 12.27s/it]08/04/2024 01:29:13 - INFO - __main__ -   Step: 8031, LR: 3.186900768847272e-06, Loss: 369.6448059082031
2024-08-04T08:29:25.794426423Z 
 85%|████████▍ | 8032/9500 [27:31:55<5:02:05, 12.35s/it]08/04/2024 01:29:25 - INFO - __main__ -   Step: 8032, LR: 3.184730225159993e-06, Loss: 333.7857971191406
2024-08-04T08:29:38.727338101Z 
 85%|████████▍ | 8033/9500 [27:32:08<5:06:11, 12.52s/it]08/04/2024 01:29:38 - INFO - __main__ -   Step: 8033, LR: 3.182559681472714e-06, Loss: 484.6727294921875
2024-08-04T08:29:50.681619668Z 
 85%|████████▍ | 8034/9500 [27:32:20<5:01:48, 12.35s/it]08/04/2024 01:29:50 - INFO - __main__ -   Step: 8034, LR: 3.180389137785435e-06, Loss: 416.5273132324219
2024-08-04T08:30:03.204844705Z 
 85%|████████▍ | 8035/9500 [27:32:33<5:02:51, 12.40s/it]08/04/2024 01:30:03 - INFO - __main__ -   Step: 8035, LR: 3.1782185940981565e-06, Loss: 417.80670166015625
2024-08-04T08:30:15.105199805Z 
 85%|████████▍ | 8036/9500 [27:32:45<4:58:57, 12.25s/it]08/04/2024 01:30:15 - INFO - __main__ -   Step: 8036, LR: 3.1760480504108777e-06, Loss: 286.95477294921875
2024-08-04T08:30:27.601683808Z 
 85%|████████▍ | 8037/9500 [27:32:57<5:00:32, 12.33s/it]08/04/2024 01:30:27 - INFO - __main__ -   Step: 8037, LR: 3.1738775067235984e-06, Loss: 382.0722351074219
2024-08-04T08:30:40.033806845Z 
 85%|████████▍ | 8038/9500 [27:33:09<5:01:06, 12.36s/it]08/04/2024 01:30:40 - INFO - __main__ -   Step: 8038, LR: 3.1717069630363195e-06, Loss: 430.82379150390625
2024-08-04T08:30:52.186183538Z 
 85%|████████▍ | 8039/9500 [27:33:22<4:59:24, 12.30s/it]08/04/2024 01:30:52 - INFO - __main__ -   Step: 8039, LR: 3.1695364193490406e-06, Loss: 369.50689697265625
2024-08-04T08:31:04.632379556Z 
 85%|████████▍ | 8040/9500 [27:33:34<5:00:18, 12.34s/it]08/04/2024 01:31:04 - INFO - __main__ -   Step: 8040, LR: 3.1673658756617614e-06, Loss: 343.9112243652344
2024-08-04T08:31:17.036772174Z 
 85%|████████▍ | 8041/9500 [27:33:46<5:00:33, 12.36s/it]08/04/2024 01:31:17 - INFO - __main__ -   Step: 8041, LR: 3.1651953319744825e-06, Loss: 425.83935546875
2024-08-04T08:31:29.142030397Z 
 85%|████████▍ | 8042/9500 [27:33:59<4:58:29, 12.28s/it]08/04/2024 01:31:29 - INFO - __main__ -   Step: 8042, LR: 3.163024788287204e-06, Loss: 385.074462890625
2024-08-04T08:31:41.760899723Z 
 85%|████████▍ | 8043/9500 [27:34:11<5:00:43, 12.38s/it]08/04/2024 01:31:41 - INFO - __main__ -   Step: 8043, LR: 3.160854244599925e-06, Loss: 465.3248291015625
2024-08-04T08:31:54.001675230Z 
 85%|████████▍ | 8044/9500 [27:34:23<4:59:28, 12.34s/it]08/04/2024 01:31:54 - INFO - __main__ -   Step: 8044, LR: 3.158683700912646e-06, Loss: 418.982666015625
2024-08-04T08:32:06.082704686Z 
 85%|████████▍ | 8045/9500 [27:34:36<4:57:22, 12.26s/it]08/04/2024 01:32:06 - INFO - __main__ -   Step: 8045, LR: 3.156513157225367e-06, Loss: 317.64007568359375
2024-08-04T08:32:18.479055809Z 
 85%|████████▍ | 8046/9500 [27:34:48<4:58:08, 12.30s/it]08/04/2024 01:32:18 - INFO - __main__ -   Step: 8046, LR: 3.154342613538088e-06, Loss: 328.631103515625
2024-08-04T08:32:30.460322227Z 
 85%|████████▍ | 8047/9500 [27:35:00<4:55:36, 12.21s/it]08/04/2024 01:32:30 - INFO - __main__ -   Step: 8047, LR: 3.152172069850809e-06, Loss: 390.8018493652344
2024-08-04T08:32:42.985571777Z 
 85%|████████▍ | 8048/9500 [27:35:12<4:57:42, 12.30s/it]08/04/2024 01:32:42 - INFO - __main__ -   Step: 8048, LR: 3.1500015261635305e-06, Loss: 437.2563171386719
2024-08-04T08:32:55.816910301Z 
 85%|████████▍ | 8049/9500 [27:35:25<5:01:20, 12.46s/it]08/04/2024 01:32:55 - INFO - __main__ -   Step: 8049, LR: 3.1478309824762516e-06, Loss: 468.20831298828125
2024-08-04T08:33:07.862839905Z 
 85%|████████▍ | 8050/9500 [27:35:37<4:58:07, 12.34s/it]08/04/2024 01:33:07 - INFO - __main__ -   Step: 8050, LR: 3.1456604387889727e-06, Loss: 287.23797607421875
2024-08-04T08:33:20.041426965Z 
 85%|████████▍ | 8051/9500 [27:35:49<4:56:46, 12.29s/it]08/04/2024 01:33:20 - INFO - __main__ -   Step: 8051, LR: 3.1434898951016934e-06, Loss: 482.17559814453125
2024-08-04T08:33:32.413335085Z 
 85%|████████▍ | 8052/9500 [27:36:02<4:57:10, 12.31s/it]08/04/2024 01:33:32 - INFO - __main__ -   Step: 8052, LR: 3.1413193514144146e-06, Loss: 457.2913818359375
2024-08-04T08:33:44.546736802Z 
 85%|████████▍ | 8053/9500 [27:36:14<4:55:39, 12.26s/it]08/04/2024 01:33:44 - INFO - __main__ -   Step: 8053, LR: 3.139148807727136e-06, Loss: 380.7496337890625
2024-08-04T08:33:57.027122558Z 
 85%|████████▍ | 8054/9500 [27:36:26<4:57:03, 12.33s/it]08/04/2024 01:33:57 - INFO - __main__ -   Step: 8054, LR: 3.1369782640398564e-06, Loss: 330.3358459472656
2024-08-04T08:34:09.901546725Z 
 85%|████████▍ | 8055/9500 [27:36:39<5:00:48, 12.49s/it]08/04/2024 01:34:09 - INFO - __main__ -   Step: 8055, LR: 3.134807720352578e-06, Loss: 529.3228149414062
2024-08-04T08:34:22.376233429Z 
 85%|████████▍ | 8056/9500 [27:36:52<5:00:29, 12.49s/it]08/04/2024 01:34:22 - INFO - __main__ -   Step: 8056, LR: 3.132637176665299e-06, Loss: 456.42852783203125
2024-08-04T08:34:34.625183664Z 
 85%|████████▍ | 8057/9500 [27:37:04<4:58:34, 12.41s/it]08/04/2024 01:34:34 - INFO - __main__ -   Step: 8057, LR: 3.1304666329780203e-06, Loss: 364.32427978515625
2024-08-04T08:34:47.111929822Z 
 85%|████████▍ | 8058/9500 [27:37:17<4:58:53, 12.44s/it]08/04/2024 01:34:47 - INFO - __main__ -   Step: 8058, LR: 3.128296089290741e-06, Loss: 403.2135009765625
2024-08-04T08:34:59.131760337Z 
 85%|████████▍ | 8059/9500 [27:37:29<4:55:40, 12.31s/it]08/04/2024 01:34:59 - INFO - __main__ -   Step: 8059, LR: 3.126125545603462e-06, Loss: 399.095947265625
2024-08-04T08:35:11.748576370Z 
 85%|████████▍ | 8060/9500 [27:37:41<4:57:40, 12.40s/it]08/04/2024 01:35:11 - INFO - __main__ -   Step: 8060, LR: 3.1239550019161837e-06, Loss: 495.21710205078125
2024-08-04T08:35:24.267780829Z 
 85%|████████▍ | 8061/9500 [27:37:54<4:58:18, 12.44s/it]08/04/2024 01:35:24 - INFO - __main__ -   Step: 8061, LR: 3.1217844582289044e-06, Loss: 387.18231201171875
2024-08-04T08:35:36.304214829Z 
 85%|████████▍ | 8062/9500 [27:38:06<4:55:12, 12.32s/it]08/04/2024 01:35:36 - INFO - __main__ -   Step: 8062, LR: 3.1196139145416255e-06, Loss: 469.1184387207031
2024-08-04T08:35:48.847309789Z 
 85%|████████▍ | 8063/9500 [27:38:18<4:56:37, 12.39s/it]08/04/2024 01:35:48 - INFO - __main__ -   Step: 8063, LR: 3.1174433708543466e-06, Loss: 392.83258056640625
2024-08-04T08:36:01.307800706Z 
 85%|████████▍ | 8064/9500 [27:38:31<4:56:57, 12.41s/it]08/04/2024 01:36:01 - INFO - __main__ -   Step: 8064, LR: 3.1152728271670678e-06, Loss: 478.42755126953125
2024-08-04T08:36:13.971048346Z 
 85%|████████▍ | 8065/9500 [27:38:43<4:58:35, 12.48s/it]08/04/2024 01:36:13 - INFO - __main__ -   Step: 8065, LR: 3.1131022834797885e-06, Loss: 421.6029968261719
2024-08-04T08:36:26.209067485Z 
 85%|████████▍ | 8066/9500 [27:38:56<4:56:36, 12.41s/it]08/04/2024 01:36:26 - INFO - __main__ -   Step: 8066, LR: 3.11093173979251e-06, Loss: 369.03375244140625
2024-08-04T08:36:38.775436045Z 
 85%|████████▍ | 8067/9500 [27:39:08<4:57:31, 12.46s/it]08/04/2024 01:36:38 - INFO - __main__ -   Step: 8067, LR: 3.108761196105231e-06, Loss: 382.0674743652344
2024-08-04T08:36:50.993198332Z 
 85%|████████▍ | 8068/9500 [27:39:20<4:55:35, 12.39s/it]08/04/2024 01:36:50 - INFO - __main__ -   Step: 8068, LR: 3.106590652417952e-06, Loss: 460.8183288574219
2024-08-04T08:37:03.038573231Z 
 85%|████████▍ | 8069/9500 [27:39:32<4:52:57, 12.28s/it]08/04/2024 01:37:03 - INFO - __main__ -   Step: 8069, LR: 3.104420108730673e-06, Loss: 442.4211120605469
2024-08-04T08:37:15.362349493Z 
 85%|████████▍ | 8070/9500 [27:39:45<4:53:02, 12.30s/it]08/04/2024 01:37:15 - INFO - __main__ -   Step: 8070, LR: 3.102249565043394e-06, Loss: 359.4608154296875
2024-08-04T08:37:27.963618311Z 
 85%|████████▍ | 8071/9500 [27:39:57<4:55:01, 12.39s/it]08/04/2024 01:37:27 - INFO - __main__ -   Step: 8071, LR: 3.1000790213561153e-06, Loss: 379.32763671875
2024-08-04T08:37:40.408999003Z 
 85%|████████▍ | 8072/9500 [27:40:10<4:55:13, 12.40s/it]08/04/2024 01:37:40 - INFO - __main__ -   Step: 8072, LR: 3.097908477668836e-06, Loss: 422.02978515625
2024-08-04T08:37:53.083394975Z 
 85%|████████▍ | 8073/9500 [27:40:23<4:56:56, 12.49s/it]08/04/2024 01:37:53 - INFO - __main__ -   Step: 8073, LR: 3.0957379339815576e-06, Loss: 342.5057678222656
2024-08-04T08:38:05.255559718Z 
 85%|████████▍ | 8074/9500 [27:40:35<4:54:30, 12.39s/it]08/04/2024 01:38:05 - INFO - __main__ -   Step: 8074, LR: 3.0935673902942787e-06, Loss: 428.881591796875
2024-08-04T08:38:17.812261753Z 
 85%|████████▌ | 8075/9500 [27:40:47<4:55:28, 12.44s/it]08/04/2024 01:38:17 - INFO - __main__ -   Step: 8075, LR: 3.0913968466069994e-06, Loss: 475.606201171875
2024-08-04T08:38:30.416375766Z 
 85%|████████▌ | 8076/9500 [27:41:00<4:56:25, 12.49s/it]08/04/2024 01:38:30 - INFO - __main__ -   Step: 8076, LR: 3.0892263029197206e-06, Loss: 372.116943359375
2024-08-04T08:38:42.485979264Z 
 85%|████████▌ | 8077/9500 [27:41:12<4:53:13, 12.36s/it]08/04/2024 01:38:42 - INFO - __main__ -   Step: 8077, LR: 3.0870557592324417e-06, Loss: 357.5934143066406
2024-08-04T08:38:54.956545925Z 
 85%|████████▌ | 8078/9500 [27:41:24<4:53:47, 12.40s/it]08/04/2024 01:38:54 - INFO - __main__ -   Step: 8078, LR: 3.0848852155451633e-06, Loss: 390.05938720703125
2024-08-04T08:39:07.089585195Z 
 85%|████████▌ | 8079/9500 [27:41:37<4:51:42, 12.32s/it]08/04/2024 01:39:07 - INFO - __main__ -   Step: 8079, LR: 3.0827146718578835e-06, Loss: 334.28143310546875
2024-08-04T08:39:19.867033889Z 
 85%|████████▌ | 8080/9500 [27:41:49<4:54:46, 12.46s/it]08/04/2024 01:39:19 - INFO - __main__ -   Step: 8080, LR: 3.080544128170605e-06, Loss: 397.4130859375
2024-08-04T08:39:32.108627690Z 
 85%|████████▌ | 8081/9500 [27:42:02<4:53:03, 12.39s/it]08/04/2024 01:39:32 - INFO - __main__ -   Step: 8081, LR: 3.0783735844833262e-06, Loss: 445.26739501953125
2024-08-04T08:39:44.437167262Z 
 85%|████████▌ | 8082/9500 [27:42:14<4:52:24, 12.37s/it]08/04/2024 01:39:44 - INFO - __main__ -   Step: 8082, LR: 3.076203040796047e-06, Loss: 406.96795654296875
2024-08-04T08:39:57.252531998Z 
 85%|████████▌ | 8083/9500 [27:42:27<4:55:19, 12.51s/it]08/04/2024 01:39:57 - INFO - __main__ -   Step: 8083, LR: 3.074032497108768e-06, Loss: 368.9376220703125
2024-08-04T08:40:09.578430663Z 
 85%|████████▌ | 8084/9500 [27:42:39<4:53:51, 12.45s/it]08/04/2024 01:40:09 - INFO - __main__ -   Step: 8084, LR: 3.0718619534214892e-06, Loss: 432.04486083984375
2024-08-04T08:40:21.578669453Z 
 85%|████████▌ | 8085/9500 [27:42:51<4:50:27, 12.32s/it]08/04/2024 01:40:21 - INFO - __main__ -   Step: 8085, LR: 3.069691409734211e-06, Loss: 387.31329345703125
2024-08-04T08:40:34.102553287Z 
 85%|████████▌ | 8086/9500 [27:43:04<4:51:43, 12.38s/it]08/04/2024 01:40:34 - INFO - __main__ -   Step: 8086, LR: 3.0675208660469315e-06, Loss: 405.4101867675781
2024-08-04T08:40:46.312566861Z 
 85%|████████▌ | 8087/9500 [27:43:16<4:50:19, 12.33s/it]08/04/2024 01:40:46 - INFO - __main__ -   Step: 8087, LR: 3.0653503223596526e-06, Loss: 457.64837646484375
2024-08-04T08:40:58.804311824Z 
 85%|████████▌ | 8088/9500 [27:43:28<4:51:16, 12.38s/it]08/04/2024 01:40:58 - INFO - __main__ -   Step: 8088, LR: 3.0631797786723738e-06, Loss: 326.4461669921875
2024-08-04T08:41:11.468995396Z 
 85%|████████▌ | 8089/9500 [27:43:41<4:53:05, 12.46s/it]08/04/2024 01:41:11 - INFO - __main__ -   Step: 8089, LR: 3.0610092349850945e-06, Loss: 384.962158203125
2024-08-04T08:41:23.772807097Z 
 85%|████████▌ | 8090/9500 [27:43:53<4:51:45, 12.42s/it]08/04/2024 01:41:23 - INFO - __main__ -   Step: 8090, LR: 3.0588386912978156e-06, Loss: 469.3534240722656
2024-08-04T08:41:36.086098029Z 
 85%|████████▌ | 8091/9500 [27:44:06<4:50:50, 12.38s/it]08/04/2024 01:41:36 - INFO - __main__ -   Step: 8091, LR: 3.056668147610537e-06, Loss: 376.8329162597656
2024-08-04T08:41:48.344999911Z 
 85%|████████▌ | 8092/9500 [27:44:18<4:49:44, 12.35s/it]08/04/2024 01:41:48 - INFO - __main__ -   Step: 8092, LR: 3.0544976039232583e-06, Loss: 370.6722412109375
2024-08-04T08:42:00.790726063Z 
 85%|████████▌ | 8093/9500 [27:44:30<4:50:13, 12.38s/it]08/04/2024 01:42:00 - INFO - __main__ -   Step: 8093, LR: 3.052327060235979e-06, Loss: 540.7086181640625
2024-08-04T08:42:12.735073088Z 
 85%|████████▌ | 8094/9500 [27:44:42<4:46:59, 12.25s/it]08/04/2024 01:42:12 - INFO - __main__ -   Step: 8094, LR: 3.0501565165487e-06, Loss: 329.09588623046875
2024-08-04T08:42:25.536886085Z 
 85%|████████▌ | 8095/9500 [27:44:55<4:50:40, 12.41s/it]08/04/2024 01:42:25 - INFO - __main__ -   Step: 8095, LR: 3.0479859728614213e-06, Loss: 464.7832946777344
2024-08-04T08:42:37.599546913Z 
 85%|████████▌ | 8096/9500 [27:45:07<4:48:00, 12.31s/it]08/04/2024 01:42:37 - INFO - __main__ -   Step: 8096, LR: 3.045815429174142e-06, Loss: 372.34796142578125
2024-08-04T08:42:50.090842597Z 
 85%|████████▌ | 8097/9500 [27:45:20<4:49:05, 12.36s/it]08/04/2024 01:42:50 - INFO - __main__ -   Step: 8097, LR: 3.043644885486863e-06, Loss: 430.6424560546875
2024-08-04T08:43:02.584548652Z 
 85%|████████▌ | 8098/9500 [27:45:32<4:49:48, 12.40s/it]08/04/2024 01:43:02 - INFO - __main__ -   Step: 8098, LR: 3.0414743417995847e-06, Loss: 493.8755187988281
2024-08-04T08:43:14.537057720Z 
 85%|████████▌ | 8099/9500 [27:45:44<4:46:26, 12.27s/it]08/04/2024 01:43:14 - INFO - __main__ -   Step: 8099, LR: 3.039303798112306e-06, Loss: 386.268310546875
2024-08-04T08:43:26.887003827Z 
 85%|████████▌ | 8100/9500 [27:45:56<4:46:48, 12.29s/it]08/04/2024 01:43:26 - INFO - __main__ -   Step: 8100, LR: 3.0371332544250266e-06, Loss: 401.91497802734375
2024-08-04T08:43:39.429764042Z 
 85%|████████▌ | 8101/9500 [27:46:09<4:48:21, 12.37s/it]08/04/2024 01:43:39 - INFO - __main__ -   Step: 8101, LR: 3.0349627107377477e-06, Loss: 380.86920166015625
2024-08-04T08:43:51.306531587Z 
 85%|████████▌ | 8102/9500 [27:46:21<4:44:43, 12.22s/it]08/04/2024 01:43:51 - INFO - __main__ -   Step: 8102, LR: 3.032792167050469e-06, Loss: 387.0227355957031
2024-08-04T08:44:03.463603664Z 
 85%|████████▌ | 8103/9500 [27:46:33<4:44:05, 12.20s/it]08/04/2024 01:44:03 - INFO - __main__ -   Step: 8103, LR: 3.0306216233631895e-06, Loss: 305.09912109375
2024-08-04T08:44:16.242841184Z 
 85%|████████▌ | 8104/9500 [27:46:46<4:47:55, 12.37s/it]08/04/2024 01:44:16 - INFO - __main__ -   Step: 8104, LR: 3.028451079675911e-06, Loss: 359.7258605957031
2024-08-04T08:44:28.503262085Z 
 85%|████████▌ | 8105/9500 [27:46:58<4:46:54, 12.34s/it]08/04/2024 01:44:28 - INFO - __main__ -   Step: 8105, LR: 3.0262805359886322e-06, Loss: 434.39923095703125
2024-08-04T08:44:40.857428224Z 
 85%|████████▌ | 8106/9500 [27:47:10<4:46:48, 12.34s/it]08/04/2024 01:44:40 - INFO - __main__ -   Step: 8106, LR: 3.0241099923013534e-06, Loss: 429.458251953125
2024-08-04T08:44:53.689676041Z 
 85%|████████▌ | 8107/9500 [27:47:23<4:49:59, 12.49s/it]08/04/2024 01:44:53 - INFO - __main__ -   Step: 8107, LR: 3.021939448614074e-06, Loss: 345.35247802734375
2024-08-04T08:45:05.933341202Z 
 85%|████████▌ | 8108/9500 [27:47:35<4:48:04, 12.42s/it]08/04/2024 01:45:05 - INFO - __main__ -   Step: 8108, LR: 3.0197689049267952e-06, Loss: 454.5196533203125
2024-08-04T08:45:18.347252719Z 
 85%|████████▌ | 8109/9500 [27:47:48<4:47:50, 12.42s/it]08/04/2024 01:45:18 - INFO - __main__ -   Step: 8109, LR: 3.0175983612395164e-06, Loss: 436.1174621582031
2024-08-04T08:45:30.803846655Z 
 85%|████████▌ | 8110/9500 [27:48:00<4:47:54, 12.43s/it]08/04/2024 01:45:30 - INFO - __main__ -   Step: 8110, LR: 3.015427817552237e-06, Loss: 374.6716613769531
2024-08-04T08:45:43.106105027Z 
 85%|████████▌ | 8111/9500 [27:48:13<4:46:50, 12.39s/it]08/04/2024 01:45:43 - INFO - __main__ -   Step: 8111, LR: 3.0132572738649586e-06, Loss: 401.48114013671875
2024-08-04T08:45:55.679520497Z 
 85%|████████▌ | 8112/9500 [27:48:25<4:47:54, 12.45s/it]08/04/2024 01:45:55 - INFO - __main__ -   Step: 8112, LR: 3.0110867301776798e-06, Loss: 485.3427429199219
2024-08-04T08:46:07.819191877Z 
 85%|████████▌ | 8113/9500 [27:48:37<4:45:34, 12.35s/it]08/04/2024 01:46:07 - INFO - __main__ -   Step: 8113, LR: 3.0089161864904005e-06, Loss: 256.6783752441406
2024-08-04T08:46:19.858427624Z 
 85%|████████▌ | 8114/9500 [27:48:49<4:43:11, 12.26s/it]08/04/2024 01:46:19 - INFO - __main__ -   Step: 8114, LR: 3.0067456428031216e-06, Loss: 354.3554382324219
2024-08-04T08:46:32.263680561Z 
 85%|████████▌ | 8115/9500 [27:49:02<4:43:59, 12.30s/it]08/04/2024 01:46:32 - INFO - __main__ -   Step: 8115, LR: 3.0045750991158428e-06, Loss: 461.6329650878906
2024-08-04T08:46:44.855789492Z 
 85%|████████▌ | 8116/9500 [27:49:14<4:45:47, 12.39s/it]08/04/2024 01:46:44 - INFO - __main__ -   Step: 8116, LR: 3.0024045554285643e-06, Loss: 310.434326171875
2024-08-04T08:46:56.883539111Z 
 85%|████████▌ | 8117/9500 [27:49:26<4:43:04, 12.28s/it]08/04/2024 01:46:56 - INFO - __main__ -   Step: 8117, LR: 3.000234011741285e-06, Loss: 322.67877197265625
2024-08-04T08:47:08.972622671Z 
 85%|████████▌ | 8118/9500 [27:49:38<4:41:32, 12.22s/it]08/04/2024 01:47:08 - INFO - __main__ -   Step: 8118, LR: 2.998063468054006e-06, Loss: 302.9907531738281
2024-08-04T08:47:21.446219859Z 
 85%|████████▌ | 8119/9500 [27:49:51<4:43:04, 12.30s/it]08/04/2024 01:47:21 - INFO - __main__ -   Step: 8119, LR: 2.9958929243667273e-06, Loss: 350.935546875
2024-08-04T08:47:34.140495220Z 
 85%|████████▌ | 8120/9500 [27:50:04<4:45:35, 12.42s/it]08/04/2024 01:47:34 - INFO - __main__ -   Step: 8120, LR: 2.993722380679448e-06, Loss: 363.2049255371094
2024-08-04T08:47:46.392160272Z 
 85%|████████▌ | 8121/9500 [27:50:16<4:44:14, 12.37s/it]08/04/2024 01:47:46 - INFO - __main__ -   Step: 8121, LR: 2.991551836992169e-06, Loss: 397.4495544433594
2024-08-04T08:47:58.445240398Z 
 85%|████████▌ | 8122/9500 [27:50:28<4:41:52, 12.27s/it]08/04/2024 01:47:58 - INFO - __main__ -   Step: 8122, LR: 2.9893812933048903e-06, Loss: 417.24420166015625
2024-08-04T08:48:11.433598966Z 
 86%|████████▌ | 8123/9500 [27:50:41<4:46:35, 12.49s/it]08/04/2024 01:48:11 - INFO - __main__ -   Step: 8123, LR: 2.987210749617612e-06, Loss: 402.9075012207031
2024-08-04T08:48:23.970984888Z 
 86%|████████▌ | 8124/9500 [27:50:53<4:46:43, 12.50s/it]08/04/2024 01:48:23 - INFO - __main__ -   Step: 8124, LR: 2.9850402059303326e-06, Loss: 386.3216552734375
2024-08-04T08:48:36.138476845Z 
 86%|████████▌ | 8125/9500 [27:51:06<4:44:13, 12.40s/it]08/04/2024 01:48:36 - INFO - __main__ -   Step: 8125, LR: 2.9828696622430537e-06, Loss: 549.2130737304688
2024-08-04T08:48:48.758986416Z 
 86%|████████▌ | 8126/9500 [27:51:18<4:45:30, 12.47s/it]08/04/2024 01:48:48 - INFO - __main__ -   Step: 8126, LR: 2.980699118555775e-06, Loss: 490.05755615234375
2024-08-04T08:49:01.089250271Z 
 86%|████████▌ | 8127/9500 [27:51:31<4:44:21, 12.43s/it]08/04/2024 01:49:01 - INFO - __main__ -   Step: 8127, LR: 2.9785285748684955e-06, Loss: 343.03375244140625
2024-08-04T08:49:13.083828133Z 
 86%|████████▌ | 8128/9500 [27:51:43<4:41:11, 12.30s/it]08/04/2024 01:49:13 - INFO - __main__ -   Step: 8128, LR: 2.9763580311812167e-06, Loss: 374.0223388671875
2024-08-04T08:49:25.388361417Z 
 86%|████████▌ | 8129/9500 [27:51:55<4:41:02, 12.30s/it]08/04/2024 01:49:25 - INFO - __main__ -   Step: 8129, LR: 2.9741874874939382e-06, Loss: 285.04632568359375
2024-08-04T08:49:37.729234898Z 
 86%|████████▌ | 8130/9500 [27:52:07<4:41:06, 12.31s/it]08/04/2024 01:49:37 - INFO - __main__ -   Step: 8130, LR: 2.9720169438066594e-06, Loss: 409.55059814453125
2024-08-04T08:49:49.947851992Z 
 86%|████████▌ | 8131/9500 [27:52:19<4:40:16, 12.28s/it]08/04/2024 01:49:49 - INFO - __main__ -   Step: 8131, LR: 2.96984640011938e-06, Loss: 371.9742126464844
2024-08-04T08:50:02.363694609Z 
 86%|████████▌ | 8132/9500 [27:52:32<4:40:58, 12.32s/it]08/04/2024 01:50:02 - INFO - __main__ -   Step: 8132, LR: 2.9676758564321012e-06, Loss: 377.04632568359375
2024-08-04T08:50:14.606899587Z 
 86%|████████▌ | 8133/9500 [27:52:44<4:40:13, 12.30s/it]08/04/2024 01:50:14 - INFO - __main__ -   Step: 8133, LR: 2.9655053127448224e-06, Loss: 421.27294921875
2024-08-04T08:50:26.743430902Z 
 86%|████████▌ | 8134/9500 [27:52:56<4:38:54, 12.25s/it]08/04/2024 01:50:26 - INFO - __main__ -   Step: 8134, LR: 2.963334769057543e-06, Loss: 411.1826171875
2024-08-04T08:50:39.559316547Z 
 86%|████████▌ | 8135/9500 [27:53:09<4:42:33, 12.42s/it]08/04/2024 01:50:39 - INFO - __main__ -   Step: 8135, LR: 2.961164225370264e-06, Loss: 310.33880615234375
2024-08-04T08:50:51.859128599Z 
 86%|████████▌ | 8136/9500 [27:53:21<4:41:31, 12.38s/it]08/04/2024 01:50:51 - INFO - __main__ -   Step: 8136, LR: 2.9589936816829858e-06, Loss: 455.2528991699219
2024-08-04T08:51:04.104267265Z 
 86%|████████▌ | 8137/9500 [27:53:34<4:40:22, 12.34s/it]08/04/2024 01:51:04 - INFO - __main__ -   Step: 8137, LR: 2.956823137995707e-06, Loss: 382.0194091796875
2024-08-04T08:51:16.426539606Z 
 86%|████████▌ | 8138/9500 [27:53:46<4:40:02, 12.34s/it]08/04/2024 01:51:16 - INFO - __main__ -   Step: 8138, LR: 2.9546525943084276e-06, Loss: 296.0928955078125
2024-08-04T08:51:28.505367490Z 
 86%|████████▌ | 8139/9500 [27:53:58<4:38:04, 12.26s/it]08/04/2024 01:51:28 - INFO - __main__ -   Step: 8139, LR: 2.9524820506211487e-06, Loss: 510.97760009765625
2024-08-04T08:51:40.774045989Z 
 86%|████████▌ | 8140/9500 [27:54:10<4:37:56, 12.26s/it]08/04/2024 01:51:40 - INFO - __main__ -   Step: 8140, LR: 2.95031150693387e-06, Loss: 372.6488342285156
2024-08-04T08:51:53.511694293Z 
 86%|████████▌ | 8141/9500 [27:54:23<4:40:57, 12.40s/it]08/04/2024 01:51:53 - INFO - __main__ -   Step: 8141, LR: 2.9481409632465906e-06, Loss: 464.39202880859375
2024-08-04T08:52:05.737291069Z 
 86%|████████▌ | 8142/9500 [27:54:35<4:39:32, 12.35s/it]08/04/2024 01:52:05 - INFO - __main__ -   Step: 8142, LR: 2.945970419559312e-06, Loss: 384.126708984375
2024-08-04T08:52:17.987666417Z 
 86%|████████▌ | 8143/9500 [27:54:47<4:38:39, 12.32s/it]08/04/2024 01:52:17 - INFO - __main__ -   Step: 8143, LR: 2.9437998758720333e-06, Loss: 404.8497314453125
2024-08-04T08:52:30.353205335Z 
 86%|████████▌ | 8144/9500 [27:55:00<4:38:45, 12.33s/it]08/04/2024 01:52:30 - INFO - __main__ -   Step: 8144, LR: 2.9416293321847544e-06, Loss: 349.1548767089844
2024-08-04T08:52:42.478242699Z 
 86%|████████▌ | 8145/9500 [27:55:12<4:37:07, 12.27s/it]08/04/2024 01:52:42 - INFO - __main__ -   Step: 8145, LR: 2.939458788497475e-06, Loss: 379.07415771484375
2024-08-04T08:52:54.930835655Z 
 86%|████████▌ | 8146/9500 [27:55:24<4:38:09, 12.33s/it]08/04/2024 01:52:54 - INFO - __main__ -   Step: 8146, LR: 2.9372882448101963e-06, Loss: 420.7255859375
2024-08-04T08:53:07.379487591Z 
 86%|████████▌ | 8147/9500 [27:55:37<4:38:46, 12.36s/it]08/04/2024 01:53:07 - INFO - __main__ -   Step: 8147, LR: 2.9351177011229174e-06, Loss: 341.25604248046875
2024-08-04T08:53:19.633693051Z 
 86%|████████▌ | 8148/9500 [27:55:49<4:37:50, 12.33s/it]08/04/2024 01:53:19 - INFO - __main__ -   Step: 8148, LR: 2.932947157435638e-06, Loss: 299.562744140625
2024-08-04T08:53:32.222201847Z 
 86%|████████▌ | 8149/9500 [27:56:02<4:39:22, 12.41s/it]08/04/2024 01:53:32 - INFO - __main__ -   Step: 8149, LR: 2.9307766137483597e-06, Loss: 578.8458862304688
2024-08-04T08:53:44.760401573Z 
 86%|████████▌ | 8150/9500 [27:56:14<4:40:03, 12.45s/it]08/04/2024 01:53:44 - INFO - __main__ -   Step: 8150, LR: 2.928606070061081e-06, Loss: 389.9256896972656
2024-08-04T08:53:56.815505022Z 
 86%|████████▌ | 8151/9500 [27:56:26<4:37:12, 12.33s/it]08/04/2024 01:53:56 - INFO - __main__ -   Step: 8151, LR: 2.926435526373802e-06, Loss: 299.65185546875
2024-08-04T08:54:08.826854315Z 
 86%|████████▌ | 8152/9500 [27:56:38<4:34:51, 12.23s/it]08/04/2024 01:54:08 - INFO - __main__ -   Step: 8152, LR: 2.9242649826865227e-06, Loss: 385.1534118652344
2024-08-04T08:54:21.357252631Z 
 86%|████████▌ | 8153/9500 [27:56:51<4:36:38, 12.32s/it]08/04/2024 01:54:21 - INFO - __main__ -   Step: 8153, LR: 2.922094438999244e-06, Loss: 425.2777404785156
2024-08-04T08:54:33.569977684Z 
 86%|████████▌ | 8154/9500 [27:57:03<4:35:42, 12.29s/it]08/04/2024 01:54:33 - INFO - __main__ -   Step: 8154, LR: 2.9199238953119654e-06, Loss: 441.7900390625
2024-08-04T08:54:45.616762721Z 
 86%|████████▌ | 8155/9500 [27:57:15<4:33:51, 12.22s/it]08/04/2024 01:54:45 - INFO - __main__ -   Step: 8155, LR: 2.917753351624686e-06, Loss: 429.23394775390625
2024-08-04T08:54:58.189233662Z 
 86%|████████▌ | 8156/9500 [27:57:28<4:36:02, 12.32s/it]08/04/2024 01:54:58 - INFO - __main__ -   Step: 8156, LR: 2.915582807937407e-06, Loss: 391.827392578125
2024-08-04T08:55:10.151948872Z 
 86%|████████▌ | 8157/9500 [27:57:40<4:33:25, 12.22s/it]08/04/2024 01:55:10 - INFO - __main__ -   Step: 8157, LR: 2.9134122642501283e-06, Loss: 456.411376953125
2024-08-04T08:55:22.553480065Z 
 86%|████████▌ | 8158/9500 [27:57:52<4:34:27, 12.27s/it]08/04/2024 01:55:22 - INFO - __main__ -   Step: 8158, LR: 2.9112417205628495e-06, Loss: 471.2850341796875
2024-08-04T08:55:35.193406692Z 
 86%|████████▌ | 8159/9500 [27:58:05<4:36:44, 12.38s/it]08/04/2024 01:55:35 - INFO - __main__ -   Step: 8159, LR: 2.90907117687557e-06, Loss: 419.53192138671875
2024-08-04T08:55:47.455368545Z 
 86%|████████▌ | 8160/9500 [27:58:17<4:35:43, 12.35s/it]08/04/2024 01:55:47 - INFO - __main__ -   Step: 8160, LR: 2.9069006331882913e-06, Loss: 480.593017578125
2024-08-04T08:55:59.769311596Z 
 86%|████████▌ | 8161/9500 [27:58:29<4:35:18, 12.34s/it]08/04/2024 01:55:59 - INFO - __main__ -   Step: 8161, LR: 2.904730089501013e-06, Loss: 334.49713134765625
2024-08-04T08:56:12.332146011Z 
 86%|████████▌ | 8162/9500 [27:58:42<4:36:36, 12.40s/it]08/04/2024 01:56:12 - INFO - __main__ -   Step: 8162, LR: 2.9025595458137336e-06, Loss: 369.8572998046875
2024-08-04T08:56:24.661980304Z 
 86%|████████▌ | 8163/9500 [27:58:54<4:35:54, 12.38s/it]08/04/2024 01:56:24 - INFO - __main__ -   Step: 8163, LR: 2.9003890021264547e-06, Loss: 441.99176025390625
2024-08-04T08:56:36.508438216Z 
 86%|████████▌ | 8164/9500 [27:59:06<4:32:07, 12.22s/it]08/04/2024 01:56:36 - INFO - __main__ -   Step: 8164, LR: 2.898218458439176e-06, Loss: 288.6888732910156
2024-08-04T08:56:48.702842122Z 
 86%|████████▌ | 8165/9500 [27:59:18<4:31:44, 12.21s/it]08/04/2024 01:56:48 - INFO - __main__ -   Step: 8165, LR: 2.896047914751897e-06, Loss: 406.6143493652344
2024-08-04T08:57:01.269326698Z 
 86%|████████▌ | 8166/9500 [27:59:31<4:33:53, 12.32s/it]08/04/2024 01:57:01 - INFO - __main__ -   Step: 8166, LR: 2.8938773710646177e-06, Loss: 351.6376037597656
2024-08-04T08:57:13.196588085Z 
 86%|████████▌ | 8167/9500 [27:59:43<4:31:04, 12.20s/it]08/04/2024 01:57:13 - INFO - __main__ -   Step: 8167, LR: 2.8917068273773393e-06, Loss: 415.519775390625
2024-08-04T08:57:25.351258803Z 
 86%|████████▌ | 8168/9500 [27:59:55<4:30:33, 12.19s/it]08/04/2024 01:57:25 - INFO - __main__ -   Step: 8168, LR: 2.8895362836900604e-06, Loss: 401.9139099121094
2024-08-04T08:57:37.795501038Z 
 86%|████████▌ | 8169/9500 [28:00:07<4:32:04, 12.26s/it]08/04/2024 01:57:37 - INFO - __main__ -   Step: 8169, LR: 2.887365740002781e-06, Loss: 352.5034484863281
2024-08-04T08:57:50.464086813Z 
 86%|████████▌ | 8170/9500 [28:00:20<4:34:33, 12.39s/it]08/04/2024 01:57:50 - INFO - __main__ -   Step: 8170, LR: 2.8851951963155023e-06, Loss: 328.3021545410156
2024-08-04T08:58:02.611042179Z 
 86%|████████▌ | 8171/9500 [28:00:32<4:32:45, 12.31s/it]08/04/2024 01:58:02 - INFO - __main__ -   Step: 8171, LR: 2.8830246526282234e-06, Loss: 364.46783447265625
2024-08-04T08:58:14.999923998Z 
 86%|████████▌ | 8172/9500 [28:00:44<4:33:03, 12.34s/it]08/04/2024 01:58:14 - INFO - __main__ -   Step: 8172, LR: 2.8808541089409445e-06, Loss: 354.08074951171875
2024-08-04T08:58:27.023571585Z 
 86%|████████▌ | 8173/9500 [28:00:56<4:30:46, 12.24s/it]08/04/2024 01:58:27 - INFO - __main__ -   Step: 8173, LR: 2.8786835652536653e-06, Loss: 273.5914306640625
2024-08-04T08:58:39.459792840Z 
 86%|████████▌ | 8174/9500 [28:01:09<4:31:50, 12.30s/it]08/04/2024 01:58:39 - INFO - __main__ -   Step: 8174, LR: 2.876513021566387e-06, Loss: 403.252197265625
2024-08-04T08:58:52.135171931Z 
 86%|████████▌ | 8175/9500 [28:01:22<4:34:07, 12.41s/it]08/04/2024 01:58:52 - INFO - __main__ -   Step: 8175, LR: 2.874342477879108e-06, Loss: 383.5955810546875
2024-08-04T08:59:04.216539868Z 
 86%|████████▌ | 8176/9500 [28:01:34<4:31:43, 12.31s/it]08/04/2024 01:59:04 - INFO - __main__ -   Step: 8176, LR: 2.8721719341918287e-06, Loss: 457.8426208496094
2024-08-04T08:59:16.307578864Z 
 86%|████████▌ | 8177/9500 [28:01:46<4:30:02, 12.25s/it]08/04/2024 01:59:16 - INFO - __main__ -   Step: 8177, LR: 2.87000139050455e-06, Loss: 387.75897216796875
2024-08-04T08:59:28.760364564Z 
 86%|████████▌ | 8178/9500 [28:01:58<4:31:11, 12.31s/it]08/04/2024 01:59:28 - INFO - __main__ -   Step: 8178, LR: 2.867830846817271e-06, Loss: 391.44464111328125
2024-08-04T08:59:40.954567973Z 
 86%|████████▌ | 8179/9500 [28:02:10<4:30:14, 12.27s/it]08/04/2024 01:59:40 - INFO - __main__ -   Step: 8179, LR: 2.8656603031299925e-06, Loss: 411.90899658203125
2024-08-04T08:59:53.455406616Z 
 86%|████████▌ | 8180/9500 [28:02:23<4:31:31, 12.34s/it]08/04/2024 01:59:53 - INFO - __main__ -   Step: 8180, LR: 2.863489759442713e-06, Loss: 417.66973876953125
2024-08-04T09:00:06.332182383Z 
 86%|████████▌ | 8181/9500 [28:02:36<4:34:50, 12.50s/it]08/04/2024 02:00:06 - INFO - __main__ -   Step: 8181, LR: 2.8613192157554343e-06, Loss: 396.315185546875
2024-08-04T09:00:18.291668539Z 
 86%|████████▌ | 8182/9500 [28:02:48<4:31:03, 12.34s/it]08/04/2024 02:00:18 - INFO - __main__ -   Step: 8182, LR: 2.8591486720681555e-06, Loss: 367.87542724609375
2024-08-04T09:00:30.356413899Z 
 86%|████████▌ | 8183/9500 [28:03:00<4:29:02, 12.26s/it]08/04/2024 02:00:30 - INFO - __main__ -   Step: 8183, LR: 2.856978128380876e-06, Loss: 436.16864013671875
2024-08-04T09:00:42.834824886Z 
 86%|████████▌ | 8184/9500 [28:03:12<4:30:17, 12.32s/it]08/04/2024 02:00:42 - INFO - __main__ -   Step: 8184, LR: 2.8548075846935973e-06, Loss: 528.4457397460938
2024-08-04T09:00:55.124160279Z 
 86%|████████▌ | 8185/9500 [28:03:25<4:29:52, 12.31s/it]08/04/2024 02:00:55 - INFO - __main__ -   Step: 8185, LR: 2.8526370410063185e-06, Loss: 455.92388916015625
2024-08-04T09:01:07.589390972Z 
 86%|████████▌ | 8186/9500 [28:03:37<4:30:39, 12.36s/it]08/04/2024 02:01:07 - INFO - __main__ -   Step: 8186, LR: 2.85046649731904e-06, Loss: 405.583740234375
2024-08-04T09:01:20.286225822Z 
 86%|████████▌ | 8187/9500 [28:03:50<4:32:40, 12.46s/it]08/04/2024 02:01:20 - INFO - __main__ -   Step: 8187, LR: 2.8482959536317607e-06, Loss: 414.9360046386719
2024-08-04T09:01:32.425857014Z 
 86%|████████▌ | 8188/9500 [28:04:02<4:30:21, 12.36s/it]08/04/2024 02:01:32 - INFO - __main__ -   Step: 8188, LR: 2.846125409944482e-06, Loss: 364.3553161621094
2024-08-04T09:01:44.773959029Z 
 86%|████████▌ | 8189/9500 [28:04:14<4:30:02, 12.36s/it]08/04/2024 02:01:44 - INFO - __main__ -   Step: 8189, LR: 2.843954866257203e-06, Loss: 409.84796142578125
2024-08-04T09:01:57.405910726Z 
 86%|████████▌ | 8190/9500 [28:04:27<4:31:37, 12.44s/it]08/04/2024 02:01:57 - INFO - __main__ -   Step: 8190, LR: 2.8417843225699237e-06, Loss: 335.22662353515625
2024-08-04T09:02:09.525342786Z 
 86%|████████▌ | 8191/9500 [28:04:39<4:29:19, 12.34s/it]08/04/2024 02:02:09 - INFO - __main__ -   Step: 8191, LR: 2.839613778882645e-06, Loss: 410.16668701171875
2024-08-04T09:02:21.854333736Z 
 86%|████████▌ | 8192/9500 [28:04:51<4:29:00, 12.34s/it]08/04/2024 02:02:21 - INFO - __main__ -   Step: 8192, LR: 2.8374432351953664e-06, Loss: 412.70257568359375
2024-08-04T09:02:34.746106050Z 
 86%|████████▌ | 8193/9500 [28:05:04<4:32:24, 12.51s/it]08/04/2024 02:02:34 - INFO - __main__ -   Step: 8193, LR: 2.835272691508087e-06, Loss: 469.2801818847656
2024-08-04T09:02:47.037680187Z 
 86%|████████▋ | 8194/9500 [28:05:16<4:30:48, 12.44s/it]08/04/2024 02:02:47 - INFO - __main__ -   Step: 8194, LR: 2.8331021478208083e-06, Loss: 419.87896728515625
2024-08-04T09:02:59.292940030Z 
 86%|████████▋ | 8195/9500 [28:05:29<4:29:23, 12.39s/it]08/04/2024 02:02:59 - INFO - __main__ -   Step: 8195, LR: 2.8309316041335294e-06, Loss: 328.5281982421875
2024-08-04T09:03:12.226278515Z 
 86%|████████▋ | 8196/9500 [28:05:42<4:32:45, 12.55s/it]08/04/2024 02:03:12 - INFO - __main__ -   Step: 8196, LR: 2.8287610604462505e-06, Loss: 484.68035888671875
2024-08-04T09:03:24.268876642Z 
 86%|████████▋ | 8197/9500 [28:05:54<4:29:14, 12.40s/it]08/04/2024 02:03:24 - INFO - __main__ -   Step: 8197, LR: 2.8265905167589712e-06, Loss: 413.3275146484375
2024-08-04T09:03:36.518146302Z 
 86%|████████▋ | 8198/9500 [28:06:06<4:28:03, 12.35s/it]08/04/2024 02:03:36 - INFO - __main__ -   Step: 8198, LR: 2.8244199730716924e-06, Loss: 406.39031982421875
2024-08-04T09:03:48.985861576Z 
 86%|████████▋ | 8199/9500 [28:06:18<4:28:36, 12.39s/it]08/04/2024 02:03:48 - INFO - __main__ -   Step: 8199, LR: 2.822249429384414e-06, Loss: 337.38372802734375
2024-08-04T09:04:00.915038880Z 
 86%|████████▋ | 8200/9500 [28:06:30<4:25:25, 12.25s/it]08/04/2024 02:04:00 - INFO - __main__ -   Step: 8200, LR: 2.8200788856971347e-06, Loss: 315.49188232421875
2024-08-04T09:04:13.117710340Z 
 86%|████████▋ | 8201/9500 [28:06:43<4:24:54, 12.24s/it]08/04/2024 02:04:13 - INFO - __main__ -   Step: 8201, LR: 2.817908342009856e-06, Loss: 438.07318115234375
2024-08-04T09:04:25.954488934Z 
 86%|████████▋ | 8202/9500 [28:06:55<4:28:36, 12.42s/it]08/04/2024 02:04:25 - INFO - __main__ -   Step: 8202, LR: 2.815737798322577e-06, Loss: 502.175048828125
2024-08-04T09:04:38.102893156Z 
 86%|████████▋ | 8203/9500 [28:07:08<4:26:39, 12.34s/it]08/04/2024 02:04:38 - INFO - __main__ -   Step: 8203, LR: 2.813567254635298e-06, Loss: 480.45343017578125
2024-08-04T09:04:50.366057392Z 
 86%|████████▋ | 8204/9500 [28:07:20<4:25:58, 12.31s/it]08/04/2024 02:04:50 - INFO - __main__ -   Step: 8204, LR: 2.8113967109480188e-06, Loss: 379.397705078125
2024-08-04T09:05:02.936171264Z 
 86%|████████▋ | 8205/9500 [28:07:32<4:27:26, 12.39s/it]08/04/2024 02:05:02 - INFO - __main__ -   Step: 8205, LR: 2.8092261672607403e-06, Loss: 349.2815246582031
2024-08-04T09:05:14.914920715Z 
 86%|████████▋ | 8206/9500 [28:07:44<4:24:33, 12.27s/it]08/04/2024 02:05:14 - INFO - __main__ -   Step: 8206, LR: 2.8070556235734615e-06, Loss: 408.73455810546875
2024-08-04T09:05:26.981333851Z 
 86%|████████▋ | 8207/9500 [28:07:56<4:23:03, 12.21s/it]08/04/2024 02:05:26 - INFO - __main__ -   Step: 8207, LR: 2.804885079886182e-06, Loss: 323.7334289550781
2024-08-04T09:05:39.108949850Z 
 86%|████████▋ | 8208/9500 [28:08:09<4:22:20, 12.18s/it]08/04/2024 02:05:39 - INFO - __main__ -   Step: 8208, LR: 2.8027145361989033e-06, Loss: 364.2377014160156
2024-08-04T09:05:51.833664642Z 
 86%|████████▋ | 8209/9500 [28:08:21<4:25:38, 12.35s/it]08/04/2024 02:05:51 - INFO - __main__ -   Step: 8209, LR: 2.8005439925116245e-06, Loss: 408.27459716796875
2024-08-04T09:06:03.946643405Z 
 86%|████████▋ | 8210/9500 [28:08:33<4:23:55, 12.28s/it]08/04/2024 02:06:03 - INFO - __main__ -   Step: 8210, LR: 2.798373448824346e-06, Loss: 386.099365234375
2024-08-04T09:06:16.049714273Z 
 86%|████████▋ | 8211/9500 [28:08:45<4:22:36, 12.22s/it]08/04/2024 02:06:16 - INFO - __main__ -   Step: 8211, LR: 2.7962029051370663e-06, Loss: 498.9460754394531
2024-08-04T09:06:28.981265498Z 
 86%|████████▋ | 8212/9500 [28:08:58<4:26:57, 12.44s/it]08/04/2024 02:06:28 - INFO - __main__ -   Step: 8212, LR: 2.794032361449788e-06, Loss: 376.0013427734375
2024-08-04T09:06:41.083358699Z 
 86%|████████▋ | 8213/9500 [28:09:11<4:24:36, 12.34s/it]08/04/2024 02:06:41 - INFO - __main__ -   Step: 8213, LR: 2.791861817762509e-06, Loss: 433.9095764160156
2024-08-04T09:06:53.351122138Z 
 86%|████████▋ | 8214/9500 [28:09:23<4:23:57, 12.32s/it]08/04/2024 02:06:53 - INFO - __main__ -   Step: 8214, LR: 2.7896912740752297e-06, Loss: 404.31072998046875
2024-08-04T09:07:05.948241029Z 
 86%|████████▋ | 8215/9500 [28:09:35<4:25:33, 12.40s/it]08/04/2024 02:07:05 - INFO - __main__ -   Step: 8215, LR: 2.787520730387951e-06, Loss: 330.3722839355469
2024-08-04T09:07:18.173492494Z 
 86%|████████▋ | 8216/9500 [28:09:48<4:24:14, 12.35s/it]08/04/2024 02:07:18 - INFO - __main__ -   Step: 8216, LR: 2.785350186700672e-06, Loss: 321.96795654296875
2024-08-04T09:07:30.482610442Z 
 86%|████████▋ | 8217/9500 [28:10:00<4:23:47, 12.34s/it]08/04/2024 02:07:30 - INFO - __main__ -   Step: 8217, LR: 2.7831796430133935e-06, Loss: 571.22900390625
2024-08-04T09:07:42.899356705Z 
 87%|████████▋ | 8218/9500 [28:10:12<4:24:05, 12.36s/it]08/04/2024 02:07:42 - INFO - __main__ -   Step: 8218, LR: 2.7810090993261143e-06, Loss: 339.439453125
2024-08-04T09:07:55.152634134Z 
 87%|████████▋ | 8219/9500 [28:10:25<4:23:12, 12.33s/it]08/04/2024 02:07:55 - INFO - __main__ -   Step: 8219, LR: 2.7788385556388354e-06, Loss: 448.6488342285156
2024-08-04T09:08:07.158498321Z 
 87%|████████▋ | 8220/9500 [28:10:37<4:20:56, 12.23s/it]08/04/2024 02:08:07 - INFO - __main__ -   Step: 8220, LR: 2.7766680119515565e-06, Loss: 371.28240966796875
2024-08-04T09:08:19.557050208Z 
 87%|████████▋ | 8221/9500 [28:10:49<4:21:48, 12.28s/it]08/04/2024 02:08:19 - INFO - __main__ -   Step: 8221, LR: 2.7744974682642772e-06, Loss: 387.15008544921875
2024-08-04T09:08:31.422378014Z 
 87%|████████▋ | 8222/9500 [28:11:01<4:18:56, 12.16s/it]08/04/2024 02:08:31 - INFO - __main__ -   Step: 8222, LR: 2.7723269245769984e-06, Loss: 350.7486572265625
2024-08-04T09:08:43.821679518Z 
 87%|████████▋ | 8223/9500 [28:11:13<4:20:17, 12.23s/it]08/04/2024 02:08:43 - INFO - __main__ -   Step: 8223, LR: 2.7701563808897195e-06, Loss: 383.6289367675781
2024-08-04T09:08:56.226323712Z 
 87%|████████▋ | 8224/9500 [28:11:26<4:21:11, 12.28s/it]08/04/2024 02:08:56 - INFO - __main__ -   Step: 8224, LR: 2.767985837202441e-06, Loss: 321.524658203125
2024-08-04T09:09:08.671633129Z 
 87%|████████▋ | 8225/9500 [28:11:38<4:22:02, 12.33s/it]08/04/2024 02:09:08 - INFO - __main__ -   Step: 8225, LR: 2.7658152935151618e-06, Loss: 498.4693908691406
2024-08-04T09:09:20.783898396Z 
 87%|████████▋ | 8226/9500 [28:11:50<4:20:26, 12.27s/it]08/04/2024 02:09:20 - INFO - __main__ -   Step: 8226, LR: 2.763644749827883e-06, Loss: 399.54248046875
2024-08-04T09:09:33.219663815Z 
 87%|████████▋ | 8227/9500 [28:12:03<4:21:18, 12.32s/it]08/04/2024 02:09:33 - INFO - __main__ -   Step: 8227, LR: 2.761474206140604e-06, Loss: 429.5923156738281
2024-08-04T09:09:45.585196396Z 
 87%|████████▋ | 8228/9500 [28:12:15<4:21:25, 12.33s/it]08/04/2024 02:09:45 - INFO - __main__ -   Step: 8228, LR: 2.7593036624533248e-06, Loss: 354.8238525390625
2024-08-04T09:09:58.230995542Z 
 87%|████████▋ | 8229/9500 [28:12:28<4:23:12, 12.43s/it]08/04/2024 02:09:58 - INFO - __main__ -   Step: 8229, LR: 2.757133118766046e-06, Loss: 490.42681884765625
2024-08-04T09:10:10.709734290Z 
 87%|████████▋ | 8230/9500 [28:12:40<4:23:20, 12.44s/it]08/04/2024 02:10:10 - INFO - __main__ -   Step: 8230, LR: 2.7549625750787675e-06, Loss: 380.72528076171875
2024-08-04T09:10:22.816148237Z 
 87%|████████▋ | 8231/9500 [28:12:52<4:21:00, 12.34s/it]08/04/2024 02:10:22 - INFO - __main__ -   Step: 8231, LR: 2.7527920313914886e-06, Loss: 425.74578857421875
2024-08-04T09:10:34.834326849Z 
 87%|████████▋ | 8232/9500 [28:13:04<4:18:45, 12.24s/it]08/04/2024 02:10:34 - INFO - __main__ -   Step: 8232, LR: 2.7506214877042093e-06, Loss: 420.30780029296875
2024-08-04T09:10:47.532485486Z 
 87%|████████▋ | 8233/9500 [28:13:17<4:21:26, 12.38s/it]08/04/2024 02:10:47 - INFO - __main__ -   Step: 8233, LR: 2.7484509440169304e-06, Loss: 409.1112060546875
2024-08-04T09:10:59.931349964Z 
 87%|████████▋ | 8234/9500 [28:13:29<4:21:20, 12.39s/it]08/04/2024 02:10:59 - INFO - __main__ -   Step: 8234, LR: 2.7462804003296516e-06, Loss: 431.8455810546875
2024-08-04T09:11:12.072501915Z 
 87%|████████▋ | 8235/9500 [28:13:42<4:19:35, 12.31s/it]08/04/2024 02:11:12 - INFO - __main__ -   Step: 8235, LR: 2.7441098566423723e-06, Loss: 311.17926025390625
2024-08-04T09:11:24.490582139Z 
 87%|████████▋ | 8236/9500 [28:13:54<4:20:03, 12.34s/it]08/04/2024 02:11:24 - INFO - __main__ -   Step: 8236, LR: 2.7419393129550934e-06, Loss: 295.2014465332031
2024-08-04T09:11:37.076889579Z 
 87%|████████▋ | 8237/9500 [28:14:07<4:21:22, 12.42s/it]08/04/2024 02:11:37 - INFO - __main__ -   Step: 8237, LR: 2.739768769267815e-06, Loss: 301.6185302734375
2024-08-04T09:11:49.569777520Z 
 87%|████████▋ | 8238/9500 [28:14:19<4:21:38, 12.44s/it]08/04/2024 02:11:49 - INFO - __main__ -   Step: 8238, LR: 2.737598225580536e-06, Loss: 455.662109375
2024-08-04T09:12:02.344955358Z 
 87%|████████▋ | 8239/9500 [28:14:32<4:23:33, 12.54s/it]08/04/2024 02:12:02 - INFO - __main__ -   Step: 8239, LR: 2.735427681893257e-06, Loss: 424.8108215332031
2024-08-04T09:12:14.401369024Z 
 87%|████████▋ | 8240/9500 [28:14:44<4:20:17, 12.40s/it]08/04/2024 02:12:14 - INFO - __main__ -   Step: 8240, LR: 2.733257138205978e-06, Loss: 355.92242431640625
2024-08-04T09:12:26.362523812Z 
 87%|████████▋ | 8241/9500 [28:14:56<4:17:21, 12.27s/it]08/04/2024 02:12:26 - INFO - __main__ -   Step: 8241, LR: 2.731086594518699e-06, Loss: 265.06884765625
2024-08-04T09:12:38.637784790Z 
 87%|████████▋ | 8242/9500 [28:15:08<4:17:13, 12.27s/it]08/04/2024 02:12:38 - INFO - __main__ -   Step: 8242, LR: 2.72891605083142e-06, Loss: 273.9599304199219
2024-08-04T09:12:50.810262231Z 
 87%|████████▋ | 8243/9500 [28:15:20<4:16:24, 12.24s/it]08/04/2024 02:12:50 - INFO - __main__ -   Step: 8243, LR: 2.7267455071441414e-06, Loss: 425.586181640625
2024-08-04T09:13:02.951114727Z 
 87%|████████▋ | 8244/9500 [28:15:32<4:15:35, 12.21s/it]08/04/2024 02:13:02 - INFO - __main__ -   Step: 8244, LR: 2.7245749634568625e-06, Loss: 481.3641662597656
2024-08-04T09:13:15.610437707Z 
 87%|████████▋ | 8245/9500 [28:15:45<4:18:12, 12.34s/it]08/04/2024 02:13:15 - INFO - __main__ -   Step: 8245, LR: 2.7224044197695837e-06, Loss: 475.8248291015625
2024-08-04T09:13:27.870058497Z 
 87%|████████▋ | 8246/9500 [28:15:57<4:17:28, 12.32s/it]08/04/2024 02:13:27 - INFO - __main__ -   Step: 8246, LR: 2.7202338760823044e-06, Loss: 427.3655700683594
2024-08-04T09:13:40.484694193Z 
 87%|████████▋ | 8247/9500 [28:16:10<4:19:06, 12.41s/it]08/04/2024 02:13:40 - INFO - __main__ -   Step: 8247, LR: 2.7180633323950255e-06, Loss: 430.2835693359375
2024-08-04T09:13:53.240671384Z 
 87%|████████▋ | 8248/9500 [28:16:23<4:21:05, 12.51s/it]08/04/2024 02:13:53 - INFO - __main__ -   Step: 8248, LR: 2.715892788707747e-06, Loss: 385.2470703125
2024-08-04T09:14:05.479831328Z 
 87%|████████▋ | 8249/9500 [28:16:35<4:19:09, 12.43s/it]08/04/2024 02:14:05 - INFO - __main__ -   Step: 8249, LR: 2.7137222450204674e-06, Loss: 428.343994140625
2024-08-04T09:14:18.045761855Z 
 87%|████████▋ | 8250/9500 [28:16:47<4:19:48, 12.47s/it]08/04/2024 02:14:18 - INFO - __main__ -   Step: 8250, LR: 2.711551701333189e-06, Loss: 471.4349060058594
2024-08-04T09:14:30.149624661Z 
 87%|████████▋ | 8251/9500 [28:17:00<4:17:18, 12.36s/it]08/04/2024 02:14:30 - INFO - __main__ -   Step: 8251, LR: 2.70938115764591e-06, Loss: 340.6310119628906
2024-08-04T09:14:43.052717590Z 
 87%|████████▋ | 8252/9500 [28:17:12<4:20:29, 12.52s/it]08/04/2024 02:14:43 - INFO - __main__ -   Step: 8252, LR: 2.707210613958631e-06, Loss: 410.3746032714844
2024-08-04T09:14:55.207616571Z 
 87%|████████▋ | 8253/9500 [28:17:25<4:17:58, 12.41s/it]08/04/2024 02:14:55 - INFO - __main__ -   Step: 8253, LR: 2.705040070271352e-06, Loss: 329.9556579589844
2024-08-04T09:15:07.523700511Z 
 87%|████████▋ | 8254/9500 [28:17:37<4:17:10, 12.38s/it]08/04/2024 02:15:07 - INFO - __main__ -   Step: 8254, LR: 2.702869526584073e-06, Loss: 448.02154541015625
2024-08-04T09:15:20.015098949Z 
 87%|████████▋ | 8255/9500 [28:17:49<4:17:38, 12.42s/it]08/04/2024 02:15:20 - INFO - __main__ -   Step: 8255, LR: 2.7006989828967946e-06, Loss: 332.5875244140625
2024-08-04T09:15:32.191579669Z 
 87%|████████▋ | 8256/9500 [28:18:02<4:15:56, 12.34s/it]08/04/2024 02:15:32 - INFO - __main__ -   Step: 8256, LR: 2.6985284392095153e-06, Loss: 412.21746826171875
2024-08-04T09:15:44.169884872Z 
 87%|████████▋ | 8257/9500 [28:18:14<4:13:27, 12.23s/it]08/04/2024 02:15:44 - INFO - __main__ -   Step: 8257, LR: 2.6963578955222364e-06, Loss: 327.7077331542969
2024-08-04T09:15:56.625843418Z 
 87%|████████▋ | 8258/9500 [28:18:26<4:14:37, 12.30s/it]08/04/2024 02:15:56 - INFO - __main__ -   Step: 8258, LR: 2.6941873518349576e-06, Loss: 333.63458251953125
2024-08-04T09:16:08.666628398Z 
 87%|████████▋ | 8259/9500 [28:18:38<4:12:48, 12.22s/it]08/04/2024 02:16:08 - INFO - __main__ -   Step: 8259, LR: 2.6920168081476787e-06, Loss: 428.3819580078125
2024-08-04T09:16:21.018588840Z 
 87%|████████▋ | 8260/9500 [28:18:50<4:13:24, 12.26s/it]08/04/2024 02:16:21 - INFO - __main__ -   Step: 8260, LR: 2.6898462644603994e-06, Loss: 443.20916748046875
2024-08-04T09:16:33.855315398Z 
 87%|████████▋ | 8261/9500 [28:19:03<4:16:45, 12.43s/it]08/04/2024 02:16:33 - INFO - __main__ -   Step: 8261, LR: 2.687675720773121e-06, Loss: 448.43365478515625
2024-08-04T09:16:46.008864727Z 
 87%|████████▋ | 8262/9500 [28:19:15<4:14:49, 12.35s/it]08/04/2024 02:16:46 - INFO - __main__ -   Step: 8262, LR: 2.685505177085842e-06, Loss: 424.4192199707031
2024-08-04T09:16:58.076551755Z 
 87%|████████▋ | 8263/9500 [28:19:28<4:12:52, 12.27s/it]08/04/2024 02:16:58 - INFO - __main__ -   Step: 8263, LR: 2.683334633398563e-06, Loss: 443.5552978515625
2024-08-04T09:17:10.497712761Z 
 87%|████████▋ | 8264/9500 [28:19:40<4:13:37, 12.31s/it]08/04/2024 02:17:10 - INFO - __main__ -   Step: 8264, LR: 2.681164089711284e-06, Loss: 366.88623046875
2024-08-04T09:17:22.545111957Z 
 87%|████████▋ | 8265/9500 [28:19:52<4:11:47, 12.23s/it]08/04/2024 02:17:22 - INFO - __main__ -   Step: 8265, LR: 2.678993546024005e-06, Loss: 391.89947509765625
2024-08-04T09:17:34.872613632Z 
 87%|████████▋ | 8266/9500 [28:20:04<4:12:10, 12.26s/it]08/04/2024 02:17:34 - INFO - __main__ -   Step: 8266, LR: 2.6768230023367262e-06, Loss: 366.9342346191406
2024-08-04T09:17:47.417683976Z 
 87%|████████▋ | 8267/9500 [28:20:17<4:13:43, 12.35s/it]08/04/2024 02:17:47 - INFO - __main__ -   Step: 8267, LR: 2.674652458649447e-06, Loss: 466.2364807128906
2024-08-04T09:17:59.397114897Z 
 87%|████████▋ | 8268/9500 [28:20:29<4:11:15, 12.24s/it]08/04/2024 02:17:59 - INFO - __main__ -   Step: 8268, LR: 2.6724819149621685e-06, Loss: 358.70654296875
2024-08-04T09:18:11.493176887Z 
 87%|████████▋ | 8269/9500 [28:20:41<4:10:10, 12.19s/it]08/04/2024 02:18:11 - INFO - __main__ -   Step: 8269, LR: 2.6703113712748897e-06, Loss: 386.36614990234375
2024-08-04T09:18:24.013448421Z 
 87%|████████▋ | 8270/9500 [28:20:53<4:11:59, 12.29s/it]08/04/2024 02:18:24 - INFO - __main__ -   Step: 8270, LR: 2.6681408275876104e-06, Loss: 356.3056945800781
2024-08-04T09:18:36.261997378Z 
 87%|████████▋ | 8271/9500 [28:21:06<4:11:30, 12.28s/it]08/04/2024 02:18:36 - INFO - __main__ -   Step: 8271, LR: 2.6659702839003315e-06, Loss: 463.6673583984375
2024-08-04T09:18:48.606905275Z 
 87%|████████▋ | 8272/9500 [28:21:18<4:11:42, 12.30s/it]08/04/2024 02:18:48 - INFO - __main__ -   Step: 8272, LR: 2.6637997402130526e-06, Loss: 454.51690673828125
2024-08-04T09:19:01.629514660Z 
 87%|████████▋ | 8273/9500 [28:21:31<4:15:57, 12.52s/it]08/04/2024 02:19:01 - INFO - __main__ -   Step: 8273, LR: 2.6616291965257733e-06, Loss: 426.58709716796875
2024-08-04T09:19:13.735134325Z 
 87%|████████▋ | 8274/9500 [28:21:43<4:13:13, 12.39s/it]08/04/2024 02:19:13 - INFO - __main__ -   Step: 8274, LR: 2.6594586528384945e-06, Loss: 389.27471923828125
2024-08-04T09:19:25.841503228Z 
 87%|████████▋ | 8275/9500 [28:21:55<4:11:15, 12.31s/it]08/04/2024 02:19:25 - INFO - __main__ -   Step: 8275, LR: 2.657288109151216e-06, Loss: 311.22454833984375
2024-08-04T09:19:38.464423671Z 
 87%|████████▋ | 8276/9500 [28:22:08<4:12:59, 12.40s/it]08/04/2024 02:19:38 - INFO - __main__ -   Step: 8276, LR: 2.655117565463937e-06, Loss: 326.0744934082031
2024-08-04T09:19:50.570300415Z 
 87%|████████▋ | 8277/9500 [28:22:20<4:10:58, 12.31s/it]08/04/2024 02:19:50 - INFO - __main__ -   Step: 8277, LR: 2.652947021776658e-06, Loss: 368.2195129394531
2024-08-04T09:20:02.962506901Z 
 87%|████████▋ | 8278/9500 [28:22:32<4:11:15, 12.34s/it]08/04/2024 02:20:02 - INFO - __main__ -   Step: 8278, LR: 2.650776478089379e-06, Loss: 353.7836608886719
2024-08-04T09:20:15.322433726Z 
 87%|████████▋ | 8279/9500 [28:22:45<4:11:11, 12.34s/it]08/04/2024 02:20:15 - INFO - __main__ -   Step: 8279, LR: 2.6486059344021e-06, Loss: 313.24737548828125
2024-08-04T09:20:27.532398576Z 
 87%|████████▋ | 8280/9500 [28:22:57<4:10:10, 12.30s/it]08/04/2024 02:20:27 - INFO - __main__ -   Step: 8280, LR: 2.646435390714821e-06, Loss: 373.61322021484375
2024-08-04T09:20:39.639638025Z 
 87%|████████▋ | 8281/9500 [28:23:09<4:08:46, 12.24s/it]08/04/2024 02:20:39 - INFO - __main__ -   Step: 8281, LR: 2.6442648470275424e-06, Loss: 463.07586669921875
2024-08-04T09:20:52.285541917Z 
 87%|████████▋ | 8282/9500 [28:23:22<4:11:00, 12.37s/it]08/04/2024 02:20:52 - INFO - __main__ -   Step: 8282, LR: 2.6420943033402636e-06, Loss: 395.6960144042969
2024-08-04T09:21:04.500199190Z 
 87%|████████▋ | 8283/9500 [28:23:34<4:09:53, 12.32s/it]08/04/2024 02:21:04 - INFO - __main__ -   Step: 8283, LR: 2.6399237596529847e-06, Loss: 373.20477294921875
2024-08-04T09:21:16.854103860Z 
 87%|████████▋ | 8284/9500 [28:23:46<4:09:53, 12.33s/it]08/04/2024 02:21:16 - INFO - __main__ -   Step: 8284, LR: 2.6377532159657054e-06, Loss: 342.52471923828125
2024-08-04T09:21:29.342356341Z 
 87%|████████▋ | 8285/9500 [28:23:59<4:10:38, 12.38s/it]08/04/2024 02:21:29 - INFO - __main__ -   Step: 8285, LR: 2.6355826722784266e-06, Loss: 468.93017578125
2024-08-04T09:21:41.511373245Z 
 87%|████████▋ | 8286/9500 [28:24:11<4:09:10, 12.32s/it]08/04/2024 02:21:41 - INFO - __main__ -   Step: 8286, LR: 2.633412128591148e-06, Loss: 387.8494873046875
2024-08-04T09:21:53.785794800Z 
 87%|████████▋ | 8287/9500 [28:24:23<4:08:43, 12.30s/it]08/04/2024 02:21:53 - INFO - __main__ -   Step: 8287, LR: 2.6312415849038684e-06, Loss: 491.8910827636719
2024-08-04T09:22:06.659972778Z 
 87%|████████▋ | 8288/9500 [28:24:36<4:11:58, 12.47s/it]08/04/2024 02:22:06 - INFO - __main__ -   Step: 8288, LR: 2.62907104121659e-06, Loss: 487.7044677734375
2024-08-04T09:22:19.137336208Z 
 87%|████████▋ | 8289/9500 [28:24:49<4:11:47, 12.48s/it]08/04/2024 02:22:19 - INFO - __main__ -   Step: 8289, LR: 2.626900497529311e-06, Loss: 465.24725341796875
2024-08-04T09:22:31.259550556Z 
 87%|████████▋ | 8290/9500 [28:25:01<4:09:26, 12.37s/it]08/04/2024 02:22:31 - INFO - __main__ -   Step: 8290, LR: 2.6247299538420322e-06, Loss: 331.713623046875
2024-08-04T09:22:43.686665140Z 
 87%|████████▋ | 8291/9500 [28:25:13<4:09:35, 12.39s/it]08/04/2024 02:22:43 - INFO - __main__ -   Step: 8291, LR: 2.622559410154753e-06, Loss: 346.2488098144531
2024-08-04T09:22:55.724182700Z 
 87%|████████▋ | 8292/9500 [28:25:25<4:07:16, 12.28s/it]08/04/2024 02:22:55 - INFO - __main__ -   Step: 8292, LR: 2.620388866467474e-06, Loss: 424.88458251953125
2024-08-04T09:23:07.987951564Z 
 87%|████████▋ | 8293/9500 [28:25:37<4:06:57, 12.28s/it]08/04/2024 02:23:07 - INFO - __main__ -   Step: 8293, LR: 2.6182183227801956e-06, Loss: 425.23651123046875
2024-08-04T09:23:20.249497528Z 
 87%|████████▋ | 8294/9500 [28:25:50<4:06:40, 12.27s/it]08/04/2024 02:23:20 - INFO - __main__ -   Step: 8294, LR: 2.6160477790929164e-06, Loss: 373.6356201171875
2024-08-04T09:23:32.882127352Z 
 87%|████████▋ | 8295/9500 [28:26:02<4:08:38, 12.38s/it]08/04/2024 02:23:32 - INFO - __main__ -   Step: 8295, LR: 2.6138772354056375e-06, Loss: 445.61016845703125
2024-08-04T09:23:45.044925794Z 
 87%|████████▋ | 8296/9500 [28:26:14<4:07:07, 12.31s/it]08/04/2024 02:23:45 - INFO - __main__ -   Step: 8296, LR: 2.6117066917183586e-06, Loss: 535.7228393554688
2024-08-04T09:23:57.166141460Z 
 87%|████████▋ | 8297/9500 [28:26:27<4:05:44, 12.26s/it]08/04/2024 02:23:57 - INFO - __main__ -   Step: 8297, LR: 2.6095361480310798e-06, Loss: 307.5770568847656
2024-08-04T09:24:09.769466905Z 
 87%|████████▋ | 8298/9500 [28:26:39<4:07:37, 12.36s/it]08/04/2024 02:24:09 - INFO - __main__ -   Step: 8298, LR: 2.6073656043438005e-06, Loss: 271.2335205078125
2024-08-04T09:24:22.021639046Z 
 87%|████████▋ | 8299/9500 [28:26:51<4:06:46, 12.33s/it]08/04/2024 02:24:22 - INFO - __main__ -   Step: 8299, LR: 2.605195060656522e-06, Loss: 443.44781494140625
2024-08-04T09:24:34.288541441Z 
 87%|████████▋ | 8300/9500 [28:27:04<4:06:11, 12.31s/it]08/04/2024 02:24:34 - INFO - __main__ -   Step: 8300, LR: 2.603024516969243e-06, Loss: 442.7509765625
2024-08-04T09:24:46.912170641Z 
 87%|████████▋ | 8301/9500 [28:27:16<4:07:52, 12.40s/it]08/04/2024 02:24:46 - INFO - __main__ -   Step: 8301, LR: 2.600853973281964e-06, Loss: 458.2880859375
2024-08-04T09:24:59.393578906Z 
 87%|████████▋ | 8302/9500 [28:27:29<4:08:07, 12.43s/it]08/04/2024 02:24:59 - INFO - __main__ -   Step: 8302, LR: 2.598683429594685e-06, Loss: 410.216552734375
2024-08-04T09:25:11.711363074Z 
 87%|████████▋ | 8303/9500 [28:27:41<4:07:16, 12.39s/it]08/04/2024 02:25:11 - INFO - __main__ -   Step: 8303, LR: 2.596512885907406e-06, Loss: 275.5364074707031
2024-08-04T09:25:24.210068145Z 
 87%|████████▋ | 8304/9500 [28:27:54<4:07:41, 12.43s/it]08/04/2024 02:25:24 - INFO - __main__ -   Step: 8304, LR: 2.5943423422201273e-06, Loss: 405.50836181640625
2024-08-04T09:25:36.425071094Z 
 87%|████████▋ | 8305/9500 [28:28:06<4:06:13, 12.36s/it]08/04/2024 02:25:36 - INFO - __main__ -   Step: 8305, LR: 2.592171798532848e-06, Loss: 396.86553955078125
2024-08-04T09:25:48.321923112Z 
 87%|████████▋ | 8306/9500 [28:28:18<4:03:13, 12.22s/it]08/04/2024 02:25:48 - INFO - __main__ -   Step: 8306, LR: 2.5900012548455696e-06, Loss: 299.8735656738281
2024-08-04T09:26:00.778810438Z 
 87%|████████▋ | 8307/9500 [28:28:30<4:04:25, 12.29s/it]08/04/2024 02:26:00 - INFO - __main__ -   Step: 8307, LR: 2.5878307111582907e-06, Loss: 427.47308349609375
2024-08-04T09:26:13.087907869Z 
 87%|████████▋ | 8308/9500 [28:28:43<4:04:19, 12.30s/it]08/04/2024 02:26:13 - INFO - __main__ -   Step: 8308, LR: 2.5856601674710114e-06, Loss: 457.9115295410156
2024-08-04T09:26:25.353907754Z 
 87%|████████▋ | 8309/9500 [28:28:55<4:03:55, 12.29s/it]08/04/2024 02:26:25 - INFO - __main__ -   Step: 8309, LR: 2.5834896237837326e-06, Loss: 402.62890625
2024-08-04T09:26:38.014628170Z 
 87%|████████▋ | 8310/9500 [28:29:07<4:05:56, 12.40s/it]08/04/2024 02:26:38 - INFO - __main__ -   Step: 8310, LR: 2.5813190800964537e-06, Loss: 403.8852233886719
2024-08-04T09:26:50.076640623Z 
 87%|████████▋ | 8311/9500 [28:29:20<4:03:43, 12.30s/it]08/04/2024 02:26:50 - INFO - __main__ -   Step: 8311, LR: 2.5791485364091752e-06, Loss: 369.80059814453125
2024-08-04T09:27:02.425684485Z 
 87%|████████▋ | 8312/9500 [28:29:32<4:03:48, 12.31s/it]08/04/2024 02:27:02 - INFO - __main__ -   Step: 8312, LR: 2.576977992721896e-06, Loss: 307.8011474609375
2024-08-04T09:27:15.139166953Z 
 88%|████████▊ | 8313/9500 [28:29:45<4:05:58, 12.43s/it]08/04/2024 02:27:15 - INFO - __main__ -   Step: 8313, LR: 2.574807449034617e-06, Loss: 341.66888427734375
2024-08-04T09:27:27.449240643Z 
 88%|████████▊ | 8314/9500 [28:29:57<4:05:02, 12.40s/it]08/04/2024 02:27:27 - INFO - __main__ -   Step: 8314, LR: 2.5726369053473382e-06, Loss: 430.3603210449219
2024-08-04T09:27:39.838729690Z 
 88%|████████▊ | 8315/9500 [28:30:09<4:04:47, 12.39s/it]08/04/2024 02:27:39 - INFO - __main__ -   Step: 8315, LR: 2.570466361660059e-06, Loss: 327.80389404296875
2024-08-04T09:27:52.377991053Z 
 88%|████████▊ | 8316/9500 [28:30:22<4:05:26, 12.44s/it]08/04/2024 02:27:52 - INFO - __main__ -   Step: 8316, LR: 2.56829581797278e-06, Loss: 406.02569580078125
2024-08-04T09:28:04.416601470Z 
 88%|████████▊ | 8317/9500 [28:30:34<4:02:52, 12.32s/it]08/04/2024 02:28:04 - INFO - __main__ -   Step: 8317, LR: 2.5661252742855012e-06, Loss: 408.8466491699219
2024-08-04T09:28:16.902759796Z 
 88%|████████▊ | 8318/9500 [28:30:46<4:03:39, 12.37s/it]08/04/2024 02:28:16 - INFO - __main__ -   Step: 8318, LR: 2.5639547305982228e-06, Loss: 394.396240234375
2024-08-04T09:28:29.602691696Z 
 88%|████████▊ | 8319/9500 [28:30:59<4:05:24, 12.47s/it]08/04/2024 02:28:29 - INFO - __main__ -   Step: 8319, LR: 2.5617841869109435e-06, Loss: 330.424072265625
2024-08-04T09:28:42.002047292Z 
 88%|████████▊ | 8320/9500 [28:31:11<4:04:47, 12.45s/it]08/04/2024 02:28:42 - INFO - __main__ -   Step: 8320, LR: 2.5596136432236646e-06, Loss: 382.1654357910156
2024-08-04T09:28:54.360083074Z 
 88%|████████▊ | 8321/9500 [28:31:24<4:04:03, 12.42s/it]08/04/2024 02:28:54 - INFO - __main__ -   Step: 8321, LR: 2.5574430995363858e-06, Loss: 468.6813049316406
2024-08-04T09:29:06.901750511Z 
 88%|████████▊ | 8322/9500 [28:31:36<4:04:34, 12.46s/it]08/04/2024 02:29:06 - INFO - __main__ -   Step: 8322, LR: 2.5552725558491065e-06, Loss: 443.37493896484375
2024-08-04T09:29:18.738605460Z 
 88%|████████▊ | 8323/9500 [28:31:48<4:00:42, 12.27s/it]08/04/2024 02:29:18 - INFO - __main__ -   Step: 8323, LR: 2.5531020121618276e-06, Loss: 285.5349426269531
2024-08-04T09:29:30.947671275Z 
 88%|████████▊ | 8324/9500 [28:32:00<4:00:08, 12.25s/it]08/04/2024 02:29:30 - INFO - __main__ -   Step: 8324, LR: 2.550931468474549e-06, Loss: 377.19537353515625
2024-08-04T09:29:43.889370714Z 
 88%|████████▊ | 8325/9500 [28:32:13<4:03:59, 12.46s/it]08/04/2024 02:29:43 - INFO - __main__ -   Step: 8325, LR: 2.5487609247872703e-06, Loss: 420.749267578125
2024-08-04T09:29:56.174989296Z 
 88%|████████▊ | 8326/9500 [28:32:26<4:02:45, 12.41s/it]08/04/2024 02:29:56 - INFO - __main__ -   Step: 8326, LR: 2.546590381099991e-06, Loss: 263.3604736328125
2024-08-04T09:30:08.641411936Z 
 88%|████████▊ | 8327/9500 [28:32:38<4:02:54, 12.42s/it]08/04/2024 02:30:08 - INFO - __main__ -   Step: 8327, LR: 2.544419837412712e-06, Loss: 422.80133056640625
2024-08-04T09:30:21.412873609Z 
 88%|████████▊ | 8328/9500 [28:32:51<4:04:43, 12.53s/it]08/04/2024 02:30:21 - INFO - __main__ -   Step: 8328, LR: 2.5422492937254333e-06, Loss: 463.7548828125
2024-08-04T09:30:33.521100432Z 
 88%|████████▊ | 8329/9500 [28:33:03<4:02:03, 12.40s/it]08/04/2024 02:30:33 - INFO - __main__ -   Step: 8329, LR: 2.540078750038154e-06, Loss: 397.4407958984375
2024-08-04T09:30:45.679770095Z 
 88%|████████▊ | 8330/9500 [28:33:15<4:00:25, 12.33s/it]08/04/2024 02:30:45 - INFO - __main__ -   Step: 8330, LR: 2.537908206350875e-06, Loss: 381.9967041015625
2024-08-04T09:30:58.273659765Z 
 88%|████████▊ | 8331/9500 [28:33:28<4:01:45, 12.41s/it]08/04/2024 02:30:58 - INFO - __main__ -   Step: 8331, LR: 2.5357376626635967e-06, Loss: 368.8722229003906
2024-08-04T09:31:10.735222004Z 
 88%|████████▊ | 8332/9500 [28:33:40<4:01:51, 12.42s/it]08/04/2024 02:31:10 - INFO - __main__ -   Step: 8332, LR: 2.533567118976318e-06, Loss: 508.81146240234375
2024-08-04T09:31:22.739077557Z 
 88%|████████▊ | 8333/9500 [28:33:52<3:59:12, 12.30s/it]08/04/2024 02:31:22 - INFO - __main__ -   Step: 8333, LR: 2.5313965752890385e-06, Loss: 373.03851318359375
2024-08-04T09:31:35.255142248Z 
 88%|████████▊ | 8334/9500 [28:34:05<4:00:16, 12.36s/it]08/04/2024 02:31:35 - INFO - __main__ -   Step: 8334, LR: 2.5292260316017597e-06, Loss: 388.95770263671875
2024-08-04T09:31:47.480292215Z 
 88%|████████▊ | 8335/9500 [28:34:17<3:59:15, 12.32s/it]08/04/2024 02:31:47 - INFO - __main__ -   Step: 8335, LR: 2.527055487914481e-06, Loss: 476.9383544921875
2024-08-04T09:31:59.540017886Z 
 88%|████████▊ | 8336/9500 [28:34:29<3:57:31, 12.24s/it]08/04/2024 02:31:59 - INFO - __main__ -   Step: 8336, LR: 2.5248849442272015e-06, Loss: 337.27740478515625
2024-08-04T09:32:11.623032031Z 
 88%|████████▊ | 8337/9500 [28:34:41<3:56:23, 12.20s/it]08/04/2024 02:32:11 - INFO - __main__ -   Step: 8337, LR: 2.522714400539923e-06, Loss: 358.25164794921875
2024-08-04T09:32:24.852065053Z 
 88%|████████▊ | 8338/9500 [28:34:54<4:02:11, 12.51s/it]08/04/2024 02:32:24 - INFO - __main__ -   Step: 8338, LR: 2.5205438568526442e-06, Loss: 405.5897216796875
2024-08-04T09:32:36.891198350Z 
 88%|████████▊ | 8339/9500 [28:35:06<3:59:16, 12.37s/it]08/04/2024 02:32:36 - INFO - __main__ -   Step: 8339, LR: 2.5183733131653654e-06, Loss: 524.0619506835938
2024-08-04T09:32:48.970204555Z 
 88%|████████▊ | 8340/9500 [28:35:18<3:57:24, 12.28s/it]08/04/2024 02:32:48 - INFO - __main__ -   Step: 8340, LR: 2.516202769478086e-06, Loss: 463.94232177734375
2024-08-04T09:33:01.605797588Z 
 88%|████████▊ | 8341/9500 [28:35:31<3:59:15, 12.39s/it]08/04/2024 02:33:01 - INFO - __main__ -   Step: 8341, LR: 2.514032225790807e-06, Loss: 327.8182678222656
2024-08-04T09:33:13.721112426Z 
 88%|████████▊ | 8342/9500 [28:35:43<3:57:29, 12.31s/it]08/04/2024 02:33:13 - INFO - __main__ -   Step: 8342, LR: 2.5118616821035283e-06, Loss: 393.63323974609375
2024-08-04T09:33:26.013776514Z 
 88%|████████▊ | 8343/9500 [28:35:55<3:57:12, 12.30s/it]08/04/2024 02:33:26 - INFO - __main__ -   Step: 8343, LR: 2.509691138416249e-06, Loss: 442.6154479980469
2024-08-04T09:33:38.582573302Z 
 88%|████████▊ | 8344/9500 [28:36:08<3:58:33, 12.38s/it]08/04/2024 02:33:38 - INFO - __main__ -   Step: 8344, LR: 2.5075205947289706e-06, Loss: 367.5088195800781
2024-08-04T09:33:50.808537554Z 
 88%|████████▊ | 8345/9500 [28:36:20<3:57:26, 12.33s/it]08/04/2024 02:33:50 - INFO - __main__ -   Step: 8345, LR: 2.5053500510416918e-06, Loss: 461.82464599609375
2024-08-04T09:34:03.034084783Z 
 88%|████████▊ | 8346/9500 [28:36:32<3:56:36, 12.30s/it]08/04/2024 02:34:03 - INFO - __main__ -   Step: 8346, LR: 2.503179507354413e-06, Loss: 393.07696533203125
2024-08-04T09:34:15.554785133Z 
 88%|████████▊ | 8347/9500 [28:36:45<3:57:39, 12.37s/it]08/04/2024 02:34:15 - INFO - __main__ -   Step: 8347, LR: 2.5010089636671336e-06, Loss: 449.1828308105469
2024-08-04T09:34:27.885213816Z 
 88%|████████▊ | 8348/9500 [28:36:57<3:57:14, 12.36s/it]08/04/2024 02:34:27 - INFO - __main__ -   Step: 8348, LR: 2.4988384199798547e-06, Loss: 465.2284240722656
2024-08-04T09:34:39.948064064Z 
 88%|████████▊ | 8349/9500 [28:37:09<3:55:20, 12.27s/it]08/04/2024 02:34:39 - INFO - __main__ -   Step: 8349, LR: 2.496667876292576e-06, Loss: 362.9132995605469
2024-08-04T09:34:52.504168195Z 
 88%|████████▊ | 8350/9500 [28:37:22<3:56:47, 12.35s/it]08/04/2024 02:34:52 - INFO - __main__ -   Step: 8350, LR: 2.494497332605297e-06, Loss: 385.3094482421875
2024-08-04T09:35:04.768148760Z 
 88%|████████▊ | 8351/9500 [28:37:34<3:56:04, 12.33s/it]08/04/2024 02:35:04 - INFO - __main__ -   Step: 8351, LR: 2.492326788918018e-06, Loss: 360.78656005859375
2024-08-04T09:35:16.999331775Z 
 88%|████████▊ | 8352/9500 [28:37:46<3:55:18, 12.30s/it]08/04/2024 02:35:16 - INFO - __main__ -   Step: 8352, LR: 2.4901562452307393e-06, Loss: 301.6452941894531
2024-08-04T09:35:29.548975746Z 
 88%|████████▊ | 8353/9500 [28:37:59<3:56:32, 12.37s/it]08/04/2024 02:35:29 - INFO - __main__ -   Step: 8353, LR: 2.4879857015434604e-06, Loss: 337.1360778808594
2024-08-04T09:35:41.850442193Z 
 88%|████████▊ | 8354/9500 [28:38:11<3:55:55, 12.35s/it]08/04/2024 02:35:41 - INFO - __main__ -   Step: 8354, LR: 2.485815157856181e-06, Loss: 493.62603759765625
2024-08-04T09:35:54.143284289Z 
 88%|████████▊ | 8355/9500 [28:38:24<3:55:22, 12.33s/it]08/04/2024 02:35:54 - INFO - __main__ -   Step: 8355, LR: 2.4836446141689023e-06, Loss: 378.1605529785156
2024-08-04T09:36:06.741606040Z 
 88%|████████▊ | 8356/9500 [28:38:36<3:56:41, 12.41s/it]08/04/2024 02:36:06 - INFO - __main__ -   Step: 8356, LR: 2.4814740704816234e-06, Loss: 484.19384765625
2024-08-04T09:36:18.992757414Z 
 88%|████████▊ | 8357/9500 [28:38:48<3:55:33, 12.36s/it]08/04/2024 02:36:18 - INFO - __main__ -   Step: 8357, LR: 2.4793035267943445e-06, Loss: 356.10125732421875
2024-08-04T09:36:31.116781645Z 
 88%|████████▊ | 8358/9500 [28:39:01<3:53:58, 12.29s/it]08/04/2024 02:36:31 - INFO - __main__ -   Step: 8358, LR: 2.4771329831070657e-06, Loss: 367.08233642578125
2024-08-04T09:36:43.649237777Z 
 88%|████████▊ | 8359/9500 [28:39:13<3:55:07, 12.36s/it]08/04/2024 02:36:43 - INFO - __main__ -   Step: 8359, LR: 2.474962439419787e-06, Loss: 371.2039794921875
2024-08-04T09:36:55.760029999Z 
 88%|████████▊ | 8360/9500 [28:39:25<3:53:28, 12.29s/it]08/04/2024 02:36:55 - INFO - __main__ -   Step: 8360, LR: 2.472791895732508e-06, Loss: 386.1683654785156
2024-08-04T09:37:08.000245337Z 
 88%|████████▊ | 8361/9500 [28:39:37<3:53:00, 12.27s/it]08/04/2024 02:37:08 - INFO - __main__ -   Step: 8361, LR: 2.4706213520452287e-06, Loss: 425.729736328125
2024-08-04T09:37:20.859258540Z 
 88%|████████▊ | 8362/9500 [28:39:50<3:56:07, 12.45s/it]08/04/2024 02:37:20 - INFO - __main__ -   Step: 8362, LR: 2.4684508083579502e-06, Loss: 397.51385498046875
2024-08-04T09:37:33.091210975Z 
 88%|████████▊ | 8363/9500 [28:40:03<3:54:40, 12.38s/it]08/04/2024 02:37:33 - INFO - __main__ -   Step: 8363, LR: 2.466280264670671e-06, Loss: 330.8683776855469
2024-08-04T09:37:45.328384718Z 
 88%|████████▊ | 8364/9500 [28:40:15<3:53:38, 12.34s/it]08/04/2024 02:37:45 - INFO - __main__ -   Step: 8364, LR: 2.464109720983392e-06, Loss: 454.56842041015625
2024-08-04T09:37:58.406265387Z 
 88%|████████▊ | 8365/9500 [28:40:28<3:57:36, 12.56s/it]08/04/2024 02:37:58 - INFO - __main__ -   Step: 8365, LR: 2.461939177296113e-06, Loss: 388.0548095703125
2024-08-04T09:38:10.407443506Z 
 88%|████████▊ | 8366/9500 [28:40:40<3:54:14, 12.39s/it]08/04/2024 02:38:10 - INFO - __main__ -   Step: 8366, LR: 2.4597686336088343e-06, Loss: 377.9136962890625
2024-08-04T09:38:23.033417535Z 
 88%|████████▊ | 8367/9500 [28:40:52<3:55:20, 12.46s/it]08/04/2024 02:38:23 - INFO - __main__ -   Step: 8367, LR: 2.4575980899215555e-06, Loss: 387.23516845703125
2024-08-04T09:38:35.640972226Z 
 88%|████████▊ | 8368/9500 [28:41:05<3:55:57, 12.51s/it]08/04/2024 02:38:35 - INFO - __main__ -   Step: 8368, LR: 2.455427546234276e-06, Loss: 399.54132080078125
2024-08-04T09:38:47.520791746Z 
 88%|████████▊ | 8369/9500 [28:41:17<3:52:12, 12.32s/it]08/04/2024 02:38:47 - INFO - __main__ -   Step: 8369, LR: 2.4532570025469977e-06, Loss: 280.74169921875
2024-08-04T09:38:59.919976559Z 
 88%|████████▊ | 8370/9500 [28:41:29<3:52:27, 12.34s/it]08/04/2024 02:38:59 - INFO - __main__ -   Step: 8370, LR: 2.4510864588597185e-06, Loss: 399.89312744140625
2024-08-04T09:39:12.353990009Z 
 88%|████████▊ | 8371/9500 [28:41:42<3:52:45, 12.37s/it]08/04/2024 02:39:12 - INFO - __main__ -   Step: 8371, LR: 2.44891591517244e-06, Loss: 299.22076416015625
2024-08-04T09:39:24.499870009Z 
 88%|████████▊ | 8372/9500 [28:41:54<3:51:17, 12.30s/it]08/04/2024 02:39:24 - INFO - __main__ -   Step: 8372, LR: 2.4467453714851607e-06, Loss: 397.17852783203125
2024-08-04T09:39:37.022251959Z 
 88%|████████▊ | 8373/9500 [28:42:06<3:52:19, 12.37s/it]08/04/2024 02:39:37 - INFO - __main__ -   Step: 8373, LR: 2.444574827797882e-06, Loss: 443.44384765625
2024-08-04T09:39:49.678732298Z 
 88%|████████▊ | 8374/9500 [28:42:19<3:53:44, 12.46s/it]08/04/2024 02:39:49 - INFO - __main__ -   Step: 8374, LR: 2.442404284110603e-06, Loss: 326.73773193359375
2024-08-04T09:40:01.815090358Z 
 88%|████████▊ | 8375/9500 [28:42:31<3:51:44, 12.36s/it]08/04/2024 02:40:01 - INFO - __main__ -   Step: 8375, LR: 2.440233740423324e-06, Loss: 361.9727783203125
2024-08-04T09:40:14.057766179Z 
 88%|████████▊ | 8376/9500 [28:42:43<3:50:52, 12.32s/it]08/04/2024 02:40:14 - INFO - __main__ -   Step: 8376, LR: 2.4380631967360453e-06, Loss: 454.83624267578125
2024-08-04T09:40:26.727666980Z 
 88%|████████▊ | 8377/9500 [28:42:56<3:52:36, 12.43s/it]08/04/2024 02:40:26 - INFO - __main__ -   Step: 8377, LR: 2.435892653048766e-06, Loss: 523.6786499023438
2024-08-04T09:40:38.597411977Z 
 88%|████████▊ | 8378/9500 [28:43:08<3:49:16, 12.26s/it]08/04/2024 02:40:38 - INFO - __main__ -   Step: 8378, LR: 2.4337221093614875e-06, Loss: 421.8984680175781
2024-08-04T09:40:50.690947598Z 
 88%|████████▊ | 8379/9500 [28:43:20<3:48:08, 12.21s/it]08/04/2024 02:40:50 - INFO - __main__ -   Step: 8379, LR: 2.4315515656742083e-06, Loss: 421.67633056640625
2024-08-04T09:41:02.925108651Z 
 88%|████████▊ | 8380/9500 [28:43:32<3:48:03, 12.22s/it]08/04/2024 02:41:02 - INFO - __main__ -   Step: 8380, LR: 2.4293810219869294e-06, Loss: 394.77423095703125
2024-08-04T09:41:15.336021321Z 
 88%|████████▊ | 8381/9500 [28:43:45<3:48:56, 12.28s/it]08/04/2024 02:41:15 - INFO - __main__ -   Step: 8381, LR: 2.4272104782996505e-06, Loss: 389.8085632324219
2024-08-04T09:41:27.541362704Z 
 88%|████████▊ | 8382/9500 [28:43:57<3:48:20, 12.25s/it]08/04/2024 02:41:27 - INFO - __main__ -   Step: 8382, LR: 2.4250399346123717e-06, Loss: 457.608642578125
2024-08-04T09:41:39.598414366Z 
 88%|████████▊ | 8383/9500 [28:44:09<3:47:02, 12.20s/it]08/04/2024 02:41:39 - INFO - __main__ -   Step: 8383, LR: 2.422869390925093e-06, Loss: 375.87628173828125
2024-08-04T09:41:52.463868287Z 
 88%|████████▊ | 8384/9500 [28:44:22<3:50:34, 12.40s/it]08/04/2024 02:41:52 - INFO - __main__ -   Step: 8384, LR: 2.420698847237814e-06, Loss: 330.89306640625
2024-08-04T09:42:04.945885599Z 
 88%|████████▊ | 8385/9500 [28:44:34<3:50:50, 12.42s/it]08/04/2024 02:42:04 - INFO - __main__ -   Step: 8385, LR: 2.418528303550535e-06, Loss: 503.1082763671875
2024-08-04T09:42:17.214998550Z 
 88%|████████▊ | 8386/9500 [28:44:47<3:49:47, 12.38s/it]08/04/2024 02:42:17 - INFO - __main__ -   Step: 8386, LR: 2.416357759863256e-06, Loss: 377.65325927734375
2024-08-04T09:42:29.650777532Z 
 88%|████████▊ | 8387/9500 [28:44:59<3:49:54, 12.39s/it]08/04/2024 02:42:29 - INFO - __main__ -   Step: 8387, LR: 2.414187216175977e-06, Loss: 392.122802734375
2024-08-04T09:42:41.820167513Z 
 88%|████████▊ | 8388/9500 [28:45:11<3:48:27, 12.33s/it]08/04/2024 02:42:41 - INFO - __main__ -   Step: 8388, LR: 2.412016672488698e-06, Loss: 422.39984130859375
2024-08-04T09:42:54.032603195Z 
 88%|████████▊ | 8389/9500 [28:45:23<3:47:36, 12.29s/it]08/04/2024 02:42:54 - INFO - __main__ -   Step: 8389, LR: 2.409846128801419e-06, Loss: 429.4563903808594
2024-08-04T09:43:06.640318697Z 
 88%|████████▊ | 8390/9500 [28:45:36<3:49:09, 12.39s/it]08/04/2024 02:43:06 - INFO - __main__ -   Step: 8390, LR: 2.4076755851141403e-06, Loss: 412.5266418457031
2024-08-04T09:43:18.890244248Z 
 88%|████████▊ | 8391/9500 [28:45:48<3:48:11, 12.35s/it]08/04/2024 02:43:18 - INFO - __main__ -   Step: 8391, LR: 2.4055050414268615e-06, Loss: 428.1717529296875
2024-08-04T09:43:30.953292745Z 
 88%|████████▊ | 8392/9500 [28:46:00<3:46:25, 12.26s/it]08/04/2024 02:43:30 - INFO - __main__ -   Step: 8392, LR: 2.4033344977395826e-06, Loss: 483.6016845703125
2024-08-04T09:43:43.727701492Z 
 88%|████████▊ | 8393/9500 [28:46:13<3:49:03, 12.41s/it]08/04/2024 02:43:43 - INFO - __main__ -   Step: 8393, LR: 2.4011639540523033e-06, Loss: 444.9786071777344
2024-08-04T09:43:56.026913916Z 
 88%|████████▊ | 8394/9500 [28:46:25<3:48:12, 12.38s/it]08/04/2024 02:43:56 - INFO - __main__ -   Step: 8394, LR: 2.3989934103650245e-06, Loss: 531.065673828125
2024-08-04T09:44:08.225222882Z 
 88%|████████▊ | 8395/9500 [28:46:38<3:46:59, 12.33s/it]08/04/2024 02:44:08 - INFO - __main__ -   Step: 8395, LR: 2.3968228666777456e-06, Loss: 311.87664794921875
2024-08-04T09:44:20.889013407Z 
 88%|████████▊ | 8396/9500 [28:46:50<3:48:39, 12.43s/it]08/04/2024 02:44:20 - INFO - __main__ -   Step: 8396, LR: 2.3946523229904667e-06, Loss: 429.13360595703125
2024-08-04T09:44:33.199395193Z 
 88%|████████▊ | 8397/9500 [28:47:03<3:47:48, 12.39s/it]08/04/2024 02:44:33 - INFO - __main__ -   Step: 8397, LR: 2.392481779303188e-06, Loss: 442.51092529296875
2024-08-04T09:44:46.014582291Z 
 88%|████████▊ | 8398/9500 [28:47:15<3:49:55, 12.52s/it]08/04/2024 02:44:46 - INFO - __main__ -   Step: 8398, LR: 2.390311235615909e-06, Loss: 490.0060119628906
2024-08-04T09:44:58.492357375Z 
 88%|████████▊ | 8399/9500 [28:47:28<3:49:29, 12.51s/it]08/04/2024 02:44:58 - INFO - __main__ -   Step: 8399, LR: 2.38814069192863e-06, Loss: 386.73040771484375
2024-08-04T09:45:10.563551839Z 
 88%|████████▊ | 8400/9500 [28:47:40<3:46:53, 12.38s/it]08/04/2024 02:45:10 - INFO - __main__ -   Step: 8400, LR: 2.3859701482413513e-06, Loss: 389.22369384765625
2024-08-04T09:45:22.863603514Z 
 88%|████████▊ | 8401/9500 [28:47:52<3:46:16, 12.35s/it]08/04/2024 02:45:22 - INFO - __main__ -   Step: 8401, LR: 2.383799604554072e-06, Loss: 356.1824951171875
2024-08-04T09:45:35.569574136Z 
 88%|████████▊ | 8402/9500 [28:48:05<3:47:59, 12.46s/it]08/04/2024 02:45:35 - INFO - __main__ -   Step: 8402, LR: 2.381629060866793e-06, Loss: 387.3638916015625
2024-08-04T09:45:47.560055404Z 
 88%|████████▊ | 8403/9500 [28:48:17<3:45:13, 12.32s/it]08/04/2024 02:45:47 - INFO - __main__ -   Step: 8403, LR: 2.3794585171795143e-06, Loss: 455.82159423828125
2024-08-04T09:45:59.943319308Z 
 88%|████████▊ | 8404/9500 [28:48:29<3:45:22, 12.34s/it]08/04/2024 02:45:59 - INFO - __main__ -   Step: 8404, LR: 2.3772879734922354e-06, Loss: 450.70025634765625
2024-08-04T09:46:13.025638302Z 
 88%|████████▊ | 8405/9500 [28:48:42<3:49:14, 12.56s/it]08/04/2024 02:46:13 - INFO - __main__ -   Step: 8405, LR: 2.3751174298049565e-06, Loss: 421.60845947265625
2024-08-04T09:46:24.981869194Z 
 88%|████████▊ | 8406/9500 [28:48:54<3:45:43, 12.38s/it]08/04/2024 02:46:24 - INFO - __main__ -   Step: 8406, LR: 2.3729468861176777e-06, Loss: 317.89447021484375
2024-08-04T09:46:36.876797903Z 
 88%|████████▊ | 8407/9500 [28:49:06<3:42:52, 12.23s/it]08/04/2024 02:46:36 - INFO - __main__ -   Step: 8407, LR: 2.370776342430399e-06, Loss: 370.22723388671875
2024-08-04T09:46:49.728433141Z 
 89%|████████▊ | 8408/9500 [28:49:19<3:46:02, 12.42s/it]08/04/2024 02:46:49 - INFO - __main__ -   Step: 8408, LR: 2.3686057987431195e-06, Loss: 367.2169189453125
2024-08-04T09:47:01.792411751Z 
 89%|████████▊ | 8409/9500 [28:49:31<3:43:53, 12.31s/it]08/04/2024 02:47:01 - INFO - __main__ -   Step: 8409, LR: 2.366435255055841e-06, Loss: 410.79083251953125
2024-08-04T09:47:13.788529662Z 
 89%|████████▊ | 8410/9500 [28:49:43<3:41:57, 12.22s/it]08/04/2024 02:47:13 - INFO - __main__ -   Step: 8410, LR: 2.3642647113685618e-06, Loss: 335.09698486328125
2024-08-04T09:47:26.774521319Z 
 89%|████████▊ | 8411/9500 [28:49:56<3:45:56, 12.45s/it]08/04/2024 02:47:26 - INFO - __main__ -   Step: 8411, LR: 2.362094167681283e-06, Loss: 393.9739990234375
2024-08-04T09:47:39.078842140Z 
 89%|████████▊ | 8412/9500 [28:50:09<3:44:56, 12.41s/it]08/04/2024 02:47:39 - INFO - __main__ -   Step: 8412, LR: 2.359923623994004e-06, Loss: 429.2066650390625
2024-08-04T09:47:51.522246472Z 
 89%|████████▊ | 8413/9500 [28:50:21<3:44:56, 12.42s/it]08/04/2024 02:47:51 - INFO - __main__ -   Step: 8413, LR: 2.357753080306725e-06, Loss: 481.9345703125
2024-08-04T09:48:04.261422135Z 
 89%|████████▊ | 8414/9500 [28:50:34<3:46:29, 12.51s/it]08/04/2024 02:48:04 - INFO - __main__ -   Step: 8414, LR: 2.3555825366194463e-06, Loss: 368.45501708984375
2024-08-04T09:48:16.477606917Z 
 89%|████████▊ | 8415/9500 [28:50:46<3:44:40, 12.42s/it]08/04/2024 02:48:16 - INFO - __main__ -   Step: 8415, LR: 2.353411992932167e-06, Loss: 421.1510314941406
2024-08-04T09:48:28.836113693Z 
 89%|████████▊ | 8416/9500 [28:50:58<3:44:06, 12.40s/it]08/04/2024 02:48:28 - INFO - __main__ -   Step: 8416, LR: 2.3512414492448886e-06, Loss: 515.971435546875
2024-08-04T09:48:41.291516058Z 
 89%|████████▊ | 8417/9500 [28:51:11<3:44:10, 12.42s/it]08/04/2024 02:48:41 - INFO - __main__ -   Step: 8417, LR: 2.3490709055576093e-06, Loss: 347.36578369140625
2024-08-04T09:48:53.314017605Z 
 89%|████████▊ | 8418/9500 [28:51:23<3:41:49, 12.30s/it]08/04/2024 02:48:53 - INFO - __main__ -   Step: 8418, LR: 2.3469003618703304e-06, Loss: 447.0045471191406
2024-08-04T09:49:05.427359220Z 
 89%|████████▊ | 8419/9500 [28:51:35<3:40:36, 12.24s/it]08/04/2024 02:49:05 - INFO - __main__ -   Step: 8419, LR: 2.3447298181830516e-06, Loss: 448.24273681640625
2024-08-04T09:49:17.837577454Z 
 89%|████████▊ | 8420/9500 [28:51:47<3:41:17, 12.29s/it]08/04/2024 02:49:17 - INFO - __main__ -   Step: 8420, LR: 2.3425592744957727e-06, Loss: 354.45782470703125
2024-08-04T09:49:29.994868619Z 
 89%|████████▊ | 8421/9500 [28:51:59<3:40:21, 12.25s/it]08/04/2024 02:49:29 - INFO - __main__ -   Step: 8421, LR: 2.340388730808494e-06, Loss: 364.14837646484375
2024-08-04T09:49:41.994073988Z 
 89%|████████▊ | 8422/9500 [28:52:11<3:38:46, 12.18s/it]08/04/2024 02:49:41 - INFO - __main__ -   Step: 8422, LR: 2.338218187121215e-06, Loss: 332.8310546875
2024-08-04T09:49:54.466494960Z 
 89%|████████▊ | 8423/9500 [28:52:24<3:40:09, 12.27s/it]08/04/2024 02:49:54 - INFO - __main__ -   Step: 8423, LR: 2.336047643433936e-06, Loss: 434.35107421875
2024-08-04T09:50:07.058358360Z 
 89%|████████▊ | 8424/9500 [28:52:36<3:41:43, 12.36s/it]08/04/2024 02:50:07 - INFO - __main__ -   Step: 8424, LR: 2.333877099746657e-06, Loss: 317.25048828125
2024-08-04T09:50:19.258393392Z 
 89%|████████▊ | 8425/9500 [28:52:49<3:40:38, 12.31s/it]08/04/2024 02:50:19 - INFO - __main__ -   Step: 8425, LR: 2.3317065560593784e-06, Loss: 447.5384521484375
2024-08-04T09:50:31.692594232Z 
 89%|████████▊ | 8426/9500 [28:53:01<3:41:04, 12.35s/it]08/04/2024 02:50:31 - INFO - __main__ -   Step: 8426, LR: 2.329536012372099e-06, Loss: 462.1962585449219
2024-08-04T09:50:44.275263342Z 
 89%|████████▊ | 8427/9500 [28:53:14<3:42:06, 12.42s/it]08/04/2024 02:50:44 - INFO - __main__ -   Step: 8427, LR: 2.3273654686848202e-06, Loss: 392.93084716796875
2024-08-04T09:50:56.849304524Z 
 89%|████████▊ | 8428/9500 [28:53:26<3:42:43, 12.47s/it]08/04/2024 02:50:56 - INFO - __main__ -   Step: 8428, LR: 2.3251949249975414e-06, Loss: 319.8452453613281
2024-08-04T09:51:09.360127959Z 
 89%|████████▊ | 8429/9500 [28:53:39<3:42:45, 12.48s/it]08/04/2024 02:51:09 - INFO - __main__ -   Step: 8429, LR: 2.3230243813102625e-06, Loss: 421.36248779296875
2024-08-04T09:51:21.897162429Z 
 89%|████████▊ | 8430/9500 [28:53:51<3:42:51, 12.50s/it]08/04/2024 02:51:21 - INFO - __main__ -   Step: 8430, LR: 2.3208538376229837e-06, Loss: 430.737548828125
2024-08-04T09:51:33.987295251Z 
 89%|████████▊ | 8431/9500 [28:54:03<3:40:28, 12.37s/it]08/04/2024 02:51:33 - INFO - __main__ -   Step: 8431, LR: 2.3186832939357044e-06, Loss: 433.5672912597656
2024-08-04T09:51:45.986444338Z 
 89%|████████▉ | 8432/9500 [28:54:15<3:38:15, 12.26s/it]08/04/2024 02:51:45 - INFO - __main__ -   Step: 8432, LR: 2.316512750248426e-06, Loss: 372.171630859375
2024-08-04T09:51:58.670817467Z 
 89%|████████▉ | 8433/9500 [28:54:28<3:40:18, 12.39s/it]08/04/2024 02:51:58 - INFO - __main__ -   Step: 8433, LR: 2.3143422065611466e-06, Loss: 468.9211120605469
2024-08-04T09:52:10.678488425Z 
 89%|████████▉ | 8434/9500 [28:54:40<3:38:04, 12.27s/it]08/04/2024 02:52:10 - INFO - __main__ -   Step: 8434, LR: 2.3121716628738678e-06, Loss: 347.0201416015625
2024-08-04T09:52:22.943776479Z 
 89%|████████▉ | 8435/9500 [28:54:52<3:37:49, 12.27s/it]08/04/2024 02:52:22 - INFO - __main__ -   Step: 8435, LR: 2.310001119186589e-06, Loss: 495.4979553222656
2024-08-04T09:52:35.356671788Z 
 89%|████████▉ | 8436/9500 [28:55:05<3:38:22, 12.31s/it]08/04/2024 02:52:35 - INFO - __main__ -   Step: 8436, LR: 2.30783057549931e-06, Loss: 381.676025390625
2024-08-04T09:52:47.555152132Z 
 89%|████████▉ | 8437/9500 [28:55:17<3:37:33, 12.28s/it]08/04/2024 02:52:47 - INFO - __main__ -   Step: 8437, LR: 2.305660031812031e-06, Loss: 365.0447692871094
2024-08-04T09:52:59.548371081Z 
 89%|████████▉ | 8438/9500 [28:55:29<3:35:49, 12.19s/it]08/04/2024 02:52:59 - INFO - __main__ -   Step: 8438, LR: 2.3034894881247523e-06, Loss: 376.9042663574219
2024-08-04T09:53:12.242789519Z 
 89%|████████▉ | 8439/9500 [28:55:42<3:38:16, 12.34s/it]08/04/2024 02:53:12 - INFO - __main__ -   Step: 8439, LR: 2.3013189444374735e-06, Loss: 387.5611877441406
2024-08-04T09:53:24.549181050Z 
 89%|████████▉ | 8440/9500 [28:55:54<3:37:52, 12.33s/it]08/04/2024 02:53:24 - INFO - __main__ -   Step: 8440, LR: 2.299148400750194e-06, Loss: 460.55572509765625
2024-08-04T09:53:36.728484151Z 
 89%|████████▉ | 8441/9500 [28:56:06<3:36:51, 12.29s/it]08/04/2024 02:53:36 - INFO - __main__ -   Step: 8441, LR: 2.2969778570629153e-06, Loss: 423.76739501953125
2024-08-04T09:53:49.551141715Z 
 89%|████████▉ | 8442/9500 [28:56:19<3:39:29, 12.45s/it]08/04/2024 02:53:49 - INFO - __main__ -   Step: 8442, LR: 2.2948073133756364e-06, Loss: 364.29107666015625
2024-08-04T09:54:01.904465314Z 
 89%|████████▉ | 8443/9500 [28:56:31<3:38:47, 12.42s/it]08/04/2024 02:54:01 - INFO - __main__ -   Step: 8443, LR: 2.2926367696883576e-06, Loss: 396.51220703125
2024-08-04T09:54:14.090923817Z 
 89%|████████▉ | 8444/9500 [28:56:44<3:37:20, 12.35s/it]08/04/2024 02:54:14 - INFO - __main__ -   Step: 8444, LR: 2.2904662260010787e-06, Loss: 378.7782897949219
2024-08-04T09:54:26.810567459Z 
 89%|████████▉ | 8445/9500 [28:56:56<3:39:05, 12.46s/it]08/04/2024 02:54:26 - INFO - __main__ -   Step: 8445, LR: 2.2882956823138e-06, Loss: 497.68292236328125
2024-08-04T09:54:39.260589334Z 
 89%|████████▉ | 8446/9500 [28:57:09<3:38:50, 12.46s/it]08/04/2024 02:54:39 - INFO - __main__ -   Step: 8446, LR: 2.286125138626521e-06, Loss: 397.0843505859375
2024-08-04T09:54:51.565393771Z 
 89%|████████▉ | 8447/9500 [28:57:21<3:37:49, 12.41s/it]08/04/2024 02:54:51 - INFO - __main__ -   Step: 8447, LR: 2.283954594939242e-06, Loss: 398.8816223144531
2024-08-04T09:55:04.227404459Z 
 89%|████████▉ | 8448/9500 [28:57:34<3:38:56, 12.49s/it]08/04/2024 02:55:04 - INFO - __main__ -   Step: 8448, LR: 2.281784051251963e-06, Loss: 469.23846435546875
2024-08-04T09:55:16.301493817Z 
 89%|████████▉ | 8449/9500 [28:57:46<3:36:33, 12.36s/it]08/04/2024 02:55:16 - INFO - __main__ -   Step: 8449, LR: 2.279613507564684e-06, Loss: 336.4850769042969
2024-08-04T09:55:28.459372081Z 
 89%|████████▉ | 8450/9500 [28:57:58<3:35:16, 12.30s/it]08/04/2024 02:55:28 - INFO - __main__ -   Step: 8450, LR: 2.277442963877405e-06, Loss: 359.0802917480469
2024-08-04T09:55:41.112666376Z 
 89%|████████▉ | 8451/9500 [28:58:11<3:36:54, 12.41s/it]08/04/2024 02:55:41 - INFO - __main__ -   Step: 8451, LR: 2.2752724201901262e-06, Loss: 464.75177001953125
2024-08-04T09:55:53.179263110Z 
 89%|████████▉ | 8452/9500 [28:58:23<3:34:55, 12.30s/it]08/04/2024 02:55:53 - INFO - __main__ -   Step: 8452, LR: 2.2731018765028474e-06, Loss: 449.3597106933594
2024-08-04T09:56:05.657070869Z 
 89%|████████▉ | 8453/9500 [28:58:35<3:35:37, 12.36s/it]08/04/2024 02:56:05 - INFO - __main__ -   Step: 8453, LR: 2.2709313328155685e-06, Loss: 473.2416076660156
2024-08-04T09:56:18.113950959Z 
 89%|████████▉ | 8454/9500 [28:58:48<3:35:56, 12.39s/it]08/04/2024 02:56:18 - INFO - __main__ -   Step: 8454, LR: 2.2687607891282896e-06, Loss: 355.0233154296875
2024-08-04T09:56:30.244422249Z 
 89%|████████▉ | 8455/9500 [28:59:00<3:34:23, 12.31s/it]08/04/2024 02:56:30 - INFO - __main__ -   Step: 8455, LR: 2.2665902454410104e-06, Loss: 365.77337646484375
2024-08-04T09:56:42.255281224Z 
 89%|████████▉ | 8456/9500 [28:59:12<3:32:37, 12.22s/it]08/04/2024 02:56:42 - INFO - __main__ -   Step: 8456, LR: 2.264419701753732e-06, Loss: 333.323486328125
2024-08-04T09:56:54.784515291Z 
 89%|████████▉ | 8457/9500 [28:59:24<3:34:02, 12.31s/it]08/04/2024 02:56:54 - INFO - __main__ -   Step: 8457, LR: 2.2622491580664526e-06, Loss: 467.64178466796875
2024-08-04T09:57:07.254846298Z 
 89%|████████▉ | 8458/9500 [28:59:37<3:34:39, 12.36s/it]08/04/2024 02:57:07 - INFO - __main__ -   Step: 8458, LR: 2.2600786143791738e-06, Loss: 388.1178283691406
2024-08-04T09:57:19.456333291Z 
 89%|████████▉ | 8459/9500 [28:59:49<3:33:37, 12.31s/it]08/04/2024 02:57:19 - INFO - __main__ -   Step: 8459, LR: 2.257908070691895e-06, Loss: 550.974609375
2024-08-04T09:57:32.154009860Z 
 89%|████████▉ | 8460/9500 [29:00:02<3:35:25, 12.43s/it]08/04/2024 02:57:32 - INFO - __main__ -   Step: 8460, LR: 2.255737527004616e-06, Loss: 341.8669738769531
2024-08-04T09:57:44.460041201Z 
 89%|████████▉ | 8461/9500 [29:00:14<3:34:34, 12.39s/it]08/04/2024 02:57:44 - INFO - __main__ -   Step: 8461, LR: 2.253566983317337e-06, Loss: 359.907470703125
2024-08-04T09:57:56.600868577Z 
 89%|████████▉ | 8462/9500 [29:00:26<3:33:04, 12.32s/it]08/04/2024 02:57:56 - INFO - __main__ -   Step: 8462, LR: 2.251396439630058e-06, Loss: 374.80078125
2024-08-04T09:58:09.294653963Z 
 89%|████████▉ | 8463/9500 [29:00:39<3:34:49, 12.43s/it]08/04/2024 02:58:09 - INFO - __main__ -   Step: 8463, LR: 2.2492258959427795e-06, Loss: 461.028564453125
2024-08-04T09:58:21.491159849Z 
 89%|████████▉ | 8464/9500 [29:00:51<3:33:24, 12.36s/it]08/04/2024 02:58:21 - INFO - __main__ -   Step: 8464, LR: 2.2470553522555e-06, Loss: 367.5687561035156
2024-08-04T09:58:33.403674332Z 
 89%|████████▉ | 8465/9500 [29:01:03<3:30:53, 12.23s/it]08/04/2024 02:58:33 - INFO - __main__ -   Step: 8465, LR: 2.2448848085682213e-06, Loss: 367.89923095703125
2024-08-04T09:58:45.888113431Z 
 89%|████████▉ | 8466/9500 [29:01:15<3:32:01, 12.30s/it]08/04/2024 02:58:45 - INFO - __main__ -   Step: 8466, LR: 2.2427142648809424e-06, Loss: 333.2204284667969
2024-08-04T09:58:58.313972701Z 
 89%|████████▉ | 8467/9500 [29:01:28<3:32:27, 12.34s/it]08/04/2024 02:58:58 - INFO - __main__ -   Step: 8467, LR: 2.2405437211936636e-06, Loss: 438.3905334472656
2024-08-04T09:59:10.465548732Z 
 89%|████████▉ | 8468/9500 [29:01:40<3:31:16, 12.28s/it]08/04/2024 02:59:10 - INFO - __main__ -   Step: 8468, LR: 2.2383731775063847e-06, Loss: 526.10791015625
2024-08-04T09:59:22.915355243Z 
 89%|████████▉ | 8469/9500 [29:01:52<3:31:55, 12.33s/it]08/04/2024 02:59:22 - INFO - __main__ -   Step: 8469, LR: 2.2362026338191054e-06, Loss: 473.28582763671875
2024-08-04T09:59:35.550274640Z 
 89%|████████▉ | 8470/9500 [29:02:05<3:33:16, 12.42s/it]08/04/2024 02:59:35 - INFO - __main__ -   Step: 8470, LR: 2.234032090131827e-06, Loss: 401.94403076171875
2024-08-04T09:59:47.503086514Z 
 89%|████████▉ | 8471/9500 [29:02:17<3:30:38, 12.28s/it]08/04/2024 02:59:47 - INFO - __main__ -   Step: 8471, LR: 2.2318615464445477e-06, Loss: 328.9535827636719
2024-08-04T09:59:59.561698582Z 
 89%|████████▉ | 8472/9500 [29:02:29<3:29:17, 12.22s/it]08/04/2024 02:59:59 - INFO - __main__ -   Step: 8472, LR: 2.2296910027572693e-06, Loss: 460.5701904296875
2024-08-04T10:00:12.027806106Z 
 89%|████████▉ | 8473/9500 [29:02:41<3:30:22, 12.29s/it]08/04/2024 03:00:12 - INFO - __main__ -   Step: 8473, LR: 2.22752045906999e-06, Loss: 342.9433288574219
2024-08-04T10:00:24.117740431Z 
 89%|████████▉ | 8474/9500 [29:02:54<3:29:08, 12.23s/it]08/04/2024 03:00:24 - INFO - __main__ -   Step: 8474, LR: 2.225349915382711e-06, Loss: 481.53887939453125
2024-08-04T10:00:36.840006779Z 
 89%|████████▉ | 8475/9500 [29:03:06<3:31:27, 12.38s/it]08/04/2024 03:00:36 - INFO - __main__ -   Step: 8475, LR: 2.2231793716954322e-06, Loss: 432.15826416015625
2024-08-04T10:00:49.383945852Z 
 89%|████████▉ | 8476/9500 [29:03:19<3:32:05, 12.43s/it]08/04/2024 03:00:49 - INFO - __main__ -   Step: 8476, LR: 2.2210088280081534e-06, Loss: 440.1866455078125
2024-08-04T10:01:01.414158313Z 
 89%|████████▉ | 8477/9500 [29:03:31<3:29:51, 12.31s/it]08/04/2024 03:01:01 - INFO - __main__ -   Step: 8477, LR: 2.2188382843208745e-06, Loss: 364.761962890625
2024-08-04T10:01:13.650719937Z 
 89%|████████▉ | 8478/9500 [29:03:43<3:29:17, 12.29s/it]08/04/2024 03:01:13 - INFO - __main__ -   Step: 8478, LR: 2.2166677406335952e-06, Loss: 418.04437255859375
2024-08-04T10:01:26.163798402Z 
 89%|████████▉ | 8479/9500 [29:03:56<3:30:14, 12.35s/it]08/04/2024 03:01:26 - INFO - __main__ -   Step: 8479, LR: 2.2144971969463168e-06, Loss: 361.3362731933594
2024-08-04T10:01:38.304425440Z 
 89%|████████▉ | 8480/9500 [29:04:08<3:28:56, 12.29s/it]08/04/2024 03:01:38 - INFO - __main__ -   Step: 8480, LR: 2.2123266532590375e-06, Loss: 329.993408203125
2024-08-04T10:01:50.382192462Z 
 89%|████████▉ | 8481/9500 [29:04:20<3:27:38, 12.23s/it]08/04/2024 03:01:50 - INFO - __main__ -   Step: 8481, LR: 2.2101561095717586e-06, Loss: 358.20965576171875
2024-08-04T10:02:02.741228868Z 
 89%|████████▉ | 8482/9500 [29:04:32<3:28:07, 12.27s/it]08/04/2024 03:02:02 - INFO - __main__ -   Step: 8482, LR: 2.2079855658844798e-06, Loss: 400.22064208984375
2024-08-04T10:02:15.220669044Z 
 89%|████████▉ | 8483/9500 [29:04:45<3:29:00, 12.33s/it]08/04/2024 03:02:15 - INFO - __main__ -   Step: 8483, LR: 2.205815022197201e-06, Loss: 404.96038818359375
2024-08-04T10:02:27.537921945Z 
 89%|████████▉ | 8484/9500 [29:04:57<3:28:43, 12.33s/it]08/04/2024 03:02:27 - INFO - __main__ -   Step: 8484, LR: 2.203644478509922e-06, Loss: 380.206298828125
2024-08-04T10:02:40.230596324Z 
 89%|████████▉ | 8485/9500 [29:05:10<3:30:22, 12.44s/it]08/04/2024 03:02:40 - INFO - __main__ -   Step: 8485, LR: 2.201473934822643e-06, Loss: 502.43084716796875
2024-08-04T10:02:52.431576427Z 
 89%|████████▉ | 8486/9500 [29:05:22<3:28:58, 12.37s/it]08/04/2024 03:02:52 - INFO - __main__ -   Step: 8486, LR: 2.1993033911353643e-06, Loss: 477.3318176269531
2024-08-04T10:03:04.653031294Z 
 89%|████████▉ | 8487/9500 [29:05:34<3:28:02, 12.32s/it]08/04/2024 03:03:04 - INFO - __main__ -   Step: 8487, LR: 2.197132847448085e-06, Loss: 475.27362060546875
2024-08-04T10:03:17.687519138Z 
 89%|████████▉ | 8488/9500 [29:05:47<3:31:26, 12.54s/it]08/04/2024 03:03:17 - INFO - __main__ -   Step: 8488, LR: 2.194962303760806e-06, Loss: 469.33758544921875
2024-08-04T10:03:29.900490150Z 
 89%|████████▉ | 8489/9500 [29:05:59<3:29:35, 12.44s/it]08/04/2024 03:03:29 - INFO - __main__ -   Step: 8489, LR: 2.1927917600735273e-06, Loss: 286.1722412109375
2024-08-04T10:03:41.921002492Z 
 89%|████████▉ | 8490/9500 [29:06:11<3:27:16, 12.31s/it]08/04/2024 03:03:41 - INFO - __main__ -   Step: 8490, LR: 2.1906212163862484e-06, Loss: 315.2229919433594
2024-08-04T10:03:54.616202632Z 
 89%|████████▉ | 8491/9500 [29:06:24<3:28:59, 12.43s/it]08/04/2024 03:03:54 - INFO - __main__ -   Step: 8491, LR: 2.1884506726989696e-06, Loss: 423.968505859375
2024-08-04T10:04:07.115086100Z 
 89%|████████▉ | 8492/9500 [29:06:37<3:29:08, 12.45s/it]08/04/2024 03:04:07 - INFO - __main__ -   Step: 8492, LR: 2.1862801290116907e-06, Loss: 475.3629150390625
2024-08-04T10:04:19.396021367Z 
 89%|████████▉ | 8493/9500 [29:06:49<3:28:05, 12.40s/it]08/04/2024 03:04:19 - INFO - __main__ -   Step: 8493, LR: 2.184109585324412e-06, Loss: 292.08367919921875
2024-08-04T10:04:31.831371049Z 
 89%|████████▉ | 8494/9500 [29:07:01<3:28:04, 12.41s/it]08/04/2024 03:04:31 - INFO - __main__ -   Step: 8494, LR: 2.181939041637133e-06, Loss: 446.57177734375
2024-08-04T10:04:43.965358414Z 
 89%|████████▉ | 8495/9500 [29:07:13<3:26:28, 12.33s/it]08/04/2024 03:04:43 - INFO - __main__ -   Step: 8495, LR: 2.1797684979498537e-06, Loss: 328.4481201171875
2024-08-04T10:04:56.346181419Z 
 89%|████████▉ | 8496/9500 [29:07:26<3:26:32, 12.34s/it]08/04/2024 03:04:56 - INFO - __main__ -   Step: 8496, LR: 2.177597954262575e-06, Loss: 484.8935852050781
2024-08-04T10:05:08.967123949Z 
 89%|████████▉ | 8497/9500 [29:07:38<3:27:43, 12.43s/it]08/04/2024 03:05:08 - INFO - __main__ -   Step: 8497, LR: 2.175427410575296e-06, Loss: 344.09619140625
2024-08-04T10:05:21.663591149Z 
 89%|████████▉ | 8498/9500 [29:07:51<3:28:52, 12.51s/it]08/04/2024 03:05:21 - INFO - __main__ -   Step: 8498, LR: 2.173256866888017e-06, Loss: 430.8974914550781
2024-08-04T10:05:33.827007358Z 
 89%|████████▉ | 8499/9500 [29:08:03<3:26:56, 12.40s/it]08/04/2024 03:05:33 - INFO - __main__ -   Step: 8499, LR: 2.1710863232007382e-06, Loss: 478.5279235839844
2024-08-04T10:05:46.320607341Z 
 89%|████████▉ | 8500/9500 [29:08:16<3:27:11, 12.43s/it]08/04/2024 03:05:46 - INFO - __main__ -   Step: 8500, LR: 2.168915779513459e-06, Loss: 466.09912109375
2024-08-04T10:05:58.393381506Z 
 89%|████████▉ | 8501/9500 [29:08:28<3:25:11, 12.32s/it]08/04/2024 03:05:58 - INFO - __main__ -   Step: 8501, LR: 2.1667452358261805e-06, Loss: 413.223876953125
2024-08-04T10:06:10.438394007Z 
 89%|████████▉ | 8502/9500 [29:08:40<3:23:35, 12.24s/it]08/04/2024 03:06:10 - INFO - __main__ -   Step: 8502, LR: 2.1645746921389012e-06, Loss: 398.7275390625
2024-08-04T10:06:22.863145208Z 
 90%|████████▉ | 8503/9500 [29:08:52<3:24:18, 12.30s/it]08/04/2024 03:06:22 - INFO - __main__ -   Step: 8503, LR: 2.1624041484516223e-06, Loss: 350.8880615234375
2024-08-04T10:06:34.942199715Z 
 90%|████████▉ | 8504/9500 [29:09:04<3:23:01, 12.23s/it]08/04/2024 03:06:34 - INFO - __main__ -   Step: 8504, LR: 2.1602336047643435e-06, Loss: 275.9541931152344
2024-08-04T10:06:47.197147607Z 
 90%|████████▉ | 8505/9500 [29:09:17<3:22:56, 12.24s/it]08/04/2024 03:06:47 - INFO - __main__ -   Step: 8505, LR: 2.1580630610770646e-06, Loss: 451.8326416015625
2024-08-04T10:06:59.692294031Z 
 90%|████████▉ | 8506/9500 [29:09:29<3:24:01, 12.32s/it]08/04/2024 03:06:59 - INFO - __main__ -   Step: 8506, LR: 2.1558925173897858e-06, Loss: 358.4366149902344
2024-08-04T10:07:11.862460381Z 
 90%|████████▉ | 8507/9500 [29:09:41<3:23:05, 12.27s/it]08/04/2024 03:07:11 - INFO - __main__ -   Step: 8507, LR: 2.153721973702507e-06, Loss: 421.9605712890625
2024-08-04T10:07:24.384549952Z 
 90%|████████▉ | 8508/9500 [29:09:54<3:24:07, 12.35s/it]08/04/2024 03:07:24 - INFO - __main__ -   Step: 8508, LR: 2.151551430015228e-06, Loss: 371.17681884765625
2024-08-04T10:07:36.396414750Z 
 90%|████████▉ | 8509/9500 [29:10:06<3:22:16, 12.25s/it]08/04/2024 03:07:36 - INFO - __main__ -   Step: 8509, LR: 2.1493808863279487e-06, Loss: 389.3983154296875
2024-08-04T10:07:49.046922487Z 
 90%|████████▉ | 8510/9500 [29:10:18<3:24:03, 12.37s/it]08/04/2024 03:07:49 - INFO - __main__ -   Step: 8510, LR: 2.1472103426406703e-06, Loss: 478.482421875
2024-08-04T10:08:01.534923913Z 
 90%|████████▉ | 8511/9500 [29:10:31<3:24:27, 12.40s/it]08/04/2024 03:08:01 - INFO - __main__ -   Step: 8511, LR: 2.145039798953391e-06, Loss: 284.2403564453125
2024-08-04T10:08:14.024045500Z 
 90%|████████▉ | 8512/9500 [29:10:43<3:24:40, 12.43s/it]08/04/2024 03:08:14 - INFO - __main__ -   Step: 8512, LR: 2.142869255266112e-06, Loss: 357.5836181640625
2024-08-04T10:08:26.434548468Z 
 90%|████████▉ | 8513/9500 [29:10:56<3:24:22, 12.42s/it]08/04/2024 03:08:26 - INFO - __main__ -   Step: 8513, LR: 2.1406987115788333e-06, Loss: 291.8233642578125
2024-08-04T10:08:38.984258148Z 
 90%|████████▉ | 8514/9500 [29:11:08<3:24:47, 12.46s/it]08/04/2024 03:08:38 - INFO - __main__ -   Step: 8514, LR: 2.1385281678915544e-06, Loss: 463.53546142578125
2024-08-04T10:08:51.305963832Z 
 90%|████████▉ | 8515/9500 [29:11:21<3:23:53, 12.42s/it]08/04/2024 03:08:51 - INFO - __main__ -   Step: 8515, LR: 2.1363576242042756e-06, Loss: 400.21490478515625
2024-08-04T10:09:03.927075653Z 
 90%|████████▉ | 8516/9500 [29:11:33<3:24:40, 12.48s/it]08/04/2024 03:09:03 - INFO - __main__ -   Step: 8516, LR: 2.1341870805169963e-06, Loss: 414.378662109375
2024-08-04T10:09:16.480052859Z 
 90%|████████▉ | 8517/9500 [29:11:46<3:24:49, 12.50s/it]08/04/2024 03:09:16 - INFO - __main__ -   Step: 8517, LR: 2.132016536829718e-06, Loss: 441.91412353515625
2024-08-04T10:09:28.470349947Z 
 90%|████████▉ | 8518/9500 [29:11:58<3:22:06, 12.35s/it]08/04/2024 03:09:28 - INFO - __main__ -   Step: 8518, LR: 2.1298459931424385e-06, Loss: 351.02374267578125
2024-08-04T10:09:41.305359760Z 
 90%|████████▉ | 8519/9500 [29:12:11<3:24:17, 12.49s/it]08/04/2024 03:09:41 - INFO - __main__ -   Step: 8519, LR: 2.12767544945516e-06, Loss: 563.2239990234375
2024-08-04T10:09:53.649488468Z 
 90%|████████▉ | 8520/9500 [29:12:23<3:23:20, 12.45s/it]08/04/2024 03:09:53 - INFO - __main__ -   Step: 8520, LR: 2.125504905767881e-06, Loss: 312.3760681152344
2024-08-04T10:10:05.769455717Z 
 90%|████████▉ | 8521/9500 [29:12:35<3:21:31, 12.35s/it]08/04/2024 03:10:05 - INFO - __main__ -   Step: 8521, LR: 2.123334362080602e-06, Loss: 360.58074951171875
2024-08-04T10:10:18.527544415Z 
 90%|████████▉ | 8522/9500 [29:12:48<3:23:18, 12.47s/it]08/04/2024 03:10:18 - INFO - __main__ -   Step: 8522, LR: 2.121163818393323e-06, Loss: 389.6340637207031
2024-08-04T10:10:30.749278771Z 
 90%|████████▉ | 8523/9500 [29:13:00<3:21:52, 12.40s/it]08/04/2024 03:10:30 - INFO - __main__ -   Step: 8523, LR: 2.1189932747060442e-06, Loss: 334.3212585449219
2024-08-04T10:10:43.153727403Z 
 90%|████████▉ | 8524/9500 [29:13:13<3:21:41, 12.40s/it]08/04/2024 03:10:43 - INFO - __main__ -   Step: 8524, LR: 2.1168227310187654e-06, Loss: 437.69281005859375
2024-08-04T10:10:55.719424014Z 
 90%|████████▉ | 8525/9500 [29:13:25<3:22:18, 12.45s/it]08/04/2024 03:10:55 - INFO - __main__ -   Step: 8525, LR: 2.114652187331486e-06, Loss: 438.85552978515625
2024-08-04T10:11:07.857668898Z 
 90%|████████▉ | 8526/9500 [29:13:37<3:20:34, 12.36s/it]08/04/2024 03:11:07 - INFO - __main__ -   Step: 8526, LR: 2.1124816436442076e-06, Loss: 412.72564697265625
2024-08-04T10:11:20.161417537Z 
 90%|████████▉ | 8527/9500 [29:13:50<3:20:07, 12.34s/it]08/04/2024 03:11:20 - INFO - __main__ -   Step: 8527, LR: 2.1103110999569283e-06, Loss: 337.99078369140625
2024-08-04T10:11:32.801528368Z 
 90%|████████▉ | 8528/9500 [29:14:02<3:21:21, 12.43s/it]08/04/2024 03:11:32 - INFO - __main__ -   Step: 8528, LR: 2.1081405562696495e-06, Loss: 332.08477783203125
2024-08-04T10:11:45.106899502Z 
 90%|████████▉ | 8529/9500 [29:14:15<3:20:33, 12.39s/it]08/04/2024 03:11:45 - INFO - __main__ -   Step: 8529, LR: 2.1059700125823706e-06, Loss: 318.8251953125
2024-08-04T10:11:57.340620172Z 
 90%|████████▉ | 8530/9500 [29:14:27<3:19:34, 12.35s/it]08/04/2024 03:11:57 - INFO - __main__ -   Step: 8530, LR: 2.1037994688950918e-06, Loss: 327.06005859375
2024-08-04T10:12:10.010101350Z 
 90%|████████▉ | 8531/9500 [29:14:39<3:20:56, 12.44s/it]08/04/2024 03:12:10 - INFO - __main__ -   Step: 8531, LR: 2.101628925207813e-06, Loss: 459.35260009765625
2024-08-04T10:12:22.062835932Z 
 90%|████████▉ | 8532/9500 [29:14:51<3:18:51, 12.33s/it]08/04/2024 03:12:22 - INFO - __main__ -   Step: 8532, LR: 2.099458381520534e-06, Loss: 440.0657958984375
2024-08-04T10:12:34.107551520Z 
 90%|████████▉ | 8533/9500 [29:15:04<3:17:17, 12.24s/it]08/04/2024 03:12:34 - INFO - __main__ -   Step: 8533, LR: 2.097287837833255e-06, Loss: 394.60662841796875
2024-08-04T10:12:46.795739794Z 
 90%|████████▉ | 8534/9500 [29:15:16<3:19:14, 12.38s/it]08/04/2024 03:12:46 - INFO - __main__ -   Step: 8534, LR: 2.095117294145976e-06, Loss: 332.9089050292969
2024-08-04T10:12:58.888611021Z 
 90%|████████▉ | 8535/9500 [29:15:28<3:17:40, 12.29s/it]08/04/2024 03:12:58 - INFO - __main__ -   Step: 8535, LR: 2.092946750458697e-06, Loss: 395.10650634765625
2024-08-04T10:13:11.144342905Z 
 90%|████████▉ | 8536/9500 [29:15:41<3:17:18, 12.28s/it]08/04/2024 03:13:11 - INFO - __main__ -   Step: 8536, LR: 2.090776206771418e-06, Loss: 435.291015625
2024-08-04T10:13:24.052034041Z 
 90%|████████▉ | 8537/9500 [29:15:53<3:20:07, 12.47s/it]08/04/2024 03:13:24 - INFO - __main__ -   Step: 8537, LR: 2.0886056630841393e-06, Loss: 371.859375
2024-08-04T10:13:36.301662964Z 
 90%|████████▉ | 8538/9500 [29:16:06<3:18:51, 12.40s/it]08/04/2024 03:13:36 - INFO - __main__ -   Step: 8538, LR: 2.0864351193968604e-06, Loss: 407.99273681640625
2024-08-04T10:13:48.573187652Z 
 90%|████████▉ | 8539/9500 [29:16:18<3:18:01, 12.36s/it]08/04/2024 03:13:48 - INFO - __main__ -   Step: 8539, LR: 2.0842645757095816e-06, Loss: 413.72589111328125
2024-08-04T10:14:01.061827080Z 
 90%|████████▉ | 8540/9500 [29:16:30<3:18:24, 12.40s/it]08/04/2024 03:14:01 - INFO - __main__ -   Step: 8540, LR: 2.0820940320223023e-06, Loss: 291.0521545410156
2024-08-04T10:14:13.482953739Z 
 90%|████████▉ | 8541/9500 [29:16:43<3:18:18, 12.41s/it]08/04/2024 03:14:13 - INFO - __main__ -   Step: 8541, LR: 2.0799234883350234e-06, Loss: 410.1968688964844
2024-08-04T10:14:25.658829738Z 
 90%|████████▉ | 8542/9500 [29:16:55<3:16:59, 12.34s/it]08/04/2024 03:14:25 - INFO - __main__ -   Step: 8542, LR: 2.0777529446477445e-06, Loss: 370.34307861328125
2024-08-04T10:14:38.422024807Z 
 90%|████████▉ | 8543/9500 [29:17:08<3:18:49, 12.47s/it]08/04/2024 03:14:38 - INFO - __main__ -   Step: 8543, LR: 2.0755824009604657e-06, Loss: 430.220458984375
2024-08-04T10:14:50.597227207Z 
 90%|████████▉ | 8544/9500 [29:17:20<3:17:13, 12.38s/it]08/04/2024 03:14:50 - INFO - __main__ -   Step: 8544, LR: 2.073411857273187e-06, Loss: 481.1292724609375
2024-08-04T10:15:03.103358608Z 
 90%|████████▉ | 8545/9500 [29:17:33<3:17:37, 12.42s/it]08/04/2024 03:15:03 - INFO - __main__ -   Step: 8545, LR: 2.071241313585908e-06, Loss: 433.6153259277344
2024-08-04T10:15:15.684144018Z 
 90%|████████▉ | 8546/9500 [29:17:45<3:18:12, 12.47s/it]08/04/2024 03:15:15 - INFO - __main__ -   Step: 8546, LR: 2.069070769898629e-06, Loss: 390.16485595703125
2024-08-04T10:15:27.830203605Z 
 90%|████████▉ | 8547/9500 [29:17:57<3:16:28, 12.37s/it]08/04/2024 03:15:27 - INFO - __main__ -   Step: 8547, LR: 2.06690022621135e-06, Loss: 360.2139892578125
2024-08-04T10:15:40.043969819Z 
 90%|████████▉ | 8548/9500 [29:18:09<3:15:31, 12.32s/it]08/04/2024 03:15:40 - INFO - __main__ -   Step: 8548, LR: 2.0647296825240714e-06, Loss: 319.3678283691406
2024-08-04T10:15:53.110789586Z 
 90%|████████▉ | 8549/9500 [29:18:23<3:18:51, 12.55s/it]08/04/2024 03:15:53 - INFO - __main__ -   Step: 8549, LR: 2.062559138836792e-06, Loss: 338.34234619140625
2024-08-04T10:16:05.269509976Z 
 90%|█████████ | 8550/9500 [29:18:35<3:16:48, 12.43s/it]08/04/2024 03:16:05 - INFO - __main__ -   Step: 8550, LR: 2.060388595149513e-06, Loss: 454.443115234375
2024-08-04T10:16:17.334965987Z 
 90%|█████████ | 8551/9500 [29:18:47<3:14:52, 12.32s/it]08/04/2024 03:16:17 - INFO - __main__ -   Step: 8551, LR: 2.0582180514622343e-06, Loss: 366.9932556152344
2024-08-04T10:16:29.550928802Z 
 90%|█████████ | 8552/9500 [29:18:59<3:14:10, 12.29s/it]08/04/2024 03:16:29 - INFO - __main__ -   Step: 8552, LR: 2.0560475077749555e-06, Loss: 431.337158203125
2024-08-04T10:16:41.985927779Z 
 90%|█████████ | 8553/9500 [29:19:11<3:14:39, 12.33s/it]08/04/2024 03:16:41 - INFO - __main__ -   Step: 8553, LR: 2.0538769640876766e-06, Loss: 345.9450378417969
2024-08-04T10:16:54.183642683Z 
 90%|█████████ | 8554/9500 [29:19:24<3:13:48, 12.29s/it]08/04/2024 03:16:54 - INFO - __main__ -   Step: 8554, LR: 2.0517064204003973e-06, Loss: 464.0023498535156
2024-08-04T10:17:06.136349728Z 
 90%|█████████ | 8555/9500 [29:19:36<3:12:00, 12.19s/it]08/04/2024 03:17:06 - INFO - __main__ -   Step: 8555, LR: 2.049535876713119e-06, Loss: 396.5327453613281
2024-08-04T10:17:18.717493682Z 
 90%|█████████ | 8556/9500 [29:19:48<3:13:38, 12.31s/it]08/04/2024 03:17:18 - INFO - __main__ -   Step: 8556, LR: 2.0473653330258396e-06, Loss: 339.89794921875
2024-08-04T10:17:30.761792282Z 
 90%|█████████ | 8557/9500 [29:20:00<3:12:11, 12.23s/it]08/04/2024 03:17:30 - INFO - __main__ -   Step: 8557, LR: 2.045194789338561e-06, Loss: 359.62188720703125
2024-08-04T10:17:42.820973409Z 
 90%|█████████ | 8558/9500 [29:20:12<3:11:11, 12.18s/it]08/04/2024 03:17:42 - INFO - __main__ -   Step: 8558, LR: 2.043024245651282e-06, Loss: 394.42486572265625
2024-08-04T10:17:55.742704170Z 
 90%|█████████ | 8559/9500 [29:20:25<3:14:29, 12.40s/it]08/04/2024 03:17:55 - INFO - __main__ -   Step: 8559, LR: 2.040853701964003e-06, Loss: 330.09600830078125
2024-08-04T10:18:07.660430739Z 
 90%|█████████ | 8560/9500 [29:20:37<3:12:00, 12.26s/it]08/04/2024 03:18:07 - INFO - __main__ -   Step: 8560, LR: 2.038683158276724e-06, Loss: 364.55328369140625
2024-08-04T10:18:19.861410729Z 
 90%|█████████ | 8561/9500 [29:20:49<3:11:32, 12.24s/it]08/04/2024 03:18:19 - INFO - __main__ -   Step: 8561, LR: 2.0365126145894453e-06, Loss: 495.1378479003906
2024-08-04T10:18:32.419924847Z 
 90%|█████████ | 8562/9500 [29:21:02<3:12:50, 12.34s/it]08/04/2024 03:18:32 - INFO - __main__ -   Step: 8562, LR: 2.0343420709021664e-06, Loss: 467.2658386230469
2024-08-04T10:18:44.505662441Z 
 90%|█████████ | 8563/9500 [29:21:14<3:11:27, 12.26s/it]08/04/2024 03:18:44 - INFO - __main__ -   Step: 8563, LR: 2.032171527214887e-06, Loss: 393.1214599609375
2024-08-04T10:18:56.811589499Z 
 90%|█████████ | 8564/9500 [29:21:26<3:11:28, 12.27s/it]08/04/2024 03:18:56 - INFO - __main__ -   Step: 8564, LR: 2.0300009835276087e-06, Loss: 398.9569091796875
2024-08-04T10:19:09.666317295Z 
 90%|█████████ | 8565/9500 [29:21:39<3:13:59, 12.45s/it]08/04/2024 03:19:09 - INFO - __main__ -   Step: 8565, LR: 2.0278304398403294e-06, Loss: 401.5401611328125
2024-08-04T10:19:21.776739382Z 
 90%|█████████ | 8566/9500 [29:21:51<3:12:11, 12.35s/it]08/04/2024 03:19:21 - INFO - __main__ -   Step: 8566, LR: 2.025659896153051e-06, Loss: 338.32293701171875
2024-08-04T10:19:33.759102873Z 
 90%|█████████ | 8567/9500 [29:22:03<3:10:17, 12.24s/it]08/04/2024 03:19:33 - INFO - __main__ -   Step: 8567, LR: 2.0234893524657717e-06, Loss: 341.508544921875
2024-08-04T10:19:46.472872288Z 
 90%|█████████ | 8568/9500 [29:22:16<3:12:18, 12.38s/it]08/04/2024 03:19:46 - INFO - __main__ -   Step: 8568, LR: 2.021318808778493e-06, Loss: 331.9422607421875
2024-08-04T10:19:58.724061427Z 
 90%|█████████ | 8569/9500 [29:22:28<3:11:30, 12.34s/it]08/04/2024 03:19:58 - INFO - __main__ -   Step: 8569, LR: 2.019148265091214e-06, Loss: 434.1007995605469
2024-08-04T10:20:11.144833751Z 
 90%|█████████ | 8570/9500 [29:22:41<3:11:39, 12.37s/it]08/04/2024 03:20:11 - INFO - __main__ -   Step: 8570, LR: 2.016977721403935e-06, Loss: 489.12646484375
2024-08-04T10:20:23.473970493Z 
 90%|█████████ | 8571/9500 [29:22:53<3:11:17, 12.35s/it]08/04/2024 03:20:23 - INFO - __main__ -   Step: 8571, LR: 2.014807177716656e-06, Loss: 297.8453674316406
2024-08-04T10:20:35.391502627Z 
 90%|█████████ | 8572/9500 [29:23:05<3:09:03, 12.22s/it]08/04/2024 03:20:35 - INFO - __main__ -   Step: 8572, LR: 2.012636634029377e-06, Loss: 318.4122009277344
2024-08-04T10:20:47.611947437Z 
 90%|█████████ | 8573/9500 [29:23:17<3:08:50, 12.22s/it]08/04/2024 03:20:47 - INFO - __main__ -   Step: 8573, LR: 2.0104660903420985e-06, Loss: 576.5081787109375
2024-08-04T10:21:00.374383149Z 
 90%|█████████ | 8574/9500 [29:23:30<3:11:08, 12.38s/it]08/04/2024 03:21:00 - INFO - __main__ -   Step: 8574, LR: 2.008295546654819e-06, Loss: 429.050537109375
2024-08-04T10:21:12.809659402Z 
 90%|█████████ | 8575/9500 [29:23:42<3:11:09, 12.40s/it]08/04/2024 03:21:12 - INFO - __main__ -   Step: 8575, LR: 2.0061250029675403e-06, Loss: 457.0236511230469
2024-08-04T10:21:25.228834947Z 
 90%|█████████ | 8576/9500 [29:23:55<3:11:02, 12.41s/it]08/04/2024 03:21:25 - INFO - __main__ -   Step: 8576, LR: 2.0039544592802615e-06, Loss: 294.11016845703125
2024-08-04T10:21:38.118713535Z 
 90%|█████████ | 8577/9500 [29:24:08<3:13:04, 12.55s/it]08/04/2024 03:21:38 - INFO - __main__ -   Step: 8577, LR: 2.0017839155929826e-06, Loss: 462.9945983886719
2024-08-04T10:21:50.248301752Z 
 90%|█████████ | 8578/9500 [29:24:20<3:10:55, 12.42s/it]08/04/2024 03:21:50 - INFO - __main__ -   Step: 8578, LR: 1.9996133719057037e-06, Loss: 341.13116455078125
2024-08-04T10:22:02.271462543Z 
 90%|█████████ | 8579/9500 [29:24:32<3:08:52, 12.30s/it]08/04/2024 03:22:02 - INFO - __main__ -   Step: 8579, LR: 1.997442828218425e-06, Loss: 363.83349609375
2024-08-04T10:22:15.126502761Z 
 90%|█████████ | 8580/9500 [29:24:45<3:11:11, 12.47s/it]08/04/2024 03:22:15 - INFO - __main__ -   Step: 8580, LR: 1.9952722845311456e-06, Loss: 370.42510986328125
2024-08-04T10:22:27.155275810Z 
 90%|█████████ | 8581/9500 [29:24:57<3:08:57, 12.34s/it]08/04/2024 03:22:27 - INFO - __main__ -   Step: 8581, LR: 1.9931017408438667e-06, Loss: 338.7084655761719
2024-08-04T10:22:39.401605683Z 
 90%|█████████ | 8582/9500 [29:25:09<3:08:20, 12.31s/it]08/04/2024 03:22:39 - INFO - __main__ -   Step: 8582, LR: 1.990931197156588e-06, Loss: 406.73077392578125
2024-08-04T10:22:52.167704973Z 
 90%|█████████ | 8583/9500 [29:25:22<3:10:13, 12.45s/it]08/04/2024 03:22:52 - INFO - __main__ -   Step: 8583, LR: 1.988760653469309e-06, Loss: 410.03961181640625
2024-08-04T10:23:04.319256315Z 
 90%|█████████ | 8584/9500 [29:25:34<3:08:40, 12.36s/it]08/04/2024 03:23:04 - INFO - __main__ -   Step: 8584, LR: 1.98659010978203e-06, Loss: 407.3354797363281
2024-08-04T10:23:16.285795952Z 
 90%|█████████ | 8585/9500 [29:25:46<3:06:40, 12.24s/it]08/04/2024 03:23:16 - INFO - __main__ -   Step: 8585, LR: 1.9844195660947513e-06, Loss: 337.4508056640625
2024-08-04T10:23:28.765626512Z 
 90%|█████████ | 8586/9500 [29:25:58<3:07:33, 12.31s/it]08/04/2024 03:23:28 - INFO - __main__ -   Step: 8586, LR: 1.9822490224074724e-06, Loss: 326.6383056640625
2024-08-04T10:23:41.178765244Z 
 90%|█████████ | 8587/9500 [29:26:11<3:07:48, 12.34s/it]08/04/2024 03:23:41 - INFO - __main__ -   Step: 8587, LR: 1.980078478720193e-06, Loss: 468.57891845703125
2024-08-04T10:23:53.004020253Z 
 90%|█████████ | 8588/9500 [29:26:22<3:05:14, 12.19s/it]08/04/2024 03:23:53 - INFO - __main__ -   Step: 8588, LR: 1.9779079350329143e-06, Loss: 322.8122253417969
2024-08-04T10:24:05.458860083Z 
 90%|█████████ | 8589/9500 [29:26:35<3:06:15, 12.27s/it]08/04/2024 03:24:05 - INFO - __main__ -   Step: 8589, LR: 1.9757373913456354e-06, Loss: 378.41253662109375
2024-08-04T10:24:17.526350421Z 
 90%|█████████ | 8590/9500 [29:26:47<3:05:08, 12.21s/it]08/04/2024 03:24:17 - INFO - __main__ -   Step: 8590, LR: 1.9735668476583565e-06, Loss: 396.7902526855469
2024-08-04T10:24:29.553641773Z 
 90%|█████████ | 8591/9500 [29:26:59<3:04:07, 12.15s/it]08/04/2024 03:24:29 - INFO - __main__ -   Step: 8591, LR: 1.9713963039710777e-06, Loss: 409.6038818359375
2024-08-04T10:24:42.210294801Z 
 90%|█████████ | 8592/9500 [29:27:12<3:06:12, 12.30s/it]08/04/2024 03:24:42 - INFO - __main__ -   Step: 8592, LR: 1.969225760283799e-06, Loss: 412.0254821777344
2024-08-04T10:24:54.423472229Z 
 90%|█████████ | 8593/9500 [29:27:24<3:05:35, 12.28s/it]08/04/2024 03:24:54 - INFO - __main__ -   Step: 8593, LR: 1.96705521659652e-06, Loss: 534.9182739257812
2024-08-04T10:25:07.122227978Z 
 90%|█████████ | 8594/9500 [29:27:37<3:07:17, 12.40s/it]08/04/2024 03:25:07 - INFO - __main__ -   Step: 8594, LR: 1.9648846729092406e-06, Loss: 528.9760131835938
2024-08-04T10:25:19.194832743Z 
 90%|█████████ | 8595/9500 [29:27:49<3:05:35, 12.30s/it]08/04/2024 03:25:19 - INFO - __main__ -   Step: 8595, LR: 1.962714129221962e-06, Loss: 394.2900390625
2024-08-04T10:25:31.856289084Z 
 90%|█████████ | 8596/9500 [29:28:01<3:07:00, 12.41s/it]08/04/2024 03:25:31 - INFO - __main__ -   Step: 8596, LR: 1.960543585534683e-06, Loss: 357.6754150390625
2024-08-04T10:25:43.984000662Z 
 90%|█████████ | 8597/9500 [29:28:13<3:05:30, 12.33s/it]08/04/2024 03:25:43 - INFO - __main__ -   Step: 8597, LR: 1.958373041847404e-06, Loss: 454.15277099609375
2024-08-04T10:25:56.230984458Z 
 91%|█████████ | 8598/9500 [29:28:26<3:04:56, 12.30s/it]08/04/2024 03:25:56 - INFO - __main__ -   Step: 8598, LR: 1.956202498160125e-06, Loss: 479.1070556640625
2024-08-04T10:26:08.585025053Z 
 91%|█████████ | 8599/9500 [29:28:38<3:04:58, 12.32s/it]08/04/2024 03:26:08 - INFO - __main__ -   Step: 8599, LR: 1.9540319544728463e-06, Loss: 343.96826171875
2024-08-04T10:26:20.555921041Z 
 91%|█████████ | 8600/9500 [29:28:50<3:03:12, 12.21s/it]08/04/2024 03:26:20 - INFO - __main__ -   Step: 8600, LR: 1.9518614107855675e-06, Loss: 348.4745788574219
2024-08-04T10:26:32.756927861Z 
 91%|█████████ | 8601/9500 [29:29:02<3:02:56, 12.21s/it]08/04/2024 03:26:32 - INFO - __main__ -   Step: 8601, LR: 1.949690867098288e-06, Loss: 340.3187561035156
2024-08-04T10:26:45.786300461Z 
 91%|█████████ | 8602/9500 [29:29:15<3:06:25, 12.46s/it]08/04/2024 03:26:45 - INFO - __main__ -   Step: 8602, LR: 1.9475203234110097e-06, Loss: 489.61126708984375
2024-08-04T10:26:57.899458319Z 
 91%|█████████ | 8603/9500 [29:29:27<3:04:40, 12.35s/it]08/04/2024 03:26:57 - INFO - __main__ -   Step: 8603, LR: 1.9453497797237304e-06, Loss: 396.4620361328125
2024-08-04T10:27:10.053035122Z 
 91%|█████████ | 8604/9500 [29:29:39<3:03:34, 12.29s/it]08/04/2024 03:27:10 - INFO - __main__ -   Step: 8604, LR: 1.943179236036452e-06, Loss: 388.4605407714844
2024-08-04T10:27:22.844256998Z 
 91%|█████████ | 8605/9500 [29:29:52<3:05:36, 12.44s/it]08/04/2024 03:27:22 - INFO - __main__ -   Step: 8605, LR: 1.9410086923491727e-06, Loss: 429.31640625
2024-08-04T10:27:35.009929124Z 
 91%|█████████ | 8606/9500 [29:30:04<3:04:09, 12.36s/it]08/04/2024 03:27:35 - INFO - __main__ -   Step: 8606, LR: 1.938838148661894e-06, Loss: 356.87286376953125
2024-08-04T10:27:46.888718748Z 
 91%|█████████ | 8607/9500 [29:30:16<3:01:48, 12.22s/it]08/04/2024 03:27:46 - INFO - __main__ -   Step: 8607, LR: 1.936667604974615e-06, Loss: 426.68896484375
2024-08-04T10:28:00.014532747Z 
 91%|█████████ | 8608/9500 [29:30:29<3:05:39, 12.49s/it]08/04/2024 03:28:00 - INFO - __main__ -   Step: 8608, LR: 1.934497061287336e-06, Loss: 536.1744995117188
2024-08-04T10:28:12.009601284Z 
 91%|█████████ | 8609/9500 [29:30:41<3:03:15, 12.34s/it]08/04/2024 03:28:12 - INFO - __main__ -   Step: 8609, LR: 1.9323265176000573e-06, Loss: 422.76171875
2024-08-04T10:28:24.078418429Z 
 91%|█████████ | 8610/9500 [29:30:54<3:01:50, 12.26s/it]08/04/2024 03:28:24 - INFO - __main__ -   Step: 8610, LR: 1.930155973912778e-06, Loss: 351.88153076171875
2024-08-04T10:28:36.702770528Z 
 91%|█████████ | 8611/9500 [29:31:06<3:03:15, 12.37s/it]08/04/2024 03:28:36 - INFO - __main__ -   Step: 8611, LR: 1.9279854302254995e-06, Loss: 476.4058532714844
2024-08-04T10:28:48.995423107Z 
 91%|█████████ | 8612/9500 [29:31:18<3:02:43, 12.35s/it]08/04/2024 03:28:48 - INFO - __main__ -   Step: 8612, LR: 1.9258148865382202e-06, Loss: 367.47454833984375
2024-08-04T10:29:01.444210692Z 
 91%|█████████ | 8613/9500 [29:31:31<3:02:58, 12.38s/it]08/04/2024 03:29:01 - INFO - __main__ -   Step: 8613, LR: 1.9236443428509414e-06, Loss: 465.0992431640625
2024-08-04T10:29:13.980623396Z 
 91%|█████████ | 8614/9500 [29:31:43<3:03:28, 12.42s/it]08/04/2024 03:29:13 - INFO - __main__ -   Step: 8614, LR: 1.9214737991636625e-06, Loss: 329.0566711425781
2024-08-04T10:29:26.121219194Z 
 91%|█████████ | 8615/9500 [29:31:56<3:02:00, 12.34s/it]08/04/2024 03:29:26 - INFO - __main__ -   Step: 8615, LR: 1.9193032554763837e-06, Loss: 423.65643310546875
2024-08-04T10:29:38.079372575Z 
 91%|█████████ | 8616/9500 [29:32:08<3:00:06, 12.23s/it]08/04/2024 03:29:38 - INFO - __main__ -   Step: 8616, LR: 1.917132711789105e-06, Loss: 343.2427673339844
2024-08-04T10:29:50.558749267Z 
 91%|█████████ | 8617/9500 [29:32:20<3:01:02, 12.30s/it]08/04/2024 03:29:50 - INFO - __main__ -   Step: 8617, LR: 1.914962168101826e-06, Loss: 356.97772216796875
2024-08-04T10:30:02.672971288Z 
 91%|█████████ | 8618/9500 [29:32:32<3:00:00, 12.25s/it]08/04/2024 03:30:02 - INFO - __main__ -   Step: 8618, LR: 1.912791624414547e-06, Loss: 380.1670227050781
2024-08-04T10:30:14.730630644Z 
 91%|█████████ | 8619/9500 [29:32:44<2:58:58, 12.19s/it]08/04/2024 03:30:14 - INFO - __main__ -   Step: 8619, LR: 1.9106210807272678e-06, Loss: 388.7370910644531
2024-08-04T10:30:27.219477068Z 
 91%|█████████ | 8620/9500 [29:32:57<3:00:05, 12.28s/it]08/04/2024 03:30:27 - INFO - __main__ -   Step: 8620, LR: 1.9084505370399893e-06, Loss: 373.75103759765625
2024-08-04T10:30:39.409639955Z 
 91%|█████████ | 8621/9500 [29:33:09<2:59:29, 12.25s/it]08/04/2024 03:30:39 - INFO - __main__ -   Step: 8621, LR: 1.90627999335271e-06, Loss: 347.7017822265625
2024-08-04T10:30:51.639128665Z 
 91%|█████████ | 8622/9500 [29:33:21<2:59:11, 12.25s/it]08/04/2024 03:30:51 - INFO - __main__ -   Step: 8622, LR: 1.9041094496654312e-06, Loss: 401.17864990234375
2024-08-04T10:31:04.107662632Z 
 91%|█████████ | 8623/9500 [29:33:34<2:59:57, 12.31s/it]08/04/2024 03:31:04 - INFO - __main__ -   Step: 8623, LR: 1.9019389059781523e-06, Loss: 401.6959228515625
2024-08-04T10:31:16.415688642Z 
 91%|█████████ | 8624/9500 [29:33:46<2:59:44, 12.31s/it]08/04/2024 03:31:16 - INFO - __main__ -   Step: 8624, LR: 1.8997683622908732e-06, Loss: 426.3031005859375
2024-08-04T10:31:28.746678334Z 
 91%|█████████ | 8625/9500 [29:33:58<2:59:37, 12.32s/it]08/04/2024 03:31:28 - INFO - __main__ -   Step: 8625, LR: 1.8975978186035946e-06, Loss: 408.2300109863281
2024-08-04T10:31:41.413758533Z 
 91%|█████████ | 8626/9500 [29:34:11<3:00:56, 12.42s/it]08/04/2024 03:31:41 - INFO - __main__ -   Step: 8626, LR: 1.8954272749163155e-06, Loss: 283.33160400390625
2024-08-04T10:31:53.514211153Z 
 91%|█████████ | 8627/9500 [29:34:23<2:59:20, 12.33s/it]08/04/2024 03:31:53 - INFO - __main__ -   Step: 8627, LR: 1.8932567312290364e-06, Loss: 452.35406494140625
2024-08-04T10:32:05.690714474Z 
 91%|█████████ | 8628/9500 [29:34:35<2:58:28, 12.28s/it]08/04/2024 03:32:05 - INFO - __main__ -   Step: 8628, LR: 1.8910861875417578e-06, Loss: 338.52520751953125
2024-08-04T10:32:18.077212712Z 
 91%|█████████ | 8629/9500 [29:34:48<2:58:44, 12.31s/it]08/04/2024 03:32:18 - INFO - __main__ -   Step: 8629, LR: 1.8889156438544787e-06, Loss: 370.36822509765625
2024-08-04T10:32:30.339296913Z 
 91%|█████████ | 8630/9500 [29:35:00<2:58:18, 12.30s/it]08/04/2024 03:32:30 - INFO - __main__ -   Step: 8630, LR: 1.8867451001671998e-06, Loss: 364.85821533203125
2024-08-04T10:32:42.414269987Z 
 91%|█████████ | 8631/9500 [29:35:12<2:57:08, 12.23s/it]08/04/2024 03:32:42 - INFO - __main__ -   Step: 8631, LR: 1.884574556479921e-06, Loss: 324.78594970703125
2024-08-04T10:32:55.102799227Z 
 91%|█████████ | 8632/9500 [29:35:25<2:58:55, 12.37s/it]08/04/2024 03:32:55 - INFO - __main__ -   Step: 8632, LR: 1.8824040127926421e-06, Loss: 392.6874694824219
2024-08-04T10:33:07.357471483Z 
 91%|█████████ | 8633/9500 [29:35:37<2:58:13, 12.33s/it]08/04/2024 03:33:07 - INFO - __main__ -   Step: 8633, LR: 1.880233469105363e-06, Loss: 316.85113525390625
2024-08-04T10:33:19.465811874Z 
 91%|█████████ | 8634/9500 [29:35:49<2:57:02, 12.27s/it]08/04/2024 03:33:19 - INFO - __main__ -   Step: 8634, LR: 1.878062925418084e-06, Loss: 325.6207275390625
2024-08-04T10:33:31.829924216Z 
 91%|█████████ | 8635/9500 [29:36:01<2:57:15, 12.30s/it]08/04/2024 03:33:31 - INFO - __main__ -   Step: 8635, LR: 1.8758923817308053e-06, Loss: 456.6246032714844
2024-08-04T10:33:44.017332781Z 
 91%|█████████ | 8636/9500 [29:36:13<2:56:35, 12.26s/it]08/04/2024 03:33:44 - INFO - __main__ -   Step: 8636, LR: 1.8737218380435262e-06, Loss: 371.51458740234375
2024-08-04T10:33:56.103202664Z 
 91%|█████████ | 8637/9500 [29:36:26<2:55:37, 12.21s/it]08/04/2024 03:33:56 - INFO - __main__ -   Step: 8637, LR: 1.8715512943562476e-06, Loss: 473.1524963378906
2024-08-04T10:34:08.049953647Z 
 91%|█████████ | 8638/9500 [29:36:37<2:54:17, 12.13s/it]08/04/2024 03:34:08 - INFO - __main__ -   Step: 8638, LR: 1.8693807506689685e-06, Loss: 317.24798583984375
2024-08-04T10:34:20.624726538Z 
 91%|█████████ | 8639/9500 [29:36:50<2:55:59, 12.26s/it]08/04/2024 03:34:20 - INFO - __main__ -   Step: 8639, LR: 1.8672102069816896e-06, Loss: 373.55731201171875
2024-08-04T10:34:32.736232161Z 
 91%|█████████ | 8640/9500 [29:37:02<2:55:07, 12.22s/it]08/04/2024 03:34:32 - INFO - __main__ -   Step: 8640, LR: 1.8650396632944106e-06, Loss: 437.0618896484375
2024-08-04T10:34:44.694443365Z 
 91%|█████████ | 8641/9500 [29:37:14<2:53:48, 12.14s/it]08/04/2024 03:34:44 - INFO - __main__ -   Step: 8641, LR: 1.8628691196071317e-06, Loss: 301.94879150390625
2024-08-04T10:34:57.289310114Z 
 91%|█████████ | 8642/9500 [29:37:27<2:55:33, 12.28s/it]08/04/2024 03:34:57 - INFO - __main__ -   Step: 8642, LR: 1.8606985759198528e-06, Loss: 337.755126953125
2024-08-04T10:35:09.406229229Z 
 91%|█████████ | 8643/9500 [29:37:39<2:54:40, 12.23s/it]08/04/2024 03:35:09 - INFO - __main__ -   Step: 8643, LR: 1.8585280322325738e-06, Loss: 336.66888427734375
2024-08-04T10:35:21.494486643Z 
 91%|█████████ | 8644/9500 [29:37:51<2:53:51, 12.19s/it]08/04/2024 03:35:21 - INFO - __main__ -   Step: 8644, LR: 1.8563574885452951e-06, Loss: 310.464599609375
2024-08-04T10:35:34.056382486Z 
 91%|█████████ | 8645/9500 [29:38:03<2:55:15, 12.30s/it]08/04/2024 03:35:34 - INFO - __main__ -   Step: 8645, LR: 1.854186944858016e-06, Loss: 409.3334045410156
2024-08-04T10:35:46.235695082Z 
 91%|█████████ | 8646/9500 [29:38:16<2:54:32, 12.26s/it]08/04/2024 03:35:46 - INFO - __main__ -   Step: 8646, LR: 1.8520164011707372e-06, Loss: 355.2110595703125
2024-08-04T10:35:58.308984200Z 
 91%|█████████ | 8647/9500 [29:38:28<2:53:32, 12.21s/it]08/04/2024 03:35:58 - INFO - __main__ -   Step: 8647, LR: 1.8498458574834583e-06, Loss: 501.33837890625
2024-08-04T10:36:11.015385329Z 
 91%|█████████ | 8648/9500 [29:38:40<2:55:27, 12.36s/it]08/04/2024 03:36:11 - INFO - __main__ -   Step: 8648, LR: 1.8476753137961792e-06, Loss: 382.30010986328125
2024-08-04T10:36:23.165926133Z 
 91%|█████████ | 8649/9500 [29:38:53<2:54:22, 12.29s/it]08/04/2024 03:36:23 - INFO - __main__ -   Step: 8649, LR: 1.8455047701089004e-06, Loss: 394.73455810546875
2024-08-04T10:36:35.219709236Z 
 91%|█████████ | 8650/9500 [29:39:05<2:53:08, 12.22s/it]08/04/2024 03:36:35 - INFO - __main__ -   Step: 8650, LR: 1.8433342264216215e-06, Loss: 370.7774658203125
2024-08-04T10:36:47.834419929Z 
 91%|█████████ | 8651/9500 [29:39:17<2:54:36, 12.34s/it]08/04/2024 03:36:47 - INFO - __main__ -   Step: 8651, LR: 1.8411636827343426e-06, Loss: 391.71087646484375
2024-08-04T10:37:00.144819743Z 
 91%|█████████ | 8652/9500 [29:39:30<2:54:16, 12.33s/it]08/04/2024 03:37:00 - INFO - __main__ -   Step: 8652, LR: 1.8389931390470636e-06, Loss: 406.7747802734375
2024-08-04T10:37:12.417692360Z 
 91%|█████████ | 8653/9500 [29:39:42<2:53:49, 12.31s/it]08/04/2024 03:37:12 - INFO - __main__ -   Step: 8653, LR: 1.836822595359785e-06, Loss: 439.8973083496094
2024-08-04T10:37:24.924897748Z 
 91%|█████████ | 8654/9500 [29:39:54<2:54:26, 12.37s/it]08/04/2024 03:37:24 - INFO - __main__ -   Step: 8654, LR: 1.8346520516725058e-06, Loss: 507.5001220703125
2024-08-04T10:37:37.017930690Z 
 91%|█████████ | 8655/9500 [29:40:06<2:53:03, 12.29s/it]08/04/2024 03:37:37 - INFO - __main__ -   Step: 8655, LR: 1.8324815079852268e-06, Loss: 343.8148498535156
2024-08-04T10:37:49.065609581Z 
 91%|█████████ | 8656/9500 [29:40:19<2:51:50, 12.22s/it]08/04/2024 03:37:49 - INFO - __main__ -   Step: 8656, LR: 1.8303109642979481e-06, Loss: 497.7491149902344
2024-08-04T10:38:02.197918335Z 
 91%|█████████ | 8657/9500 [29:40:32<2:55:29, 12.49s/it]08/04/2024 03:38:02 - INFO - __main__ -   Step: 8657, LR: 1.828140420610669e-06, Loss: 405.74688720703125
2024-08-04T10:38:14.643479488Z 
 91%|█████████ | 8658/9500 [29:40:44<2:55:05, 12.48s/it]08/04/2024 03:38:14 - INFO - __main__ -   Step: 8658, LR: 1.8259698769233902e-06, Loss: 439.7037353515625
2024-08-04T10:38:26.669091842Z 
 91%|█████████ | 8659/9500 [29:40:56<2:52:59, 12.34s/it]08/04/2024 03:38:26 - INFO - __main__ -   Step: 8659, LR: 1.823799333236111e-06, Loss: 377.83782958984375
2024-08-04T10:38:39.635692579Z 
 91%|█████████ | 8660/9500 [29:41:09<2:55:24, 12.53s/it]08/04/2024 03:38:39 - INFO - __main__ -   Step: 8660, LR: 1.8216287895488324e-06, Loss: 391.81732177734375
2024-08-04T10:38:51.894171975Z 
 91%|█████████ | 8661/9500 [29:41:21<2:54:03, 12.45s/it]08/04/2024 03:38:51 - INFO - __main__ -   Step: 8661, LR: 1.8194582458615534e-06, Loss: 395.8135986328125
2024-08-04T10:39:04.018754056Z 
 91%|█████████ | 8662/9500 [29:41:33<2:52:30, 12.35s/it]08/04/2024 03:39:04 - INFO - __main__ -   Step: 8662, LR: 1.8172877021742743e-06, Loss: 418.9926452636719
2024-08-04T10:39:16.744442127Z 
 91%|█████████ | 8663/9500 [29:41:46<2:53:51, 12.46s/it]08/04/2024 03:39:16 - INFO - __main__ -   Step: 8663, LR: 1.8151171584869956e-06, Loss: 400.02294921875
2024-08-04T10:39:29.209690088Z 
 91%|█████████ | 8664/9500 [29:41:59<2:53:39, 12.46s/it]08/04/2024 03:39:29 - INFO - __main__ -   Step: 8664, LR: 1.8129466147997166e-06, Loss: 362.5043640136719
2024-08-04T10:39:41.461440721Z 
 91%|█████████ | 8665/9500 [29:42:11<2:52:34, 12.40s/it]08/04/2024 03:39:41 - INFO - __main__ -   Step: 8665, LR: 1.8107760711124377e-06, Loss: 437.29180908203125
2024-08-04T10:39:54.228919417Z 
 91%|█████████ | 8666/9500 [29:42:24<2:53:53, 12.51s/it]08/04/2024 03:39:54 - INFO - __main__ -   Step: 8666, LR: 1.8086055274251588e-06, Loss: 471.8999328613281
2024-08-04T10:40:06.391724926Z 
 91%|█████████ | 8667/9500 [29:42:36<2:52:14, 12.41s/it]08/04/2024 03:40:06 - INFO - __main__ -   Step: 8667, LR: 1.8064349837378798e-06, Loss: 461.4734191894531
2024-08-04T10:40:18.525750676Z 
 91%|█████████ | 8668/9500 [29:42:48<2:50:54, 12.32s/it]08/04/2024 03:40:18 - INFO - __main__ -   Step: 8668, LR: 1.804264440050601e-06, Loss: 271.0987548828125
2024-08-04T10:40:31.166528237Z 
 91%|█████████▏| 8669/9500 [29:43:01<2:52:00, 12.42s/it]08/04/2024 03:40:31 - INFO - __main__ -   Step: 8669, LR: 1.802093896363322e-06, Loss: 453.446533203125
2024-08-04T10:40:43.548565457Z 
 91%|█████████▏| 8670/9500 [29:43:13<2:51:38, 12.41s/it]08/04/2024 03:40:43 - INFO - __main__ -   Step: 8670, LR: 1.7999233526760432e-06, Loss: 407.205322265625
2024-08-04T10:40:55.679588075Z 
 91%|█████████▏| 8671/9500 [29:43:25<2:50:17, 12.33s/it]08/04/2024 03:40:55 - INFO - __main__ -   Step: 8671, LR: 1.797752808988764e-06, Loss: 464.2296142578125
2024-08-04T10:41:08.212415670Z 
 91%|█████████▏| 8672/9500 [29:43:38<2:50:56, 12.39s/it]08/04/2024 03:41:08 - INFO - __main__ -   Step: 8672, LR: 1.7955822653014854e-06, Loss: 497.2946472167969
2024-08-04T10:41:20.408723212Z 
 91%|█████████▏| 8673/9500 [29:43:50<2:49:56, 12.33s/it]08/04/2024 03:41:20 - INFO - __main__ -   Step: 8673, LR: 1.7934117216142064e-06, Loss: 372.1278381347656
2024-08-04T10:41:32.568182419Z 
 91%|█████████▏| 8674/9500 [29:44:02<2:49:02, 12.28s/it]08/04/2024 03:41:32 - INFO - __main__ -   Step: 8674, LR: 1.7912411779269273e-06, Loss: 379.99615478515625
2024-08-04T10:41:45.380140344Z 
 91%|█████████▏| 8675/9500 [29:44:15<2:51:02, 12.44s/it]08/04/2024 03:41:45 - INFO - __main__ -   Step: 8675, LR: 1.7890706342396486e-06, Loss: 454.9067687988281
2024-08-04T10:41:57.651377491Z 
 91%|█████████▏| 8676/9500 [29:44:27<2:50:07, 12.39s/it]08/04/2024 03:41:57 - INFO - __main__ -   Step: 8676, LR: 1.7869000905523696e-06, Loss: 406.31378173828125
2024-08-04T10:42:09.727963899Z 
 91%|█████████▏| 8677/9500 [29:44:39<2:48:38, 12.30s/it]08/04/2024 03:42:09 - INFO - __main__ -   Step: 8677, LR: 1.7847295468650907e-06, Loss: 380.0799560546875
2024-08-04T10:42:22.152104032Z 
 91%|█████████▏| 8678/9500 [29:44:52<2:48:58, 12.33s/it]08/04/2024 03:42:22 - INFO - __main__ -   Step: 8678, LR: 1.7825590031778116e-06, Loss: 358.5305480957031
2024-08-04T10:42:34.463282001Z 
 91%|█████████▏| 8679/9500 [29:45:04<2:48:40, 12.33s/it]08/04/2024 03:42:34 - INFO - __main__ -   Step: 8679, LR: 1.780388459490533e-06, Loss: 318.44866943359375
2024-08-04T10:42:46.539785695Z 
 91%|█████████▏| 8680/9500 [29:45:16<2:47:26, 12.25s/it]08/04/2024 03:42:46 - INFO - __main__ -   Step: 8680, LR: 1.778217915803254e-06, Loss: 378.1468200683594
2024-08-04T10:42:58.565172991Z 
 91%|█████████▏| 8681/9500 [29:45:28<2:46:18, 12.18s/it]08/04/2024 03:42:58 - INFO - __main__ -   Step: 8681, LR: 1.7760473721159748e-06, Loss: 415.7490234375
2024-08-04T10:43:11.252969702Z 
 91%|█████████▏| 8682/9500 [29:45:41<2:48:10, 12.34s/it]08/04/2024 03:43:11 - INFO - __main__ -   Step: 8682, LR: 1.7738768284286962e-06, Loss: 460.819091796875
2024-08-04T10:43:23.351193586Z 
 91%|█████████▏| 8683/9500 [29:45:53<2:46:59, 12.26s/it]08/04/2024 03:43:23 - INFO - __main__ -   Step: 8683, LR: 1.771706284741417e-06, Loss: 417.7357482910156
2024-08-04T10:43:35.320299203Z 
 91%|█████████▏| 8684/9500 [29:46:05<2:45:35, 12.18s/it]08/04/2024 03:43:35 - INFO - __main__ -   Step: 8684, LR: 1.7695357410541384e-06, Loss: 373.8135070800781
2024-08-04T10:43:47.683025013Z 
 91%|█████████▏| 8685/9500 [29:46:17<2:46:08, 12.23s/it]08/04/2024 03:43:47 - INFO - __main__ -   Step: 8685, LR: 1.7673651973668594e-06, Loss: 357.8834228515625
2024-08-04T10:43:59.978354658Z 
 91%|█████████▏| 8686/9500 [29:46:29<2:46:12, 12.25s/it]08/04/2024 03:43:59 - INFO - __main__ -   Step: 8686, LR: 1.7651946536795805e-06, Loss: 427.48809814453125
2024-08-04T10:44:12.123750356Z 
 91%|█████████▏| 8687/9500 [29:46:42<2:45:34, 12.22s/it]08/04/2024 03:44:12 - INFO - __main__ -   Step: 8687, LR: 1.7630241099923014e-06, Loss: 396.3686218261719
2024-08-04T10:44:25.287633836Z 
 91%|█████████▏| 8688/9500 [29:46:55<2:49:12, 12.50s/it]08/04/2024 03:44:25 - INFO - __main__ -   Step: 8688, LR: 1.7608535663050226e-06, Loss: 394.76171875
2024-08-04T10:44:37.513928389Z 
 91%|█████████▏| 8689/9500 [29:47:07<2:47:52, 12.42s/it]08/04/2024 03:44:37 - INFO - __main__ -   Step: 8689, LR: 1.7586830226177437e-06, Loss: 311.8475341796875
2024-08-04T10:44:49.648905502Z 
 91%|█████████▏| 8690/9500 [29:47:19<2:46:30, 12.33s/it]08/04/2024 03:44:49 - INFO - __main__ -   Step: 8690, LR: 1.7565124789304646e-06, Loss: 416.0321350097656
2024-08-04T10:45:02.220025971Z 
 91%|█████████▏| 8691/9500 [29:47:32<2:47:15, 12.41s/it]08/04/2024 03:45:02 - INFO - __main__ -   Step: 8691, LR: 1.754341935243186e-06, Loss: 487.6261291503906
2024-08-04T10:45:14.284962734Z 
 91%|█████████▏| 8692/9500 [29:47:44<2:45:41, 12.30s/it]08/04/2024 03:45:14 - INFO - __main__ -   Step: 8692, LR: 1.7521713915559069e-06, Loss: 435.4024353027344
2024-08-04T10:45:26.297431002Z 
 92%|█████████▏| 8693/9500 [29:47:56<2:44:18, 12.22s/it]08/04/2024 03:45:26 - INFO - __main__ -   Step: 8693, LR: 1.750000847868628e-06, Loss: 381.10540771484375
2024-08-04T10:45:39.118360651Z 
 92%|█████████▏| 8694/9500 [29:48:09<2:46:32, 12.40s/it]08/04/2024 03:45:39 - INFO - __main__ -   Step: 8694, LR: 1.7478303041813492e-06, Loss: 427.90869140625
2024-08-04T10:45:50.884430485Z 
 92%|█████████▏| 8695/9500 [29:48:20<2:43:46, 12.21s/it]08/04/2024 03:45:50 - INFO - __main__ -   Step: 8695, LR: 1.74565976049407e-06, Loss: 375.4808349609375
2024-08-04T10:46:02.755967597Z 
 92%|█████████▏| 8696/9500 [29:48:32<2:42:14, 12.11s/it]08/04/2024 03:46:02 - INFO - __main__ -   Step: 8696, LR: 1.7434892168067912e-06, Loss: 332.3098449707031
2024-08-04T10:46:15.350450562Z 
 92%|█████████▏| 8697/9500 [29:48:45<2:43:59, 12.25s/it]08/04/2024 03:46:15 - INFO - __main__ -   Step: 8697, LR: 1.7413186731195121e-06, Loss: 379.5954284667969
2024-08-04T10:46:27.492699200Z 
 92%|█████████▏| 8698/9500 [29:48:57<2:43:20, 12.22s/it]08/04/2024 03:46:27 - INFO - __main__ -   Step: 8698, LR: 1.7391481294322335e-06, Loss: 412.1134338378906
2024-08-04T10:46:39.986703028Z 
 92%|█████████▏| 8699/9500 [29:49:09<2:44:14, 12.30s/it]08/04/2024 03:46:39 - INFO - __main__ -   Step: 8699, LR: 1.7369775857449544e-06, Loss: 490.7216796875
2024-08-04T10:46:52.452280210Z 
 92%|█████████▏| 8700/9500 [29:49:22<2:44:41, 12.35s/it]08/04/2024 03:46:52 - INFO - __main__ -   Step: 8700, LR: 1.7348070420576758e-06, Loss: 449.6427001953125
2024-08-04T10:47:04.883827256Z 
 92%|█████████▏| 8701/9500 [29:49:34<2:44:47, 12.38s/it]08/04/2024 03:47:04 - INFO - __main__ -   Step: 8701, LR: 1.7326364983703967e-06, Loss: 391.0238952636719
2024-08-04T10:47:16.992310515Z 
 92%|█████████▏| 8702/9500 [29:49:46<2:43:31, 12.30s/it]08/04/2024 03:47:16 - INFO - __main__ -   Step: 8702, LR: 1.7304659546831176e-06, Loss: 412.92864990234375
2024-08-04T10:47:29.685708753Z 
 92%|█████████▏| 8703/9500 [29:49:59<2:44:54, 12.41s/it]08/04/2024 03:47:29 - INFO - __main__ -   Step: 8703, LR: 1.728295410995839e-06, Loss: 399.980712890625
2024-08-04T10:47:42.001588092Z 
 92%|█████████▏| 8704/9500 [29:50:11<2:44:18, 12.38s/it]08/04/2024 03:47:42 - INFO - __main__ -   Step: 8704, LR: 1.7261248673085599e-06, Loss: 428.61431884765625
2024-08-04T10:47:54.259263204Z 
 92%|█████████▏| 8705/9500 [29:50:24<2:43:35, 12.35s/it]08/04/2024 03:47:54 - INFO - __main__ -   Step: 8705, LR: 1.723954323621281e-06, Loss: 386.63848876953125
2024-08-04T10:48:06.822618495Z 
 92%|█████████▏| 8706/9500 [29:50:36<2:44:14, 12.41s/it]08/04/2024 03:48:06 - INFO - __main__ -   Step: 8706, LR: 1.721783779934002e-06, Loss: 388.1731262207031
2024-08-04T10:48:19.219066697Z 
 92%|█████████▏| 8707/9500 [29:50:49<2:43:58, 12.41s/it]08/04/2024 03:48:19 - INFO - __main__ -   Step: 8707, LR: 1.719613236246723e-06, Loss: 412.10882568359375
2024-08-04T10:48:31.326508728Z 
 92%|█████████▏| 8708/9500 [29:51:01<2:42:35, 12.32s/it]08/04/2024 03:48:31 - INFO - __main__ -   Step: 8708, LR: 1.7174426925594442e-06, Loss: 364.60186767578125
2024-08-04T10:48:43.793252766Z 
 92%|█████████▏| 8709/9500 [29:51:13<2:42:58, 12.36s/it]08/04/2024 03:48:43 - INFO - __main__ -   Step: 8709, LR: 1.7152721488721651e-06, Loss: 377.548828125
2024-08-04T10:48:56.469502763Z 
 92%|█████████▏| 8710/9500 [29:51:26<2:44:00, 12.46s/it]08/04/2024 03:48:56 - INFO - __main__ -   Step: 8710, LR: 1.7131016051848865e-06, Loss: 377.152099609375
2024-08-04T10:49:08.753090858Z 
 92%|█████████▏| 8711/9500 [29:51:38<2:43:07, 12.40s/it]08/04/2024 03:49:08 - INFO - __main__ -   Step: 8711, LR: 1.7109310614976074e-06, Loss: 330.334228515625
2024-08-04T10:49:21.454021710Z 
 92%|█████████▏| 8712/9500 [29:51:51<2:44:04, 12.49s/it]08/04/2024 03:49:21 - INFO - __main__ -   Step: 8712, LR: 1.7087605178103286e-06, Loss: 457.03326416015625
2024-08-04T10:49:33.770969995Z 
 92%|█████████▏| 8713/9500 [29:52:03<2:43:10, 12.44s/it]08/04/2024 03:49:33 - INFO - __main__ -   Step: 8713, LR: 1.7065899741230497e-06, Loss: 427.8783874511719
2024-08-04T10:49:45.939894619Z 
 92%|█████████▏| 8714/9500 [29:52:15<2:41:54, 12.36s/it]08/04/2024 03:49:45 - INFO - __main__ -   Step: 8714, LR: 1.7044194304357706e-06, Loss: 389.4095458984375
2024-08-04T10:49:58.387657444Z 
 92%|█████████▏| 8715/9500 [29:52:28<2:42:02, 12.39s/it]08/04/2024 03:49:58 - INFO - __main__ -   Step: 8715, LR: 1.7022488867484917e-06, Loss: 389.9342346191406
2024-08-04T10:50:10.763989882Z 
 92%|█████████▏| 8716/9500 [29:52:40<2:41:48, 12.38s/it]08/04/2024 03:50:10 - INFO - __main__ -   Step: 8716, LR: 1.7000783430612127e-06, Loss: 384.5152893066406
2024-08-04T10:50:22.841337499Z 
 92%|█████████▏| 8717/9500 [29:52:52<2:40:24, 12.29s/it]08/04/2024 03:50:22 - INFO - __main__ -   Step: 8717, LR: 1.697907799373934e-06, Loss: 378.33111572265625
2024-08-04T10:50:35.385461602Z 
 92%|█████████▏| 8718/9500 [29:53:05<2:41:11, 12.37s/it]08/04/2024 03:50:35 - INFO - __main__ -   Step: 8718, LR: 1.695737255686655e-06, Loss: 375.84197998046875
2024-08-04T10:50:47.623316734Z 
 92%|█████████▏| 8719/9500 [29:53:17<2:40:28, 12.33s/it]08/04/2024 03:50:47 - INFO - __main__ -   Step: 8719, LR: 1.6935667119993763e-06, Loss: 404.97454833984375
2024-08-04T10:50:59.771818115Z 
 92%|█████████▏| 8720/9500 [29:53:29<2:39:34, 12.27s/it]08/04/2024 03:50:59 - INFO - __main__ -   Step: 8720, LR: 1.6913961683120972e-06, Loss: 357.17120361328125
2024-08-04T10:51:12.431404412Z 
 92%|█████████▏| 8721/9500 [29:53:42<2:40:51, 12.39s/it]08/04/2024 03:51:12 - INFO - __main__ -   Step: 8721, LR: 1.6892256246248181e-06, Loss: 293.5577392578125
2024-08-04T10:51:24.538424146Z 
 92%|█████████▏| 8722/9500 [29:53:54<2:39:33, 12.31s/it]08/04/2024 03:51:24 - INFO - __main__ -   Step: 8722, LR: 1.6870550809375395e-06, Loss: 350.5685729980469
2024-08-04T10:51:36.638613432Z 
 92%|█████████▏| 8723/9500 [29:54:06<2:38:33, 12.24s/it]08/04/2024 03:51:36 - INFO - __main__ -   Step: 8723, LR: 1.6848845372502604e-06, Loss: 394.20458984375
2024-08-04T10:51:49.122864536Z 
 92%|█████████▏| 8724/9500 [29:54:19<2:39:17, 12.32s/it]08/04/2024 03:51:49 - INFO - __main__ -   Step: 8724, LR: 1.6827139935629816e-06, Loss: 452.58538818359375
2024-08-04T10:52:02.125413276Z 
 92%|█████████▏| 8725/9500 [29:54:32<2:41:44, 12.52s/it]08/04/2024 03:52:02 - INFO - __main__ -   Step: 8725, LR: 1.6805434498757025e-06, Loss: 354.5347900390625
2024-08-04T10:52:14.352616136Z 
 92%|█████████▏| 8726/9500 [29:54:44<2:40:23, 12.43s/it]08/04/2024 03:52:14 - INFO - __main__ -   Step: 8726, LR: 1.6783729061884238e-06, Loss: 440.69940185546875
2024-08-04T10:52:26.522139087Z 
 92%|█████████▏| 8727/9500 [29:54:56<2:39:09, 12.35s/it]08/04/2024 03:52:26 - INFO - __main__ -   Step: 8727, LR: 1.6762023625011447e-06, Loss: 333.0957946777344
2024-08-04T10:52:39.313011831Z 
 92%|█████████▏| 8728/9500 [29:55:09<2:40:38, 12.49s/it]08/04/2024 03:52:39 - INFO - __main__ -   Step: 8728, LR: 1.6740318188138657e-06, Loss: 528.2251586914062
2024-08-04T10:52:51.277521825Z 
 92%|█████████▏| 8729/9500 [29:55:21<2:38:25, 12.33s/it]08/04/2024 03:52:51 - INFO - __main__ -   Step: 8729, LR: 1.671861275126587e-06, Loss: 354.1788330078125
2024-08-04T10:53:03.537080418Z 
 92%|█████████▏| 8730/9500 [29:55:33<2:37:57, 12.31s/it]08/04/2024 03:53:03 - INFO - __main__ -   Step: 8730, LR: 1.669690731439308e-06, Loss: 469.0859375
2024-08-04T10:53:16.169047232Z 
 92%|█████████▏| 8731/9500 [29:55:46<2:38:59, 12.41s/it]08/04/2024 03:53:16 - INFO - __main__ -   Step: 8731, LR: 1.667520187752029e-06, Loss: 449.40924072265625
2024-08-04T10:53:28.117081548Z 
 92%|█████████▏| 8732/9500 [29:55:58<2:37:01, 12.27s/it]08/04/2024 03:53:28 - INFO - __main__ -   Step: 8732, LR: 1.6653496440647502e-06, Loss: 306.26983642578125
2024-08-04T10:53:39.864646708Z 
 92%|█████████▏| 8733/9500 [29:56:09<2:34:49, 12.11s/it]08/04/2024 03:53:39 - INFO - __main__ -   Step: 8733, LR: 1.6631791003774714e-06, Loss: 293.3853759765625
2024-08-04T10:53:52.653354198Z 
 92%|█████████▏| 8734/9500 [29:56:22<2:37:13, 12.32s/it]08/04/2024 03:53:52 - INFO - __main__ -   Step: 8734, LR: 1.6610085566901923e-06, Loss: 467.92327880859375
2024-08-04T10:54:04.703593390Z 
 92%|█████████▏| 8735/9500 [29:56:34<2:36:00, 12.24s/it]08/04/2024 03:54:04 - INFO - __main__ -   Step: 8735, LR: 1.6588380130029134e-06, Loss: 378.5861511230469
2024-08-04T10:54:16.936418302Z 
 92%|█████████▏| 8736/9500 [29:56:46<2:35:47, 12.23s/it]08/04/2024 03:54:16 - INFO - __main__ -   Step: 8736, LR: 1.6566674693156345e-06, Loss: 457.34521484375
2024-08-04T10:54:29.489696124Z 
 92%|█████████▏| 8737/9500 [29:56:59<2:36:48, 12.33s/it]08/04/2024 03:54:29 - INFO - __main__ -   Step: 8737, LR: 1.6544969256283555e-06, Loss: 492.9271240234375
2024-08-04T10:54:41.681963549Z 
 92%|█████████▏| 8738/9500 [29:57:11<2:36:04, 12.29s/it]08/04/2024 03:54:41 - INFO - __main__ -   Step: 8738, LR: 1.6523263819410768e-06, Loss: 318.7796325683594
2024-08-04T10:54:53.761290797Z 
 92%|█████████▏| 8739/9500 [29:57:23<2:35:04, 12.23s/it]08/04/2024 03:54:53 - INFO - __main__ -   Step: 8739, LR: 1.6501558382537977e-06, Loss: 355.1539611816406
2024-08-04T10:55:06.720812902Z 
 92%|█████████▏| 8740/9500 [29:57:36<2:37:38, 12.45s/it]08/04/2024 03:55:06 - INFO - __main__ -   Step: 8740, LR: 1.6479852945665189e-06, Loss: 415.1153259277344
2024-08-04T10:55:19.456507852Z 
 92%|█████████▏| 8741/9500 [29:57:49<2:38:32, 12.53s/it]08/04/2024 03:55:19 - INFO - __main__ -   Step: 8741, LR: 1.64581475087924e-06, Loss: 354.4269104003906
2024-08-04T10:55:31.480367653Z 
 92%|█████████▏| 8742/9500 [29:58:01<2:36:24, 12.38s/it]08/04/2024 03:55:31 - INFO - __main__ -   Step: 8742, LR: 1.643644207191961e-06, Loss: 369.64788818359375
2024-08-04T10:55:43.947144630Z 
 92%|█████████▏| 8743/9500 [29:58:13<2:36:31, 12.41s/it]08/04/2024 03:55:43 - INFO - __main__ -   Step: 8743, LR: 1.641473663504682e-06, Loss: 339.29058837890625
2024-08-04T10:55:56.030322114Z 
 92%|█████████▏| 8744/9500 [29:58:25<2:35:05, 12.31s/it]08/04/2024 03:55:56 - INFO - __main__ -   Step: 8744, LR: 1.639303119817403e-06, Loss: 352.9757995605469
2024-08-04T10:56:08.116477829Z 
 92%|█████████▏| 8745/9500 [29:58:38<2:34:02, 12.24s/it]08/04/2024 03:56:08 - INFO - __main__ -   Step: 8745, LR: 1.6371325761301243e-06, Loss: 347.2745056152344
2024-08-04T10:56:20.725036030Z 
 92%|█████████▏| 8746/9500 [29:58:50<2:35:13, 12.35s/it]08/04/2024 03:56:20 - INFO - __main__ -   Step: 8746, LR: 1.6349620324428453e-06, Loss: 367.57659912109375
2024-08-04T10:56:32.900760453Z 
 92%|█████████▏| 8747/9500 [29:59:02<2:34:21, 12.30s/it]08/04/2024 03:56:32 - INFO - __main__ -   Step: 8747, LR: 1.6327914887555662e-06, Loss: 333.3521728515625
2024-08-04T10:56:45.034807709Z 
 92%|█████████▏| 8748/9500 [29:59:14<2:33:31, 12.25s/it]08/04/2024 03:56:45 - INFO - __main__ -   Step: 8748, LR: 1.6306209450682875e-06, Loss: 409.9747619628906
2024-08-04T10:56:57.779905643Z 
 92%|█████████▏| 8749/9500 [29:59:27<2:35:11, 12.40s/it]08/04/2024 03:56:57 - INFO - __main__ -   Step: 8749, LR: 1.6284504013810085e-06, Loss: 442.46722412109375
2024-08-04T10:57:10.214213377Z 
 92%|█████████▏| 8750/9500 [29:59:40<2:35:06, 12.41s/it]08/04/2024 03:57:10 - INFO - __main__ -   Step: 8750, LR: 1.6262798576937296e-06, Loss: 435.2843017578125
2024-08-04T10:57:22.284402059Z 
 92%|█████████▏| 8751/9500 [29:59:52<2:33:38, 12.31s/it]08/04/2024 03:57:22 - INFO - __main__ -   Step: 8751, LR: 1.6241093140064507e-06, Loss: 433.00164794921875
2024-08-04T10:57:34.907997792Z 
 92%|█████████▏| 8752/9500 [30:00:04<2:34:36, 12.40s/it]08/04/2024 03:57:34 - INFO - __main__ -   Step: 8752, LR: 1.6219387703191719e-06, Loss: 364.5939636230469
2024-08-04T10:57:47.433686985Z 
 92%|█████████▏| 8753/9500 [30:00:17<2:34:52, 12.44s/it]08/04/2024 03:57:47 - INFO - __main__ -   Step: 8753, LR: 1.6197682266318928e-06, Loss: 418.106689453125
2024-08-04T10:57:59.580457287Z 
 92%|█████████▏| 8754/9500 [30:00:29<2:33:34, 12.35s/it]08/04/2024 03:57:59 - INFO - __main__ -   Step: 8754, LR: 1.617597682944614e-06, Loss: 504.3946533203125
2024-08-04T10:58:11.973444784Z 
 92%|█████████▏| 8755/9500 [30:00:41<2:33:31, 12.36s/it]08/04/2024 03:58:11 - INFO - __main__ -   Step: 8755, LR: 1.615427139257335e-06, Loss: 401.4913330078125
2024-08-04T10:58:24.471271348Z 
 92%|█████████▏| 8756/9500 [30:00:54<2:33:48, 12.40s/it]08/04/2024 03:58:24 - INFO - __main__ -   Step: 8756, LR: 1.613256595570056e-06, Loss: 276.1226501464844
2024-08-04T10:58:36.502539301Z 
 92%|█████████▏| 8757/9500 [30:01:06<2:32:13, 12.29s/it]08/04/2024 03:58:36 - INFO - __main__ -   Step: 8757, LR: 1.6110860518827773e-06, Loss: 315.89361572265625
2024-08-04T10:58:48.856812325Z 
 92%|█████████▏| 8758/9500 [30:01:18<2:32:14, 12.31s/it]08/04/2024 03:58:48 - INFO - __main__ -   Step: 8758, LR: 1.6089155081954983e-06, Loss: 459.8091735839844
2024-08-04T10:59:01.407821487Z 
 92%|█████████▏| 8759/9500 [30:01:31<2:32:55, 12.38s/it]08/04/2024 03:59:01 - INFO - __main__ -   Step: 8759, LR: 1.6067449645082194e-06, Loss: 369.89447021484375
2024-08-04T10:59:13.603070275Z 
 92%|█████████▏| 8760/9500 [30:01:43<2:32:01, 12.33s/it]08/04/2024 03:59:13 - INFO - __main__ -   Step: 8760, LR: 1.6045744208209405e-06, Loss: 441.2732238769531
2024-08-04T10:59:26.357826043Z 
 92%|█████████▏| 8761/9500 [30:01:56<2:33:24, 12.46s/it]08/04/2024 03:59:26 - INFO - __main__ -   Step: 8761, LR: 1.6024038771336615e-06, Loss: 436.075439453125
2024-08-04T10:59:38.515991191Z 
 92%|█████████▏| 8762/9500 [30:02:08<2:32:06, 12.37s/it]08/04/2024 03:59:38 - INFO - __main__ -   Step: 8762, LR: 1.6002333334463826e-06, Loss: 388.66632080078125
2024-08-04T10:59:50.680482103Z 
 92%|█████████▏| 8763/9500 [30:02:20<2:31:09, 12.31s/it]08/04/2024 03:59:50 - INFO - __main__ -   Step: 8763, LR: 1.5980627897591035e-06, Loss: 418.256591796875
2024-08-04T11:00:03.522590631Z 
 92%|█████████▏| 8764/9500 [30:02:33<2:32:55, 12.47s/it]08/04/2024 04:00:03 - INFO - __main__ -   Step: 8764, LR: 1.5958922460718249e-06, Loss: 412.74822998046875
2024-08-04T11:00:16.079473150Z 
 92%|█████████▏| 8765/9500 [30:02:46<2:33:02, 12.49s/it]08/04/2024 04:00:16 - INFO - __main__ -   Step: 8765, LR: 1.5937217023845458e-06, Loss: 513.4608764648438
2024-08-04T11:00:28.026372665Z 
 92%|█████████▏| 8766/9500 [30:02:57<2:30:49, 12.33s/it]08/04/2024 04:00:28 - INFO - __main__ -   Step: 8766, LR: 1.5915511586972671e-06, Loss: 484.46185302734375
2024-08-04T11:00:40.532254955Z 
 92%|█████████▏| 8767/9500 [30:03:10<2:31:16, 12.38s/it]08/04/2024 04:00:40 - INFO - __main__ -   Step: 8767, LR: 1.589380615009988e-06, Loss: 462.2154541015625
2024-08-04T11:00:53.298579701Z 
 92%|█████████▏| 8768/9500 [30:03:23<2:32:28, 12.50s/it]08/04/2024 04:00:53 - INFO - __main__ -   Step: 8768, LR: 1.587210071322709e-06, Loss: 406.6759338378906
2024-08-04T11:01:05.424436469Z 
 92%|█████████▏| 8769/9500 [30:03:35<2:30:54, 12.39s/it]08/04/2024 04:01:05 - INFO - __main__ -   Step: 8769, LR: 1.5850395276354301e-06, Loss: 435.2394104003906
2024-08-04T11:01:17.440574578Z 
 92%|█████████▏| 8770/9500 [30:03:47<2:29:20, 12.28s/it]08/04/2024 04:01:17 - INFO - __main__ -   Step: 8770, LR: 1.5828689839481513e-06, Loss: 389.0706481933594
2024-08-04T11:01:30.000612332Z 
 92%|█████████▏| 8771/9500 [30:03:59<2:30:10, 12.36s/it]08/04/2024 04:01:30 - INFO - __main__ -   Step: 8771, LR: 1.5806984402608724e-06, Loss: 390.3706359863281
2024-08-04T11:01:42.078802566Z 
 92%|█████████▏| 8772/9500 [30:04:12<2:28:56, 12.28s/it]08/04/2024 04:01:42 - INFO - __main__ -   Step: 8772, LR: 1.5785278965735933e-06, Loss: 404.6187438964844
2024-08-04T11:01:54.314534271Z 
 92%|█████████▏| 8773/9500 [30:04:24<2:28:35, 12.26s/it]08/04/2024 04:01:54 - INFO - __main__ -   Step: 8773, LR: 1.5763573528863147e-06, Loss: 397.85333251953125
2024-08-04T11:02:06.671328676Z 
 92%|█████████▏| 8774/9500 [30:04:36<2:28:43, 12.29s/it]08/04/2024 04:02:06 - INFO - __main__ -   Step: 8774, LR: 1.5741868091990356e-06, Loss: 326.33447265625
2024-08-04T11:02:19.219174302Z 
 92%|█████████▏| 8775/9500 [30:04:49<2:29:27, 12.37s/it]08/04/2024 04:02:19 - INFO - __main__ -   Step: 8775, LR: 1.5720162655117565e-06, Loss: 472.7491760253906
2024-08-04T11:02:31.459659387Z 
 92%|█████████▏| 8776/9500 [30:05:01<2:28:47, 12.33s/it]08/04/2024 04:02:31 - INFO - __main__ -   Step: 8776, LR: 1.5698457218244779e-06, Loss: 363.6701354980469
2024-08-04T11:02:43.985959434Z 
 92%|█████████▏| 8777/9500 [30:05:13<2:29:17, 12.39s/it]08/04/2024 04:02:43 - INFO - __main__ -   Step: 8777, LR: 1.5676751781371988e-06, Loss: 336.2279052734375
2024-08-04T11:02:56.035173487Z 
 92%|█████████▏| 8778/9500 [30:05:25<2:27:51, 12.29s/it]08/04/2024 04:02:56 - INFO - __main__ -   Step: 8778, LR: 1.56550463444992e-06, Loss: 461.3236083984375
2024-08-04T11:03:08.212727839Z 
 92%|█████████▏| 8779/9500 [30:05:38<2:27:15, 12.25s/it]08/04/2024 04:03:08 - INFO - __main__ -   Step: 8779, LR: 1.563334090762641e-06, Loss: 359.98284912109375
2024-08-04T11:03:20.857385078Z 
 92%|█████████▏| 8780/9500 [30:05:50<2:28:27, 12.37s/it]08/04/2024 04:03:20 - INFO - __main__ -   Step: 8780, LR: 1.5611635470753622e-06, Loss: 403.73773193359375
2024-08-04T11:03:33.153195185Z 
 92%|█████████▏| 8781/9500 [30:06:03<2:27:58, 12.35s/it]08/04/2024 04:03:33 - INFO - __main__ -   Step: 8781, LR: 1.5589930033880831e-06, Loss: 374.736328125
2024-08-04T11:03:45.303068589Z 
 92%|█████████▏| 8782/9500 [30:06:15<2:27:03, 12.29s/it]08/04/2024 04:03:45 - INFO - __main__ -   Step: 8782, LR: 1.556822459700804e-06, Loss: 420.1788635253906
2024-08-04T11:03:57.769227925Z 
 92%|█████████▏| 8783/9500 [30:06:27<2:27:29, 12.34s/it]08/04/2024 04:03:57 - INFO - __main__ -   Step: 8783, LR: 1.5546519160135254e-06, Loss: 397.927001953125
2024-08-04T11:04:10.084730470Z 
 92%|█████████▏| 8784/9500 [30:06:40<2:27:11, 12.33s/it]08/04/2024 04:04:10 - INFO - __main__ -   Step: 8784, LR: 1.5524813723262463e-06, Loss: 360.7191162109375
2024-08-04T11:04:22.275373966Z 
 92%|█████████▏| 8785/9500 [30:06:52<2:26:28, 12.29s/it]08/04/2024 04:04:22 - INFO - __main__ -   Step: 8785, LR: 1.5503108286389677e-06, Loss: 402.22686767578125
2024-08-04T11:04:34.582911941Z 
 92%|█████████▏| 8786/9500 [30:07:04<2:26:19, 12.30s/it]08/04/2024 04:04:34 - INFO - __main__ -   Step: 8786, LR: 1.5481402849516886e-06, Loss: 394.4337463378906
2024-08-04T11:04:46.829773693Z 
 92%|█████████▏| 8787/9500 [30:07:16<2:25:56, 12.28s/it]08/04/2024 04:04:46 - INFO - __main__ -   Step: 8787, LR: 1.5459697412644095e-06, Loss: 349.862548828125
2024-08-04T11:04:59.264477271Z 
 93%|█████████▎| 8788/9500 [30:07:29<2:26:17, 12.33s/it]08/04/2024 04:04:59 - INFO - __main__ -   Step: 8788, LR: 1.5437991975771307e-06, Loss: 363.60693359375
2024-08-04T11:05:11.826939789Z 
 93%|█████████▎| 8789/9500 [30:07:41<2:26:54, 12.40s/it]08/04/2024 04:05:11 - INFO - __main__ -   Step: 8789, LR: 1.5416286538898518e-06, Loss: 326.30108642578125
2024-08-04T11:05:23.912123499Z 
 93%|█████████▎| 8790/9500 [30:07:53<2:25:35, 12.30s/it]08/04/2024 04:05:23 - INFO - __main__ -   Step: 8790, LR: 1.539458110202573e-06, Loss: 311.5821533203125
2024-08-04T11:05:36.308394688Z 
 93%|█████████▎| 8791/9500 [30:08:06<2:25:43, 12.33s/it]08/04/2024 04:05:36 - INFO - __main__ -   Step: 8791, LR: 1.5372875665152939e-06, Loss: 439.71044921875
2024-08-04T11:05:49.041725703Z 
 93%|█████████▎| 8792/9500 [30:08:18<2:26:56, 12.45s/it]08/04/2024 04:05:49 - INFO - __main__ -   Step: 8792, LR: 1.5351170228280152e-06, Loss: 515.386962890625
2024-08-04T11:06:01.374980596Z 
 93%|█████████▎| 8793/9500 [30:08:31<2:26:18, 12.42s/it]08/04/2024 04:06:01 - INFO - __main__ -   Step: 8793, LR: 1.5329464791407361e-06, Loss: 340.22686767578125
2024-08-04T11:06:13.248803287Z 
 93%|█████████▎| 8794/9500 [30:08:43<2:24:11, 12.25s/it]08/04/2024 04:06:13 - INFO - __main__ -   Step: 8794, LR: 1.530775935453457e-06, Loss: 409.80609130859375
2024-08-04T11:06:25.690485222Z 
 93%|█████████▎| 8795/9500 [30:08:55<2:24:38, 12.31s/it]08/04/2024 04:06:25 - INFO - __main__ -   Step: 8795, LR: 1.5286053917661784e-06, Loss: 372.8683776855469
2024-08-04T11:06:37.925887429Z 
 93%|█████████▎| 8796/9500 [30:09:07<2:24:10, 12.29s/it]08/04/2024 04:06:37 - INFO - __main__ -   Step: 8796, LR: 1.5264348480788993e-06, Loss: 377.3721618652344
2024-08-04T11:06:50.037591756Z 
 93%|█████████▎| 8797/9500 [30:09:19<2:23:21, 12.23s/it]08/04/2024 04:06:50 - INFO - __main__ -   Step: 8797, LR: 1.5242643043916205e-06, Loss: 449.4345703125
2024-08-04T11:07:02.960983550Z 
 93%|█████████▎| 8798/9500 [30:09:32<2:25:33, 12.44s/it]08/04/2024 04:07:02 - INFO - __main__ -   Step: 8798, LR: 1.5220937607043416e-06, Loss: 299.2708740234375
2024-08-04T11:07:15.191350101Z 
 93%|█████████▎| 8799/9500 [30:09:45<2:24:37, 12.38s/it]08/04/2024 04:07:15 - INFO - __main__ -   Step: 8799, LR: 1.5199232170170627e-06, Loss: 345.646484375
2024-08-04T11:07:27.397285823Z 
 93%|█████████▎| 8800/9500 [30:09:57<2:23:48, 12.33s/it]08/04/2024 04:07:27 - INFO - __main__ -   Step: 8800, LR: 1.5177526733297837e-06, Loss: 451.00311279296875
2024-08-04T11:07:39.699439235Z 
 93%|█████████▎| 8801/9500 [30:10:09<2:23:31, 12.32s/it]08/04/2024 04:07:39 - INFO - __main__ -   Step: 8801, LR: 1.5155821296425046e-06, Loss: 373.36578369140625
2024-08-04T11:07:51.929238896Z 
 93%|█████████▎| 8802/9500 [30:10:21<2:23:00, 12.29s/it]08/04/2024 04:07:51 - INFO - __main__ -   Step: 8802, LR: 1.513411585955226e-06, Loss: 392.7093505859375
2024-08-04T11:08:04.066342716Z 
 93%|█████████▎| 8803/9500 [30:10:34<2:22:15, 12.25s/it]08/04/2024 04:08:04 - INFO - __main__ -   Step: 8803, LR: 1.5112410422679468e-06, Loss: 349.1726989746094
2024-08-04T11:08:16.614599327Z 
 93%|█████████▎| 8804/9500 [30:10:46<2:23:06, 12.34s/it]08/04/2024 04:08:16 - INFO - __main__ -   Step: 8804, LR: 1.5090704985806682e-06, Loss: 405.68780517578125
2024-08-04T11:08:28.998114745Z 
 93%|█████████▎| 8805/9500 [30:10:58<2:23:03, 12.35s/it]08/04/2024 04:08:28 - INFO - __main__ -   Step: 8805, LR: 1.5068999548933891e-06, Loss: 448.27093505859375
2024-08-04T11:08:41.078588862Z 
 93%|█████████▎| 8806/9500 [30:11:11<2:21:55, 12.27s/it]08/04/2024 04:08:41 - INFO - __main__ -   Step: 8806, LR: 1.5047294112061103e-06, Loss: 364.29669189453125
2024-08-04T11:08:53.597601114Z 
 93%|█████████▎| 8807/9500 [30:11:23<2:22:34, 12.34s/it]08/04/2024 04:08:53 - INFO - __main__ -   Step: 8807, LR: 1.5025588675188314e-06, Loss: 319.4245910644531
2024-08-04T11:09:05.925333877Z 
 93%|█████████▎| 8808/9500 [30:11:35<2:22:18, 12.34s/it]08/04/2024 04:09:05 - INFO - __main__ -   Step: 8808, LR: 1.5003883238315523e-06, Loss: 358.65985107421875
2024-08-04T11:09:18.015032729Z 
 93%|█████████▎| 8809/9500 [30:11:47<2:21:14, 12.26s/it]08/04/2024 04:09:18 - INFO - __main__ -   Step: 8809, LR: 1.4982177801442735e-06, Loss: 403.3895568847656
2024-08-04T11:09:30.438652058Z 
 93%|█████████▎| 8810/9500 [30:12:00<2:21:35, 12.31s/it]08/04/2024 04:09:30 - INFO - __main__ -   Step: 8810, LR: 1.4960472364569944e-06, Loss: 548.6656494140625
2024-08-04T11:09:43.685004223Z 
 93%|█████████▎| 8811/9500 [30:12:13<2:24:36, 12.59s/it]08/04/2024 04:09:43 - INFO - __main__ -   Step: 8811, LR: 1.4938766927697157e-06, Loss: 330.88555908203125
2024-08-04T11:09:55.444184645Z 
 93%|█████████▎| 8812/9500 [30:12:25<2:21:31, 12.34s/it]08/04/2024 04:09:55 - INFO - __main__ -   Step: 8812, LR: 1.4917061490824366e-06, Loss: 309.8913269042969
2024-08-04T11:10:07.607720701Z 
 93%|█████████▎| 8813/9500 [30:12:37<2:20:42, 12.29s/it]08/04/2024 04:10:07 - INFO - __main__ -   Step: 8813, LR: 1.489535605395158e-06, Loss: 398.59918212890625
2024-08-04T11:10:20.215526892Z 
 93%|█████████▎| 8814/9500 [30:12:50<2:21:35, 12.38s/it]08/04/2024 04:10:20 - INFO - __main__ -   Step: 8814, LR: 1.487365061707879e-06, Loss: 398.00482177734375
2024-08-04T11:10:32.418604047Z 
 93%|█████████▎| 8815/9500 [30:13:02<2:20:46, 12.33s/it]08/04/2024 04:10:32 - INFO - __main__ -   Step: 8815, LR: 1.4851945180205998e-06, Loss: 304.0663146972656
2024-08-04T11:10:44.404614626Z 
 93%|█████████▎| 8816/9500 [30:13:14<2:19:23, 12.23s/it]08/04/2024 04:10:44 - INFO - __main__ -   Step: 8816, LR: 1.483023974333321e-06, Loss: 368.425537109375
2024-08-04T11:10:57.172073852Z 
 93%|█████████▎| 8817/9500 [30:13:27<2:21:01, 12.39s/it]08/04/2024 04:10:57 - INFO - __main__ -   Step: 8817, LR: 1.4808534306460421e-06, Loss: 391.1480712890625
2024-08-04T11:11:09.450757153Z 
 93%|█████████▎| 8818/9500 [30:13:39<2:20:26, 12.36s/it]08/04/2024 04:11:09 - INFO - __main__ -   Step: 8818, LR: 1.4786828869587633e-06, Loss: 352.43304443359375
2024-08-04T11:11:21.829327417Z 
 93%|█████████▎| 8819/9500 [30:13:51<2:20:19, 12.36s/it]08/04/2024 04:11:21 - INFO - __main__ -   Step: 8819, LR: 1.4765123432714842e-06, Loss: 476.4674072265625
2024-08-04T11:11:34.723498130Z 
 93%|█████████▎| 8820/9500 [30:14:04<2:21:55, 12.52s/it]08/04/2024 04:11:34 - INFO - __main__ -   Step: 8820, LR: 1.4743417995842055e-06, Loss: 404.72674560546875
2024-08-04T11:11:46.862772049Z 
 93%|█████████▎| 8821/9500 [30:14:16<2:20:24, 12.41s/it]08/04/2024 04:11:46 - INFO - __main__ -   Step: 8821, LR: 1.4721712558969264e-06, Loss: 410.98724365234375
2024-08-04T11:11:59.019563525Z 
 93%|█████████▎| 8822/9500 [30:14:28<2:19:21, 12.33s/it]08/04/2024 04:11:59 - INFO - __main__ -   Step: 8822, LR: 1.4700007122096474e-06, Loss: 381.4614562988281
2024-08-04T11:12:11.519739323Z 
 93%|█████████▎| 8823/9500 [30:14:41<2:19:43, 12.38s/it]08/04/2024 04:12:11 - INFO - __main__ -   Step: 8823, LR: 1.4678301685223687e-06, Loss: 343.0932922363281
2024-08-04T11:12:23.843661410Z 
 93%|█████████▎| 8824/9500 [30:14:53<2:19:18, 12.36s/it]08/04/2024 04:12:23 - INFO - __main__ -   Step: 8824, LR: 1.4656596248350896e-06, Loss: 449.5324401855469
2024-08-04T11:12:35.975606795Z 
 93%|█████████▎| 8825/9500 [30:15:05<2:18:19, 12.30s/it]08/04/2024 04:12:35 - INFO - __main__ -   Step: 8825, LR: 1.4634890811478108e-06, Loss: 378.42962646484375
2024-08-04T11:12:48.339376045Z 
 93%|█████████▎| 8826/9500 [30:15:18<2:18:20, 12.32s/it]08/04/2024 04:12:48 - INFO - __main__ -   Step: 8826, LR: 1.461318537460532e-06, Loss: 414.059814453125
2024-08-04T11:13:01.005954222Z 
 93%|█████████▎| 8827/9500 [30:15:30<2:19:19, 12.42s/it]08/04/2024 04:13:01 - INFO - __main__ -   Step: 8827, LR: 1.4591479937732528e-06, Loss: 332.548583984375
2024-08-04T11:13:12.871727451Z 
 93%|█████████▎| 8828/9500 [30:15:42<2:17:14, 12.25s/it]08/04/2024 04:13:12 - INFO - __main__ -   Step: 8828, LR: 1.456977450085974e-06, Loss: 409.7724609375
2024-08-04T11:13:25.274468299Z 
 93%|█████████▎| 8829/9500 [30:15:55<2:17:32, 12.30s/it]08/04/2024 04:13:25 - INFO - __main__ -   Step: 8829, LR: 1.454806906398695e-06, Loss: 325.5451965332031
2024-08-04T11:13:37.652756843Z 
 93%|█████████▎| 8830/9500 [30:16:07<2:17:36, 12.32s/it]08/04/2024 04:13:37 - INFO - __main__ -   Step: 8830, LR: 1.4526363627114163e-06, Loss: 341.5301818847656
2024-08-04T11:13:49.884942802Z 
 93%|█████████▎| 8831/9500 [30:16:19<2:17:05, 12.30s/it]08/04/2024 04:13:49 - INFO - __main__ -   Step: 8831, LR: 1.4504658190241372e-06, Loss: 428.1621398925781
2024-08-04T11:14:02.436001345Z 
 93%|█████████▎| 8832/9500 [30:16:32<2:17:44, 12.37s/it]08/04/2024 04:14:02 - INFO - __main__ -   Step: 8832, LR: 1.4482952753368585e-06, Loss: 305.7275085449219
2024-08-04T11:14:14.974130683Z 
 93%|█████████▎| 8833/9500 [30:16:44<2:18:05, 12.42s/it]08/04/2024 04:14:14 - INFO - __main__ -   Step: 8833, LR: 1.4461247316495794e-06, Loss: 484.7658996582031
2024-08-04T11:14:27.203617988Z 
 93%|█████████▎| 8834/9500 [30:16:57<2:17:14, 12.36s/it]08/04/2024 04:14:27 - INFO - __main__ -   Step: 8834, LR: 1.4439541879623004e-06, Loss: 486.0975341796875
2024-08-04T11:14:39.806687231Z 
 93%|█████████▎| 8835/9500 [30:17:09<2:17:49, 12.44s/it]08/04/2024 04:14:39 - INFO - __main__ -   Step: 8835, LR: 1.4417836442750215e-06, Loss: 442.9040832519531
2024-08-04T11:14:51.970715301Z 
 93%|█████████▎| 8836/9500 [30:17:21<2:16:43, 12.35s/it]08/04/2024 04:14:51 - INFO - __main__ -   Step: 8836, LR: 1.4396131005877426e-06, Loss: 352.7724304199219
2024-08-04T11:15:04.289382292Z 
 93%|█████████▎| 8837/9500 [30:17:34<2:16:23, 12.34s/it]08/04/2024 04:15:04 - INFO - __main__ -   Step: 8837, LR: 1.4374425569004638e-06, Loss: 607.8421630859375
2024-08-04T11:15:16.950361224Z 
 93%|█████████▎| 8838/9500 [30:17:46<2:17:14, 12.44s/it]08/04/2024 04:15:16 - INFO - __main__ -   Step: 8838, LR: 1.4352720132131847e-06, Loss: 396.58941650390625
2024-08-04T11:15:29.560333592Z 
 93%|█████████▎| 8839/9500 [30:17:59<2:17:35, 12.49s/it]08/04/2024 04:15:29 - INFO - __main__ -   Step: 8839, LR: 1.433101469525906e-06, Loss: 486.7795104980469
2024-08-04T11:15:41.987914476Z 
 93%|█████████▎| 8840/9500 [30:18:11<2:17:11, 12.47s/it]08/04/2024 04:15:41 - INFO - __main__ -   Step: 8840, LR: 1.430930925838627e-06, Loss: 553.482421875
2024-08-04T11:15:54.464469748Z 
 93%|█████████▎| 8841/9500 [30:18:24<2:16:59, 12.47s/it]08/04/2024 04:15:54 - INFO - __main__ -   Step: 8841, LR: 1.428760382151348e-06, Loss: 366.78363037109375
2024-08-04T11:16:06.582721799Z 
 93%|█████████▎| 8842/9500 [30:18:36<2:15:37, 12.37s/it]08/04/2024 04:16:06 - INFO - __main__ -   Step: 8842, LR: 1.4265898384640692e-06, Loss: 306.4254150390625
2024-08-04T11:16:18.694488058Z 
 93%|█████████▎| 8843/9500 [30:18:48<2:14:34, 12.29s/it]08/04/2024 04:16:18 - INFO - __main__ -   Step: 8843, LR: 1.4244192947767902e-06, Loss: 429.178955078125
2024-08-04T11:16:31.099965318Z 
 93%|█████████▎| 8844/9500 [30:19:01<2:14:45, 12.32s/it]08/04/2024 04:16:31 - INFO - __main__ -   Step: 8844, LR: 1.4222487510895113e-06, Loss: 380.0740051269531
2024-08-04T11:16:43.151205120Z 
 93%|█████████▎| 8845/9500 [30:19:13<2:13:38, 12.24s/it]08/04/2024 04:16:43 - INFO - __main__ -   Step: 8845, LR: 1.4200782074022324e-06, Loss: 432.7330322265625
2024-08-04T11:16:55.215476278Z 
 93%|█████████▎| 8846/9500 [30:19:25<2:12:51, 12.19s/it]08/04/2024 04:16:55 - INFO - __main__ -   Step: 8846, LR: 1.4179076637149536e-06, Loss: 371.4810791015625
2024-08-04T11:17:07.808806020Z 
 93%|█████████▎| 8847/9500 [30:19:37<2:13:58, 12.31s/it]08/04/2024 04:17:07 - INFO - __main__ -   Step: 8847, LR: 1.4157371200276745e-06, Loss: 407.1984558105469
2024-08-04T11:17:20.120301267Z 
 93%|█████████▎| 8848/9500 [30:19:50<2:13:46, 12.31s/it]08/04/2024 04:17:20 - INFO - __main__ -   Step: 8848, LR: 1.4135665763403954e-06, Loss: 421.93389892578125
2024-08-04T11:17:32.124126905Z 
 93%|█████████▎| 8849/9500 [30:20:02<2:12:34, 12.22s/it]08/04/2024 04:17:32 - INFO - __main__ -   Step: 8849, LR: 1.4113960326531168e-06, Loss: 377.18170166015625
2024-08-04T11:17:44.606817487Z 
 93%|█████████▎| 8850/9500 [30:20:14<2:13:13, 12.30s/it]08/04/2024 04:17:44 - INFO - __main__ -   Step: 8850, LR: 1.4092254889658377e-06, Loss: 366.5993347167969
2024-08-04T11:17:56.877771929Z 
 93%|█████████▎| 8851/9500 [30:20:26<2:12:56, 12.29s/it]08/04/2024 04:17:56 - INFO - __main__ -   Step: 8851, LR: 1.407054945278559e-06, Loss: 321.6183166503906
2024-08-04T11:18:09.122805948Z 
 93%|█████████▎| 8852/9500 [30:20:39<2:12:34, 12.28s/it]08/04/2024 04:18:09 - INFO - __main__ -   Step: 8852, LR: 1.40488440159128e-06, Loss: 409.1338195800781
2024-08-04T11:18:21.319714950Z 
 93%|█████████▎| 8853/9500 [30:20:51<2:12:07, 12.25s/it]08/04/2024 04:18:21 - INFO - __main__ -   Step: 8853, LR: 1.4027138579040011e-06, Loss: 355.56353759765625
2024-08-04T11:18:34.308542151Z 
 93%|█████████▎| 8854/9500 [30:21:04<2:14:17, 12.47s/it]08/04/2024 04:18:34 - INFO - __main__ -   Step: 8854, LR: 1.400543314216722e-06, Loss: 373.0400390625
2024-08-04T11:18:46.361830708Z 
 93%|█████████▎| 8855/9500 [30:21:16<2:12:44, 12.35s/it]08/04/2024 04:18:46 - INFO - __main__ -   Step: 8855, LR: 1.3983727705294432e-06, Loss: 422.9308776855469
2024-08-04T11:18:58.521359466Z 
 93%|█████████▎| 8856/9500 [30:21:28<2:11:55, 12.29s/it]08/04/2024 04:18:58 - INFO - __main__ -   Step: 8856, LR: 1.3962022268421643e-06, Loss: 338.70318603515625
2024-08-04T11:19:11.125634389Z 
 93%|█████████▎| 8857/9500 [30:21:41<2:12:43, 12.38s/it]08/04/2024 04:19:11 - INFO - __main__ -   Step: 8857, LR: 1.3940316831548852e-06, Loss: 363.115478515625
2024-08-04T11:19:23.235859532Z 
 93%|█████████▎| 8858/9500 [30:21:53<2:11:38, 12.30s/it]08/04/2024 04:19:23 - INFO - __main__ -   Step: 8858, LR: 1.3918611394676066e-06, Loss: 324.75579833984375
2024-08-04T11:19:35.380223323Z 
 93%|█████████▎| 8859/9500 [30:22:05<2:10:55, 12.26s/it]08/04/2024 04:19:35 - INFO - __main__ -   Step: 8859, LR: 1.3896905957803275e-06, Loss: 397.19244384765625
2024-08-04T11:19:47.804104195Z 
 93%|█████████▎| 8860/9500 [30:22:17<2:11:15, 12.31s/it]08/04/2024 04:19:47 - INFO - __main__ -   Step: 8860, LR: 1.3875200520930486e-06, Loss: 369.333251953125
2024-08-04T11:19:59.883525338Z 
 93%|█████████▎| 8861/9500 [30:22:29<2:10:19, 12.24s/it]08/04/2024 04:19:59 - INFO - __main__ -   Step: 8861, LR: 1.3853495084057698e-06, Loss: 390.82763671875
2024-08-04T11:20:12.304690735Z 
 93%|█████████▎| 8862/9500 [30:22:42<2:10:42, 12.29s/it]08/04/2024 04:20:12 - INFO - __main__ -   Step: 8862, LR: 1.3831789647184907e-06, Loss: 435.06103515625
2024-08-04T11:20:24.657349645Z 
 93%|█████████▎| 8863/9500 [30:22:54<2:10:42, 12.31s/it]08/04/2024 04:20:24 - INFO - __main__ -   Step: 8863, LR: 1.3810084210312118e-06, Loss: 421.212646484375
2024-08-04T11:20:36.837780383Z 
 93%|█████████▎| 8864/9500 [30:23:06<2:10:04, 12.27s/it]08/04/2024 04:20:36 - INFO - __main__ -   Step: 8864, LR: 1.378837877343933e-06, Loss: 412.23297119140625
2024-08-04T11:20:48.928685419Z 
 93%|█████████▎| 8865/9500 [30:23:18<2:09:18, 12.22s/it]08/04/2024 04:20:48 - INFO - __main__ -   Step: 8865, LR: 1.376667333656654e-06, Loss: 328.3394775390625
2024-08-04T11:21:01.414870102Z 
 93%|█████████▎| 8866/9500 [30:23:31<2:09:56, 12.30s/it]08/04/2024 04:21:01 - INFO - __main__ -   Step: 8866, LR: 1.374496789969375e-06, Loss: 453.111328125
2024-08-04T11:21:13.935074891Z 
 93%|█████████▎| 8867/9500 [30:23:43<2:10:26, 12.36s/it]08/04/2024 04:21:13 - INFO - __main__ -   Step: 8867, LR: 1.372326246282096e-06, Loss: 337.9208984375
2024-08-04T11:21:26.117967412Z 
 93%|█████████▎| 8868/9500 [30:23:56<2:09:40, 12.31s/it]08/04/2024 04:21:26 - INFO - __main__ -   Step: 8868, LR: 1.3701557025948173e-06, Loss: 450.74713134765625
2024-08-04T11:21:38.729137227Z 
 93%|█████████▎| 8869/9500 [30:24:08<2:10:24, 12.40s/it]08/04/2024 04:21:38 - INFO - __main__ -   Step: 8869, LR: 1.3679851589075382e-06, Loss: 339.3517150878906
2024-08-04T11:21:51.291196279Z 
 93%|█████████▎| 8870/9500 [30:24:21<2:10:42, 12.45s/it]08/04/2024 04:21:51 - INFO - __main__ -   Step: 8870, LR: 1.3658146152202596e-06, Loss: 409.42828369140625
2024-08-04T11:22:03.352772220Z 
 93%|█████████▎| 8871/9500 [30:24:33<2:09:17, 12.33s/it]08/04/2024 04:22:03 - INFO - __main__ -   Step: 8871, LR: 1.3636440715329805e-06, Loss: 340.289306640625
2024-08-04T11:22:15.901770052Z 
 93%|█████████▎| 8872/9500 [30:24:45<2:09:45, 12.40s/it]08/04/2024 04:22:15 - INFO - __main__ -   Step: 8872, LR: 1.3614735278457016e-06, Loss: 370.4313049316406
2024-08-04T11:22:28.101184787Z 
 93%|█████████▎| 8873/9500 [30:24:58<2:08:56, 12.34s/it]08/04/2024 04:22:28 - INFO - __main__ -   Step: 8873, LR: 1.3593029841584226e-06, Loss: 387.1065673828125
2024-08-04T11:22:40.070589689Z 
 93%|█████████▎| 8874/9500 [30:25:10<2:07:34, 12.23s/it]08/04/2024 04:22:40 - INFO - __main__ -   Step: 8874, LR: 1.3571324404711437e-06, Loss: 427.55242919921875
2024-08-04T11:22:52.696787474Z 
 93%|█████████▎| 8875/9500 [30:25:22<2:08:36, 12.35s/it]08/04/2024 04:22:52 - INFO - __main__ -   Step: 8875, LR: 1.3549618967838648e-06, Loss: 368.4694519042969
2024-08-04T11:23:04.747036504Z 
 93%|█████████▎| 8876/9500 [30:25:34<2:07:29, 12.26s/it]08/04/2024 04:23:04 - INFO - __main__ -   Step: 8876, LR: 1.3527913530965858e-06, Loss: 385.49169921875
2024-08-04T11:23:17.042807085Z 
 93%|█████████▎| 8877/9500 [30:25:46<2:07:23, 12.27s/it]08/04/2024 04:23:17 - INFO - __main__ -   Step: 8877, LR: 1.350620809409307e-06, Loss: 474.682373046875
2024-08-04T11:23:29.488955477Z 
 93%|█████████▎| 8878/9500 [30:25:59<2:07:44, 12.32s/it]08/04/2024 04:23:29 - INFO - __main__ -   Step: 8878, LR: 1.348450265722028e-06, Loss: 439.6070556640625
2024-08-04T11:23:41.961116567Z 
 93%|█████████▎| 8879/9500 [30:26:11<2:08:00, 12.37s/it]08/04/2024 04:23:41 - INFO - __main__ -   Step: 8879, LR: 1.3462797220347494e-06, Loss: 396.32415771484375
2024-08-04T11:23:54.323896049Z 
 93%|█████████▎| 8880/9500 [30:26:24<2:07:46, 12.37s/it]08/04/2024 04:23:54 - INFO - __main__ -   Step: 8880, LR: 1.3441091783474703e-06, Loss: 439.7467041015625
2024-08-04T11:24:07.264855333Z 
 93%|█████████▎| 8881/9500 [30:26:37<2:09:21, 12.54s/it]08/04/2024 04:24:07 - INFO - __main__ -   Step: 8881, LR: 1.3419386346601912e-06, Loss: 481.95660400390625
2024-08-04T11:24:19.644682447Z 
 93%|█████████▎| 8882/9500 [30:26:49<2:08:39, 12.49s/it]08/04/2024 04:24:19 - INFO - __main__ -   Step: 8882, LR: 1.3397680909729124e-06, Loss: 428.0054016113281
2024-08-04T11:24:31.889269467Z 
 94%|█████████▎| 8883/9500 [30:27:01<2:07:41, 12.42s/it]08/04/2024 04:24:31 - INFO - __main__ -   Step: 8883, LR: 1.3375975472856335e-06, Loss: 473.27557373046875
2024-08-04T11:24:44.593668577Z 
 94%|█████████▎| 8884/9500 [30:27:14<2:08:21, 12.50s/it]08/04/2024 04:24:44 - INFO - __main__ -   Step: 8884, LR: 1.3354270035983546e-06, Loss: 508.99072265625
2024-08-04T11:24:56.725476226Z 
 94%|█████████▎| 8885/9500 [30:27:26<2:07:00, 12.39s/it]08/04/2024 04:24:56 - INFO - __main__ -   Step: 8885, LR: 1.3332564599110756e-06, Loss: 312.7012634277344
2024-08-04T11:25:08.877011943Z 
 94%|█████████▎| 8886/9500 [30:27:38<2:06:04, 12.32s/it]08/04/2024 04:25:08 - INFO - __main__ -   Step: 8886, LR: 1.331085916223797e-06, Loss: 431.30364990234375
2024-08-04T11:25:21.391895699Z 
 94%|█████████▎| 8887/9500 [30:27:51<2:06:27, 12.38s/it]08/04/2024 04:25:21 - INFO - __main__ -   Step: 8887, LR: 1.3289153725365178e-06, Loss: 251.86119079589844
2024-08-04T11:25:33.884761199Z 
 94%|█████████▎| 8888/9500 [30:28:03<2:06:36, 12.41s/it]08/04/2024 04:25:33 - INFO - __main__ -   Step: 8888, LR: 1.3267448288492388e-06, Loss: 343.4289245605469
2024-08-04T11:25:45.893714999Z 
 94%|█████████▎| 8889/9500 [30:28:15<2:05:10, 12.29s/it]08/04/2024 04:25:45 - INFO - __main__ -   Step: 8889, LR: 1.32457428516196e-06, Loss: 402.2596435546875
2024-08-04T11:25:58.319725920Z 
 94%|█████████▎| 8890/9500 [30:28:28<2:05:22, 12.33s/it]08/04/2024 04:25:58 - INFO - __main__ -   Step: 8890, LR: 1.322403741474681e-06, Loss: 403.09844970703125
2024-08-04T11:26:10.692312149Z 
 94%|█████████▎| 8891/9500 [30:28:40<2:05:17, 12.34s/it]08/04/2024 04:26:10 - INFO - __main__ -   Step: 8891, LR: 1.3202331977874022e-06, Loss: 345.97265625
2024-08-04T11:26:22.624388131Z 
 94%|█████████▎| 8892/9500 [30:28:52<2:03:50, 12.22s/it]08/04/2024 04:26:22 - INFO - __main__ -   Step: 8892, LR: 1.318062654100123e-06, Loss: 411.88653564453125
2024-08-04T11:26:35.199175536Z 
 94%|█████████▎| 8893/9500 [30:29:05<2:04:42, 12.33s/it]08/04/2024 04:26:35 - INFO - __main__ -   Step: 8893, LR: 1.3158921104128444e-06, Loss: 468.54107666015625
2024-08-04T11:26:47.306404149Z 
 94%|█████████▎| 8894/9500 [30:29:17<2:03:50, 12.26s/it]08/04/2024 04:26:47 - INFO - __main__ -   Step: 8894, LR: 1.3137215667255654e-06, Loss: 377.9732360839844
2024-08-04T11:26:59.559871240Z 
 94%|█████████▎| 8895/9500 [30:29:29<2:03:36, 12.26s/it]08/04/2024 04:26:59 - INFO - __main__ -   Step: 8895, LR: 1.3115510230382863e-06, Loss: 455.760009765625
2024-08-04T11:27:11.977438218Z 
 94%|█████████▎| 8896/9500 [30:29:41<2:03:52, 12.31s/it]08/04/2024 04:27:11 - INFO - __main__ -   Step: 8896, LR: 1.3093804793510076e-06, Loss: 493.3341369628906
2024-08-04T11:27:24.798461368Z 
 94%|█████████▎| 8897/9500 [30:29:54<2:05:13, 12.46s/it]08/04/2024 04:27:24 - INFO - __main__ -   Step: 8897, LR: 1.3072099356637286e-06, Loss: 400.6150207519531
2024-08-04T11:27:37.077733431Z 
 94%|█████████▎| 8898/9500 [30:30:07<2:04:28, 12.41s/it]08/04/2024 04:27:37 - INFO - __main__ -   Step: 8898, LR: 1.30503939197645e-06, Loss: 394.12408447265625
2024-08-04T11:27:49.309322203Z 
 94%|█████████▎| 8899/9500 [30:30:19<2:03:44, 12.35s/it]08/04/2024 04:27:49 - INFO - __main__ -   Step: 8899, LR: 1.3028688482891708e-06, Loss: 404.6964416503906
2024-08-04T11:28:01.762170814Z 
 94%|█████████▎| 8900/9500 [30:30:31<2:03:50, 12.38s/it]08/04/2024 04:28:01 - INFO - __main__ -   Step: 8900, LR: 1.300698304601892e-06, Loss: 349.853271484375
2024-08-04T11:28:14.052904900Z 
 94%|█████████▎| 8901/9500 [30:30:43<2:03:21, 12.36s/it]08/04/2024 04:28:14 - INFO - __main__ -   Step: 8901, LR: 1.2985277609146129e-06, Loss: 379.1722717285156
2024-08-04T11:28:26.156313060Z 
 94%|█████████▎| 8902/9500 [30:30:56<2:02:23, 12.28s/it]08/04/2024 04:28:26 - INFO - __main__ -   Step: 8902, LR: 1.296357217227334e-06, Loss: 380.2454528808594
2024-08-04T11:28:38.450489301Z 
 94%|█████████▎| 8903/9500 [30:31:08<2:02:13, 12.28s/it]08/04/2024 04:28:38 - INFO - __main__ -   Step: 8903, LR: 1.2941866735400552e-06, Loss: 341.96875
2024-08-04T11:28:50.891172631Z 
 94%|█████████▎| 8904/9500 [30:31:20<2:02:29, 12.33s/it]08/04/2024 04:28:50 - INFO - __main__ -   Step: 8904, LR: 1.292016129852776e-06, Loss: 426.40533447265625
2024-08-04T11:29:03.022462843Z 
 94%|█████████▎| 8905/9500 [30:31:32<2:01:41, 12.27s/it]08/04/2024 04:29:03 - INFO - __main__ -   Step: 8905, LR: 1.2898455861654974e-06, Loss: 557.7236938476562
2024-08-04T11:29:15.879842464Z 
 94%|█████████▎| 8906/9500 [30:31:45<2:03:13, 12.45s/it]08/04/2024 04:29:15 - INFO - __main__ -   Step: 8906, LR: 1.2876750424782184e-06, Loss: 499.9579162597656
2024-08-04T11:29:27.879382432Z 
 94%|█████████▍| 8907/9500 [30:31:57<2:01:41, 12.31s/it]08/04/2024 04:29:27 - INFO - __main__ -   Step: 8907, LR: 1.2855044987909393e-06, Loss: 353.3414611816406
2024-08-04T11:29:39.955502963Z 
 94%|█████████▍| 8908/9500 [30:32:09<2:00:47, 12.24s/it]08/04/2024 04:29:39 - INFO - __main__ -   Step: 8908, LR: 1.2833339551036606e-06, Loss: 333.3532409667969
2024-08-04T11:29:52.645852770Z 
 94%|█████████▍| 8909/9500 [30:32:22<2:01:54, 12.38s/it]08/04/2024 04:29:52 - INFO - __main__ -   Step: 8909, LR: 1.2811634114163815e-06, Loss: 472.95404052734375
2024-08-04T11:30:04.851509773Z 
 94%|█████████▍| 8910/9500 [30:32:34<2:01:11, 12.33s/it]08/04/2024 04:30:04 - INFO - __main__ -   Step: 8910, LR: 1.2789928677291027e-06, Loss: 402.37115478515625
2024-08-04T11:30:17.016351007Z 
 94%|█████████▍| 8911/9500 [30:32:46<2:00:31, 12.28s/it]08/04/2024 04:30:17 - INFO - __main__ -   Step: 8911, LR: 1.2768223240418236e-06, Loss: 349.0293273925781
2024-08-04T11:30:29.760461835Z 
 94%|█████████▍| 8912/9500 [30:32:59<2:01:41, 12.42s/it]08/04/2024 04:30:29 - INFO - __main__ -   Step: 8912, LR: 1.274651780354545e-06, Loss: 477.17376708984375
2024-08-04T11:30:42.277088051Z 
 94%|█████████▍| 8913/9500 [30:33:12<2:01:46, 12.45s/it]08/04/2024 04:30:42 - INFO - __main__ -   Step: 8913, LR: 1.2724812366672659e-06, Loss: 404.50439453125
2024-08-04T11:30:54.216527584Z 
 94%|█████████▍| 8914/9500 [30:33:24<2:00:04, 12.29s/it]08/04/2024 04:30:54 - INFO - __main__ -   Step: 8914, LR: 1.2703106929799868e-06, Loss: 366.47015380859375
2024-08-04T11:31:07.116037342Z 
 94%|█████████▍| 8915/9500 [30:33:37<2:01:38, 12.48s/it]08/04/2024 04:31:07 - INFO - __main__ -   Step: 8915, LR: 1.2681401492927082e-06, Loss: 500.4864807128906
2024-08-04T11:31:19.144810268Z 
 94%|█████████▍| 8916/9500 [30:33:49<2:00:07, 12.34s/it]08/04/2024 04:31:19 - INFO - __main__ -   Step: 8916, LR: 1.265969605605429e-06, Loss: 312.74859619140625
2024-08-04T11:31:31.342483660Z 
 94%|█████████▍| 8917/9500 [30:34:01<1:59:30, 12.30s/it]08/04/2024 04:31:31 - INFO - __main__ -   Step: 8917, LR: 1.2637990619181504e-06, Loss: 425.18255615234375
2024-08-04T11:31:43.740785057Z 
 94%|█████████▍| 8918/9500 [30:34:13<1:59:35, 12.33s/it]08/04/2024 04:31:43 - INFO - __main__ -   Step: 8918, LR: 1.2616285182308713e-06, Loss: 357.6097106933594
2024-08-04T11:31:55.822385176Z 
 94%|█████████▍| 8919/9500 [30:34:25<1:58:39, 12.25s/it]08/04/2024 04:31:55 - INFO - __main__ -   Step: 8919, LR: 1.2594579745435925e-06, Loss: 318.4201965332031
2024-08-04T11:32:08.126755841Z 
 94%|█████████▍| 8920/9500 [30:34:38<1:58:36, 12.27s/it]08/04/2024 04:32:08 - INFO - __main__ -   Step: 8920, LR: 1.2572874308563134e-06, Loss: 465.08966064453125
2024-08-04T11:32:20.670190513Z 
 94%|█████████▍| 8921/9500 [30:34:50<1:59:11, 12.35s/it]08/04/2024 04:32:20 - INFO - __main__ -   Step: 8921, LR: 1.2551168871690345e-06, Loss: 420.33154296875
2024-08-04T11:32:33.301479113Z 
 94%|█████████▍| 8922/9500 [30:35:03<1:59:47, 12.44s/it]08/04/2024 04:32:33 - INFO - __main__ -   Step: 8922, LR: 1.2529463434817557e-06, Loss: 303.61676025390625
2024-08-04T11:32:45.447430906Z 
 94%|█████████▍| 8923/9500 [30:35:15<1:58:45, 12.35s/it]08/04/2024 04:32:45 - INFO - __main__ -   Step: 8923, LR: 1.2507757997944766e-06, Loss: 415.30853271484375
2024-08-04T11:32:57.800343557Z 
 94%|█████████▍| 8924/9500 [30:35:27<1:58:33, 12.35s/it]08/04/2024 04:32:57 - INFO - __main__ -   Step: 8924, LR: 1.2486052561071977e-06, Loss: 344.1420593261719
2024-08-04T11:33:09.707927925Z 
 94%|█████████▍| 8925/9500 [30:35:39<1:57:04, 12.22s/it]08/04/2024 04:33:09 - INFO - __main__ -   Step: 8925, LR: 1.2464347124199189e-06, Loss: 356.9509582519531
2024-08-04T11:33:21.859315032Z 
 94%|█████████▍| 8926/9500 [30:35:51<1:56:41, 12.20s/it]08/04/2024 04:33:21 - INFO - __main__ -   Step: 8926, LR: 1.24426416873264e-06, Loss: 423.7371826171875
2024-08-04T11:33:34.136500979Z 
 94%|█████████▍| 8927/9500 [30:36:04<1:56:42, 12.22s/it]08/04/2024 04:33:34 - INFO - __main__ -   Step: 8927, LR: 1.2420936250453611e-06, Loss: 288.0721435546875
2024-08-04T11:33:46.167432820Z 
 94%|█████████▍| 8928/9500 [30:36:16<1:55:58, 12.16s/it]08/04/2024 04:33:46 - INFO - __main__ -   Step: 8928, LR: 1.239923081358082e-06, Loss: 499.1575927734375
2024-08-04T11:33:58.274224713Z 
 94%|█████████▍| 8929/9500 [30:36:28<1:55:35, 12.15s/it]08/04/2024 04:33:58 - INFO - __main__ -   Step: 8929, LR: 1.2377525376708032e-06, Loss: 389.17755126953125
2024-08-04T11:34:11.043921176Z 
 94%|█████████▍| 8930/9500 [30:36:40<1:57:10, 12.33s/it]08/04/2024 04:34:11 - INFO - __main__ -   Step: 8930, LR: 1.2355819939835243e-06, Loss: 499.6396789550781
2024-08-04T11:34:23.208888149Z 
 94%|█████████▍| 8931/9500 [30:36:53<1:56:29, 12.28s/it]08/04/2024 04:34:23 - INFO - __main__ -   Step: 8931, LR: 1.2334114502962453e-06, Loss: 465.19915771484375
2024-08-04T11:34:35.554679301Z 
 94%|█████████▍| 8932/9500 [30:37:05<1:56:27, 12.30s/it]08/04/2024 04:34:35 - INFO - __main__ -   Step: 8932, LR: 1.2312409066089664e-06, Loss: 398.8740234375
2024-08-04T11:34:48.305049564Z 
 94%|█████████▍| 8933/9500 [30:37:18<1:57:31, 12.44s/it]08/04/2024 04:34:48 - INFO - __main__ -   Step: 8933, LR: 1.2290703629216875e-06, Loss: 513.0227661132812
2024-08-04T11:35:00.820387901Z 
 94%|█████████▍| 8934/9500 [30:37:30<1:57:32, 12.46s/it]08/04/2024 04:35:00 - INFO - __main__ -   Step: 8934, LR: 1.2268998192344087e-06, Loss: 373.7591857910156
2024-08-04T11:35:12.937878576Z 
 94%|█████████▍| 8935/9500 [30:37:42<1:56:21, 12.36s/it]08/04/2024 04:35:12 - INFO - __main__ -   Step: 8935, LR: 1.2247292755471298e-06, Loss: 380.30670166015625
2024-08-04T11:35:25.621276318Z 
 94%|█████████▍| 8936/9500 [30:37:55<1:57:04, 12.46s/it]08/04/2024 04:35:25 - INFO - __main__ -   Step: 8936, LR: 1.222558731859851e-06, Loss: 393.30047607421875
2024-08-04T11:35:37.800743449Z 
 94%|█████████▍| 8937/9500 [30:38:07<1:56:05, 12.37s/it]08/04/2024 04:35:37 - INFO - __main__ -   Step: 8937, LR: 1.2203881881725719e-06, Loss: 338.24505615234375
2024-08-04T11:35:50.169644798Z 
 94%|█████████▍| 8938/9500 [30:38:20<1:55:52, 12.37s/it]08/04/2024 04:35:50 - INFO - __main__ -   Step: 8938, LR: 1.218217644485293e-06, Loss: 435.3964538574219
2024-08-04T11:36:02.324344194Z 
 94%|█████████▍| 8939/9500 [30:38:32<1:55:03, 12.31s/it]08/04/2024 04:36:02 - INFO - __main__ -   Step: 8939, LR: 1.216047100798014e-06, Loss: 354.8533935546875
2024-08-04T11:36:14.649045826Z 
 94%|█████████▍| 8940/9500 [30:38:44<1:54:54, 12.31s/it]08/04/2024 04:36:14 - INFO - __main__ -   Step: 8940, LR: 1.213876557110735e-06, Loss: 322.0811767578125
2024-08-04T11:36:27.080963218Z 
 94%|█████████▍| 8941/9500 [30:38:57<1:55:02, 12.35s/it]08/04/2024 04:36:27 - INFO - __main__ -   Step: 8941, LR: 1.2117060134234562e-06, Loss: 467.72509765625
2024-08-04T11:36:39.839855780Z 
 94%|█████████▍| 8942/9500 [30:39:09<1:55:58, 12.47s/it]08/04/2024 04:36:39 - INFO - __main__ -   Step: 8942, LR: 1.2095354697361773e-06, Loss: 463.702880859375
2024-08-04T11:36:52.320282354Z 
 94%|█████████▍| 8943/9500 [30:39:22<1:55:48, 12.47s/it]08/04/2024 04:36:52 - INFO - __main__ -   Step: 8943, LR: 1.2073649260488985e-06, Loss: 458.55792236328125
2024-08-04T11:37:04.808044854Z 
 94%|█████████▍| 8944/9500 [30:39:34<1:55:37, 12.48s/it]08/04/2024 04:37:04 - INFO - __main__ -   Step: 8944, LR: 1.2051943823616194e-06, Loss: 423.8740234375
2024-08-04T11:37:17.108095935Z 
 94%|█████████▍| 8945/9500 [30:39:47<1:54:55, 12.42s/it]08/04/2024 04:37:17 - INFO - __main__ -   Step: 8945, LR: 1.2030238386743405e-06, Loss: 447.483642578125
2024-08-04T11:37:29.721968535Z 
 94%|█████████▍| 8946/9500 [30:39:59<1:55:14, 12.48s/it]08/04/2024 04:37:29 - INFO - __main__ -   Step: 8946, LR: 1.2008532949870617e-06, Loss: 275.5512390136719
2024-08-04T11:37:42.095241876Z 
 94%|█████████▍| 8947/9500 [30:40:12<1:54:44, 12.45s/it]08/04/2024 04:37:42 - INFO - __main__ -   Step: 8947, LR: 1.1986827512997826e-06, Loss: 485.0401306152344
2024-08-04T11:37:54.461729868Z 
 94%|█████████▍| 8948/9500 [30:40:24<1:54:18, 12.42s/it]08/04/2024 04:37:54 - INFO - __main__ -   Step: 8948, LR: 1.1965122076125037e-06, Loss: 520.4857788085938
2024-08-04T11:38:07.164640429Z 
 94%|█████████▍| 8949/9500 [30:40:37<1:54:51, 12.51s/it]08/04/2024 04:38:07 - INFO - __main__ -   Step: 8949, LR: 1.1943416639252249e-06, Loss: 425.244873046875
2024-08-04T11:38:19.134569705Z 
 94%|█████████▍| 8950/9500 [30:40:49<1:53:10, 12.35s/it]08/04/2024 04:38:19 - INFO - __main__ -   Step: 8950, LR: 1.192171120237946e-06, Loss: 392.3376159667969
2024-08-04T11:38:31.186359371Z 
 94%|█████████▍| 8951/9500 [30:41:01<1:52:09, 12.26s/it]08/04/2024 04:38:31 - INFO - __main__ -   Step: 8951, LR: 1.190000576550667e-06, Loss: 446.828125
2024-08-04T11:38:43.752775805Z 
 94%|█████████▍| 8952/9500 [30:41:13<1:52:48, 12.35s/it]08/04/2024 04:38:43 - INFO - __main__ -   Step: 8952, LR: 1.187830032863388e-06, Loss: 375.651611328125
2024-08-04T11:38:56.085086608Z 
 94%|█████████▍| 8953/9500 [30:41:26<1:52:32, 12.35s/it]08/04/2024 04:38:56 - INFO - __main__ -   Step: 8953, LR: 1.1856594891761092e-06, Loss: 363.1976318359375
2024-08-04T11:39:08.330935534Z 
 94%|█████████▍| 8954/9500 [30:41:38<1:52:04, 12.32s/it]08/04/2024 04:39:08 - INFO - __main__ -   Step: 8954, LR: 1.1834889454888303e-06, Loss: 398.35888671875
2024-08-04T11:39:20.941754448Z 
 94%|█████████▍| 8955/9500 [30:41:50<1:52:40, 12.40s/it]08/04/2024 04:39:20 - INFO - __main__ -   Step: 8955, LR: 1.1813184018015515e-06, Loss: 402.7587585449219
2024-08-04T11:39:33.087364841Z 
 94%|█████████▍| 8956/9500 [30:42:03<1:51:45, 12.33s/it]08/04/2024 04:39:33 - INFO - __main__ -   Step: 8956, LR: 1.1791478581142724e-06, Loss: 356.0005187988281
2024-08-04T11:39:45.417346028Z 
 94%|█████████▍| 8957/9500 [30:42:15<1:51:33, 12.33s/it]08/04/2024 04:39:45 - INFO - __main__ -   Step: 8957, LR: 1.1769773144269935e-06, Loss: 442.19769287109375
2024-08-04T11:39:57.842743556Z 
 94%|█████████▍| 8958/9500 [30:42:27<1:51:37, 12.36s/it]08/04/2024 04:39:57 - INFO - __main__ -   Step: 8958, LR: 1.1748067707397145e-06, Loss: 390.9805908203125
2024-08-04T11:40:10.309657029Z 
 94%|█████████▍| 8959/9500 [30:42:40<1:51:42, 12.39s/it]08/04/2024 04:40:10 - INFO - __main__ -   Step: 8959, LR: 1.1726362270524356e-06, Loss: 393.3755798339844
2024-08-04T11:40:22.449414645Z 
 94%|█████████▍| 8960/9500 [30:42:52<1:50:50, 12.31s/it]08/04/2024 04:40:22 - INFO - __main__ -   Step: 8960, LR: 1.1704656833651567e-06, Loss: 400.43951416015625
2024-08-04T11:40:34.889427742Z 
 94%|█████████▍| 8961/9500 [30:43:04<1:50:57, 12.35s/it]08/04/2024 04:40:34 - INFO - __main__ -   Step: 8961, LR: 1.1682951396778779e-06, Loss: 400.24798583984375
2024-08-04T11:40:47.040845116Z 
 94%|█████████▍| 8962/9500 [30:43:16<1:50:13, 12.29s/it]08/04/2024 04:40:47 - INFO - __main__ -   Step: 8962, LR: 1.166124595990599e-06, Loss: 520.970458984375
2024-08-04T11:40:59.093729950Z 
 94%|█████████▍| 8963/9500 [30:43:29<1:49:22, 12.22s/it]08/04/2024 04:40:59 - INFO - __main__ -   Step: 8963, LR: 1.1639540523033201e-06, Loss: 278.85076904296875
2024-08-04T11:41:11.529686080Z 
 94%|█████████▍| 8964/9500 [30:43:41<1:49:44, 12.28s/it]08/04/2024 04:41:11 - INFO - __main__ -   Step: 8964, LR: 1.161783508616041e-06, Loss: 309.45697021484375
2024-08-04T11:41:23.606530031Z 
 94%|█████████▍| 8965/9500 [30:43:53<1:48:59, 12.22s/it]08/04/2024 04:41:23 - INFO - __main__ -   Step: 8965, LR: 1.1596129649287622e-06, Loss: 283.77978515625
2024-08-04T11:41:35.810354013Z 
 94%|█████████▍| 8966/9500 [30:44:05<1:48:43, 12.22s/it]08/04/2024 04:41:35 - INFO - __main__ -   Step: 8966, LR: 1.1574424212414833e-06, Loss: 467.8898620605469
2024-08-04T11:41:48.356300188Z 
 94%|█████████▍| 8967/9500 [30:44:18<1:49:24, 12.32s/it]08/04/2024 04:41:48 - INFO - __main__ -   Step: 8967, LR: 1.1552718775542043e-06, Loss: 509.1458740234375
2024-08-04T11:42:01.094365570Z 
 94%|█████████▍| 8968/9500 [30:44:31<1:50:19, 12.44s/it]08/04/2024 04:42:01 - INFO - __main__ -   Step: 8968, LR: 1.1531013338669254e-06, Loss: 397.96075439453125
2024-08-04T11:42:13.251180479Z 
 94%|█████████▍| 8969/9500 [30:44:43<1:49:21, 12.36s/it]08/04/2024 04:42:13 - INFO - __main__ -   Step: 8969, LR: 1.1509307901796465e-06, Loss: 369.4256591796875
2024-08-04T11:42:25.876936435Z 
 94%|█████████▍| 8970/9500 [30:44:55<1:49:51, 12.44s/it]08/04/2024 04:42:25 - INFO - __main__ -   Step: 8970, LR: 1.1487602464923677e-06, Loss: 442.6873779296875
2024-08-04T11:42:37.821686787Z 
 94%|█████████▍| 8971/9500 [30:45:07<1:48:21, 12.29s/it]08/04/2024 04:42:37 - INFO - __main__ -   Step: 8971, LR: 1.1465897028050886e-06, Loss: 347.581787109375
2024-08-04T11:42:50.118664051Z 
 94%|█████████▍| 8972/9500 [30:45:20<1:48:10, 12.29s/it]08/04/2024 04:42:50 - INFO - __main__ -   Step: 8972, LR: 1.1444191591178097e-06, Loss: 444.4443359375
2024-08-04T11:43:02.432857583Z 
 94%|█████████▍| 8973/9500 [30:45:32<1:48:01, 12.30s/it]08/04/2024 04:43:02 - INFO - __main__ -   Step: 8973, LR: 1.1422486154305309e-06, Loss: 416.04241943359375
2024-08-04T11:43:14.467623922Z 
 94%|█████████▍| 8974/9500 [30:45:44<1:47:07, 12.22s/it]08/04/2024 04:43:14 - INFO - __main__ -   Step: 8974, LR: 1.140078071743252e-06, Loss: 305.7710266113281
2024-08-04T11:43:26.595372692Z 
 94%|█████████▍| 8975/9500 [30:45:56<1:46:40, 12.19s/it]08/04/2024 04:43:26 - INFO - __main__ -   Step: 8975, LR: 1.137907528055973e-06, Loss: 414.8862609863281
2024-08-04T11:43:39.196471412Z 
 94%|█████████▍| 8976/9500 [30:46:09<1:47:32, 12.31s/it]08/04/2024 04:43:39 - INFO - __main__ -   Step: 8976, LR: 1.135736984368694e-06, Loss: 297.1334228515625
2024-08-04T11:43:51.291951809Z 
 94%|█████████▍| 8977/9500 [30:46:21<1:46:46, 12.25s/it]08/04/2024 04:43:51 - INFO - __main__ -   Step: 8977, LR: 1.1335664406814152e-06, Loss: 414.8946533203125
2024-08-04T11:44:03.762519924Z 
 95%|█████████▍| 8978/9500 [30:46:33<1:47:08, 12.32s/it]08/04/2024 04:44:03 - INFO - __main__ -   Step: 8978, LR: 1.1313958969941361e-06, Loss: 378.9918212890625
2024-08-04T11:44:16.515925147Z 
 95%|█████████▍| 8979/9500 [30:46:46<1:48:04, 12.45s/it]08/04/2024 04:44:16 - INFO - __main__ -   Step: 8979, LR: 1.1292253533068573e-06, Loss: 342.1760559082031
2024-08-04T11:44:28.922289356Z 
 95%|█████████▍| 8980/9500 [30:46:58<1:47:46, 12.43s/it]08/04/2024 04:44:28 - INFO - __main__ -   Step: 8980, LR: 1.1270548096195784e-06, Loss: 432.9390869140625
2024-08-04T11:44:41.426084367Z 
 95%|█████████▍| 8981/9500 [30:47:11<1:47:44, 12.46s/it]08/04/2024 04:44:41 - INFO - __main__ -   Step: 8981, LR: 1.1248842659322995e-06, Loss: 438.0438537597656
2024-08-04T11:44:53.669583865Z 
 95%|█████████▍| 8982/9500 [30:47:23<1:46:58, 12.39s/it]08/04/2024 04:44:53 - INFO - __main__ -   Step: 8982, LR: 1.1227137222450207e-06, Loss: 360.0867919921875
2024-08-04T11:45:06.147387412Z 
 95%|█████████▍| 8983/9500 [30:47:36<1:46:59, 12.42s/it]08/04/2024 04:45:06 - INFO - __main__ -   Step: 8983, LR: 1.1205431785577416e-06, Loss: 328.75799560546875
2024-08-04T11:45:18.365689288Z 
 95%|█████████▍| 8984/9500 [30:47:48<1:46:16, 12.36s/it]08/04/2024 04:45:18 - INFO - __main__ -   Step: 8984, LR: 1.1183726348704627e-06, Loss: 427.50103759765625
2024-08-04T11:45:30.384714245Z 
 95%|█████████▍| 8985/9500 [30:48:00<1:45:11, 12.26s/it]08/04/2024 04:45:30 - INFO - __main__ -   Step: 8985, LR: 1.1162020911831839e-06, Loss: 426.60125732421875
2024-08-04T11:45:43.082495811Z 
 95%|█████████▍| 8986/9500 [30:48:13<1:46:07, 12.39s/it]08/04/2024 04:45:43 - INFO - __main__ -   Step: 8986, LR: 1.1140315474959048e-06, Loss: 499.13909912109375
2024-08-04T11:45:55.430814073Z 
 95%|█████████▍| 8987/9500 [30:48:25<1:45:49, 12.38s/it]08/04/2024 04:45:55 - INFO - __main__ -   Step: 8987, LR: 1.111861003808626e-06, Loss: 359.6881408691406
2024-08-04T11:46:07.637977155Z 
 95%|█████████▍| 8988/9500 [30:48:37<1:45:10, 12.33s/it]08/04/2024 04:46:07 - INFO - __main__ -   Step: 8988, LR: 1.109690460121347e-06, Loss: 339.608642578125
2024-08-04T11:46:20.163353365Z 
 95%|█████████▍| 8989/9500 [30:48:50<1:45:29, 12.39s/it]08/04/2024 04:46:20 - INFO - __main__ -   Step: 8989, LR: 1.1075199164340682e-06, Loss: 315.21893310546875
2024-08-04T11:46:32.228727708Z 
 95%|█████████▍| 8990/9500 [30:49:02<1:44:27, 12.29s/it]08/04/2024 04:46:32 - INFO - __main__ -   Step: 8990, LR: 1.1053493727467893e-06, Loss: 417.9156799316406
2024-08-04T11:46:44.276525355Z 
 95%|█████████▍| 8991/9500 [30:49:14<1:43:38, 12.22s/it]08/04/2024 04:46:44 - INFO - __main__ -   Step: 8991, LR: 1.1031788290595103e-06, Loss: 297.82916259765625
2024-08-04T11:46:56.872060029Z 
 95%|█████████▍| 8992/9500 [30:49:26<1:44:23, 12.33s/it]08/04/2024 04:46:56 - INFO - __main__ -   Step: 8992, LR: 1.1010082853722314e-06, Loss: 416.68572998046875
2024-08-04T11:47:09.504832831Z 
 95%|█████████▍| 8993/9500 [30:49:39<1:44:57, 12.42s/it]08/04/2024 04:47:09 - INFO - __main__ -   Step: 8993, LR: 1.0988377416849525e-06, Loss: 320.351806640625
2024-08-04T11:47:21.420630171Z 
 95%|█████████▍| 8994/9500 [30:49:51<1:43:28, 12.27s/it]08/04/2024 04:47:21 - INFO - __main__ -   Step: 8994, LR: 1.0966671979976735e-06, Loss: 304.5455017089844
2024-08-04T11:47:33.825301801Z 
 95%|█████████▍| 8995/9500 [30:50:03<1:43:36, 12.31s/it]08/04/2024 04:47:33 - INFO - __main__ -   Step: 8995, LR: 1.0944966543103946e-06, Loss: 349.3740234375
2024-08-04T11:47:45.954188020Z 
 95%|█████████▍| 8996/9500 [30:50:15<1:42:56, 12.26s/it]08/04/2024 04:47:45 - INFO - __main__ -   Step: 8996, LR: 1.0923261106231157e-06, Loss: 399.4034118652344
2024-08-04T11:47:58.169740115Z 
 95%|█████████▍| 8997/9500 [30:50:28<1:42:38, 12.24s/it]08/04/2024 04:47:58 - INFO - __main__ -   Step: 8997, LR: 1.0901555669358369e-06, Loss: 419.773681640625
2024-08-04T11:48:10.605249434Z 
 95%|█████████▍| 8998/9500 [30:50:40<1:42:55, 12.30s/it]08/04/2024 04:48:10 - INFO - __main__ -   Step: 8998, LR: 1.0879850232485578e-06, Loss: 464.92144775390625
2024-08-04T11:48:23.026271626Z 
 95%|█████████▍| 8999/9500 [30:50:52<1:43:00, 12.34s/it]08/04/2024 04:48:23 - INFO - __main__ -   Step: 8999, LR: 1.085814479561279e-06, Loss: 411.55126953125
2024-08-04T11:48:35.381988492Z 
 95%|█████████▍| 9000/9500 [30:51:05<1:42:51, 12.34s/it]08/04/2024 04:48:35 - INFO - __main__ -   Step: 9000, LR: 1.083643935874e-06, Loss: 369.7076416015625
2024-08-04T11:48:47.616566961Z 
 95%|█████████▍| 9001/9500 [30:51:17<1:42:22, 12.31s/it]08/04/2024 04:48:47 - INFO - __main__ -   Step: 9001, LR: 1.0814733921867212e-06, Loss: 372.5330810546875
2024-08-04T11:48:59.955178093Z 
 95%|█████████▍| 9002/9500 [30:51:29<1:42:14, 12.32s/it]08/04/2024 04:48:59 - INFO - __main__ -   Step: 9002, LR: 1.0793028484994423e-06, Loss: 384.73193359375
2024-08-04T11:49:12.026954712Z 
 95%|█████████▍| 9003/9500 [30:51:41<1:41:25, 12.24s/it]08/04/2024 04:49:12 - INFO - __main__ -   Step: 9003, LR: 1.0771323048121633e-06, Loss: 450.43511962890625
2024-08-04T11:49:24.641529022Z 
 95%|█████████▍| 9004/9500 [30:51:54<1:42:08, 12.36s/it]08/04/2024 04:49:24 - INFO - __main__ -   Step: 9004, LR: 1.0749617611248844e-06, Loss: 397.69464111328125
2024-08-04T11:49:37.007954927Z 
 95%|█████████▍| 9005/9500 [30:52:06<1:41:57, 12.36s/it]08/04/2024 04:49:37 - INFO - __main__ -   Step: 9005, LR: 1.0727912174376053e-06, Loss: 376.417236328125
2024-08-04T11:49:49.344718289Z 
 95%|█████████▍| 9006/9500 [30:52:19<1:41:41, 12.35s/it]08/04/2024 04:49:49 - INFO - __main__ -   Step: 9006, LR: 1.0706206737503264e-06, Loss: 482.34576416015625
2024-08-04T11:50:02.300640072Z 
 95%|█████████▍| 9007/9500 [30:52:32<1:42:58, 12.53s/it]08/04/2024 04:50:02 - INFO - __main__ -   Step: 9007, LR: 1.0684501300630476e-06, Loss: 476.303466796875
2024-08-04T11:50:14.548744597Z 
 95%|█████████▍| 9008/9500 [30:52:44<1:42:04, 12.45s/it]08/04/2024 04:50:14 - INFO - __main__ -   Step: 9008, LR: 1.0662795863757687e-06, Loss: 413.2004699707031
2024-08-04T11:50:26.710344807Z 
 95%|█████████▍| 9009/9500 [30:52:56<1:41:09, 12.36s/it]08/04/2024 04:50:26 - INFO - __main__ -   Step: 9009, LR: 1.0641090426884899e-06, Loss: 473.6551818847656
2024-08-04T11:50:39.070138246Z 
 95%|█████████▍| 9010/9500 [30:53:09<1:40:57, 12.36s/it]08/04/2024 04:50:39 - INFO - __main__ -   Step: 9010, LR: 1.061938499001211e-06, Loss: 409.67523193359375
2024-08-04T11:50:51.490594873Z 
 95%|█████████▍| 9011/9500 [30:53:21<1:40:53, 12.38s/it]08/04/2024 04:50:51 - INFO - __main__ -   Step: 9011, LR: 1.059767955313932e-06, Loss: 374.8984375
2024-08-04T11:51:03.480131933Z 
 95%|█████████▍| 9012/9500 [30:53:33<1:39:43, 12.26s/it]08/04/2024 04:51:03 - INFO - __main__ -   Step: 9012, LR: 1.057597411626653e-06, Loss: 296.3704833984375
2024-08-04T11:51:16.140128694Z 
 95%|█████████▍| 9013/9500 [30:53:46<1:40:29, 12.38s/it]08/04/2024 04:51:16 - INFO - __main__ -   Step: 9013, LR: 1.055426867939374e-06, Loss: 449.91827392578125
2024-08-04T11:51:28.663196941Z 
 95%|█████████▍| 9014/9500 [30:53:58<1:40:38, 12.42s/it]08/04/2024 04:51:28 - INFO - __main__ -   Step: 9014, LR: 1.0532563242520951e-06, Loss: 296.966796875
2024-08-04T11:51:41.043854103Z 
 95%|█████████▍| 9015/9500 [30:54:10<1:40:19, 12.41s/it]08/04/2024 04:51:41 - INFO - __main__ -   Step: 9015, LR: 1.0510857805648162e-06, Loss: 352.2103271484375
2024-08-04T11:51:53.505788353Z 
 95%|█████████▍| 9016/9500 [30:54:23<1:40:14, 12.43s/it]08/04/2024 04:51:53 - INFO - __main__ -   Step: 9016, LR: 1.0489152368775374e-06, Loss: 392.1947326660156
2024-08-04T11:52:05.590952131Z 
 95%|█████████▍| 9017/9500 [30:54:35<1:39:12, 12.32s/it]08/04/2024 04:52:05 - INFO - __main__ -   Step: 9017, LR: 1.0467446931902585e-06, Loss: 387.4005126953125
2024-08-04T11:52:17.721235406Z 
 95%|█████████▍| 9018/9500 [30:54:47<1:38:32, 12.27s/it]08/04/2024 04:52:17 - INFO - __main__ -   Step: 9018, LR: 1.0445741495029794e-06, Loss: 419.36669921875
2024-08-04T11:52:30.542136111Z 
 95%|█████████▍| 9019/9500 [30:55:00<1:39:39, 12.43s/it]08/04/2024 04:52:30 - INFO - __main__ -   Step: 9019, LR: 1.0424036058157006e-06, Loss: 437.42584228515625
2024-08-04T11:52:42.478280041Z 
 95%|█████████▍| 9020/9500 [30:55:12<1:38:16, 12.28s/it]08/04/2024 04:52:42 - INFO - __main__ -   Step: 9020, LR: 1.0402330621284217e-06, Loss: 433.95330810546875
2024-08-04T11:52:54.866893748Z 
 95%|█████████▍| 9021/9500 [30:55:24<1:38:18, 12.32s/it]08/04/2024 04:52:54 - INFO - __main__ -   Step: 9021, LR: 1.0380625184411429e-06, Loss: 554.6151123046875
2024-08-04T11:53:07.372052609Z 
 95%|█████████▍| 9022/9500 [30:55:37<1:38:33, 12.37s/it]08/04/2024 04:53:07 - INFO - __main__ -   Step: 9022, LR: 1.0358919747538638e-06, Loss: 406.6705322265625
2024-08-04T11:53:19.413717872Z 
 95%|█████████▍| 9023/9500 [30:55:49<1:37:34, 12.27s/it]08/04/2024 04:53:19 - INFO - __main__ -   Step: 9023, LR: 1.033721431066585e-06, Loss: 353.7583312988281
2024-08-04T11:53:31.726876207Z 
 95%|█████████▍| 9024/9500 [30:56:01<1:37:27, 12.29s/it]08/04/2024 04:53:31 - INFO - __main__ -   Step: 9024, LR: 1.0315508873793058e-06, Loss: 344.6258239746094
2024-08-04T11:53:43.504920243Z 
 95%|█████████▌| 9025/9500 [30:56:13<1:36:03, 12.13s/it]08/04/2024 04:53:43 - INFO - __main__ -   Step: 9025, LR: 1.029380343692027e-06, Loss: 335.4991455078125
2024-08-04T11:53:55.955323283Z 
 95%|█████████▌| 9026/9500 [30:56:25<1:36:36, 12.23s/it]08/04/2024 04:53:55 - INFO - __main__ -   Step: 9026, LR: 1.0272098000047481e-06, Loss: 317.2845458984375
2024-08-04T11:54:08.317285138Z 
 95%|█████████▌| 9027/9500 [30:56:38<1:36:42, 12.27s/it]08/04/2024 04:54:08 - INFO - __main__ -   Step: 9027, LR: 1.0250392563174692e-06, Loss: 388.486572265625
2024-08-04T11:54:20.397569024Z 
 95%|█████████▌| 9028/9500 [30:56:50<1:36:04, 12.21s/it]08/04/2024 04:54:20 - INFO - __main__ -   Step: 9028, LR: 1.0228687126301904e-06, Loss: 355.59332275390625
2024-08-04T11:54:33.048027277Z 
 95%|█████████▌| 9029/9500 [30:57:02<1:36:53, 12.34s/it]08/04/2024 04:54:33 - INFO - __main__ -   Step: 9029, LR: 1.0206981689429115e-06, Loss: 503.49090576171875
2024-08-04T11:54:45.291944551Z 
 95%|█████████▌| 9030/9500 [30:57:15<1:36:27, 12.31s/it]08/04/2024 04:54:45 - INFO - __main__ -   Step: 9030, LR: 1.0185276252556324e-06, Loss: 333.225341796875
2024-08-04T11:54:57.332684984Z 
 95%|█████████▌| 9031/9500 [30:57:27<1:35:36, 12.23s/it]08/04/2024 04:54:57 - INFO - __main__ -   Step: 9031, LR: 1.0163570815683536e-06, Loss: 430.49407958984375
2024-08-04T11:55:09.707529875Z 
 95%|█████████▌| 9032/9500 [30:57:39<1:35:44, 12.27s/it]08/04/2024 04:55:09 - INFO - __main__ -   Step: 9032, LR: 1.0141865378810745e-06, Loss: 380.5838623046875
2024-08-04T11:55:21.978534241Z 
 95%|█████████▌| 9033/9500 [30:57:51<1:35:31, 12.27s/it]08/04/2024 04:55:21 - INFO - __main__ -   Step: 9033, LR: 1.0120159941937956e-06, Loss: 337.52154541015625
2024-08-04T11:55:34.173051098Z 
 95%|█████████▌| 9034/9500 [30:58:04<1:35:08, 12.25s/it]08/04/2024 04:55:34 - INFO - __main__ -   Step: 9034, LR: 1.0098454505065168e-06, Loss: 403.11602783203125
2024-08-04T11:55:46.597777449Z 
 95%|█████████▌| 9035/9500 [30:58:16<1:35:20, 12.30s/it]08/04/2024 04:55:46 - INFO - __main__ -   Step: 9035, LR: 1.007674906819238e-06, Loss: 311.3194885253906
2024-08-04T11:55:59.008515812Z 
 95%|█████████▌| 9036/9500 [30:58:28<1:35:23, 12.33s/it]08/04/2024 04:55:59 - INFO - __main__ -   Step: 9036, LR: 1.005504363131959e-06, Loss: 356.98248291015625
2024-08-04T11:56:10.863636703Z 
 95%|█████████▌| 9037/9500 [30:58:40<1:34:04, 12.19s/it]08/04/2024 04:56:10 - INFO - __main__ -   Step: 9037, LR: 1.0033338194446802e-06, Loss: 324.6769714355469
2024-08-04T11:56:23.270125832Z 
 95%|█████████▌| 9038/9500 [30:58:53<1:34:22, 12.26s/it]08/04/2024 04:56:23 - INFO - __main__ -   Step: 9038, LR: 1.0011632757574011e-06, Loss: 411.992919921875
2024-08-04T11:56:35.372743928Z 
 95%|█████████▌| 9039/9500 [30:59:05<1:33:48, 12.21s/it]08/04/2024 04:56:35 - INFO - __main__ -   Step: 9039, LR: 9.989927320701222e-07, Loss: 374.766845703125
2024-08-04T11:56:47.344064750Z 
 95%|█████████▌| 9040/9500 [30:59:17<1:33:03, 12.14s/it]08/04/2024 04:56:47 - INFO - __main__ -   Step: 9040, LR: 9.968221883828434e-07, Loss: 372.6452331542969
2024-08-04T11:57:00.019200145Z 
 95%|█████████▌| 9041/9500 [30:59:29<1:34:05, 12.30s/it]08/04/2024 04:57:00 - INFO - __main__ -   Step: 9041, LR: 9.946516446955643e-07, Loss: 394.87152099609375
2024-08-04T11:57:12.342586017Z 
 95%|█████████▌| 9042/9500 [30:59:42<1:33:56, 12.31s/it]08/04/2024 04:57:12 - INFO - __main__ -   Step: 9042, LR: 9.924811010082854e-07, Loss: 375.1981201171875
2024-08-04T11:57:24.536472943Z 
 95%|█████████▌| 9043/9500 [30:59:54<1:33:28, 12.27s/it]08/04/2024 04:57:24 - INFO - __main__ -   Step: 9043, LR: 9.903105573210066e-07, Loss: 490.9160461425781
2024-08-04T11:57:36.967939049Z 
 95%|█████████▌| 9044/9500 [31:00:06<1:33:38, 12.32s/it]08/04/2024 04:57:36 - INFO - __main__ -   Step: 9044, LR: 9.881400136337275e-07, Loss: 373.0539855957031
2024-08-04T11:57:49.146200980Z 
 95%|█████████▌| 9045/9500 [31:00:19<1:33:06, 12.28s/it]08/04/2024 04:57:49 - INFO - __main__ -   Step: 9045, LR: 9.859694699464486e-07, Loss: 517.06494140625
2024-08-04T11:58:02.136042449Z 
 95%|█████████▌| 9046/9500 [31:00:32<1:34:31, 12.49s/it]08/04/2024 04:58:02 - INFO - __main__ -   Step: 9046, LR: 9.837989262591698e-07, Loss: 476.76043701171875
2024-08-04T11:58:14.636851763Z 
 95%|█████████▌| 9047/9500 [31:00:44<1:34:19, 12.49s/it]08/04/2024 04:58:14 - INFO - __main__ -   Step: 9047, LR: 9.81628382571891e-07, Loss: 455.5130615234375
2024-08-04T11:58:27.100446215Z 
 95%|█████████▌| 9048/9500 [31:00:57<1:34:03, 12.49s/it]08/04/2024 04:58:27 - INFO - __main__ -   Step: 9048, LR: 9.79457838884612e-07, Loss: 344.08941650390625
2024-08-04T11:58:39.300613669Z 
 95%|█████████▌| 9049/9500 [31:01:09<1:33:12, 12.40s/it]08/04/2024 04:58:39 - INFO - __main__ -   Step: 9049, LR: 9.77287295197333e-07, Loss: 339.65826416015625
2024-08-04T11:58:51.925086531Z 
 95%|█████████▌| 9050/9500 [31:01:21<1:33:30, 12.47s/it]08/04/2024 04:58:51 - INFO - __main__ -   Step: 9050, LR: 9.75116751510054e-07, Loss: 304.8016662597656
2024-08-04T11:59:04.064262688Z 
 95%|█████████▌| 9051/9500 [31:01:34<1:32:33, 12.37s/it]08/04/2024 04:59:04 - INFO - __main__ -   Step: 9051, LR: 9.72946207822775e-07, Loss: 434.22589111328125
2024-08-04T11:59:16.291154673Z 
 95%|█████████▌| 9052/9500 [31:01:46<1:32:02, 12.33s/it]08/04/2024 04:59:16 - INFO - __main__ -   Step: 9052, LR: 9.707756641354962e-07, Loss: 327.9342041015625
2024-08-04T11:59:28.731150951Z 
 95%|█████████▌| 9053/9500 [31:01:58<1:32:05, 12.36s/it]08/04/2024 04:59:28 - INFO - __main__ -   Step: 9053, LR: 9.686051204482173e-07, Loss: 413.3887939453125
2024-08-04T11:59:40.594776409Z 
 95%|█████████▌| 9054/9500 [31:02:10<1:30:46, 12.21s/it]08/04/2024 04:59:40 - INFO - __main__ -   Step: 9054, LR: 9.664345767609384e-07, Loss: 316.2676696777344
2024-08-04T11:59:53.510597174Z 
 95%|█████████▌| 9055/9500 [31:02:23<1:32:08, 12.42s/it]08/04/2024 04:59:53 - INFO - __main__ -   Step: 9055, LR: 9.642640330736596e-07, Loss: 366.892822265625
2024-08-04T12:00:06.219164413Z 
 95%|█████████▌| 9056/9500 [31:02:36<1:32:33, 12.51s/it]08/04/2024 05:00:06 - INFO - __main__ -   Step: 9056, LR: 9.620934893863807e-07, Loss: 486.59197998046875
2024-08-04T12:00:18.488701547Z 
 95%|█████████▌| 9057/9500 [31:02:48<1:31:49, 12.44s/it]08/04/2024 05:00:18 - INFO - __main__ -   Step: 9057, LR: 9.599229456991018e-07, Loss: 402.97003173828125
2024-08-04T12:00:31.016440281Z 
 95%|█████████▌| 9058/9500 [31:03:00<1:31:49, 12.46s/it]08/04/2024 05:00:31 - INFO - __main__ -   Step: 9058, LR: 9.577524020118228e-07, Loss: 505.84307861328125
2024-08-04T12:00:43.577450030Z 
 95%|█████████▌| 9059/9500 [31:03:13<1:31:49, 12.49s/it]08/04/2024 05:00:43 - INFO - __main__ -   Step: 9059, LR: 9.55581858324544e-07, Loss: 341.9083251953125
2024-08-04T12:00:55.650799533Z 
 95%|█████████▌| 9060/9500 [31:03:25<1:30:41, 12.37s/it]08/04/2024 05:00:55 - INFO - __main__ -   Step: 9060, LR: 9.534113146372649e-07, Loss: 407.2685241699219
2024-08-04T12:01:08.213934269Z 
 95%|█████████▌| 9061/9500 [31:03:38<1:30:55, 12.43s/it]08/04/2024 05:01:08 - INFO - __main__ -   Step: 9061, LR: 9.51240770949986e-07, Loss: 425.33905029296875
2024-08-04T12:01:20.530754152Z 
 95%|█████████▌| 9062/9500 [31:03:50<1:30:28, 12.39s/it]08/04/2024 05:01:20 - INFO - __main__ -   Step: 9062, LR: 9.490702272627071e-07, Loss: 339.32598876953125
2024-08-04T12:01:32.747524712Z 
 95%|█████████▌| 9063/9500 [31:04:02<1:29:52, 12.34s/it]08/04/2024 05:01:32 - INFO - __main__ -   Step: 9063, LR: 9.468996835754282e-07, Loss: 390.1420593261719
2024-08-04T12:01:45.148573489Z 
 95%|█████████▌| 9064/9500 [31:04:15<1:29:48, 12.36s/it]08/04/2024 05:01:45 - INFO - __main__ -   Step: 9064, LR: 9.447291398881492e-07, Loss: 335.21966552734375
2024-08-04T12:01:57.590663396Z 
 95%|█████████▌| 9065/9500 [31:04:27<1:29:46, 12.38s/it]08/04/2024 05:01:57 - INFO - __main__ -   Step: 9065, LR: 9.425585962008703e-07, Loss: 429.4519958496094
2024-08-04T12:02:09.841502291Z 
 95%|█████████▌| 9066/9500 [31:04:39<1:29:17, 12.34s/it]08/04/2024 05:02:09 - INFO - __main__ -   Step: 9066, LR: 9.403880525135914e-07, Loss: 459.37982177734375
2024-08-04T12:02:21.885258058Z 
 95%|█████████▌| 9067/9500 [31:04:51<1:28:25, 12.25s/it]08/04/2024 05:02:21 - INFO - __main__ -   Step: 9067, LR: 9.382175088263125e-07, Loss: 365.89892578125
2024-08-04T12:02:33.903089141Z 
 95%|█████████▌| 9068/9500 [31:05:03<1:27:43, 12.18s/it]08/04/2024 05:02:33 - INFO - __main__ -   Step: 9068, LR: 9.360469651390336e-07, Loss: 344.02154541015625
2024-08-04T12:02:46.143442779Z 
 95%|█████████▌| 9069/9500 [31:05:16<1:27:38, 12.20s/it]08/04/2024 05:02:46 - INFO - __main__ -   Step: 9069, LR: 9.338764214517547e-07, Loss: 378.00213623046875
2024-08-04T12:02:58.332604532Z 
 95%|█████████▌| 9070/9500 [31:05:28<1:27:24, 12.20s/it]08/04/2024 05:02:58 - INFO - __main__ -   Step: 9070, LR: 9.317058777644758e-07, Loss: 335.9197998046875
2024-08-04T12:03:10.456089005Z 
 95%|█████████▌| 9071/9500 [31:05:40<1:27:03, 12.17s/it]08/04/2024 05:03:10 - INFO - __main__ -   Step: 9071, LR: 9.295353340771968e-07, Loss: 380.83905029296875
2024-08-04T12:03:23.044859888Z 
 95%|█████████▌| 9072/9500 [31:05:52<1:27:43, 12.30s/it]08/04/2024 05:03:23 - INFO - __main__ -   Step: 9072, LR: 9.273647903899178e-07, Loss: 423.1916198730469
2024-08-04T12:03:35.337627374Z 
 96%|█████████▌| 9073/9500 [31:06:05<1:27:30, 12.30s/it]08/04/2024 05:03:35 - INFO - __main__ -   Step: 9073, LR: 9.25194246702639e-07, Loss: 453.9910583496094
2024-08-04T12:03:47.072889812Z 
 96%|█████████▌| 9074/9500 [31:06:17<1:26:06, 12.13s/it]08/04/2024 05:03:47 - INFO - __main__ -   Step: 9074, LR: 9.230237030153601e-07, Loss: 334.05279541015625
2024-08-04T12:03:59.648206998Z 
 96%|█████████▌| 9075/9500 [31:06:29<1:26:51, 12.26s/it]08/04/2024 05:03:59 - INFO - __main__ -   Step: 9075, LR: 9.208531593280811e-07, Loss: 435.30487060546875
2024-08-04T12:04:11.970675267Z 
 96%|█████████▌| 9076/9500 [31:06:41<1:26:47, 12.28s/it]08/04/2024 05:04:11 - INFO - __main__ -   Step: 9076, LR: 9.186826156408023e-07, Loss: 400.66778564453125
2024-08-04T12:04:24.263990097Z 
 96%|█████████▌| 9077/9500 [31:06:54<1:26:36, 12.28s/it]08/04/2024 05:04:24 - INFO - __main__ -   Step: 9077, LR: 9.165120719535234e-07, Loss: 373.30462646484375
2024-08-04T12:04:36.623474604Z 
 96%|█████████▌| 9078/9500 [31:07:06<1:26:33, 12.31s/it]08/04/2024 05:04:36 - INFO - __main__ -   Step: 9078, LR: 9.143415282662443e-07, Loss: 381.08013916015625
2024-08-04T12:04:48.771880392Z 
 96%|█████████▌| 9079/9500 [31:07:18<1:26:01, 12.26s/it]08/04/2024 05:04:48 - INFO - __main__ -   Step: 9079, LR: 9.121709845789655e-07, Loss: 342.44647216796875
2024-08-04T12:05:00.911105682Z 
 96%|█████████▌| 9080/9500 [31:07:30<1:25:33, 12.22s/it]08/04/2024 05:05:00 - INFO - __main__ -   Step: 9080, LR: 9.100004408916866e-07, Loss: 403.7653503417969
2024-08-04T12:05:13.690953571Z 
 96%|█████████▌| 9081/9500 [31:07:43<1:26:31, 12.39s/it]08/04/2024 05:05:13 - INFO - __main__ -   Step: 9081, LR: 9.078298972044076e-07, Loss: 391.0162353515625
2024-08-04T12:05:25.880431258Z 
 96%|█████████▌| 9082/9500 [31:07:55<1:25:53, 12.33s/it]08/04/2024 05:05:25 - INFO - __main__ -   Step: 9082, LR: 9.056593535171288e-07, Loss: 367.96044921875
2024-08-04T12:05:38.255612117Z 
 96%|█████████▌| 9083/9500 [31:08:08<1:25:47, 12.34s/it]08/04/2024 05:05:38 - INFO - __main__ -   Step: 9083, LR: 9.034888098298499e-07, Loss: 372.22955322265625
2024-08-04T12:05:50.834451981Z 
 96%|█████████▌| 9084/9500 [31:08:20<1:26:04, 12.41s/it]08/04/2024 05:05:50 - INFO - __main__ -   Step: 9084, LR: 9.013182661425708e-07, Loss: 309.0634765625
2024-08-04T12:06:02.859616066Z 
 96%|█████████▌| 9085/9500 [31:08:32<1:25:03, 12.30s/it]08/04/2024 05:06:02 - INFO - __main__ -   Step: 9085, LR: 8.99147722455292e-07, Loss: 390.1683349609375
2024-08-04T12:06:15.218060516Z 
 96%|█████████▌| 9086/9500 [31:08:45<1:24:58, 12.32s/it]08/04/2024 05:06:15 - INFO - __main__ -   Step: 9086, LR: 8.96977178768013e-07, Loss: 492.2680358886719
2024-08-04T12:06:28.290387841Z 
 96%|█████████▌| 9087/9500 [31:08:58<1:26:20, 12.54s/it]08/04/2024 05:06:28 - INFO - __main__ -   Step: 9087, LR: 8.948066350807341e-07, Loss: 400.734375
2024-08-04T12:06:40.433588512Z 
 96%|█████████▌| 9088/9500 [31:09:10<1:25:18, 12.42s/it]08/04/2024 05:06:40 - INFO - __main__ -   Step: 9088, LR: 8.926360913934553e-07, Loss: 390.3450622558594
2024-08-04T12:06:52.712809050Z 
 96%|█████████▌| 9089/9500 [31:09:22<1:24:48, 12.38s/it]08/04/2024 05:06:52 - INFO - __main__ -   Step: 9089, LR: 8.904655477061763e-07, Loss: 394.41705322265625
2024-08-04T12:07:05.308092817Z 
 96%|█████████▌| 9090/9500 [31:09:35<1:25:02, 12.44s/it]08/04/2024 05:07:05 - INFO - __main__ -   Step: 9090, LR: 8.882950040188974e-07, Loss: 532.644775390625
2024-08-04T12:07:17.657925120Z 
 96%|█████████▌| 9091/9500 [31:09:47<1:24:38, 12.42s/it]08/04/2024 05:07:17 - INFO - __main__ -   Step: 9091, LR: 8.861244603316184e-07, Loss: 382.60162353515625
2024-08-04T12:07:30.008334629Z 
 96%|█████████▌| 9092/9500 [31:09:59<1:24:17, 12.40s/it]08/04/2024 05:07:30 - INFO - __main__ -   Step: 9092, LR: 8.839539166443395e-07, Loss: 405.12078857421875
2024-08-04T12:07:42.394882898Z 
 96%|█████████▌| 9093/9500 [31:10:12<1:24:04, 12.39s/it]08/04/2024 05:07:42 - INFO - __main__ -   Step: 9093, LR: 8.817833729570606e-07, Loss: 305.24505615234375
2024-08-04T12:07:54.499408219Z 
 96%|█████████▌| 9094/9500 [31:10:24<1:23:16, 12.31s/it]08/04/2024 05:07:54 - INFO - __main__ -   Step: 9094, LR: 8.796128292697817e-07, Loss: 468.8190612792969
2024-08-04T12:08:06.946393077Z 
 96%|█████████▌| 9095/9500 [31:10:36<1:23:21, 12.35s/it]08/04/2024 05:08:06 - INFO - __main__ -   Step: 9095, LR: 8.774422855825028e-07, Loss: 439.60205078125
2024-08-04T12:08:19.437062847Z 
 96%|█████████▌| 9096/9500 [31:10:49<1:23:26, 12.39s/it]08/04/2024 05:08:19 - INFO - __main__ -   Step: 9096, LR: 8.752717418952239e-07, Loss: 371.43701171875
2024-08-04T12:08:31.645041649Z 
 96%|█████████▌| 9097/9500 [31:11:01<1:22:51, 12.34s/it]08/04/2024 05:08:31 - INFO - __main__ -   Step: 9097, LR: 8.73101198207945e-07, Loss: 381.21588134765625
2024-08-04T12:08:43.805186046Z 
 96%|█████████▌| 9098/9500 [31:11:13<1:22:17, 12.28s/it]08/04/2024 05:08:43 - INFO - __main__ -   Step: 9098, LR: 8.70930654520666e-07, Loss: 359.97210693359375
2024-08-04T12:08:56.416792549Z 
 96%|█████████▌| 9099/9500 [31:11:26<1:22:45, 12.38s/it]08/04/2024 05:08:56 - INFO - __main__ -   Step: 9099, LR: 8.687601108333871e-07, Loss: 478.00799560546875
2024-08-04T12:09:08.683517491Z 
 96%|█████████▌| 9100/9500 [31:11:38<1:22:18, 12.35s/it]08/04/2024 05:09:08 - INFO - __main__ -   Step: 9100, LR: 8.665895671461082e-07, Loss: 411.7088928222656
2024-08-04T12:09:20.880163096Z 
 96%|█████████▌| 9101/9500 [31:11:50<1:21:48, 12.30s/it]08/04/2024 05:09:20 - INFO - __main__ -   Step: 9101, LR: 8.644190234588293e-07, Loss: 355.059326171875
2024-08-04T12:09:33.526637046Z 
 96%|█████████▌| 9102/9500 [31:12:03<1:22:17, 12.41s/it]08/04/2024 05:09:33 - INFO - __main__ -   Step: 9102, LR: 8.622484797715504e-07, Loss: 481.5868225097656
2024-08-04T12:09:45.597093558Z 
 96%|█████████▌| 9103/9500 [31:12:15<1:21:25, 12.30s/it]08/04/2024 05:09:45 - INFO - __main__ -   Step: 9103, LR: 8.600779360842715e-07, Loss: 328.8923645019531
2024-08-04T12:09:58.076196554Z 
 96%|█████████▌| 9104/9500 [31:12:28<1:21:33, 12.36s/it]08/04/2024 05:09:58 - INFO - __main__ -   Step: 9104, LR: 8.579073923969925e-07, Loss: 373.6483154296875
2024-08-04T12:10:10.498367178Z 
 96%|█████████▌| 9105/9500 [31:12:40<1:21:28, 12.38s/it]08/04/2024 05:10:10 - INFO - __main__ -   Step: 9105, LR: 8.557368487097135e-07, Loss: 487.2726135253906
2024-08-04T12:10:22.719140727Z 
 96%|█████████▌| 9106/9500 [31:12:52<1:20:58, 12.33s/it]08/04/2024 05:10:22 - INFO - __main__ -   Step: 9106, LR: 8.535663050224346e-07, Loss: 467.2593688964844
2024-08-04T12:10:34.779274717Z 
 96%|█████████▌| 9107/9500 [31:13:04<1:20:13, 12.25s/it]08/04/2024 05:10:34 - INFO - __main__ -   Step: 9107, LR: 8.513957613351558e-07, Loss: 301.0791931152344
2024-08-04T12:10:47.085798630Z 
 96%|█████████▌| 9108/9500 [31:13:17<1:20:08, 12.27s/it]08/04/2024 05:10:47 - INFO - __main__ -   Step: 9108, LR: 8.492252176478768e-07, Loss: 316.58636474609375
2024-08-04T12:10:59.392224392Z 
 96%|█████████▌| 9109/9500 [31:13:29<1:20:00, 12.28s/it]08/04/2024 05:10:59 - INFO - __main__ -   Step: 9109, LR: 8.47054673960598e-07, Loss: 361.12615966796875
2024-08-04T12:11:11.539073857Z 
 96%|█████████▌| 9110/9500 [31:13:41<1:19:33, 12.24s/it]08/04/2024 05:11:11 - INFO - __main__ -   Step: 9110, LR: 8.448841302733191e-07, Loss: 336.9718322753906
2024-08-04T12:11:23.693651210Z 
 96%|█████████▌| 9111/9500 [31:13:53<1:19:11, 12.21s/it]08/04/2024 05:11:23 - INFO - __main__ -   Step: 9111, LR: 8.4271358658604e-07, Loss: 427.68450927734375
2024-08-04T12:11:36.189434037Z 
 96%|█████████▌| 9112/9500 [31:14:06<1:19:31, 12.30s/it]08/04/2024 05:11:36 - INFO - __main__ -   Step: 9112, LR: 8.405430428987611e-07, Loss: 282.0148010253906
2024-08-04T12:11:48.548784719Z 
 96%|█████████▌| 9113/9500 [31:14:18<1:19:26, 12.32s/it]08/04/2024 05:11:48 - INFO - __main__ -   Step: 9113, LR: 8.383724992114822e-07, Loss: 416.9951171875
2024-08-04T12:12:00.517464396Z 
 96%|█████████▌| 9114/9500 [31:14:30<1:18:33, 12.21s/it]08/04/2024 05:12:00 - INFO - __main__ -   Step: 9114, LR: 8.362019555242033e-07, Loss: 422.27008056640625
2024-08-04T12:12:13.665423398Z 
 96%|█████████▌| 9115/9500 [31:14:43<1:20:09, 12.49s/it]08/04/2024 05:12:13 - INFO - __main__ -   Step: 9115, LR: 8.340314118369245e-07, Loss: 525.03515625
2024-08-04T12:12:25.879730689Z 
 96%|█████████▌| 9116/9500 [31:14:55<1:19:25, 12.41s/it]08/04/2024 05:12:25 - INFO - __main__ -   Step: 9116, LR: 8.318608681496456e-07, Loss: 358.46343994140625
2024-08-04T12:12:37.975613953Z 
 96%|█████████▌| 9117/9500 [31:15:07<1:18:36, 12.32s/it]08/04/2024 05:12:37 - INFO - __main__ -   Step: 9117, LR: 8.296903244623666e-07, Loss: 500.36761474609375
2024-08-04T12:12:50.703250020Z 
 96%|█████████▌| 9118/9500 [31:15:20<1:19:11, 12.44s/it]08/04/2024 05:12:50 - INFO - __main__ -   Step: 9118, LR: 8.275197807750876e-07, Loss: 450.7327880859375
2024-08-04T12:13:03.097668114Z 
 96%|█████████▌| 9119/9500 [31:15:33<1:18:54, 12.43s/it]08/04/2024 05:13:03 - INFO - __main__ -   Step: 9119, LR: 8.253492370878087e-07, Loss: 405.5869140625
2024-08-04T12:13:15.446470181Z 
 96%|█████████▌| 9120/9500 [31:15:45<1:18:32, 12.40s/it]08/04/2024 05:13:15 - INFO - __main__ -   Step: 9120, LR: 8.231786934005298e-07, Loss: 497.5155944824219
2024-08-04T12:13:27.755441039Z 
 96%|█████████▌| 9121/9500 [31:15:57<1:18:09, 12.37s/it]08/04/2024 05:13:27 - INFO - __main__ -   Step: 9121, LR: 8.21008149713251e-07, Loss: 312.60833740234375
2024-08-04T12:13:39.768878160Z 
 96%|█████████▌| 9122/9500 [31:16:09<1:17:16, 12.27s/it]08/04/2024 05:13:39 - INFO - __main__ -   Step: 9122, LR: 8.18837606025972e-07, Loss: 604.5254516601562
2024-08-04T12:13:52.202147120Z 
 96%|█████████▌| 9123/9500 [31:16:22<1:17:23, 12.32s/it]08/04/2024 05:13:52 - INFO - __main__ -   Step: 9123, LR: 8.166670623386931e-07, Loss: 399.84356689453125
2024-08-04T12:14:04.767786367Z 
 96%|█████████▌| 9124/9500 [31:16:34<1:17:39, 12.39s/it]08/04/2024 05:14:04 - INFO - __main__ -   Step: 9124, LR: 8.14496518651414e-07, Loss: 408.8846435546875
2024-08-04T12:14:16.802475255Z 
 96%|█████████▌| 9125/9500 [31:16:46<1:16:46, 12.28s/it]08/04/2024 05:14:16 - INFO - __main__ -   Step: 9125, LR: 8.123259749641352e-07, Loss: 379.41851806640625
2024-08-04T12:14:28.984778515Z 
 96%|█████████▌| 9126/9500 [31:16:58<1:16:22, 12.25s/it]08/04/2024 05:14:28 - INFO - __main__ -   Step: 9126, LR: 8.101554312768563e-07, Loss: 331.51580810546875
2024-08-04T12:14:41.426683929Z 
 96%|█████████▌| 9127/9500 [31:17:11<1:16:31, 12.31s/it]08/04/2024 05:14:41 - INFO - __main__ -   Step: 9127, LR: 8.079848875895773e-07, Loss: 379.5836486816406
2024-08-04T12:14:53.603240207Z 
 96%|█████████▌| 9128/9500 [31:17:23<1:16:04, 12.27s/it]08/04/2024 05:14:53 - INFO - __main__ -   Step: 9128, LR: 8.058143439022985e-07, Loss: 440.64276123046875
2024-08-04T12:15:05.641814441Z 
 96%|█████████▌| 9129/9500 [31:17:35<1:15:26, 12.20s/it]08/04/2024 05:15:05 - INFO - __main__ -   Step: 9129, LR: 8.036438002150196e-07, Loss: 299.63409423828125
2024-08-04T12:15:18.146224811Z 
 96%|█████████▌| 9130/9500 [31:17:48<1:15:47, 12.29s/it]08/04/2024 05:15:18 - INFO - __main__ -   Step: 9130, LR: 8.014732565277406e-07, Loss: 399.752685546875
2024-08-04T12:15:30.349020359Z 
 96%|█████████▌| 9131/9500 [31:18:00<1:15:25, 12.26s/it]08/04/2024 05:15:30 - INFO - __main__ -   Step: 9131, LR: 7.993027128404617e-07, Loss: 396.82171630859375
2024-08-04T12:15:42.308040665Z 
 96%|█████████▌| 9132/9500 [31:18:12<1:14:39, 12.17s/it]08/04/2024 05:15:42 - INFO - __main__ -   Step: 9132, LR: 7.971321691531827e-07, Loss: 421.4659423828125
2024-08-04T12:15:54.588151376Z 
 96%|█████████▌| 9133/9500 [31:18:24<1:14:39, 12.21s/it]08/04/2024 05:15:54 - INFO - __main__ -   Step: 9133, LR: 7.949616254659038e-07, Loss: 320.8471374511719
2024-08-04T12:16:06.696366735Z 
 96%|█████████▌| 9134/9500 [31:18:36<1:14:16, 12.18s/it]08/04/2024 05:16:06 - INFO - __main__ -   Step: 9134, LR: 7.92791081778625e-07, Loss: 407.58355712890625
2024-08-04T12:16:19.230345233Z 
 96%|█████████▌| 9135/9500 [31:18:49<1:14:43, 12.28s/it]08/04/2024 05:16:19 - INFO - __main__ -   Step: 9135, LR: 7.906205380913461e-07, Loss: 433.052001953125
2024-08-04T12:16:31.422997433Z 
 96%|█████████▌| 9136/9500 [31:19:01<1:14:21, 12.26s/it]08/04/2024 05:16:31 - INFO - __main__ -   Step: 9136, LR: 7.884499944040671e-07, Loss: 261.4794921875
2024-08-04T12:16:43.548276549Z 
 96%|█████████▌| 9137/9500 [31:19:13<1:13:54, 12.22s/it]08/04/2024 05:16:43 - INFO - __main__ -   Step: 9137, LR: 7.862794507167883e-07, Loss: 329.7137756347656
2024-08-04T12:16:55.834595147Z 
 96%|█████████▌| 9138/9500 [31:19:25<1:13:50, 12.24s/it]08/04/2024 05:16:55 - INFO - __main__ -   Step: 9138, LR: 7.841089070295092e-07, Loss: 427.83856201171875
2024-08-04T12:17:08.317701329Z 
 96%|█████████▌| 9139/9500 [31:19:38<1:14:04, 12.31s/it]08/04/2024 05:17:08 - INFO - __main__ -   Step: 9139, LR: 7.819383633422303e-07, Loss: 376.26959228515625
2024-08-04T12:17:20.177789427Z 
 96%|█████████▌| 9140/9500 [31:19:50<1:13:03, 12.18s/it]08/04/2024 05:17:20 - INFO - __main__ -   Step: 9140, LR: 7.797678196549515e-07, Loss: 381.53765869140625
2024-08-04T12:17:32.753713655Z 
 96%|█████████▌| 9141/9500 [31:20:02<1:13:34, 12.30s/it]08/04/2024 05:17:32 - INFO - __main__ -   Step: 9141, LR: 7.775972759676725e-07, Loss: 425.58001708984375
2024-08-04T12:17:45.368594751Z 
 96%|█████████▌| 9142/9500 [31:20:15<1:13:56, 12.39s/it]08/04/2024 05:17:45 - INFO - __main__ -   Step: 9142, LR: 7.754267322803936e-07, Loss: 404.47210693359375
2024-08-04T12:17:57.372456018Z 
 96%|█████████▌| 9143/9500 [31:20:27<1:13:02, 12.28s/it]08/04/2024 05:17:57 - INFO - __main__ -   Step: 9143, LR: 7.732561885931148e-07, Loss: 399.8188171386719
2024-08-04T12:18:09.662287630Z 
 96%|█████████▋| 9144/9500 [31:20:39<1:12:51, 12.28s/it]08/04/2024 05:18:09 - INFO - __main__ -   Step: 9144, LR: 7.710856449058357e-07, Loss: 388.0839538574219
2024-08-04T12:18:22.203957420Z 
 96%|█████████▋| 9145/9500 [31:20:52<1:13:07, 12.36s/it]08/04/2024 05:18:22 - INFO - __main__ -   Step: 9145, LR: 7.689151012185568e-07, Loss: 378.83685302734375
2024-08-04T12:18:34.505861674Z 
 96%|█████████▋| 9146/9500 [31:21:04<1:12:48, 12.34s/it]08/04/2024 05:18:34 - INFO - __main__ -   Step: 9146, LR: 7.667445575312779e-07, Loss: 443.0653381347656
2024-08-04T12:18:46.569690442Z 
 96%|█████████▋| 9147/9500 [31:21:16<1:12:07, 12.26s/it]08/04/2024 05:18:46 - INFO - __main__ -   Step: 9147, LR: 7.64574013843999e-07, Loss: 289.6649169921875
2024-08-04T12:18:59.179297121Z 
 96%|█████████▋| 9148/9500 [31:21:29<1:12:31, 12.36s/it]08/04/2024 05:18:59 - INFO - __main__ -   Step: 9148, LR: 7.624034701567201e-07, Loss: 460.49700927734375
2024-08-04T12:19:11.209712643Z 
 96%|█████████▋| 9149/9500 [31:21:41<1:11:44, 12.26s/it]08/04/2024 05:19:11 - INFO - __main__ -   Step: 9149, LR: 7.602329264694412e-07, Loss: 535.892578125
2024-08-04T12:19:23.638933620Z 
 96%|█████████▋| 9150/9500 [31:21:53<1:11:49, 12.31s/it]08/04/2024 05:19:23 - INFO - __main__ -   Step: 9150, LR: 7.580623827821623e-07, Loss: 433.7396240234375
2024-08-04T12:19:36.113585228Z 
 96%|█████████▋| 9151/9500 [31:22:06<1:11:54, 12.36s/it]08/04/2024 05:19:36 - INFO - __main__ -   Step: 9151, LR: 7.558918390948833e-07, Loss: 410.614013671875
2024-08-04T12:19:48.129866909Z 
 96%|█████████▋| 9152/9500 [31:22:18<1:11:05, 12.26s/it]08/04/2024 05:19:48 - INFO - __main__ -   Step: 9152, LR: 7.537212954076044e-07, Loss: 312.4049072265625
2024-08-04T12:20:00.290482046Z 
 96%|█████████▋| 9153/9500 [31:22:30<1:10:43, 12.23s/it]08/04/2024 05:20:00 - INFO - __main__ -   Step: 9153, LR: 7.515507517203255e-07, Loss: 388.3316955566406
2024-08-04T12:20:12.404142309Z 
 96%|█████████▋| 9154/9500 [31:22:42<1:10:19, 12.19s/it]08/04/2024 05:20:12 - INFO - __main__ -   Step: 9154, LR: 7.493802080330466e-07, Loss: 449.79974365234375
2024-08-04T12:20:24.784466772Z 
 96%|█████████▋| 9155/9500 [31:22:54<1:10:26, 12.25s/it]08/04/2024 05:20:24 - INFO - __main__ -   Step: 9155, LR: 7.472096643457677e-07, Loss: 447.50286865234375
2024-08-04T12:20:37.116456452Z 
 96%|█████████▋| 9156/9500 [31:23:07<1:10:22, 12.27s/it]08/04/2024 05:20:37 - INFO - __main__ -   Step: 9156, LR: 7.450391206584888e-07, Loss: 440.1596984863281
2024-08-04T12:20:49.586826673Z 
 96%|█████████▋| 9157/9500 [31:23:19<1:10:30, 12.33s/it]08/04/2024 05:20:49 - INFO - __main__ -   Step: 9157, LR: 7.428685769712099e-07, Loss: 454.6277160644531
2024-08-04T12:21:02.160937784Z 
 96%|█████████▋| 9158/9500 [31:23:32<1:10:42, 12.41s/it]08/04/2024 05:21:02 - INFO - __main__ -   Step: 9158, LR: 7.406980332839309e-07, Loss: 433.38720703125
2024-08-04T12:21:14.864699916Z 
 96%|█████████▋| 9159/9500 [31:23:44<1:11:00, 12.49s/it]08/04/2024 05:21:14 - INFO - __main__ -   Step: 9159, LR: 7.38527489596652e-07, Loss: 433.6322326660156
2024-08-04T12:21:27.001662189Z 
 96%|█████████▋| 9160/9500 [31:23:56<1:10:11, 12.39s/it]08/04/2024 05:21:27 - INFO - __main__ -   Step: 9160, LR: 7.36356945909373e-07, Loss: 481.8348083496094
2024-08-04T12:21:39.590592159Z 
 96%|█████████▋| 9161/9500 [31:24:09<1:10:19, 12.45s/it]08/04/2024 05:21:39 - INFO - __main__ -   Step: 9161, LR: 7.341864022220942e-07, Loss: 360.67822265625
2024-08-04T12:21:51.690257447Z 
 96%|█████████▋| 9162/9500 [31:24:21<1:09:32, 12.34s/it]08/04/2024 05:21:51 - INFO - __main__ -   Step: 9162, LR: 7.320158585348153e-07, Loss: 376.34405517578125
2024-08-04T12:22:03.859212715Z 
 96%|█████████▋| 9163/9500 [31:24:33<1:09:02, 12.29s/it]08/04/2024 05:22:03 - INFO - __main__ -   Step: 9163, LR: 7.298453148475363e-07, Loss: 344.7367248535156
2024-08-04T12:22:16.149082714Z 
 96%|█████████▋| 9164/9500 [31:24:46<1:08:49, 12.29s/it]08/04/2024 05:22:16 - INFO - __main__ -   Step: 9164, LR: 7.276747711602574e-07, Loss: 394.0
2024-08-04T12:22:28.257021336Z 
 96%|█████████▋| 9165/9500 [31:24:58<1:08:19, 12.24s/it]08/04/2024 05:22:28 - INFO - __main__ -   Step: 9165, LR: 7.255042274729784e-07, Loss: 359.9671936035156
2024-08-04T12:22:40.651049898Z 
 96%|█████████▋| 9166/9500 [31:25:10<1:08:22, 12.28s/it]08/04/2024 05:22:40 - INFO - __main__ -   Step: 9166, LR: 7.233336837856995e-07, Loss: 375.0750427246094
2024-08-04T12:22:53.492973419Z 
 96%|█████████▋| 9167/9500 [31:25:23<1:09:06, 12.45s/it]08/04/2024 05:22:53 - INFO - __main__ -   Step: 9167, LR: 7.211631400984207e-07, Loss: 333.5802001953125
2024-08-04T12:23:05.738840545Z 
 97%|█████████▋| 9168/9500 [31:25:35<1:08:33, 12.39s/it]08/04/2024 05:23:05 - INFO - __main__ -   Step: 9168, LR: 7.189925964111417e-07, Loss: 442.60089111328125
2024-08-04T12:23:18.003909259Z 
 97%|█████████▋| 9169/9500 [31:25:47<1:08:08, 12.35s/it]08/04/2024 05:23:18 - INFO - __main__ -   Step: 9169, LR: 7.168220527238628e-07, Loss: 518.6397705078125
2024-08-04T12:23:30.397693845Z 
 97%|█████████▋| 9170/9500 [31:26:00<1:08:00, 12.36s/it]08/04/2024 05:23:30 - INFO - __main__ -   Step: 9170, LR: 7.14651509036584e-07, Loss: 443.7641906738281
2024-08-04T12:23:42.575681958Z 
 97%|█████████▋| 9171/9500 [31:26:12<1:07:29, 12.31s/it]08/04/2024 05:23:42 - INFO - __main__ -   Step: 9171, LR: 7.124809653493049e-07, Loss: 327.65576171875
2024-08-04T12:23:55.025950545Z 
 97%|█████████▋| 9172/9500 [31:26:24<1:07:31, 12.35s/it]08/04/2024 05:23:55 - INFO - __main__ -   Step: 9172, LR: 7.10310421662026e-07, Loss: 433.63153076171875
2024-08-04T12:24:07.621509849Z 
 97%|█████████▋| 9173/9500 [31:26:37<1:07:42, 12.42s/it]08/04/2024 05:24:07 - INFO - __main__ -   Step: 9173, LR: 7.081398779747472e-07, Loss: 337.80816650390625
2024-08-04T12:24:19.843230411Z 
 97%|█████████▋| 9174/9500 [31:26:49<1:07:10, 12.36s/it]08/04/2024 05:24:19 - INFO - __main__ -   Step: 9174, LR: 7.059693342874682e-07, Loss: 294.2655029296875
2024-08-04T12:24:32.471693056Z 
 97%|█████████▋| 9175/9500 [31:27:02<1:07:23, 12.44s/it]08/04/2024 05:24:32 - INFO - __main__ -   Step: 9175, LR: 7.037987906001893e-07, Loss: 407.8603820800781
2024-08-04T12:24:45.040721590Z 
 97%|█████████▋| 9176/9500 [31:27:14<1:07:23, 12.48s/it]08/04/2024 05:24:45 - INFO - __main__ -   Step: 9176, LR: 7.016282469129105e-07, Loss: 368.88885498046875
2024-08-04T12:24:57.354253076Z 
 97%|█████████▋| 9177/9500 [31:27:27<1:06:55, 12.43s/it]08/04/2024 05:24:57 - INFO - __main__ -   Step: 9177, LR: 6.994577032256315e-07, Loss: 397.0690612792969
2024-08-04T12:25:09.680995779Z 
 97%|█████████▋| 9178/9500 [31:27:39<1:06:32, 12.40s/it]08/04/2024 05:25:09 - INFO - __main__ -   Step: 9178, LR: 6.972871595383525e-07, Loss: 399.60296630859375
2024-08-04T12:25:22.099834689Z 
 97%|█████████▋| 9179/9500 [31:27:52<1:06:22, 12.41s/it]08/04/2024 05:25:22 - INFO - __main__ -   Step: 9179, LR: 6.951166158510736e-07, Loss: 380.246337890625
2024-08-04T12:25:34.176401749Z 
 97%|█████████▋| 9180/9500 [31:28:04<1:05:38, 12.31s/it]08/04/2024 05:25:34 - INFO - __main__ -   Step: 9180, LR: 6.929460721637947e-07, Loss: 291.246337890625
2024-08-04T12:25:46.489343089Z 
 97%|█████████▋| 9181/9500 [31:28:16<1:05:26, 12.31s/it]08/04/2024 05:25:46 - INFO - __main__ -   Step: 9181, LR: 6.907755284765158e-07, Loss: 360.4056396484375
2024-08-04T12:25:59.052158297Z 
 97%|█████████▋| 9182/9500 [31:28:28<1:05:38, 12.38s/it]08/04/2024 05:25:59 - INFO - __main__ -   Step: 9182, LR: 6.886049847892369e-07, Loss: 361.1730041503906
2024-08-04T12:26:11.104756520Z 
 97%|█████████▋| 9183/9500 [31:28:41<1:04:54, 12.29s/it]08/04/2024 05:26:11 - INFO - __main__ -   Step: 9183, LR: 6.86434441101958e-07, Loss: 404.8515319824219
2024-08-04T12:26:23.103121554Z 
 97%|█████████▋| 9184/9500 [31:28:53<1:04:14, 12.20s/it]08/04/2024 05:26:23 - INFO - __main__ -   Step: 9184, LR: 6.842638974146789e-07, Loss: 408.2230224609375
2024-08-04T12:26:35.981771762Z 
 97%|█████████▋| 9185/9500 [31:29:05<1:05:06, 12.40s/it]08/04/2024 05:26:35 - INFO - __main__ -   Step: 9185, LR: 6.820933537274001e-07, Loss: 455.63421630859375
2024-08-04T12:26:48.232500516Z 
 97%|█████████▋| 9186/9500 [31:29:18<1:04:40, 12.36s/it]08/04/2024 05:26:48 - INFO - __main__ -   Step: 9186, LR: 6.799228100401212e-07, Loss: 338.9750061035156
2024-08-04T12:27:00.336724796Z 
 97%|█████████▋| 9187/9500 [31:29:30<1:04:04, 12.28s/it]08/04/2024 05:27:00 - INFO - __main__ -   Step: 9187, LR: 6.777522663528423e-07, Loss: 349.0824890136719
2024-08-04T12:27:12.818984421Z 
 97%|█████████▋| 9188/9500 [31:29:42<1:04:10, 12.34s/it]08/04/2024 05:27:12 - INFO - __main__ -   Step: 9188, LR: 6.755817226655634e-07, Loss: 257.94793701171875
2024-08-04T12:27:24.967379507Z 
 97%|█████████▋| 9189/9500 [31:29:54<1:03:40, 12.28s/it]08/04/2024 05:27:24 - INFO - __main__ -   Step: 9189, LR: 6.734111789782845e-07, Loss: 419.28302001953125
2024-08-04T12:27:37.146939950Z 
 97%|█████████▋| 9190/9500 [31:30:07<1:03:18, 12.25s/it]08/04/2024 05:27:37 - INFO - __main__ -   Step: 9190, LR: 6.712406352910056e-07, Loss: 411.6783142089844
2024-08-04T12:27:50.133303632Z 
 97%|█████████▋| 9191/9500 [31:30:20<1:04:14, 12.47s/it]08/04/2024 05:27:50 - INFO - __main__ -   Step: 9191, LR: 6.690700916037266e-07, Loss: 436.09881591796875
2024-08-04T12:28:02.341607888Z 
 97%|█████████▋| 9192/9500 [31:30:32<1:03:37, 12.39s/it]08/04/2024 05:28:02 - INFO - __main__ -   Step: 9192, LR: 6.668995479164477e-07, Loss: 535.2288818359375
2024-08-04T12:28:14.358311619Z 
 97%|█████████▋| 9193/9500 [31:30:44<1:02:50, 12.28s/it]08/04/2024 05:28:14 - INFO - __main__ -   Step: 9193, LR: 6.647290042291687e-07, Loss: 423.89544677734375
2024-08-04T12:28:27.073565120Z 
 97%|█████████▋| 9194/9500 [31:30:57<1:03:17, 12.41s/it]08/04/2024 05:28:27 - INFO - __main__ -   Step: 9194, LR: 6.625584605418899e-07, Loss: 391.73822021484375
2024-08-04T12:28:39.258667895Z 
 97%|█████████▋| 9195/9500 [31:31:09<1:02:44, 12.34s/it]08/04/2024 05:28:39 - INFO - __main__ -   Step: 9195, LR: 6.60387916854611e-07, Loss: 417.05352783203125
2024-08-04T12:28:52.026714740Z 
 97%|█████████▋| 9196/9500 [31:31:21<1:03:11, 12.47s/it]08/04/2024 05:28:52 - INFO - __main__ -   Step: 9196, LR: 6.58217373167332e-07, Loss: 329.80657958984375
2024-08-04T12:29:04.353650523Z 
 97%|█████████▋| 9197/9500 [31:31:34<1:02:45, 12.43s/it]08/04/2024 05:29:04 - INFO - __main__ -   Step: 9197, LR: 6.560468294800532e-07, Loss: 393.73236083984375
2024-08-04T12:29:16.986135886Z 
 97%|█████████▋| 9198/9500 [31:31:46<1:02:51, 12.49s/it]08/04/2024 05:29:16 - INFO - __main__ -   Step: 9198, LR: 6.538762857927741e-07, Loss: 470.60491943359375
2024-08-04T12:29:29.225417276Z 
 97%|█████████▋| 9199/9500 [31:31:59<1:02:16, 12.41s/it]08/04/2024 05:29:29 - INFO - __main__ -   Step: 9199, LR: 6.517057421054952e-07, Loss: 459.0302429199219
2024-08-04T12:29:41.356551605Z 
 97%|█████████▋| 9200/9500 [31:32:11<1:01:38, 12.33s/it]08/04/2024 05:29:41 - INFO - __main__ -   Step: 9200, LR: 6.495351984182164e-07, Loss: 444.9114990234375
2024-08-04T12:29:54.001948847Z 
 97%|█████████▋| 9201/9500 [31:32:23<1:01:54, 12.42s/it]08/04/2024 05:29:54 - INFO - __main__ -   Step: 9201, LR: 6.473646547309374e-07, Loss: 383.49285888671875
2024-08-04T12:30:06.015249844Z 
 97%|█████████▋| 9202/9500 [31:32:35<1:01:05, 12.30s/it]08/04/2024 05:30:06 - INFO - __main__ -   Step: 9202, LR: 6.451941110436585e-07, Loss: 451.6141357421875
2024-08-04T12:30:18.051691019Z 
 97%|█████████▋| 9203/9500 [31:32:47<1:00:29, 12.22s/it]08/04/2024 05:30:18 - INFO - __main__ -   Step: 9203, LR: 6.430235673563797e-07, Loss: 533.1165161132812
2024-08-04T12:30:30.341015617Z 
 97%|█████████▋| 9204/9500 [31:33:00<1:00:23, 12.24s/it]08/04/2024 05:30:30 - INFO - __main__ -   Step: 9204, LR: 6.408530236691006e-07, Loss: 374.7066345214844
2024-08-04T12:30:42.854360215Z 
 97%|█████████▋| 9205/9500 [31:33:12<1:00:35, 12.32s/it]08/04/2024 05:30:42 - INFO - __main__ -   Step: 9205, LR: 6.386824799818217e-07, Loss: 301.00994873046875
2024-08-04T12:30:55.150679724Z 
 97%|█████████▋| 9206/9500 [31:33:25<1:00:20, 12.31s/it]08/04/2024 05:30:55 - INFO - __main__ -   Step: 9206, LR: 6.365119362945429e-07, Loss: 538.9325561523438
2024-08-04T12:31:07.572196505Z 
 97%|█████████▋| 9207/9500 [31:33:37<1:00:17, 12.35s/it]08/04/2024 05:31:07 - INFO - __main__ -   Step: 9207, LR: 6.343413926072639e-07, Loss: 365.03826904296875
2024-08-04T12:31:19.512522958Z 
 97%|█████████▋| 9208/9500 [31:33:49<59:29, 12.23s/it]  08/04/2024 05:31:19 - INFO - __main__ -   Step: 9208, LR: 6.32170848919985e-07, Loss: 504.6247253417969
2024-08-04T12:31:31.715032315Z 
 97%|█████████▋| 9209/9500 [31:34:01<59:15, 12.22s/it]08/04/2024 05:31:31 - INFO - __main__ -   Step: 9209, LR: 6.300003052327062e-07, Loss: 411.37762451171875
2024-08-04T12:31:44.610184308Z 
 97%|█████████▋| 9210/9500 [31:34:14<1:00:02, 12.42s/it]08/04/2024 05:31:44 - INFO - __main__ -   Step: 9210, LR: 6.278297615454272e-07, Loss: 370.943603515625
2024-08-04T12:31:57.042297281Z 
 97%|█████████▋| 9211/9500 [31:34:26<59:50, 12.42s/it]  08/04/2024 05:31:57 - INFO - __main__ -   Step: 9211, LR: 6.256592178581482e-07, Loss: 435.9720458984375
2024-08-04T12:32:09.342825682Z 
 97%|█████████▋| 9212/9500 [31:34:39<59:27, 12.39s/it]08/04/2024 05:32:09 - INFO - __main__ -   Step: 9212, LR: 6.234886741708693e-07, Loss: 403.9471130371094
2024-08-04T12:32:21.814152781Z 
 97%|█████████▋| 9213/9500 [31:34:51<59:22, 12.41s/it]08/04/2024 05:32:21 - INFO - __main__ -   Step: 9213, LR: 6.213181304835904e-07, Loss: 278.2637939453125
2024-08-04T12:32:33.941678597Z 
 97%|█████████▋| 9214/9500 [31:35:03<58:45, 12.33s/it]08/04/2024 05:32:33 - INFO - __main__ -   Step: 9214, LR: 6.191475867963115e-07, Loss: 349.1349792480469
2024-08-04T12:32:45.960159396Z 
 97%|█████████▋| 9215/9500 [31:35:15<58:06, 12.23s/it]08/04/2024 05:32:45 - INFO - __main__ -   Step: 9215, LR: 6.169770431090325e-07, Loss: 363.01019287109375
2024-08-04T12:32:58.914634611Z 
 97%|█████████▋| 9216/9500 [31:35:28<58:55, 12.45s/it]08/04/2024 05:32:58 - INFO - __main__ -   Step: 9216, LR: 6.148064994217536e-07, Loss: 428.32415771484375
2024-08-04T12:33:11.008362695Z 
 97%|█████████▋| 9217/9500 [31:35:40<58:13, 12.34s/it]08/04/2024 05:33:11 - INFO - __main__ -   Step: 9217, LR: 6.126359557344747e-07, Loss: 380.14013671875
2024-08-04T12:33:23.561362652Z 
 97%|█████████▋| 9218/9500 [31:35:53<58:18, 12.41s/it]08/04/2024 05:33:23 - INFO - __main__ -   Step: 9218, LR: 6.104654120471958e-07, Loss: 527.447265625
2024-08-04T12:33:35.991115120Z 
 97%|█████████▋| 9219/9500 [31:36:05<58:08, 12.41s/it]08/04/2024 05:33:35 - INFO - __main__ -   Step: 9219, LR: 6.082948683599169e-07, Loss: 450.99755859375
2024-08-04T12:33:48.396348436Z 
 97%|█████████▋| 9220/9500 [31:36:18<57:55, 12.41s/it]08/04/2024 05:33:48 - INFO - __main__ -   Step: 9220, LR: 6.061243246726379e-07, Loss: 377.0074768066406
2024-08-04T12:34:00.650499392Z 
 97%|█████████▋| 9221/9500 [31:36:30<57:29, 12.36s/it]08/04/2024 05:34:00 - INFO - __main__ -   Step: 9221, LR: 6.03953780985359e-07, Loss: 360.6246032714844
2024-08-04T12:34:13.510995497Z 
 97%|█████████▋| 9222/9500 [31:36:43<57:58, 12.51s/it]08/04/2024 05:34:13 - INFO - __main__ -   Step: 9222, LR: 6.017832372980802e-07, Loss: 453.77069091796875
2024-08-04T12:34:25.955352409Z 
 97%|█████████▋| 9223/9500 [31:36:55<57:40, 12.49s/it]08/04/2024 05:34:25 - INFO - __main__ -   Step: 9223, LR: 5.996126936108012e-07, Loss: 389.90576171875
2024-08-04T12:34:38.081715410Z 
 97%|█████████▋| 9224/9500 [31:37:08<56:57, 12.38s/it]08/04/2024 05:34:38 - INFO - __main__ -   Step: 9224, LR: 5.974421499235223e-07, Loss: 400.0167236328125
2024-08-04T12:34:50.580760373Z 
 97%|█████████▋| 9225/9500 [31:37:20<56:54, 12.42s/it]08/04/2024 05:34:50 - INFO - __main__ -   Step: 9225, LR: 5.952716062362434e-07, Loss: 324.8617248535156
2024-08-04T12:35:02.674673502Z 
 97%|█████████▋| 9226/9500 [31:37:32<56:15, 12.32s/it]08/04/2024 05:35:02 - INFO - __main__ -   Step: 9226, LR: 5.931010625489644e-07, Loss: 363.7344970703125
2024-08-04T12:35:15.274628441Z 
 97%|█████████▋| 9227/9500 [31:37:45<56:26, 12.40s/it]08/04/2024 05:35:15 - INFO - __main__ -   Step: 9227, LR: 5.909305188616855e-07, Loss: 462.79638671875
2024-08-04T12:35:27.814966201Z 
 97%|█████████▋| 9228/9500 [31:37:57<56:25, 12.45s/it]08/04/2024 05:35:27 - INFO - __main__ -   Step: 9228, LR: 5.887599751744067e-07, Loss: 409.2476806640625
2024-08-04T12:35:40.060119705Z 
 97%|█████████▋| 9229/9500 [31:38:09<55:56, 12.39s/it]08/04/2024 05:35:40 - INFO - __main__ -   Step: 9229, LR: 5.865894314871277e-07, Loss: 479.00933837890625
2024-08-04T12:35:52.440271424Z 
 97%|█████████▋| 9230/9500 [31:38:22<55:43, 12.38s/it]08/04/2024 05:35:52 - INFO - __main__ -   Step: 9230, LR: 5.844188877998487e-07, Loss: 423.5318908691406
2024-08-04T12:36:04.846801631Z 
 97%|█████████▋| 9231/9500 [31:38:34<55:33, 12.39s/it]08/04/2024 05:36:04 - INFO - __main__ -   Step: 9231, LR: 5.822483441125699e-07, Loss: 400.29107666015625
2024-08-04T12:36:17.226752710Z 
 97%|█████████▋| 9232/9500 [31:38:47<55:19, 12.39s/it]08/04/2024 05:36:17 - INFO - __main__ -   Step: 9232, LR: 5.80077800425291e-07, Loss: 418.58197021484375
2024-08-04T12:36:29.498251409Z 
 97%|█████████▋| 9233/9500 [31:38:59<54:58, 12.35s/it]08/04/2024 05:36:29 - INFO - __main__ -   Step: 9233, LR: 5.77907256738012e-07, Loss: 483.49078369140625
2024-08-04T12:36:42.025499308Z 
 97%|█████████▋| 9234/9500 [31:39:11<54:59, 12.40s/it]08/04/2024 05:36:42 - INFO - __main__ -   Step: 9234, LR: 5.757367130507331e-07, Loss: 410.5164794921875
2024-08-04T12:36:54.419092943Z 
 97%|█████████▋| 9235/9500 [31:39:24<54:46, 12.40s/it]08/04/2024 05:36:54 - INFO - __main__ -   Step: 9235, LR: 5.735661693634542e-07, Loss: 389.10748291015625
2024-08-04T12:37:06.812504138Z 
 97%|█████████▋| 9236/9500 [31:39:36<54:33, 12.40s/it]08/04/2024 05:37:06 - INFO - __main__ -   Step: 9236, LR: 5.713956256761752e-07, Loss: 459.0499267578125
2024-08-04T12:37:19.475397169Z 
 97%|█████████▋| 9237/9500 [31:39:49<54:41, 12.48s/it]08/04/2024 05:37:19 - INFO - __main__ -   Step: 9237, LR: 5.692250819888964e-07, Loss: 348.309326171875
2024-08-04T12:37:31.798738078Z 
 97%|█████████▋| 9238/9500 [31:40:01<54:17, 12.43s/it]08/04/2024 05:37:31 - INFO - __main__ -   Step: 9238, LR: 5.670545383016174e-07, Loss: 402.73126220703125
2024-08-04T12:37:43.907169028Z 
 97%|█████████▋| 9239/9500 [31:40:13<53:39, 12.33s/it]08/04/2024 05:37:43 - INFO - __main__ -   Step: 9239, LR: 5.648839946143385e-07, Loss: 393.1446533203125
2024-08-04T12:37:56.194351165Z 
 97%|█████████▋| 9240/9500 [31:40:26<53:23, 12.32s/it]08/04/2024 05:37:56 - INFO - __main__ -   Step: 9240, LR: 5.627134509270596e-07, Loss: 459.37237548828125
2024-08-04T12:38:09.011569811Z 
 97%|█████████▋| 9241/9500 [31:40:38<53:49, 12.47s/it]08/04/2024 05:38:09 - INFO - __main__ -   Step: 9241, LR: 5.605429072397807e-07, Loss: 448.13775634765625
2024-08-04T12:38:21.272151015Z 
 97%|█████████▋| 9242/9500 [31:40:51<53:20, 12.41s/it]08/04/2024 05:38:21 - INFO - __main__ -   Step: 9242, LR: 5.583723635525018e-07, Loss: 348.5767822265625
2024-08-04T12:38:33.447547225Z 
 97%|█████████▋| 9243/9500 [31:41:03<52:50, 12.34s/it]08/04/2024 05:38:33 - INFO - __main__ -   Step: 9243, LR: 5.562018198652229e-07, Loss: 440.9313659667969
2024-08-04T12:38:45.999583382Z 
 97%|█████████▋| 9244/9500 [31:41:15<52:54, 12.40s/it]08/04/2024 05:38:45 - INFO - __main__ -   Step: 9244, LR: 5.540312761779439e-07, Loss: 435.1265869140625
2024-08-04T12:38:58.065239155Z 
 97%|█████████▋| 9245/9500 [31:41:28<52:16, 12.30s/it]08/04/2024 05:38:58 - INFO - __main__ -   Step: 9245, LR: 5.51860732490665e-07, Loss: 399.6907958984375
2024-08-04T12:39:10.410618950Z 
 97%|█████████▋| 9246/9500 [31:41:40<52:07, 12.31s/it]08/04/2024 05:39:10 - INFO - __main__ -   Step: 9246, LR: 5.496901888033861e-07, Loss: 335.6025085449219
2024-08-04T12:39:23.058634127Z 
 97%|█████████▋| 9247/9500 [31:41:52<52:20, 12.41s/it]08/04/2024 05:39:23 - INFO - __main__ -   Step: 9247, LR: 5.475196451161072e-07, Loss: 376.0292663574219
2024-08-04T12:39:35.400658540Z 
 97%|█████████▋| 9248/9500 [31:42:05<52:02, 12.39s/it]08/04/2024 05:39:35 - INFO - __main__ -   Step: 9248, LR: 5.453491014288282e-07, Loss: 469.92425537109375
2024-08-04T12:39:47.715928925Z 
 97%|█████████▋| 9249/9500 [31:42:17<51:44, 12.37s/it]08/04/2024 05:39:47 - INFO - __main__ -   Step: 9249, LR: 5.431785577415493e-07, Loss: 346.88458251953125
2024-08-04T12:40:00.540744016Z 
 97%|█████████▋| 9250/9500 [31:42:30<52:06, 12.51s/it]08/04/2024 05:40:00 - INFO - __main__ -   Step: 9250, LR: 5.410080140542704e-07, Loss: 410.9801025390625
2024-08-04T12:40:12.769070368Z 
 97%|█████████▋| 9251/9500 [31:42:42<51:33, 12.42s/it]08/04/2024 05:40:12 - INFO - __main__ -   Step: 9251, LR: 5.388374703669915e-07, Loss: 407.87811279296875
2024-08-04T12:40:24.958078115Z 
 97%|█████████▋| 9252/9500 [31:42:54<51:03, 12.35s/it]08/04/2024 05:40:24 - INFO - __main__ -   Step: 9252, LR: 5.366669266797126e-07, Loss: 461.7837829589844
2024-08-04T12:40:37.550617049Z 
 97%|█████████▋| 9253/9500 [31:43:07<51:08, 12.42s/it]08/04/2024 05:40:37 - INFO - __main__ -   Step: 9253, LR: 5.344963829924336e-07, Loss: 462.41595458984375
2024-08-04T12:40:49.571857255Z 
 97%|█████████▋| 9254/9500 [31:43:19<50:26, 12.30s/it]08/04/2024 05:40:49 - INFO - __main__ -   Step: 9254, LR: 5.323258393051547e-07, Loss: 368.85125732421875
2024-08-04T12:41:01.693297754Z 
 97%|█████████▋| 9255/9500 [31:43:31<50:00, 12.25s/it]08/04/2024 05:41:01 - INFO - __main__ -   Step: 9255, LR: 5.301552956178759e-07, Loss: 353.4144592285156
2024-08-04T12:41:14.403192174Z 
 97%|█████████▋| 9256/9500 [31:43:44<50:22, 12.39s/it]08/04/2024 05:41:14 - INFO - __main__ -   Step: 9256, LR: 5.279847519305969e-07, Loss: 416.3118896484375
2024-08-04T12:41:26.714434970Z 
 97%|█████████▋| 9257/9500 [31:43:56<50:04, 12.36s/it]08/04/2024 05:41:26 - INFO - __main__ -   Step: 9257, LR: 5.25814208243318e-07, Loss: 403.9329833984375
2024-08-04T12:41:39.074801429Z 
 97%|█████████▋| 9258/9500 [31:44:09<49:51, 12.36s/it]08/04/2024 05:41:39 - INFO - __main__ -   Step: 9258, LR: 5.236436645560391e-07, Loss: 404.9554443359375
2024-08-04T12:41:51.839532478Z 
 97%|█████████▋| 9259/9500 [31:44:21<50:08, 12.48s/it]08/04/2024 05:41:51 - INFO - __main__ -   Step: 9259, LR: 5.214731208687602e-07, Loss: 361.80181884765625
2024-08-04T12:42:03.811439288Z 
 97%|█████████▋| 9260/9500 [31:44:33<49:19, 12.33s/it]08/04/2024 05:42:03 - INFO - __main__ -   Step: 9260, LR: 5.193025771814812e-07, Loss: 369.81396484375
2024-08-04T12:42:16.460595248Z 
 97%|█████████▋| 9261/9500 [31:44:46<49:29, 12.43s/it]08/04/2024 05:42:16 - INFO - __main__ -   Step: 9261, LR: 5.171320334942024e-07, Loss: 407.8068542480469
2024-08-04T12:42:29.139203435Z 
 97%|█████████▋| 9262/9500 [31:44:59<49:35, 12.50s/it]08/04/2024 05:42:29 - INFO - __main__ -   Step: 9262, LR: 5.149614898069234e-07, Loss: 426.9917297363281
2024-08-04T12:42:41.158471448Z 
 98%|█████████▊| 9263/9500 [31:45:11<48:48, 12.36s/it]08/04/2024 05:42:41 - INFO - __main__ -   Step: 9263, LR: 5.127909461196444e-07, Loss: 472.6517333984375
2024-08-04T12:42:53.589430764Z 
 98%|█████████▊| 9264/9500 [31:45:23<48:41, 12.38s/it]08/04/2024 05:42:53 - INFO - __main__ -   Step: 9264, LR: 5.106204024323656e-07, Loss: 386.22930908203125
2024-08-04T12:43:06.081311566Z 
 98%|█████████▊| 9265/9500 [31:45:36<48:37, 12.41s/it]08/04/2024 05:43:06 - INFO - __main__ -   Step: 9265, LR: 5.084498587450867e-07, Loss: 526.5409545898438
2024-08-04T12:43:18.401896349Z 
 98%|█████████▊| 9266/9500 [31:45:48<48:18, 12.39s/it]08/04/2024 05:43:18 - INFO - __main__ -   Step: 9266, LR: 5.062793150578077e-07, Loss: 486.4971923828125
2024-08-04T12:43:31.021702638Z 
 98%|█████████▊| 9267/9500 [31:46:00<48:22, 12.46s/it]08/04/2024 05:43:31 - INFO - __main__ -   Step: 9267, LR: 5.041087713705288e-07, Loss: 471.7972412109375
2024-08-04T12:43:43.633288367Z 
 98%|█████████▊| 9268/9500 [31:46:13<48:20, 12.50s/it]08/04/2024 05:43:43 - INFO - __main__ -   Step: 9268, LR: 5.019382276832499e-07, Loss: 377.1998291015625
2024-08-04T12:43:55.722937765Z 
 98%|█████████▊| 9269/9500 [31:46:25<47:39, 12.38s/it]08/04/2024 05:43:55 - INFO - __main__ -   Step: 9269, LR: 4.99767683995971e-07, Loss: 372.5799255371094
2024-08-04T12:44:07.980902494Z 
 98%|█████████▊| 9270/9500 [31:46:37<47:18, 12.34s/it]08/04/2024 05:44:07 - INFO - __main__ -   Step: 9270, LR: 4.975971403086921e-07, Loss: 442.3176574707031
2024-08-04T12:44:20.912804582Z 
 98%|█████████▊| 9271/9500 [31:46:50<47:46, 12.52s/it]08/04/2024 05:44:20 - INFO - __main__ -   Step: 9271, LR: 4.954265966214131e-07, Loss: 395.81048583984375
2024-08-04T12:44:33.396706795Z 
 98%|█████████▊| 9272/9500 [31:47:03<47:31, 12.51s/it]08/04/2024 05:44:33 - INFO - __main__ -   Step: 9272, LR: 4.932560529341342e-07, Loss: 436.2275390625
2024-08-04T12:44:45.663969766Z 
 98%|█████████▊| 9273/9500 [31:47:15<47:03, 12.44s/it]08/04/2024 05:44:45 - INFO - __main__ -   Step: 9273, LR: 4.910855092468553e-07, Loss: 397.0479431152344
2024-08-04T12:44:58.449309927Z 
 98%|█████████▊| 9274/9500 [31:47:28<47:14, 12.54s/it]08/04/2024 05:44:58 - INFO - __main__ -   Step: 9274, LR: 4.889149655595764e-07, Loss: 417.6488037109375
2024-08-04T12:45:10.408558914Z 
 98%|█████████▊| 9275/9500 [31:47:40<46:22, 12.37s/it]08/04/2024 05:45:10 - INFO - __main__ -   Step: 9275, LR: 4.867444218722975e-07, Loss: 390.3406982421875
2024-08-04T12:45:22.489057686Z 
 98%|█████████▊| 9276/9500 [31:47:52<45:50, 12.28s/it]08/04/2024 05:45:22 - INFO - __main__ -   Step: 9276, LR: 4.845738781850186e-07, Loss: 351.2457275390625
2024-08-04T12:45:34.917361000Z 
 98%|█████████▊| 9277/9500 [31:48:04<45:48, 12.33s/it]08/04/2024 05:45:34 - INFO - __main__ -   Step: 9277, LR: 4.824033344977396e-07, Loss: 403.87744140625
2024-08-04T12:45:47.177498115Z 
 98%|█████████▊| 9278/9500 [31:48:17<45:31, 12.31s/it]08/04/2024 05:45:47 - INFO - __main__ -   Step: 9278, LR: 4.802327908104607e-07, Loss: 415.18060302734375
2024-08-04T12:45:59.048407372Z 
 98%|█████████▊| 9279/9500 [31:48:28<44:50, 12.18s/it]08/04/2024 05:45:59 - INFO - __main__ -   Step: 9279, LR: 4.780622471231819e-07, Loss: 332.19049072265625
2024-08-04T12:46:11.623984659Z 
 98%|█████████▊| 9280/9500 [31:48:41<45:04, 12.30s/it]08/04/2024 05:46:11 - INFO - __main__ -   Step: 9280, LR: 4.7589170343590284e-07, Loss: 359.4311218261719
2024-08-04T12:46:23.772154869Z 
 98%|█████████▊| 9281/9500 [31:48:53<44:42, 12.25s/it]08/04/2024 05:46:23 - INFO - __main__ -   Step: 9281, LR: 4.73721159748624e-07, Loss: 467.61181640625
2024-08-04T12:46:35.787135883Z 
 98%|█████████▊| 9282/9500 [31:49:05<44:15, 12.18s/it]08/04/2024 05:46:35 - INFO - __main__ -   Step: 9282, LR: 4.7155061606134506e-07, Loss: 327.87652587890625
2024-08-04T12:46:47.869313484Z 
 98%|█████████▊| 9283/9500 [31:49:17<43:56, 12.15s/it]08/04/2024 05:46:47 - INFO - __main__ -   Step: 9283, LR: 4.693800723740661e-07, Loss: 348.517822265625
2024-08-04T12:47:00.682360388Z 
 98%|█████████▊| 9284/9500 [31:49:30<44:27, 12.35s/it]08/04/2024 05:47:00 - INFO - __main__ -   Step: 9284, LR: 4.6720952868678717e-07, Loss: 440.28460693359375
2024-08-04T12:47:12.922839456Z 
 98%|█████████▊| 9285/9500 [31:49:42<44:08, 12.32s/it]08/04/2024 05:47:12 - INFO - __main__ -   Step: 9285, LR: 4.650389849995083e-07, Loss: 360.94219970703125
2024-08-04T12:47:25.283096922Z 
 98%|█████████▊| 9286/9500 [31:49:55<43:58, 12.33s/it]08/04/2024 05:47:25 - INFO - __main__ -   Step: 9286, LR: 4.6286844131222934e-07, Loss: 351.26031494140625
2024-08-04T12:47:38.043897264Z 
 98%|█████████▊| 9287/9500 [31:50:07<44:13, 12.46s/it]08/04/2024 05:47:38 - INFO - __main__ -   Step: 9287, LR: 4.606978976249504e-07, Loss: 427.90985107421875
2024-08-04T12:47:50.188727635Z 
 98%|█████████▊| 9288/9500 [31:50:20<43:41, 12.36s/it]08/04/2024 05:47:50 - INFO - __main__ -   Step: 9288, LR: 4.5852735393767156e-07, Loss: 296.291748046875
2024-08-04T12:48:02.605777357Z 
 98%|█████████▊| 9289/9500 [31:50:32<43:32, 12.38s/it]08/04/2024 05:48:02 - INFO - __main__ -   Step: 9289, LR: 4.5635681025039264e-07, Loss: 391.777099609375
2024-08-04T12:48:14.775377983Z 
 98%|█████████▊| 9290/9500 [31:50:44<43:06, 12.32s/it]08/04/2024 05:48:14 - INFO - __main__ -   Step: 9290, LR: 4.5418626656311367e-07, Loss: 315.552490234375
2024-08-04T12:48:26.648886549Z 
 98%|█████████▊| 9291/9500 [31:50:56<42:26, 12.18s/it]08/04/2024 05:48:26 - INFO - __main__ -   Step: 9291, LR: 4.5201572287583475e-07, Loss: 340.3504333496094
2024-08-04T12:48:38.856110331Z 
 98%|█████████▊| 9292/9500 [31:51:08<42:15, 12.19s/it]08/04/2024 05:48:38 - INFO - __main__ -   Step: 9292, LR: 4.498451791885559e-07, Loss: 396.32635498046875
2024-08-04T12:48:51.670579184Z 
 98%|█████████▊| 9293/9500 [31:51:21<42:42, 12.38s/it]08/04/2024 05:48:51 - INFO - __main__ -   Step: 9293, LR: 4.476746355012769e-07, Loss: 511.11041259765625
2024-08-04T12:49:03.428506525Z 
 98%|█████████▊| 9294/9500 [31:51:33<41:51, 12.19s/it]08/04/2024 05:49:03 - INFO - __main__ -   Step: 9294, LR: 4.45504091813998e-07, Loss: 368.0018310546875
2024-08-04T12:49:15.721222029Z 
 98%|█████████▊| 9295/9500 [31:51:45<41:45, 12.22s/it]08/04/2024 05:49:15 - INFO - __main__ -   Step: 9295, LR: 4.433335481267191e-07, Loss: 400.1829528808594
2024-08-04T12:49:28.216979422Z 
 98%|█████████▊| 9296/9500 [31:51:58<41:50, 12.30s/it]08/04/2024 05:49:28 - INFO - __main__ -   Step: 9296, LR: 4.411630044394401e-07, Loss: 449.89971923828125
2024-08-04T12:49:40.409536735Z 
 98%|█████████▊| 9297/9500 [31:52:10<41:30, 12.27s/it]08/04/2024 05:49:40 - INFO - __main__ -   Step: 9297, LR: 4.3899246075216125e-07, Loss: 389.9813232421875
2024-08-04T12:49:52.885423351Z 
 98%|█████████▊| 9298/9500 [31:52:22<41:31, 12.33s/it]08/04/2024 05:49:52 - INFO - __main__ -   Step: 9298, LR: 4.3682191706488234e-07, Loss: 399.9979248046875
2024-08-04T12:50:05.820315956Z 
 98%|█████████▊| 9299/9500 [31:52:35<41:55, 12.51s/it]08/04/2024 05:50:05 - INFO - __main__ -   Step: 9299, LR: 4.3465137337760347e-07, Loss: 387.7615966796875
2024-08-04T12:50:18.188077612Z 
 98%|█████████▊| 9300/9500 [31:52:48<41:33, 12.47s/it]08/04/2024 05:50:18 - INFO - __main__ -   Step: 9300, LR: 4.324808296903245e-07, Loss: 471.47064208984375
2024-08-04T12:50:30.370843776Z 
 98%|█████████▊| 9301/9500 [31:53:00<41:04, 12.38s/it]08/04/2024 05:50:30 - INFO - __main__ -   Step: 9301, LR: 4.303102860030456e-07, Loss: 525.3387451171875
2024-08-04T12:50:42.939966945Z 
 98%|█████████▊| 9302/9500 [31:53:12<41:02, 12.44s/it]08/04/2024 05:50:42 - INFO - __main__ -   Step: 9302, LR: 4.2813974231576667e-07, Loss: 324.9855651855469
2024-08-04T12:50:55.618605044Z 
 98%|█████████▊| 9303/9500 [31:53:25<41:04, 12.51s/it]08/04/2024 05:50:55 - INFO - __main__ -   Step: 9303, LR: 4.259691986284877e-07, Loss: 427.49432373046875
2024-08-04T12:51:08.037274400Z 
 98%|█████████▊| 9304/9500 [31:53:37<40:46, 12.48s/it]08/04/2024 05:51:08 - INFO - __main__ -   Step: 9304, LR: 4.2379865494120883e-07, Loss: 395.73443603515625
2024-08-04T12:51:20.856668552Z 
 98%|█████████▊| 9305/9500 [31:53:50<40:53, 12.58s/it]08/04/2024 05:51:20 - INFO - __main__ -   Step: 9305, LR: 4.216281112539299e-07, Loss: 421.6004943847656
2024-08-04T12:51:32.864083096Z 
 98%|█████████▊| 9306/9500 [31:54:02<40:07, 12.41s/it]08/04/2024 05:51:32 - INFO - __main__ -   Step: 9306, LR: 4.1945756756665095e-07, Loss: 330.6829528808594
2024-08-04T12:51:44.882097479Z 
 98%|█████████▊| 9307/9500 [31:54:14<39:32, 12.29s/it]08/04/2024 05:51:44 - INFO - __main__ -   Step: 9307, LR: 4.172870238793721e-07, Loss: 340.1131286621094
2024-08-04T12:51:57.539203668Z 
 98%|█████████▊| 9308/9500 [31:54:27<39:41, 12.40s/it]08/04/2024 05:51:57 - INFO - __main__ -   Step: 9308, LR: 4.1511648019209317e-07, Loss: 330.02783203125
2024-08-04T12:52:09.592913861Z 
 98%|█████████▊| 9309/9500 [31:54:39<39:08, 12.30s/it]08/04/2024 05:52:09 - INFO - __main__ -   Step: 9309, LR: 4.1294593650481425e-07, Loss: 409.028564453125
2024-08-04T12:52:21.939766344Z 
 98%|█████████▊| 9310/9500 [31:54:51<38:59, 12.31s/it]08/04/2024 05:52:21 - INFO - __main__ -   Step: 9310, LR: 4.107753928175353e-07, Loss: 387.467529296875
2024-08-04T12:52:34.391631531Z 
 98%|█████████▊| 9311/9500 [31:55:04<38:54, 12.35s/it]08/04/2024 05:52:34 - INFO - __main__ -   Step: 9311, LR: 4.086048491302564e-07, Loss: 361.31585693359375
2024-08-04T12:52:46.713079888Z 
 98%|█████████▊| 9312/9500 [31:55:16<38:40, 12.34s/it]08/04/2024 05:52:46 - INFO - __main__ -   Step: 9312, LR: 4.064343054429775e-07, Loss: 365.93170166015625
2024-08-04T12:52:58.971550624Z 
 98%|█████████▊| 9313/9500 [31:55:28<38:23, 12.32s/it]08/04/2024 05:52:58 - INFO - __main__ -   Step: 9313, LR: 4.0426376175569853e-07, Loss: 400.17083740234375
2024-08-04T12:53:11.560562698Z 
 98%|█████████▊| 9314/9500 [31:55:41<38:26, 12.40s/it]08/04/2024 05:53:11 - INFO - __main__ -   Step: 9314, LR: 4.0209321806841966e-07, Loss: 433.22064208984375
2024-08-04T12:53:23.829182619Z 
 98%|█████████▊| 9315/9500 [31:55:53<38:06, 12.36s/it]08/04/2024 05:53:23 - INFO - __main__ -   Step: 9315, LR: 3.9992267438114075e-07, Loss: 354.191650390625
2024-08-04T12:53:36.260627650Z 
 98%|█████████▊| 9316/9500 [31:56:06<37:58, 12.38s/it]08/04/2024 05:53:36 - INFO - __main__ -   Step: 9316, LR: 3.977521306938618e-07, Loss: 495.91705322265625
2024-08-04T12:53:48.955337227Z 
 98%|█████████▊| 9317/9500 [31:56:18<38:03, 12.48s/it]08/04/2024 05:53:48 - INFO - __main__ -   Step: 9317, LR: 3.9558158700658286e-07, Loss: 424.4874572753906
2024-08-04T12:54:01.326292504Z 
 98%|█████████▊| 9318/9500 [31:56:31<37:44, 12.44s/it]08/04/2024 05:54:01 - INFO - __main__ -   Step: 9318, LR: 3.93411043319304e-07, Loss: 448.156982421875
2024-08-04T12:54:13.688898980Z 
 98%|█████████▊| 9319/9500 [31:56:43<37:27, 12.42s/it]08/04/2024 05:54:13 - INFO - __main__ -   Step: 9319, LR: 3.912404996320251e-07, Loss: 371.3393249511719
2024-08-04T12:54:26.241278821Z 
 98%|█████████▊| 9320/9500 [31:56:56<37:22, 12.46s/it]08/04/2024 05:54:26 - INFO - __main__ -   Step: 9320, LR: 3.890699559447461e-07, Loss: 346.8170166015625
2024-08-04T12:54:38.666947853Z 
 98%|█████████▊| 9321/9500 [31:57:08<37:08, 12.45s/it]08/04/2024 05:54:38 - INFO - __main__ -   Step: 9321, LR: 3.868994122574672e-07, Loss: 369.38751220703125
2024-08-04T12:54:50.969046714Z 
 98%|█████████▊| 9322/9500 [31:57:20<36:48, 12.41s/it]08/04/2024 05:54:50 - INFO - __main__ -   Step: 9322, LR: 3.8472886857018833e-07, Loss: 427.1194763183594
2024-08-04T12:55:03.482826149Z 
 98%|█████████▊| 9323/9500 [31:57:33<36:41, 12.44s/it]08/04/2024 05:55:03 - INFO - __main__ -   Step: 9323, LR: 3.8255832488290936e-07, Loss: 459.77484130859375
2024-08-04T12:55:15.896252507Z 
 98%|█████████▊| 9324/9500 [31:57:45<36:27, 12.43s/it]08/04/2024 05:55:15 - INFO - __main__ -   Step: 9324, LR: 3.8038778119563044e-07, Loss: 378.81219482421875
2024-08-04T12:55:28.115301833Z 
 98%|█████████▊| 9325/9500 [31:57:58<36:04, 12.37s/it]08/04/2024 05:55:28 - INFO - __main__ -   Step: 9325, LR: 3.782172375083516e-07, Loss: 410.09539794921875
2024-08-04T12:55:40.270585663Z 
 98%|█████████▊| 9326/9500 [31:58:10<35:40, 12.30s/it]08/04/2024 05:55:40 - INFO - __main__ -   Step: 9326, LR: 3.760466938210726e-07, Loss: 355.50250244140625
2024-08-04T12:55:52.621255442Z 
 98%|█████████▊| 9327/9500 [31:58:22<35:30, 12.32s/it]08/04/2024 05:55:52 - INFO - __main__ -   Step: 9327, LR: 3.738761501337937e-07, Loss: 228.74407958984375
2024-08-04T12:56:04.885098453Z 
 98%|█████████▊| 9328/9500 [31:58:34<35:15, 12.30s/it]08/04/2024 05:56:04 - INFO - __main__ -   Step: 9328, LR: 3.717056064465148e-07, Loss: 454.6563720703125
2024-08-04T12:56:17.154463602Z 
 98%|█████████▊| 9329/9500 [31:58:47<35:01, 12.29s/it]08/04/2024 05:56:17 - INFO - __main__ -   Step: 9329, LR: 3.695350627592359e-07, Loss: 327.29473876953125
2024-08-04T12:56:29.997738586Z 
 98%|█████████▊| 9330/9500 [31:58:59<35:17, 12.46s/it]08/04/2024 05:56:29 - INFO - __main__ -   Step: 9330, LR: 3.6736451907195694e-07, Loss: 465.3044128417969
2024-08-04T12:56:41.993981925Z 
 98%|█████████▊| 9331/9500 [31:59:11<34:41, 12.32s/it]08/04/2024 05:56:41 - INFO - __main__ -   Step: 9331, LR: 3.65193975384678e-07, Loss: 376.849365234375
2024-08-04T12:56:54.320935230Z 
 98%|█████████▊| 9332/9500 [31:59:24<34:29, 12.32s/it]08/04/2024 05:56:54 - INFO - __main__ -   Step: 9332, LR: 3.6302343169739916e-07, Loss: 416.5650634765625
2024-08-04T12:57:06.545285610Z 
 98%|█████████▊| 9333/9500 [31:59:36<34:12, 12.29s/it]08/04/2024 05:57:06 - INFO - __main__ -   Step: 9333, LR: 3.608528880101202e-07, Loss: 326.83831787109375
2024-08-04T12:57:18.655848700Z 
 98%|█████████▊| 9334/9500 [31:59:48<33:51, 12.24s/it]08/04/2024 05:57:18 - INFO - __main__ -   Step: 9334, LR: 3.5868234432284127e-07, Loss: 440.4454345703125
2024-08-04T12:57:31.210900460Z 
 98%|█████████▊| 9335/9500 [32:00:01<33:54, 12.33s/it]08/04/2024 05:57:31 - INFO - __main__ -   Step: 9335, LR: 3.5651180063556236e-07, Loss: 393.0052490234375
2024-08-04T12:57:43.523901117Z 
 98%|█████████▊| 9336/9500 [32:00:13<33:41, 12.33s/it]08/04/2024 05:57:43 - INFO - __main__ -   Step: 9336, LR: 3.543412569482834e-07, Loss: 234.52554321289062
2024-08-04T12:57:55.576013894Z 
 98%|█████████▊| 9337/9500 [32:00:25<33:15, 12.24s/it]08/04/2024 05:57:55 - INFO - __main__ -   Step: 9337, LR: 3.521707132610045e-07, Loss: 424.3532409667969
2024-08-04T12:58:07.714053321Z 
 98%|█████████▊| 9338/9500 [32:00:37<32:58, 12.21s/it]08/04/2024 05:58:07 - INFO - __main__ -   Step: 9338, LR: 3.500001695737256e-07, Loss: 454.96331787109375
2024-08-04T12:58:20.332401387Z 
 98%|█████████▊| 9339/9500 [32:00:50<33:05, 12.33s/it]08/04/2024 05:58:20 - INFO - __main__ -   Step: 9339, LR: 3.478296258864467e-07, Loss: 407.0538024902344
2024-08-04T12:58:32.663665388Z 
 98%|█████████▊| 9340/9500 [32:01:02<32:53, 12.33s/it]08/04/2024 05:58:32 - INFO - __main__ -   Step: 9340, LR: 3.456590821991677e-07, Loss: 345.4796142578125
2024-08-04T12:58:44.608566112Z 
 98%|█████████▊| 9341/9500 [32:01:14<32:22, 12.22s/it]08/04/2024 05:58:44 - INFO - __main__ -   Step: 9341, LR: 3.4348853851188885e-07, Loss: 315.213134765625
2024-08-04T12:58:57.289258486Z 
 98%|█████████▊| 9342/9500 [32:01:27<32:32, 12.36s/it]08/04/2024 05:58:57 - INFO - __main__ -   Step: 9342, LR: 3.4131799482460994e-07, Loss: 308.4410095214844
2024-08-04T12:59:09.311157667Z 
 98%|█████████▊| 9343/9500 [32:01:39<32:04, 12.26s/it]08/04/2024 05:59:09 - INFO - __main__ -   Step: 9343, LR: 3.3914745113733097e-07, Loss: 334.64141845703125
2024-08-04T12:59:21.686171450Z 
 98%|█████████▊| 9344/9500 [32:01:51<31:57, 12.29s/it]08/04/2024 05:59:21 - INFO - __main__ -   Step: 9344, LR: 3.369769074500521e-07, Loss: 422.4571533203125
2024-08-04T12:59:34.087046976Z 
 98%|█████████▊| 9345/9500 [32:02:04<31:50, 12.32s/it]08/04/2024 05:59:34 - INFO - __main__ -   Step: 9345, LR: 3.348063637627732e-07, Loss: 388.85308837890625
2024-08-04T12:59:46.547045275Z 
 98%|█████████▊| 9346/9500 [32:02:16<31:44, 12.37s/it]08/04/2024 05:59:46 - INFO - __main__ -   Step: 9346, LR: 3.326358200754942e-07, Loss: 337.9453125
2024-08-04T12:59:58.539304798Z 
 98%|█████████▊| 9347/9500 [32:02:28<31:14, 12.25s/it]08/04/2024 05:59:58 - INFO - __main__ -   Step: 9347, LR: 3.304652763882153e-07, Loss: 372.0475158691406
2024-08-04T13:00:11.123817557Z 
 98%|█████████▊| 9348/9500 [32:02:41<31:17, 12.35s/it]08/04/2024 06:00:11 - INFO - __main__ -   Step: 9348, LR: 3.2829473270093644e-07, Loss: 463.28839111328125
2024-08-04T13:00:23.081236564Z 
 98%|█████████▊| 9349/9500 [32:02:53<30:47, 12.23s/it]08/04/2024 06:00:23 - INFO - __main__ -   Step: 9349, LR: 3.261241890136575e-07, Loss: 374.4065246582031
2024-08-04T13:00:35.063323063Z 
 98%|█████████▊| 9350/9500 [32:03:05<30:23, 12.16s/it]08/04/2024 06:00:35 - INFO - __main__ -   Step: 9350, LR: 3.2395364532637855e-07, Loss: 368.8919372558594
2024-08-04T13:00:47.386550422Z 
 98%|█████████▊| 9351/9500 [32:03:17<30:18, 12.21s/it]08/04/2024 06:00:47 - INFO - __main__ -   Step: 9351, LR: 3.217831016390997e-07, Loss: 359.2042236328125
2024-08-04T13:00:59.796046928Z 
 98%|█████████▊| 9352/9500 [32:03:29<30:15, 12.27s/it]08/04/2024 06:00:59 - INFO - __main__ -   Step: 9352, LR: 3.1961255795182077e-07, Loss: 459.8378601074219
2024-08-04T13:01:11.799841422Z 
 98%|█████████▊| 9353/9500 [32:03:41<29:51, 12.19s/it]08/04/2024 06:01:11 - INFO - __main__ -   Step: 9353, LR: 3.174420142645418e-07, Loss: 435.757080078125
2024-08-04T13:01:24.631146314Z 
 98%|█████████▊| 9354/9500 [32:03:54<30:07, 12.38s/it]08/04/2024 06:01:24 - INFO - __main__ -   Step: 9354, LR: 3.152714705772629e-07, Loss: 470.2771911621094
2024-08-04T13:01:36.933950552Z 
 98%|█████████▊| 9355/9500 [32:04:06<29:51, 12.36s/it]08/04/2024 06:01:36 - INFO - __main__ -   Step: 9355, LR: 3.13100926889984e-07, Loss: 487.4404296875
2024-08-04T13:01:48.861759161Z 
 98%|█████████▊| 9356/9500 [32:04:18<29:20, 12.23s/it]08/04/2024 06:01:48 - INFO - __main__ -   Step: 9356, LR: 3.1093038320270505e-07, Loss: 334.5075378417969
2024-08-04T13:02:01.712417196Z 
 98%|█████████▊| 9357/9500 [32:04:31<29:35, 12.42s/it]08/04/2024 06:02:01 - INFO - __main__ -   Step: 9357, LR: 3.0875983951542613e-07, Loss: 405.45611572265625
2024-08-04T13:02:13.674233535Z 
 99%|█████████▊| 9358/9500 [32:04:43<29:03, 12.28s/it]08/04/2024 06:02:13 - INFO - __main__ -   Step: 9358, LR: 3.065892958281472e-07, Loss: 304.2440185546875
2024-08-04T13:02:25.729085856Z 
 99%|█████████▊| 9359/9500 [32:04:55<28:41, 12.21s/it]08/04/2024 06:02:25 - INFO - __main__ -   Step: 9359, LR: 3.044187521408683e-07, Loss: 389.914306640625
2024-08-04T13:02:38.204393056Z 
 99%|█████████▊| 9360/9500 [32:05:08<28:40, 12.29s/it]08/04/2024 06:02:38 - INFO - __main__ -   Step: 9360, LR: 3.022482084535894e-07, Loss: 390.45428466796875
2024-08-04T13:02:50.793690610Z 
 99%|█████████▊| 9361/9500 [32:05:20<28:40, 12.38s/it]08/04/2024 06:02:50 - INFO - __main__ -   Step: 9361, LR: 3.0007766476631046e-07, Loss: 407.512939453125
2024-08-04T13:03:03.007843221Z 
 99%|█████████▊| 9362/9500 [32:05:32<28:21, 12.33s/it]08/04/2024 06:03:03 - INFO - __main__ -   Step: 9362, LR: 2.9790712107903155e-07, Loss: 357.8170166015625
2024-08-04T13:03:15.460177471Z 
 99%|█████████▊| 9363/9500 [32:05:45<28:14, 12.37s/it]08/04/2024 06:03:15 - INFO - __main__ -   Step: 9363, LR: 2.9573657739175263e-07, Loss: 413.3302001953125
2024-08-04T13:03:27.587936291Z 
 99%|█████████▊| 9364/9500 [32:05:57<27:52, 12.30s/it]08/04/2024 06:03:27 - INFO - __main__ -   Step: 9364, LR: 2.935660337044737e-07, Loss: 325.5274353027344
2024-08-04T13:03:39.726711698Z 
 99%|█████████▊| 9365/9500 [32:06:09<27:33, 12.25s/it]08/04/2024 06:03:39 - INFO - __main__ -   Step: 9365, LR: 2.913954900171948e-07, Loss: 372.83642578125
2024-08-04T13:03:52.221609730Z 
 99%|█████████▊| 9366/9500 [32:06:22<27:31, 12.32s/it]08/04/2024 06:03:52 - INFO - __main__ -   Step: 9366, LR: 2.892249463299159e-07, Loss: 429.17156982421875
2024-08-04T13:04:04.439968584Z 
 99%|█████████▊| 9367/9500 [32:06:34<27:14, 12.29s/it]08/04/2024 06:04:04 - INFO - __main__ -   Step: 9367, LR: 2.8705440264263696e-07, Loss: 373.07403564453125
2024-08-04T13:04:16.509329257Z 
 99%|█████████▊| 9368/9500 [32:06:46<26:53, 12.22s/it]08/04/2024 06:04:16 - INFO - __main__ -   Step: 9368, LR: 2.8488385895535804e-07, Loss: 380.9656677246094
2024-08-04T13:04:28.932139045Z 
 99%|█████████▊| 9369/9500 [32:06:58<26:49, 12.28s/it]08/04/2024 06:04:28 - INFO - __main__ -   Step: 9369, LR: 2.8271331526807913e-07, Loss: 449.124755859375
2024-08-04T13:04:41.473851076Z 
 99%|█████████▊| 9370/9500 [32:07:11<26:46, 12.36s/it]08/04/2024 06:04:41 - INFO - __main__ -   Step: 9370, LR: 2.805427715808002e-07, Loss: 453.63677978515625
2024-08-04T13:04:53.704380397Z 
 99%|█████████▊| 9371/9500 [32:07:23<26:29, 12.32s/it]08/04/2024 06:04:53 - INFO - __main__ -   Step: 9371, LR: 2.783722278935213e-07, Loss: 342.8057556152344
2024-08-04T13:05:06.088338832Z 
 99%|█████████▊| 9372/9500 [32:07:36<26:19, 12.34s/it]08/04/2024 06:05:06 - INFO - __main__ -   Step: 9372, LR: 2.762016842062424e-07, Loss: 563.890869140625
2024-08-04T13:05:18.705838230Z 
 99%|█████████▊| 9373/9500 [32:07:48<26:17, 12.42s/it]08/04/2024 06:05:18 - INFO - __main__ -   Step: 9373, LR: 2.7403114051896346e-07, Loss: 333.0992431640625
2024-08-04T13:05:31.260650837Z 
 99%|█████████▊| 9374/9500 [32:08:01<26:10, 12.46s/it]08/04/2024 06:05:31 - INFO - __main__ -   Step: 9374, LR: 2.7186059683168454e-07, Loss: 495.33447265625
2024-08-04T13:05:43.366900494Z 
 99%|█████████▊| 9375/9500 [32:08:13<25:44, 12.36s/it]08/04/2024 06:05:43 - INFO - __main__ -   Step: 9375, LR: 2.6969005314440557e-07, Loss: 354.9578857421875
2024-08-04T13:05:55.982114218Z 
 99%|█████████▊| 9376/9500 [32:08:25<25:41, 12.43s/it]08/04/2024 06:05:55 - INFO - __main__ -   Step: 9376, LR: 2.675195094571267e-07, Loss: 529.747314453125
2024-08-04T13:06:07.945038086Z 
 99%|█████████▊| 9377/9500 [32:08:37<25:11, 12.29s/it]08/04/2024 06:06:07 - INFO - __main__ -   Step: 9377, LR: 2.653489657698478e-07, Loss: 283.29541015625
2024-08-04T13:06:20.591379398Z 
 99%|█████████▊| 9378/9500 [32:08:50<25:12, 12.40s/it]08/04/2024 06:06:20 - INFO - __main__ -   Step: 9378, LR: 2.631784220825689e-07, Loss: 375.50469970703125
2024-08-04T13:06:33.216188149Z 
 99%|█████████▊| 9379/9500 [32:09:03<25:08, 12.47s/it]08/04/2024 06:06:33 - INFO - __main__ -   Step: 9379, LR: 2.6100787839528996e-07, Loss: 330.85284423828125
2024-08-04T13:06:45.683034162Z 
 99%|█████████▊| 9380/9500 [32:09:15<24:56, 12.47s/it]08/04/2024 06:06:45 - INFO - __main__ -   Step: 9380, LR: 2.58837334708011e-07, Loss: 345.845947265625
2024-08-04T13:06:57.896792559Z 
 99%|█████████▊| 9381/9500 [32:09:27<24:34, 12.39s/it]08/04/2024 06:06:57 - INFO - __main__ -   Step: 9381, LR: 2.566667910207321e-07, Loss: 449.3207702636719
2024-08-04T13:07:10.304387254Z 
 99%|█████████▉| 9382/9500 [32:09:40<24:22, 12.40s/it]08/04/2024 06:07:10 - INFO - __main__ -   Step: 9382, LR: 2.5449624733345316e-07, Loss: 345.3967590332031
2024-08-04T13:07:22.369255616Z 
 99%|█████████▉| 9383/9500 [32:09:52<23:58, 12.30s/it]08/04/2024 06:07:22 - INFO - __main__ -   Step: 9383, LR: 2.523257036461743e-07, Loss: 384.610595703125
2024-08-04T13:07:34.641420093Z 
 99%|█████████▉| 9384/9500 [32:10:04<23:45, 12.29s/it]08/04/2024 06:07:34 - INFO - __main__ -   Step: 9384, LR: 2.501551599588953e-07, Loss: 434.1494140625
2024-08-04T13:07:47.039261788Z 
 99%|█████████▉| 9385/9500 [32:10:16<23:37, 12.32s/it]08/04/2024 06:07:47 - INFO - __main__ -   Step: 9385, LR: 2.479846162716164e-07, Loss: 385.24462890625
2024-08-04T13:07:59.230530909Z 
 99%|█████████▉| 9386/9500 [32:10:29<23:20, 12.28s/it]08/04/2024 06:07:59 - INFO - __main__ -   Step: 9386, LR: 2.4581407258433754e-07, Loss: 385.37542724609375
2024-08-04T13:08:11.526901108Z 
 99%|█████████▉| 9387/9500 [32:10:41<23:08, 12.29s/it]08/04/2024 06:08:11 - INFO - __main__ -   Step: 9387, LR: 2.4364352889705857e-07, Loss: 409.57733154296875
2024-08-04T13:08:24.064058667Z 
 99%|█████████▉| 9388/9500 [32:10:54<23:04, 12.36s/it]08/04/2024 06:08:24 - INFO - __main__ -   Step: 9388, LR: 2.414729852097797e-07, Loss: 439.52117919921875
2024-08-04T13:08:36.383474731Z 
 99%|█████████▉| 9389/9500 [32:11:06<22:50, 12.35s/it]08/04/2024 06:08:36 - INFO - __main__ -   Step: 9389, LR: 2.3930244152250074e-07, Loss: 387.21710205078125
2024-08-04T13:08:48.387881021Z 
 99%|█████████▉| 9390/9500 [32:11:18<22:27, 12.25s/it]08/04/2024 06:08:48 - INFO - __main__ -   Step: 9390, LR: 2.3713189783522182e-07, Loss: 352.75494384765625
2024-08-04T13:09:01.256068101Z 
 99%|█████████▉| 9391/9500 [32:11:31<22:35, 12.43s/it]08/04/2024 06:09:01 - INFO - __main__ -   Step: 9391, LR: 2.3496135414794293e-07, Loss: 389.2654724121094
2024-08-04T13:09:13.257888277Z 
 99%|█████████▉| 9392/9500 [32:11:43<22:08, 12.30s/it]08/04/2024 06:09:13 - INFO - __main__ -   Step: 9392, LR: 2.3279081046066399e-07, Loss: 381.93548583984375
2024-08-04T13:09:25.772845077Z 
 99%|█████████▉| 9393/9500 [32:11:55<22:03, 12.37s/it]08/04/2024 06:09:25 - INFO - __main__ -   Step: 9393, LR: 2.306202667733851e-07, Loss: 434.95623779296875
2024-08-04T13:09:38.473348451Z 
 99%|█████████▉| 9394/9500 [32:12:08<22:01, 12.47s/it]08/04/2024 06:09:38 - INFO - __main__ -   Step: 9394, LR: 2.2844972308610615e-07, Loss: 388.56817626953125
2024-08-04T13:09:50.628149594Z 
 99%|█████████▉| 9395/9500 [32:12:20<21:39, 12.37s/it]08/04/2024 06:09:50 - INFO - __main__ -   Step: 9395, LR: 2.2627917939882724e-07, Loss: 338.02020263671875
2024-08-04T13:10:02.751826586Z 
 99%|█████████▉| 9396/9500 [32:12:32<21:19, 12.30s/it]08/04/2024 06:10:02 - INFO - __main__ -   Step: 9396, LR: 2.2410863571154834e-07, Loss: 485.89447021484375
2024-08-04T13:10:15.243679657Z 
 99%|█████████▉| 9397/9500 [32:12:45<21:12, 12.36s/it]08/04/2024 06:10:15 - INFO - __main__ -   Step: 9397, LR: 2.219380920242694e-07, Loss: 324.1754150390625
2024-08-04T13:10:27.533844183Z 
 99%|█████████▉| 9398/9500 [32:12:57<20:58, 12.34s/it]08/04/2024 06:10:27 - INFO - __main__ -   Step: 9398, LR: 2.197675483369905e-07, Loss: 412.92578125
2024-08-04T13:10:39.553734440Z 
 99%|█████████▉| 9399/9500 [32:13:09<20:36, 12.24s/it]08/04/2024 06:10:39 - INFO - __main__ -   Step: 9399, LR: 2.1759700464971157e-07, Loss: 392.03326416015625
2024-08-04T13:10:52.336002514Z 
 99%|█████████▉| 9400/9500 [32:13:22<20:40, 12.40s/it]08/04/2024 06:10:52 - INFO - __main__ -   Step: 9400, LR: 2.1542646096243265e-07, Loss: 432.5421447753906
2024-08-04T13:11:04.737659145Z 
 99%|█████████▉| 9401/9500 [32:13:34<20:27, 12.40s/it]08/04/2024 06:11:04 - INFO - __main__ -   Step: 9401, LR: 2.1325591727515373e-07, Loss: 405.54779052734375
2024-08-04T13:11:17.063436956Z 
 99%|█████████▉| 9402/9500 [32:13:47<20:13, 12.38s/it]08/04/2024 06:11:17 - INFO - __main__ -   Step: 9402, LR: 2.1108537358787482e-07, Loss: 427.04534912109375
2024-08-04T13:11:29.501748912Z 
 99%|█████████▉| 9403/9500 [32:13:59<20:02, 12.40s/it]08/04/2024 06:11:29 - INFO - __main__ -   Step: 9403, LR: 2.089148299005959e-07, Loss: 399.99859619140625
2024-08-04T13:11:41.581657833Z 
 99%|█████████▉| 9404/9500 [32:14:11<19:41, 12.30s/it]08/04/2024 06:11:41 - INFO - __main__ -   Step: 9404, LR: 2.0674428621331698e-07, Loss: 406.9013671875
2024-08-04T13:11:53.793044908Z 
 99%|█████████▉| 9405/9500 [32:14:23<19:26, 12.27s/it]08/04/2024 06:11:53 - INFO - __main__ -   Step: 9405, LR: 2.0457374252603804e-07, Loss: 411.78857421875
2024-08-04T13:12:06.415163566Z 
 99%|█████████▉| 9406/9500 [32:14:36<19:23, 12.38s/it]08/04/2024 06:12:06 - INFO - __main__ -   Step: 9406, LR: 2.0240319883875915e-07, Loss: 358.19744873046875
2024-08-04T13:12:18.542918578Z 
 99%|█████████▉| 9407/9500 [32:14:48<19:04, 12.30s/it]08/04/2024 06:12:18 - INFO - __main__ -   Step: 9407, LR: 2.002326551514802e-07, Loss: 399.9468994140625
2024-08-04T13:12:30.698617114Z 
 99%|█████████▉| 9408/9500 [32:15:00<18:47, 12.26s/it]08/04/2024 06:12:30 - INFO - __main__ -   Step: 9408, LR: 1.9806211146420131e-07, Loss: 377.87921142578125
2024-08-04T13:12:43.455317799Z 
 99%|█████████▉| 9409/9500 [32:15:13<18:49, 12.41s/it]08/04/2024 06:12:43 - INFO - __main__ -   Step: 9409, LR: 1.958915677769224e-07, Loss: 391.0641784667969
2024-08-04T13:12:55.507942091Z 
 99%|█████████▉| 9410/9500 [32:15:25<18:27, 12.30s/it]08/04/2024 06:12:55 - INFO - __main__ -   Step: 9410, LR: 1.9372102408964345e-07, Loss: 440.3427734375
2024-08-04T13:13:07.722846570Z 
 99%|█████████▉| 9411/9500 [32:15:37<18:12, 12.28s/it]08/04/2024 06:13:07 - INFO - __main__ -   Step: 9411, LR: 1.9155048040236456e-07, Loss: 474.5142517089844
2024-08-04T13:13:19.863388598Z 
 99%|█████████▉| 9412/9500 [32:15:49<17:56, 12.24s/it]08/04/2024 06:13:19 - INFO - __main__ -   Step: 9412, LR: 1.8937993671508562e-07, Loss: 562.5687866210938
2024-08-04T13:13:32.592327241Z 
 99%|█████████▉| 9413/9500 [32:16:02<17:57, 12.38s/it]08/04/2024 06:13:32 - INFO - __main__ -   Step: 9413, LR: 1.8720939302780673e-07, Loss: 389.4329833984375
2024-08-04T13:13:44.883166955Z 
 99%|█████████▉| 9414/9500 [32:16:14<17:42, 12.36s/it]08/04/2024 06:13:44 - INFO - __main__ -   Step: 9414, LR: 1.850388493405278e-07, Loss: 368.5371398925781
2024-08-04T13:13:57.281239641Z 
 99%|█████████▉| 9415/9500 [32:16:27<17:31, 12.37s/it]08/04/2024 06:13:57 - INFO - __main__ -   Step: 9415, LR: 1.8286830565324887e-07, Loss: 395.74237060546875
2024-08-04T13:14:09.698673150Z 
 99%|█████████▉| 9416/9500 [32:16:39<17:20, 12.38s/it]08/04/2024 06:14:09 - INFO - __main__ -   Step: 9416, LR: 1.8069776196596995e-07, Loss: 392.82806396484375
2024-08-04T13:14:21.975782604Z 
 99%|█████████▉| 9417/9500 [32:16:51<17:05, 12.35s/it]08/04/2024 06:14:21 - INFO - __main__ -   Step: 9417, LR: 1.7852721827869104e-07, Loss: 522.7412719726562
2024-08-04T13:14:34.187079049Z 
 99%|█████████▉| 9418/9500 [32:17:04<16:49, 12.31s/it]08/04/2024 06:14:34 - INFO - __main__ -   Step: 9418, LR: 1.7635667459141215e-07, Loss: 437.8087463378906
2024-08-04T13:14:46.585986914Z 
 99%|█████████▉| 9419/9500 [32:17:16<16:39, 12.34s/it]08/04/2024 06:14:46 - INFO - __main__ -   Step: 9419, LR: 1.741861309041332e-07, Loss: 401.5710144042969
2024-08-04T13:14:58.427262244Z 
 99%|█████████▉| 9420/9500 [32:17:28<16:15, 12.19s/it]08/04/2024 06:14:58 - INFO - __main__ -   Step: 9420, LR: 1.720155872168543e-07, Loss: 393.7139587402344
2024-08-04T13:15:10.865143141Z 
 99%|█████████▉| 9421/9500 [32:17:40<16:08, 12.26s/it]08/04/2024 06:15:10 - INFO - __main__ -   Step: 9421, LR: 1.6984504352957537e-07, Loss: 383.76922607421875
2024-08-04T13:15:23.319562250Z 
 99%|█████████▉| 9422/9500 [32:17:53<16:00, 12.32s/it]08/04/2024 06:15:23 - INFO - __main__ -   Step: 9422, LR: 1.6767449984229645e-07, Loss: 362.4490966796875
2024-08-04T13:15:35.361769286Z 
 99%|█████████▉| 9423/9500 [32:18:05<15:42, 12.24s/it]08/04/2024 06:15:35 - INFO - __main__ -   Step: 9423, LR: 1.6550395615501753e-07, Loss: 312.0091552734375
2024-08-04T13:15:48.137684222Z 
 99%|█████████▉| 9424/9500 [32:18:18<15:42, 12.40s/it]08/04/2024 06:15:48 - INFO - __main__ -   Step: 9424, LR: 1.6333341246773862e-07, Loss: 440.26904296875
2024-08-04T13:16:00.655537954Z 
 99%|█████████▉| 9425/9500 [32:18:30<15:32, 12.43s/it]08/04/2024 06:16:00 - INFO - __main__ -   Step: 9425, LR: 1.611628687804597e-07, Loss: 276.60052490234375
2024-08-04T13:16:12.507075398Z 
 99%|█████████▉| 9426/9500 [32:18:42<15:07, 12.26s/it]08/04/2024 06:16:12 - INFO - __main__ -   Step: 9426, LR: 1.5899232509318078e-07, Loss: 369.74151611328125
2024-08-04T13:16:24.435670260Z 
 99%|█████████▉| 9427/9500 [32:18:54<14:47, 12.16s/it]08/04/2024 06:16:24 - INFO - __main__ -   Step: 9427, LR: 1.5682178140590184e-07, Loss: 399.4595642089844
2024-08-04T13:16:37.112219910Z 
 99%|█████████▉| 9428/9500 [32:19:07<14:46, 12.32s/it]08/04/2024 06:16:37 - INFO - __main__ -   Step: 9428, LR: 1.5465123771862292e-07, Loss: 394.7283630371094
2024-08-04T13:16:49.160324005Z 
 99%|█████████▉| 9429/9500 [32:19:19<14:28, 12.24s/it]08/04/2024 06:16:49 - INFO - __main__ -   Step: 9429, LR: 1.52480694031344e-07, Loss: 403.84381103515625
2024-08-04T13:17:01.431297427Z 
 99%|█████████▉| 9430/9500 [32:19:31<14:17, 12.25s/it]08/04/2024 06:17:01 - INFO - __main__ -   Step: 9430, LR: 1.503101503440651e-07, Loss: 387.02630615234375
2024-08-04T13:17:13.936310763Z 
 99%|█████████▉| 9431/9500 [32:19:43<14:10, 12.32s/it]08/04/2024 06:17:13 - INFO - __main__ -   Step: 9431, LR: 1.481396066567862e-07, Loss: 420.328125
2024-08-04T13:17:26.168042631Z 
 99%|█████████▉| 9432/9500 [32:19:56<13:56, 12.30s/it]08/04/2024 06:17:26 - INFO - __main__ -   Step: 9432, LR: 1.4596906296950728e-07, Loss: 454.939453125
2024-08-04T13:17:38.161229752Z 
 99%|█████████▉| 9433/9500 [32:20:08<13:37, 12.21s/it]08/04/2024 06:17:38 - INFO - __main__ -   Step: 9433, LR: 1.4379851928222834e-07, Loss: 295.0157470703125
2024-08-04T13:17:50.263226787Z 
 99%|█████████▉| 9434/9500 [32:20:20<13:23, 12.17s/it]08/04/2024 06:17:50 - INFO - __main__ -   Step: 9434, LR: 1.4162797559494942e-07, Loss: 274.9273681640625
2024-08-04T13:18:02.568563219Z 
 99%|█████████▉| 9435/9500 [32:20:32<13:13, 12.21s/it]08/04/2024 06:18:02 - INFO - __main__ -   Step: 9435, LR: 1.394574319076705e-07, Loss: 483.151611328125
2024-08-04T13:18:14.686344555Z 
 99%|█████████▉| 9436/9500 [32:20:44<12:59, 12.18s/it]08/04/2024 06:18:14 - INFO - __main__ -   Step: 9436, LR: 1.372868882203916e-07, Loss: 487.572509765625
2024-08-04T13:18:27.588832222Z 
 99%|█████████▉| 9437/9500 [32:20:57<13:01, 12.40s/it]08/04/2024 06:18:27 - INFO - __main__ -   Step: 9437, LR: 1.3511634453311267e-07, Loss: 515.46630859375
2024-08-04T13:18:40.037055501Z 
 99%|█████████▉| 9438/9500 [32:21:09<12:49, 12.41s/it]08/04/2024 06:18:40 - INFO - __main__ -   Step: 9438, LR: 1.3294580084583375e-07, Loss: 372.72021484375
2024-08-04T13:18:52.558695611Z 
 99%|█████████▉| 9439/9500 [32:21:22<12:39, 12.45s/it]08/04/2024 06:18:52 - INFO - __main__ -   Step: 9439, LR: 1.3077525715855484e-07, Loss: 491.6430969238281
2024-08-04T13:19:05.434231460Z 
 99%|█████████▉| 9440/9500 [32:21:35<12:34, 12.58s/it]08/04/2024 06:19:05 - INFO - __main__ -   Step: 9440, LR: 1.2860471347127592e-07, Loss: 449.4534912109375
2024-08-04T13:19:17.784770182Z 
 99%|█████████▉| 9441/9500 [32:21:47<12:17, 12.51s/it]08/04/2024 06:19:17 - INFO - __main__ -   Step: 9441, LR: 1.26434169783997e-07, Loss: 416.41162109375
2024-08-04T13:19:29.879944596Z 
 99%|█████████▉| 9442/9500 [32:21:59<11:58, 12.38s/it]08/04/2024 06:19:29 - INFO - __main__ -   Step: 9442, LR: 1.2426362609671809e-07, Loss: 358.08154296875
2024-08-04T13:19:42.446860894Z 
 99%|█████████▉| 9443/9500 [32:22:12<11:49, 12.44s/it]08/04/2024 06:19:42 - INFO - __main__ -   Step: 9443, LR: 1.2209308240943914e-07, Loss: 387.5068054199219
2024-08-04T13:19:54.886117226Z 
 99%|█████████▉| 9444/9500 [32:22:24<11:36, 12.44s/it]08/04/2024 06:19:54 - INFO - __main__ -   Step: 9444, LR: 1.1992253872216023e-07, Loss: 491.2166748046875
2024-08-04T13:20:07.304840024Z 
 99%|█████████▉| 9445/9500 [32:22:37<11:23, 12.43s/it]08/04/2024 06:20:07 - INFO - __main__ -   Step: 9445, LR: 1.1775199503488132e-07, Loss: 464.79168701171875
2024-08-04T13:20:19.741480279Z 
 99%|█████████▉| 9446/9500 [32:22:49<11:11, 12.43s/it]08/04/2024 06:20:19 - INFO - __main__ -   Step: 9446, LR: 1.155814513476024e-07, Loss: 553.48876953125
2024-08-04T13:20:31.897135441Z 
 99%|█████████▉| 9447/9500 [32:23:01<10:54, 12.35s/it]08/04/2024 06:20:31 - INFO - __main__ -   Step: 9447, LR: 1.1341090766032349e-07, Loss: 299.3611755371094
2024-08-04T13:20:43.882932650Z 
 99%|█████████▉| 9448/9500 [32:23:13<10:36, 12.24s/it]08/04/2024 06:20:43 - INFO - __main__ -   Step: 9448, LR: 1.1124036397304456e-07, Loss: 334.53997802734375
2024-08-04T13:20:56.617047349Z 
 99%|█████████▉| 9449/9500 [32:23:26<10:31, 12.39s/it]08/04/2024 06:20:56 - INFO - __main__ -   Step: 9449, LR: 1.0906982028576564e-07, Loss: 284.06378173828125
2024-08-04T13:21:08.939758037Z 
 99%|█████████▉| 9450/9500 [32:23:38<10:18, 12.37s/it]08/04/2024 06:21:08 - INFO - __main__ -   Step: 9450, LR: 1.0689927659848674e-07, Loss: 432.3226623535156
2024-08-04T13:21:21.156931954Z 
 99%|█████████▉| 9451/9500 [32:23:51<10:03, 12.32s/it]08/04/2024 06:21:21 - INFO - __main__ -   Step: 9451, LR: 1.0472873291120782e-07, Loss: 397.22119140625
2024-08-04T13:21:34.001700868Z 
 99%|█████████▉| 9452/9500 [32:24:03<09:59, 12.48s/it]08/04/2024 06:21:34 - INFO - __main__ -   Step: 9452, LR: 1.025581892239289e-07, Loss: 429.7436218261719
2024-08-04T13:21:46.058550143Z 
100%|█████████▉| 9453/9500 [32:24:15<09:40, 12.35s/it]08/04/2024 06:21:46 - INFO - __main__ -   Step: 9453, LR: 1.0038764553664997e-07, Loss: 398.7459716796875
2024-08-04T13:21:58.408399682Z 
100%|█████████▉| 9454/9500 [32:24:28<09:28, 12.35s/it]08/04/2024 06:21:58 - INFO - __main__ -   Step: 9454, LR: 9.821710184937106e-08, Loss: 407.478515625
2024-08-04T13:22:10.547208874Z 
100%|█████████▉| 9455/9500 [32:24:40<09:12, 12.29s/it]08/04/2024 06:22:10 - INFO - __main__ -   Step: 9455, LR: 9.604655816209214e-08, Loss: 443.8425598144531
2024-08-04T13:22:23.418112402Z 
100%|█████████▉| 9456/9500 [32:24:53<09:08, 12.46s/it]08/04/2024 06:22:23 - INFO - __main__ -   Step: 9456, LR: 9.387601447481322e-08, Loss: 429.7275390625
2024-08-04T13:22:35.608368160Z 
100%|█████████▉| 9457/9500 [32:25:05<08:52, 12.38s/it]08/04/2024 06:22:35 - INFO - __main__ -   Step: 9457, LR: 9.17054707875343e-08, Loss: 335.2783203125
2024-08-04T13:22:47.677145086Z 
100%|█████████▉| 9458/9500 [32:25:17<08:36, 12.29s/it]08/04/2024 06:22:47 - INFO - __main__ -   Step: 9458, LR: 8.953492710025538e-08, Loss: 318.8800048828125
2024-08-04T13:23:00.390168506Z 
100%|█████████▉| 9459/9500 [32:25:30<08:29, 12.42s/it]08/04/2024 06:23:00 - INFO - __main__ -   Step: 9459, LR: 8.736438341297646e-08, Loss: 432.99163818359375
2024-08-04T13:23:12.449584428Z 
100%|█████████▉| 9460/9500 [32:25:42<08:12, 12.31s/it]08/04/2024 06:23:12 - INFO - __main__ -   Step: 9460, LR: 8.519383972569754e-08, Loss: 357.04052734375
2024-08-04T13:23:24.419009176Z 
100%|█████████▉| 9461/9500 [32:25:54<07:56, 12.21s/it]08/04/2024 06:23:24 - INFO - __main__ -   Step: 9461, LR: 8.302329603841864e-08, Loss: 311.7744140625
2024-08-04T13:23:37.484380717Z 
100%|█████████▉| 9462/9500 [32:26:07<07:53, 12.46s/it]08/04/2024 06:23:37 - INFO - __main__ -   Step: 9462, LR: 8.085275235113972e-08, Loss: 395.7887268066406
2024-08-04T13:23:49.903695678Z 
100%|█████████▉| 9463/9500 [32:26:19<07:40, 12.45s/it]08/04/2024 06:23:49 - INFO - __main__ -   Step: 9463, LR: 7.868220866386079e-08, Loss: 412.3625183105469
2024-08-04T13:24:01.969512354Z 
100%|█████████▉| 9464/9500 [32:26:31<07:24, 12.34s/it]08/04/2024 06:24:01 - INFO - __main__ -   Step: 9464, LR: 7.651166497658187e-08, Loss: 477.6239318847656
2024-08-04T13:24:14.427556149Z 
100%|█████████▉| 9465/9500 [32:26:44<07:13, 12.37s/it]08/04/2024 06:24:14 - INFO - __main__ -   Step: 9465, LR: 7.434112128930296e-08, Loss: 395.4300537109375
2024-08-04T13:24:26.931496322Z 
100%|█████████▉| 9466/9500 [32:26:56<07:01, 12.41s/it]08/04/2024 06:24:26 - INFO - __main__ -   Step: 9466, LR: 7.217057760202404e-08, Loss: 423.57086181640625
2024-08-04T13:24:39.312912464Z 
100%|█████████▉| 9467/9500 [32:27:09<06:49, 12.40s/it]08/04/2024 06:24:39 - INFO - __main__ -   Step: 9467, LR: 7.000003391474511e-08, Loss: 371.7978515625
2024-08-04T13:24:52.297322482Z 
100%|█████████▉| 9468/9500 [32:27:22<06:42, 12.58s/it]08/04/2024 06:24:52 - INFO - __main__ -   Step: 9468, LR: 6.782949022746621e-08, Loss: 469.18603515625
2024-08-04T13:25:04.553734035Z 
100%|█████████▉| 9469/9500 [32:27:34<06:26, 12.48s/it]08/04/2024 06:25:04 - INFO - __main__ -   Step: 9469, LR: 6.565894654018729e-08, Loss: 559.4447021484375
2024-08-04T13:25:16.754875888Z 
100%|█████████▉| 9470/9500 [32:27:46<06:11, 12.40s/it]08/04/2024 06:25:16 - INFO - __main__ -   Step: 9470, LR: 6.348840285290836e-08, Loss: 445.026611328125
2024-08-04T13:25:29.547631213Z 
100%|█████████▉| 9471/9500 [32:27:59<06:02, 12.52s/it]08/04/2024 06:25:29 - INFO - __main__ -   Step: 9471, LR: 6.131785916562944e-08, Loss: 397.7405700683594
2024-08-04T13:25:41.709498655Z 
100%|█████████▉| 9472/9500 [32:28:11<05:47, 12.41s/it]08/04/2024 06:25:41 - INFO - __main__ -   Step: 9472, LR: 5.9147315478350526e-08, Loss: 399.66143798828125
2024-08-04T13:25:54.053951763Z 
100%|█████████▉| 9473/9500 [32:28:23<05:34, 12.39s/it]08/04/2024 06:25:54 - INFO - __main__ -   Step: 9473, LR: 5.697677179107161e-08, Loss: 427.7442626953125
2024-08-04T13:26:06.289307579Z 
100%|█████████▉| 9474/9500 [32:28:36<05:20, 12.34s/it]08/04/2024 06:26:06 - INFO - __main__ -   Step: 9474, LR: 5.480622810379269e-08, Loss: 338.9400329589844
2024-08-04T13:26:18.304703625Z 
100%|█████████▉| 9475/9500 [32:28:48<05:06, 12.25s/it]08/04/2024 06:26:18 - INFO - __main__ -   Step: 9475, LR: 5.263568441651377e-08, Loss: 375.51141357421875
2024-08-04T13:26:30.553239127Z 
100%|█████████▉| 9476/9500 [32:29:00<04:53, 12.25s/it]08/04/2024 06:26:30 - INFO - __main__ -   Step: 9476, LR: 5.046514072923486e-08, Loss: 386.18621826171875
2024-08-04T13:26:43.218786102Z 
100%|█████████▉| 9477/9500 [32:29:13<04:44, 12.37s/it]08/04/2024 06:26:43 - INFO - __main__ -   Step: 9477, LR: 4.8294597041955935e-08, Loss: 393.82275390625
2024-08-04T13:26:55.403715106Z 
100%|█████████▉| 9478/9500 [32:29:25<04:30, 12.32s/it]08/04/2024 06:26:55 - INFO - __main__ -   Step: 9478, LR: 4.612405335467702e-08, Loss: 370.6756591796875
2024-08-04T13:27:07.767845089Z 
100%|█████████▉| 9479/9500 [32:29:37<04:18, 12.33s/it]08/04/2024 06:27:07 - INFO - __main__ -   Step: 9479, LR: 4.39535096673981e-08, Loss: 372.2326354980469
2024-08-04T13:27:20.430111231Z 
100%|█████████▉| 9480/9500 [32:29:50<04:08, 12.43s/it]08/04/2024 06:27:20 - INFO - __main__ -   Step: 9480, LR: 4.178296598011918e-08, Loss: 435.0133056640625
2024-08-04T13:27:32.265349716Z 
100%|█████████▉| 9481/9500 [32:30:02<03:52, 12.25s/it]08/04/2024 06:27:32 - INFO - __main__ -   Step: 9481, LR: 3.961242229284026e-08, Loss: 347.57659912109375
2024-08-04T13:27:44.219447227Z 
100%|█████████▉| 9482/9500 [32:30:14<03:38, 12.16s/it]08/04/2024 06:27:44 - INFO - __main__ -   Step: 9482, LR: 3.7441878605561343e-08, Loss: 367.3328857421875
2024-08-04T13:27:56.684317372Z 
100%|█████████▉| 9483/9500 [32:30:26<03:28, 12.25s/it]08/04/2024 06:27:56 - INFO - __main__ -   Step: 9483, LR: 3.5271334918282426e-08, Loss: 310.42083740234375
2024-08-04T13:28:08.666408053Z 
100%|█████████▉| 9484/9500 [32:30:38<03:14, 12.17s/it]08/04/2024 06:28:08 - INFO - __main__ -   Step: 9484, LR: 3.31007912310035e-08, Loss: 359.81243896484375
2024-08-04T13:28:21.070456928Z 
100%|█████████▉| 9485/9500 [32:30:51<03:03, 12.24s/it]08/04/2024 06:28:21 - INFO - __main__ -   Step: 9485, LR: 3.0930247543724586e-08, Loss: 466.7673645019531
2024-08-04T13:28:33.698042576Z 
100%|█████████▉| 9486/9500 [32:31:03<02:53, 12.36s/it]08/04/2024 06:28:33 - INFO - __main__ -   Step: 9486, LR: 2.8759703856445672e-08, Loss: 313.26104736328125
2024-08-04T13:28:46.123862680Z 
100%|█████████▉| 9487/9500 [32:31:16<02:40, 12.38s/it]08/04/2024 06:28:46 - INFO - __main__ -   Step: 9487, LR: 2.6589160169166752e-08, Loss: 477.51763916015625
2024-08-04T13:28:58.036341850Z 
100%|█████████▉| 9488/9500 [32:31:27<02:26, 12.24s/it]08/04/2024 06:28:58 - INFO - __main__ -   Step: 9488, LR: 2.4418616481887832e-08, Loss: 331.9729309082031
2024-08-04T13:29:10.369852995Z 
100%|█████████▉| 9489/9500 [32:31:40<02:14, 12.27s/it]08/04/2024 06:29:10 - INFO - __main__ -   Step: 9489, LR: 2.2248072794608912e-08, Loss: 367.2017822265625
2024-08-04T13:29:22.461451426Z 
100%|█████████▉| 9490/9500 [32:31:52<02:02, 12.21s/it]08/04/2024 06:29:22 - INFO - __main__ -   Step: 9490, LR: 2.0077529107329995e-08, Loss: 423.41827392578125
2024-08-04T13:29:34.379099661Z 
100%|█████████▉| 9491/9500 [32:32:04<01:49, 12.13s/it]08/04/2024 06:29:34 - INFO - __main__ -   Step: 9491, LR: 1.7906985420051078e-08, Loss: 306.1011657714844
2024-08-04T13:29:47.201904482Z 
100%|█████████▉| 9492/9500 [32:32:17<01:38, 12.33s/it]08/04/2024 06:29:47 - INFO - __main__ -   Step: 9492, LR: 1.573644173277216e-08, Loss: 398.4042053222656
2024-08-04T13:29:59.132711653Z 
100%|█████████▉| 9493/9500 [32:32:29<01:25, 12.21s/it]08/04/2024 06:29:59 - INFO - __main__ -   Step: 9493, LR: 1.356589804549324e-08, Loss: 453.6568603515625
2024-08-04T13:30:11.116447064Z 
100%|█████████▉| 9494/9500 [32:32:41<01:12, 12.14s/it]08/04/2024 06:30:11 - INFO - __main__ -   Step: 9494, LR: 1.139535435821432e-08, Loss: 248.87783813476562
2024-08-04T13:30:23.408544786Z 
100%|█████████▉| 9495/9500 [32:32:53<01:00, 12.19s/it]08/04/2024 06:30:23 - INFO - __main__ -   Step: 9495, LR: 9.224810670935404e-09, Loss: 324.77618408203125
2024-08-04T13:30:35.831103153Z 
100%|█████████▉| 9496/9500 [32:33:05<00:49, 12.26s/it]08/04/2024 06:30:35 - INFO - __main__ -   Step: 9496, LR: 7.054266983656484e-09, Loss: 381.31060791015625
2024-08-04T13:30:47.796869562Z 
100%|█████████▉| 9497/9500 [32:33:17<00:36, 12.17s/it]08/04/2024 06:30:47 - INFO - __main__ -   Step: 9497, LR: 4.883723296377566e-09, Loss: 417.565185546875
2024-08-04T13:30:59.819650191Z 
100%|█████████▉| 9498/9500 [32:33:29<00:24, 12.13s/it]08/04/2024 06:30:59 - INFO - __main__ -   Step: 9498, LR: 2.7131796090986482e-09, Loss: 458.7153625488281
2024-08-04T13:31:12.596539367Z 
100%|█████████▉| 9499/9500 [32:33:42<00:12, 12.32s/it]08/04/2024 06:31:12 - INFO - __main__ -   Step: 9499, LR: 5.426359218197296e-10, Loss: 440.3268127441406
2024-08-04T13:31:24.954592980Z 
100%|██████████| 9500/9500 [32:33:54<00:00, 12.33s/it]08/04/2024 06:31:24 - INFO - __main__ -   Step: 9500, LR: 0.0, Loss: 407.36669921875
2024-08-04T13:31:24.956203622Z tokenizer config file saved in /output/tokenizer_config.json
2024-08-04T13:31:24.956229766Z Special tokens file saved in /output/special_tokens_map.json
2024-08-04T13:31:34.402890759Z Configuration saved in /output/config.json
2024-08-04T13:31:34.403031427Z Configuration saved in /output/generation_config.json
2024-08-04T13:31:44.761176505Z The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at /output/pytorch_model.bin.index.json.
2024-08-04T13:31:44.803667837Z 
100%|██████████| 9500/9500 [32:34:14<00:00, 12.34s/it]
2024-08-04T13:31:50.506393296Z [93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
2024-08-04T13:31:50.506418337Z [93m [WARNING] [0m async_io: please install the libaio-dev package with apt
2024-08-04T13:31:50.506420213Z [93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
2024-08-04T13:31:50.506422046Z [93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
2024-08-04T13:31:50.506423431Z [93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
2024-08-04T13:31:50.506434472Z [93m [WARNING] [0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible
