2025-08-07 04:13:58,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc0-halfcheetah/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:13:58,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc0-halfcheetah/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:13:58,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14ec534b3b90>}
2025-08-07 04:13:58,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 04:13:58,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 04:13:58,791 baseline-bpql-noiseperc0-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 04:13:58,791 baseline-bpql-noiseperc0-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 04:14:00,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 04:14:00,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 04:15:36,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:15:49,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: -453.41595 ± 21.155
2025-08-07 04:15:49,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [-486.82883, -470.0746, -437.91064, -471.82495, -432.3357, -431.96884, -442.80606, -463.87137, -473.98657, -422.55212]
2025-08-07 04:15:49,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:15:49,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (-453.42) for latency ExtremeClogL1U23
2025-08-07 04:15:49,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 34 seconds)
2025-08-07 04:17:31,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:17:44,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: -322.16620 ± 39.932
2025-08-07 04:17:44,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [-355.44342, -336.19443, -296.97153, -339.09232, -369.18765, -311.35718, -361.36996, -322.87888, -224.43927, -304.72742]
2025-08-07 04:17:44,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:17:44,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (-322.17) for latency ExtremeClogL1U23
2025-08-07 04:17:44,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 3 minutes, 15 seconds)
2025-08-07 04:19:26,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:19:39,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: -155.83205 ± 95.086
2025-08-07 04:19:39,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [-296.25317, -195.45457, -4.935567, -188.20076, -55.30501, -10.380567, -206.53552, -201.8411, -252.18736, -147.22679]
2025-08-07 04:19:39,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:19:39,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (-155.83) for latency ExtremeClogL1U23
2025-08-07 04:19:39,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 2 minutes, 55 seconds)
2025-08-07 04:21:21,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:21:34,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: -3.15363 ± 77.118
2025-08-07 04:21:34,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [130.02966, 38.388657, -28.965282, 24.818995, -104.56027, -2.7545187, -69.71133, 68.51647, -131.48569, 44.187023]
2025-08-07 04:21:34,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:21:34,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (-3.15) for latency ExtremeClogL1U23
2025-08-07 04:21:34,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 1 minute, 48 seconds)
2025-08-07 04:23:16,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:23:29,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 13.93728 ± 74.235
2025-08-07 04:23:29,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [50.24048, 184.64719, -15.264767, -69.30803, -18.646627, 45.589214, 84.20324, -62.073124, -40.63592, -19.378824]
2025-08-07 04:23:29,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:23:29,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (13.94) for latency ExtremeClogL1U23
2025-08-07 04:23:29,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 21 seconds)
2025-08-07 04:25:11,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:25:24,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 127.98806 ± 126.282
2025-08-07 04:25:24,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [-28.142979, 128.27603, 405.94366, 212.11195, 42.860943, 0.79735786, 43.77058, 102.89951, 266.16446, 105.19912]
2025-08-07 04:25:24,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:25:24,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (127.99) for latency ExtremeClogL1U23
2025-08-07 04:25:24,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 13 seconds)
2025-08-07 04:27:06,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:27:19,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 250.36743 ± 196.657
2025-08-07 04:27:19,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [37.823975, 496.06085, 148.08151, 421.94785, 269.61526, 518.0142, 39.670113, 67.84661, 38.32225, 466.2915]
2025-08-07 04:27:19,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:27:19,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (250.37) for latency ExtremeClogL1U23
2025-08-07 04:27:19,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 58 minutes, 19 seconds)
2025-08-07 04:29:01,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:29:14,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 467.04852 ± 203.751
2025-08-07 04:29:14,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [414.86914, 297.3669, 104.04141, 197.89223, 567.5702, 431.73532, 674.1983, 750.9951, 605.79785, 626.01855]
2025-08-07 04:29:14,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:29:14,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (467.05) for latency ExtremeClogL1U23
2025-08-07 04:29:14,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 56 minutes, 27 seconds)
2025-08-07 04:30:57,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:31:10,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 513.18127 ± 178.552
2025-08-07 04:31:10,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [450.83804, 749.6061, 660.9724, 856.85315, 441.9602, 385.67917, 520.5415, 472.11807, 263.40048, 329.84366]
2025-08-07 04:31:10,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:31:10,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (513.18) for latency ExtremeClogL1U23
2025-08-07 04:31:10,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 54 minutes, 39 seconds)
2025-08-07 04:32:52,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:33:05,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 774.66101 ± 107.707
2025-08-07 04:33:05,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [962.25085, 635.2834, 868.08704, 662.1788, 914.4154, 805.17334, 685.52545, 697.27356, 808.57404, 707.8481]
2025-08-07 04:33:05,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:33:05,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (774.66) for latency ExtremeClogL1U23
2025-08-07 04:33:05,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 52 minutes, 47 seconds)
2025-08-07 04:34:47,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:35:00,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1173.13159 ± 254.106
2025-08-07 04:35:00,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [990.00726, 926.0128, 1260.508, 972.3962, 1110.6039, 1606.2302, 1238.7955, 1124.5125, 868.47565, 1633.774]
2025-08-07 04:35:00,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:35:00,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (1173.13) for latency ExtremeClogL1U23
2025-08-07 04:35:00,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 50 minutes, 54 seconds)
2025-08-07 04:36:42,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:36:55,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1138.00867 ± 150.827
2025-08-07 04:36:55,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1023.3041, 1122.8787, 1034.0597, 998.6414, 1251.2834, 1126.8473, 1048.4718, 1536.3405, 1077.1766, 1161.083]
2025-08-07 04:36:55,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:36:55,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 48 minutes, 57 seconds)
2025-08-07 04:38:37,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:38:50,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1099.55493 ± 346.747
2025-08-07 04:38:50,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1112.8688, 1454.1703, 1536.0354, 243.46149, 1140.1123, 937.06055, 1054.6106, 1426.3813, 973.8445, 1117.004]
2025-08-07 04:38:50,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:38:50,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 46 minutes, 54 seconds)
2025-08-07 04:40:32,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:40:45,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1201.86902 ± 204.578
2025-08-07 04:40:45,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [999.50336, 1091.97, 1189.3234, 1656.2842, 1395.8835, 999.40094, 1410.6666, 1105.2645, 1090.3967, 1079.9968]
2025-08-07 04:40:45,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:40:45,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (1201.87) for latency ExtremeClogL1U23
2025-08-07 04:40:45,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 44 minutes, 55 seconds)
2025-08-07 04:42:27,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:42:40,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1196.37183 ± 171.725
2025-08-07 04:42:40,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1420.8362, 1137.6306, 1246.185, 1469.2925, 1042.5859, 1007.9646, 1032.4249, 1252.6036, 1369.6597, 984.5347]
2025-08-07 04:42:40,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:42:40,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 42 minutes, 59 seconds)
2025-08-07 04:44:22,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:44:35,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1382.03638 ± 271.981
2025-08-07 04:44:35,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1222.197, 1820.0873, 1316.4098, 1320.1869, 1066.75, 1398.4719, 1936.2296, 1279.0878, 1389.8728, 1071.0714]
2025-08-07 04:44:35,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:44:35,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (1382.04) for latency ExtremeClogL1U23
2025-08-07 04:44:35,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 40 minutes, 59 seconds)
2025-08-07 04:46:17,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:46:30,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1646.43237 ± 431.939
2025-08-07 04:46:30,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2290.8325, 1998.84, 1894.7343, 1056.1523, 2147.736, 1295.7734, 1148.0562, 1268.4735, 1954.0582, 1409.6688]
2025-08-07 04:46:30,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:46:30,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (1646.43) for latency ExtremeClogL1U23
2025-08-07 04:46:30,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 39 minutes, 7 seconds)
2025-08-07 04:48:12,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:48:26,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1407.76465 ± 268.792
2025-08-07 04:48:26,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1597.1924, 1020.459, 1425.9799, 1767.7664, 1166.2479, 1529.1775, 1423.9683, 1844.7249, 1078.2167, 1223.9144]
2025-08-07 04:48:26,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:48:26,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 37 minutes, 19 seconds)
2025-08-07 04:50:07,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:50:21,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1491.39844 ± 256.937
2025-08-07 04:50:21,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1364.2043, 1895.0393, 1274.9142, 1702.247, 1447.1733, 1640.1207, 1257.2002, 1299.9298, 1885.9603, 1147.1964]
2025-08-07 04:50:21,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:50:21,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 35 minutes, 20 seconds)
2025-08-07 04:52:02,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:52:16,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1616.11304 ± 333.129
2025-08-07 04:52:16,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2192.1028, 1515.87, 1486.9314, 1268.2659, 1240.392, 2066.3352, 1655.8948, 1222.6007, 1543.5239, 1969.2145]
2025-08-07 04:52:16,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:52:16,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 33 minutes, 24 seconds)
2025-08-07 04:53:58,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:54:11,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1487.79907 ± 179.056
2025-08-07 04:54:11,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1383.7369, 1162.3182, 1628.0944, 1643.7336, 1613.6606, 1616.3143, 1385.3746, 1436.6962, 1742.6279, 1265.4324]
2025-08-07 04:54:11,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:54:11,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 31 minutes, 35 seconds)
2025-08-07 04:55:53,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:56:06,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1529.64758 ± 524.141
2025-08-07 04:56:06,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1515.5319, 1379.2882, 1227.5024, 1279.6604, 2482.415, 1144.366, 1433.1411, 1267.8535, 974.703, 2592.0146]
2025-08-07 04:56:06,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:56:06,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 29 minutes, 37 seconds)
2025-08-07 04:57:48,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:58:01,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1260.76660 ± 537.281
2025-08-07 04:58:01,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1216.6118, 167.49431, 1365.4663, 1436.6809, 1640.7471, 1371.1783, 1324.4598, 1178.511, 614.7782, 2291.738]
2025-08-07 04:58:01,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:58:01,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 27 minutes, 43 seconds)
2025-08-07 04:59:43,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:59:56,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1692.83521 ± 343.191
2025-08-07 04:59:56,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2335.5415, 1501.9082, 1258.1359, 1891.8004, 1893.6538, 1302.2566, 1380.5854, 1813.8575, 2076.2202, 1474.393]
2025-08-07 04:59:56,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:59:56,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (1692.84) for latency ExtremeClogL1U23
2025-08-07 04:59:56,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 25 minutes, 48 seconds)
2025-08-07 05:01:38,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:01:51,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1602.01733 ± 396.475
2025-08-07 05:01:51,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1533.7755, 1390.0917, 1340.0494, 1433.2289, 1344.9325, 2274.969, 1158.7742, 2354.8003, 1327.3564, 1862.1964]
2025-08-07 05:01:51,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:01:51,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 23 minutes, 53 seconds)
2025-08-07 05:03:33,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:03:46,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1616.04578 ± 301.470
2025-08-07 05:03:46,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1566.6787, 1885.8368, 1649.3358, 1294.502, 1412.4762, 1259.4805, 1895.9706, 1782.3838, 1238.9191, 2174.8752]
2025-08-07 05:03:46,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:03:46,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 21 minutes, 58 seconds)
2025-08-07 05:05:29,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:05:42,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1282.62708 ± 488.221
2025-08-07 05:05:42,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [908.8155, 1421.6428, 1482.9746, 1394.6598, 1356.031, 1420.2617, 1934.4854, 1436.2631, -6.2019715, 1477.3391]
2025-08-07 05:05:42,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:05:42,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 20 minutes, 8 seconds)
2025-08-07 05:07:24,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:07:37,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1575.56421 ± 727.038
2025-08-07 05:07:37,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1230.5687, 1946.1315, 2600.841, 1315.0573, 1279.8105, 1373.9855, 1646.5717, 2863.8542, 135.99774, 1362.8243]
2025-08-07 05:07:37,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:07:37,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 18 minutes, 12 seconds)
2025-08-07 05:09:19,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:09:32,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1756.97717 ± 434.969
2025-08-07 05:09:32,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1832.0432, 1848.8923, 1676.6617, 1348.9115, 1261.3656, 2905.0486, 1867.4795, 1688.1781, 1736.142, 1405.0502]
2025-08-07 05:09:32,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:09:32,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (1756.98) for latency ExtremeClogL1U23
2025-08-07 05:09:32,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 16 minutes, 22 seconds)
2025-08-07 05:11:14,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:11:27,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2015.65234 ± 499.322
2025-08-07 05:11:27,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2166.7024, 2319.45, 1280.1097, 1935.3584, 2512.208, 1529.3668, 1455.3855, 1675.566, 2380.9434, 2901.433]
2025-08-07 05:11:27,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:11:27,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (2015.65) for latency ExtremeClogL1U23
2025-08-07 05:11:27,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 14 minutes, 25 seconds)
2025-08-07 05:13:09,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:13:22,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1741.29626 ± 269.764
2025-08-07 05:13:22,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1412.2607, 1617.4636, 1913.1527, 1911.8387, 1319.9232, 1806.5592, 1824.8723, 2208.9297, 1439.7069, 1958.2554]
2025-08-07 05:13:22,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:13:22,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 12 minutes, 28 seconds)
2025-08-07 05:15:04,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:15:17,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1784.33240 ± 586.685
2025-08-07 05:15:17,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1291.8481, 1360.9452, 1726.5381, 1631.0314, 3227.5515, 2332.3997, 1252.9485, 1998.8049, 1767.4323, 1253.8246]
2025-08-07 05:15:17,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:15:17,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 10 minutes, 29 seconds)
2025-08-07 05:16:59,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:17:13,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2540.66943 ± 818.435
2025-08-07 05:17:13,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3773.4507, 2645.6108, 3671.64, 1363.903, 2113.0493, 1946.6292, 2140.0166, 2339.109, 3631.7754, 1781.5094]
2025-08-07 05:17:13,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:17:13,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (2540.67) for latency ExtremeClogL1U23
2025-08-07 05:17:13,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 8 minutes, 34 seconds)
2025-08-07 05:18:55,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:19:08,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1417.98608 ± 408.669
2025-08-07 05:19:08,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1832.0382, 1217.9438, 1190.4481, 1537.2092, 1197.8602, 570.874, 1394.3938, 1546.3383, 1498.1416, 2194.6143]
2025-08-07 05:19:08,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:19:08,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 6 minutes, 35 seconds)
2025-08-07 05:20:50,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:21:03,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1823.19763 ± 645.654
2025-08-07 05:21:03,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2622.5317, 1618.8173, 1296.7986, 1310.82, 1327.059, 2792.2207, 1317.0538, 2852.9111, 1163.4706, 1930.2943]
2025-08-07 05:21:03,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:21:03,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 4 minutes, 38 seconds)
2025-08-07 05:22:44,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:22:57,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1946.84058 ± 475.248
2025-08-07 05:22:57,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2948.0242, 1814.6766, 1263.4772, 1663.9364, 2452.9272, 1920.7789, 2144.588, 1736.8466, 2137.809, 1385.3423]
2025-08-07 05:22:57,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:22:57,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 2 minutes, 40 seconds)
2025-08-07 05:24:39,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:24:53,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2322.28442 ± 889.672
2025-08-07 05:24:53,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1843.7661, 3645.5237, 1401.8744, 1377.569, 1175.5919, 3221.5188, 3204.8008, 2841.8323, 1543.0469, 2967.3193]
2025-08-07 05:24:53,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:24:53,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 46 seconds)
2025-08-07 05:26:35,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:26:48,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2027.79468 ± 718.522
2025-08-07 05:26:48,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1343.9119, 2417.5427, 1285.917, 2217.5227, 2248.6428, 3568.6697, 2757.0737, 1728.8759, 1413.6361, 1296.1542]
2025-08-07 05:26:48,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:26:48,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 58 minutes, 51 seconds)
2025-08-07 05:28:30,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:28:43,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2164.65356 ± 533.566
2025-08-07 05:28:43,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1393.5969, 1804.7124, 1592.3193, 2822.416, 3096.4312, 2140.115, 2606.0693, 2486.429, 1842.2638, 1862.1816]
2025-08-07 05:28:43,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:28:43,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 56 minutes, 59 seconds)
2025-08-07 05:30:25,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:30:38,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1924.93433 ± 513.069
2025-08-07 05:30:38,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1552.0857, 1641.9384, 1586.9268, 1842.825, 2948.557, 1371.8936, 1623.8893, 1939.69, 1896.2913, 2845.2473]
2025-08-07 05:30:38,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:30:38,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 55 minutes, 8 seconds)
2025-08-07 05:32:20,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:32:33,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1778.18066 ± 364.702
2025-08-07 05:32:33,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1440.3657, 1586.8785, 1990.7471, 1975.9172, 2014.2814, 2504.6511, 1818.5764, 1298.9454, 1890.5974, 1260.8453]
2025-08-07 05:32:33,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:32:33,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 53 minutes, 10 seconds)
2025-08-07 05:34:15,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:34:28,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2390.96069 ± 1042.772
2025-08-07 05:34:28,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2180.7834, 258.3487, 3628.2969, 2703.9377, 1907.8185, 2204.5874, 1549.2955, 3403.0415, 3990.0615, 2083.4358]
2025-08-07 05:34:28,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:34:28,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 51 minutes, 16 seconds)
2025-08-07 05:36:10,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:36:23,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1674.16040 ± 871.714
2025-08-07 05:36:23,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1577.1072, 1530.4363, 2540.495, 1369.4484, 98.74143, 2565.25, 3338.129, 980.55475, 1458.4227, 1283.018]
2025-08-07 05:36:23,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:36:23,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 49 minutes, 15 seconds)
2025-08-07 05:38:05,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:38:18,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1920.75781 ± 649.374
2025-08-07 05:38:18,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3455.0173, 1349.3937, 1870.2628, 1543.7158, 1479.5319, 1280.1204, 2241.95, 1875.5487, 1488.1921, 2623.8467]
2025-08-07 05:38:18,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:38:18,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 47 minutes, 16 seconds)
2025-08-07 05:40:00,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:40:13,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3234.18994 ± 880.141
2025-08-07 05:40:13,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3928.5032, 3923.1794, 3716.8835, 3577.4531, 3639.2312, 3831.3726, 2499.513, 1810.5421, 3890.2148, 1525.0051]
2025-08-07 05:40:13,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:40:13,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (3234.19) for latency ExtremeClogL1U23
2025-08-07 05:40:13,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 45 minutes, 21 seconds)
2025-08-07 05:41:55,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:42:08,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2121.03857 ± 821.780
2025-08-07 05:42:08,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1265.355, 3679.342, 1365.9097, 1234.0732, 2477.2507, 2642.5103, 2885.1528, 2667.1128, 1200.482, 1793.1948]
2025-08-07 05:42:08,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:42:08,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 43 minutes, 30 seconds)
2025-08-07 05:43:49,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:44:02,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1972.69666 ± 561.094
2025-08-07 05:44:02,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1683.5591, 2428.7148, 1545.3596, 3063.841, 1364.3955, 2616.1, 1468.7463, 1482.4651, 2350.713, 1723.072]
2025-08-07 05:44:02,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:44:02,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 41 minutes, 29 seconds)
2025-08-07 05:45:44,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:45:57,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1733.91077 ± 625.261
2025-08-07 05:45:57,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [495.52032, 1199.4293, 2092.7349, 1717.4601, 1394.6245, 2250.4407, 2922.3252, 2033.9883, 1773.201, 1459.3829]
2025-08-07 05:45:57,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:45:57,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 39 minutes, 30 seconds)
2025-08-07 05:47:38,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:47:51,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2529.89600 ± 760.783
2025-08-07 05:47:51,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3717.7395, 2255.8728, 2313.0818, 1300.466, 2994.4172, 3281.5125, 1476.3025, 2965.3396, 1922.3019, 3071.9255]
2025-08-07 05:47:51,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:47:51,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 37 minutes, 28 seconds)
2025-08-07 05:49:32,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:49:45,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2298.58545 ± 704.207
2025-08-07 05:49:45,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3061.803, 3094.5938, 2214.753, 3407.8235, 2197.0205, 2310.2625, 1647.4563, 1230.501, 2471.875, 1349.766]
2025-08-07 05:49:45,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:49:45,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 35 minutes, 25 seconds)
2025-08-07 05:51:27,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:51:40,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1281.63074 ± 403.675
2025-08-07 05:51:40,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1396.2798, 1407.354, 1229.2817, 1281.6947, 1285.3856, 147.99594, 1285.6619, 1626.2418, 1483.3002, 1673.1124]
2025-08-07 05:51:40,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:51:40,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 33 minutes, 23 seconds)
2025-08-07 05:53:21,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:53:34,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1954.54077 ± 606.500
2025-08-07 05:53:34,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1274.3372, 1841.1737, 1344.562, 2902.8035, 3167.1877, 1719.549, 1626.5856, 1907.8284, 1503.0421, 2258.3398]
2025-08-07 05:53:34,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:53:34,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 31 minutes, 24 seconds)
2025-08-07 05:55:15,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:55:28,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1873.29138 ± 701.565
2025-08-07 05:55:28,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1379.8674, 1974.7999, 1419.3945, 3839.5398, 1736.7373, 1990.3792, 1423.0347, 1275.617, 1812.8674, 1880.6785]
2025-08-07 05:55:28,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:55:28,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 29 minutes, 26 seconds)
2025-08-07 05:57:09,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:57:22,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2617.56030 ± 1052.385
2025-08-07 05:57:22,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4190.1826, 1252.153, 2369.3674, 1583.4202, 2188.3162, 4179.6367, 3486.0305, 2145.1387, 3365.8481, 1415.5101]
2025-08-07 05:57:22,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:57:22,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 27 minutes, 31 seconds)
2025-08-07 05:59:03,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:59:16,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1896.02441 ± 486.109
2025-08-07 05:59:16,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2435.5034, 2058.1807, 1552.7382, 1024.744, 2763.5117, 1948.9224, 1777.9814, 1416.9819, 2266.5022, 1715.1771]
2025-08-07 05:59:16,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:59:16,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 25 minutes, 38 seconds)
2025-08-07 06:00:58,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:01:11,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1581.51611 ± 687.477
2025-08-07 06:01:11,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1491.7231, 1815.2244, 1724.1542, 2352.609, 305.772, 1534.1897, 1547.591, 604.7969, 1649.6547, 2789.4463]
2025-08-07 06:01:11,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:01:11,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 23 minutes, 47 seconds)
2025-08-07 06:02:52,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:03:05,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2020.61719 ± 733.624
2025-08-07 06:03:05,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2738.9744, 3577.0142, 1393.8052, 1282.1696, 2314.7769, 2017.0374, 1502.809, 1373.7119, 2607.772, 1398.1016]
2025-08-07 06:03:05,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:03:05,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 21 minutes, 55 seconds)
2025-08-07 06:04:47,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:05:00,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2414.02075 ± 758.314
2025-08-07 06:05:00,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3500.7522, 2902.7095, 1603.6368, 3034.213, 3633.3113, 1915.5574, 2143.7825, 2288.726, 1593.609, 1523.9093]
2025-08-07 06:05:00,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:05:00,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 20 minutes, 4 seconds)
2025-08-07 06:06:41,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:06:54,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2237.85571 ± 1032.766
2025-08-07 06:06:54,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2856.6987, 2631.1836, 2863.8171, 1373.9452, 363.1409, 4391.1895, 2477.3464, 1420.2021, 2011.3157, 1989.7174]
2025-08-07 06:06:54,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:06:54,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 18 minutes, 9 seconds)
2025-08-07 06:08:35,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:08:48,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1699.40662 ± 1147.433
2025-08-07 06:08:48,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1852.5632, 1739.7944, 606.1388, 3813.268, 1349.9662, 1835.6113, -346.80832, 1509.5054, 1232.8788, 3401.1492]
2025-08-07 06:08:48,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:08:48,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 16 minutes, 15 seconds)
2025-08-07 06:10:30,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:10:43,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2521.49121 ± 952.252
2025-08-07 06:10:43,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3905.2031, 1446.4929, 3813.5146, 2312.8613, 3048.6577, 1902.0416, 1583.5757, 3708.4526, 1487.23, 2006.8816]
2025-08-07 06:10:43,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:10:43,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 14 minutes, 17 seconds)
2025-08-07 06:12:24,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:12:37,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1827.37830 ± 701.505
2025-08-07 06:12:37,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2713.9177, 534.4198, 3075.329, 1339.5823, 2104.3013, 1467.569, 1559.6202, 2108.548, 1317.305, 2053.1921]
2025-08-07 06:12:37,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:12:37,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 12 minutes, 23 seconds)
2025-08-07 06:14:18,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:14:31,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2222.20605 ± 632.794
2025-08-07 06:14:31,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1793.8076, 1362.3328, 2030.0402, 1412.1085, 2109.6123, 2869.7559, 2045.4395, 3077.7087, 3314.371, 2206.8843]
2025-08-07 06:14:31,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:14:31,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 10 minutes, 27 seconds)
2025-08-07 06:16:13,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:16:26,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1985.49829 ± 675.658
2025-08-07 06:16:26,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1599.1267, 2861.8367, 3403.9233, 1340.3967, 1365.8508, 1846.3137, 1613.0929, 2329.016, 1275.1218, 2220.3052]
2025-08-07 06:16:26,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:16:26,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 8 minutes, 36 seconds)
2025-08-07 06:18:06,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:18:19,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1986.52502 ± 967.622
2025-08-07 06:18:19,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1983.7483, 4119.2266, 1410.7101, 1851.1715, 1506.9457, 1731.9763, 476.98923, 2346.6199, 1309.6923, 3128.17]
2025-08-07 06:18:19,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:18:19,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 6 minutes, 37 seconds)
2025-08-07 06:20:00,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:20:13,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2020.40588 ± 906.550
2025-08-07 06:20:13,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3387.258, 1504.1803, 1453.9017, 1505.0485, 1443.6177, 1563.9554, 1151.1306, 1606.9764, 2667.3481, 3920.6426]
2025-08-07 06:20:13,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:20:13,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 4 minutes, 40 seconds)
2025-08-07 06:21:54,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:22:07,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2147.70044 ± 760.053
2025-08-07 06:22:07,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1272.6938, 1833.7102, 2960.9756, 1429.4998, 2679.9634, 1864.3353, 1993.0383, 3764.267, 2360.9258, 1317.5934]
2025-08-07 06:22:07,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:22:07,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 2 minutes, 42 seconds)
2025-08-07 06:23:47,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:24:00,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2837.69458 ± 720.054
2025-08-07 06:24:00,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2218.2588, 3453.6577, 1952.7185, 2424.8792, 3138.3845, 3812.1816, 1841.7415, 3842.7104, 3307.675, 2384.738]
2025-08-07 06:24:00,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:24:00,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 42 seconds)
2025-08-07 06:25:41,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:25:54,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2822.48364 ± 981.727
2025-08-07 06:25:54,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1704.0431, 3713.6846, 3848.273, 3302.934, 1389.796, 1479.847, 2187.578, 2950.2446, 3942.5762, 3705.8584]
2025-08-07 06:25:54,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:25:54,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 58 minutes, 43 seconds)
2025-08-07 06:27:35,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:27:48,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2296.29248 ± 748.898
2025-08-07 06:27:48,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4125.0117, 2087.0925, 1968.2874, 2674.3171, 1544.2758, 1291.5293, 2222.9126, 2847.3562, 2170.5325, 2031.6069]
2025-08-07 06:27:48,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:27:48,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 56 minutes, 50 seconds)
2025-08-07 06:29:28,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:29:41,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2229.30713 ± 994.681
2025-08-07 06:29:41,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3790.3777, 619.99243, 1411.2158, 2283.1123, 1616.2405, 2522.2766, 2429.111, 2014.7728, 1580.4785, 4025.4954]
2025-08-07 06:29:41,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:29:41,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 54 minutes, 55 seconds)
2025-08-07 06:31:22,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:31:35,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3311.57666 ± 1071.968
2025-08-07 06:31:35,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3883.2075, 4441.209, 4163.54, 1996.0717, 3894.969, 3688.815, 1323.6636, 3628.7566, 4222.0063, 1873.5277]
2025-08-07 06:31:35,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:31:35,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (3311.58) for latency ExtremeClogL1U23
2025-08-07 06:31:35,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 53 minutes, 1 second)
2025-08-07 06:33:15,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:33:28,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2755.97070 ± 1166.528
2025-08-07 06:33:28,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4525.23, 1889.2118, 1751.398, 1555.4673, 2810.6375, 4258.523, 4494.2856, 1593.205, 1955.57, 2726.1794]
2025-08-07 06:33:28,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:33:28,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 51 minutes, 6 seconds)
2025-08-07 06:35:09,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:35:22,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2313.36084 ± 999.626
2025-08-07 06:35:22,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4024.023, 1565.2202, 4151.0273, 2810.038, 932.44073, 1955.2241, 1780.0115, 2324.2166, 1679.472, 1911.9333]
2025-08-07 06:35:22,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:35:22,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 49 minutes, 13 seconds)
2025-08-07 06:37:02,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:37:15,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2497.00244 ± 487.737
2025-08-07 06:37:15,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2313.738, 2110.5647, 3065.1672, 2790.5225, 2139.8013, 3193.8599, 2511.5513, 3106.171, 1784.0847, 1954.5652]
2025-08-07 06:37:15,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:37:15,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 47 minutes, 17 seconds)
2025-08-07 06:38:56,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:39:09,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2438.70044 ± 760.278
2025-08-07 06:39:09,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2042.9946, 3280.8054, 3518.7449, 1363.7897, 2296.987, 2037.8533, 2074.836, 1514.6714, 2640.8857, 3615.437]
2025-08-07 06:39:09,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:39:09,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 45 minutes, 22 seconds)
2025-08-07 06:40:49,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:41:02,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2849.83350 ± 1156.208
2025-08-07 06:41:02,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1329.9316, 3019.166, 4087.111, 3044.7717, 1285.5975, 3993.4436, 2235.1433, 4046.828, 1337.9143, 4118.4272]
2025-08-07 06:41:02,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:41:02,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 43 minutes, 28 seconds)
2025-08-07 06:42:42,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:42:55,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2116.62354 ± 797.849
2025-08-07 06:42:55,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2855.0479, 3189.3748, 1703.5256, 1302.1691, 2200.4307, 3464.2034, 2472.9014, 1312.9259, 1258.4813, 1407.1777]
2025-08-07 06:42:55,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:42:55,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 41 minutes, 35 seconds)
2025-08-07 06:44:36,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:44:49,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2577.62866 ± 995.173
2025-08-07 06:44:49,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2577.7312, 2888.5857, 2814.7034, 1642.8776, 1518.684, 3618.1, 1314.9679, 1470.0765, 3943.5286, 3987.0315]
2025-08-07 06:44:49,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:44:49,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 39 minutes, 42 seconds)
2025-08-07 06:46:30,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:46:43,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3877.47900 ± 882.262
2025-08-07 06:46:43,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3128.1375, 4467.71, 3982.2, 4365.4834, 4354.4146, 1476.1383, 4363.9985, 4229.245, 4033.9773, 4373.4854]
2025-08-07 06:46:43,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:46:43,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (3877.48) for latency ExtremeClogL1U23
2025-08-07 06:46:43,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 37 minutes, 50 seconds)
2025-08-07 06:48:23,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:48:36,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3160.05542 ± 1150.223
2025-08-07 06:48:36,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1948.0408, 3847.2637, 4463.556, 4154.0864, 1380.0323, 1601.8944, 4354.559, 2484.2832, 3147.6912, 4219.1465]
2025-08-07 06:48:36,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:48:36,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 35 minutes, 57 seconds)
2025-08-07 06:50:17,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:50:30,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3044.96680 ± 1194.739
2025-08-07 06:50:30,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4000.7966, 4515.986, 1575.7189, 3100.3872, 1519.7363, 3187.2776, 4319.5874, 4399.3203, 2497.455, 1333.402]
2025-08-07 06:50:30,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:50:30,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes, 5 seconds)
2025-08-07 06:52:11,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:52:24,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2176.30078 ± 886.493
2025-08-07 06:52:24,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1704.0742, 3664.0776, 2889.9026, 2222.7197, 1777.522, 1144.5061, 1755.3103, 3730.814, 1421.8988, 1452.1844]
2025-08-07 06:52:24,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:52:24,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 11 seconds)
2025-08-07 06:54:04,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:54:17,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3784.51758 ± 1020.040
2025-08-07 06:54:17,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4616.6196, 4492.701, 1344.0685, 4275.2695, 4456.556, 4279.5376, 2937.2468, 4403.6147, 2786.2039, 4253.359]
2025-08-07 06:54:17,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:54:17,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 30 minutes, 18 seconds)
2025-08-07 06:55:58,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:56:11,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3247.62134 ± 1179.924
2025-08-07 06:56:11,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3483.7324, 4317.96, 2238.7917, 4412.585, 1734.0553, 4502.29, 1405.4436, 4570.5503, 3647.9934, 2162.808]
2025-08-07 06:56:11,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:56:11,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 28 minutes, 23 seconds)
2025-08-07 06:57:51,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:58:04,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4113.18896 ± 373.378
2025-08-07 06:58:04,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4433.5493, 4195.347, 4481.3115, 3109.4958, 4355.893, 3992.9314, 3975.0928, 4083.551, 4182.4863, 4322.236]
2025-08-07 06:58:04,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:58:04,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (4113.19) for latency ExtremeClogL1U23
2025-08-07 06:58:04,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 30 seconds)
2025-08-07 06:59:45,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:59:58,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2940.40674 ± 1274.491
2025-08-07 06:59:58,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1273.6674, 4222.708, 1311.5685, 1542.4303, 3820.145, 3490.8054, 3020.1685, 4902.413, 1842.6191, 3977.543]
2025-08-07 06:59:58,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:59:58,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 35 seconds)
2025-08-07 07:01:38,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:01:52,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2220.68506 ± 894.745
2025-08-07 07:01:52,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1316.6046, 3457.3376, 1339.9325, 1326.8981, 2163.801, 2800.7124, 3078.2607, 1310.5197, 3630.8137, 1781.9679]
2025-08-07 07:01:52,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:01:52,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 42 seconds)
2025-08-07 07:03:32,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:03:45,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1903.94592 ± 331.314
2025-08-07 07:03:45,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2334.2112, 1934.2241, 2353.0452, 1920.9233, 1757.3997, 1543.4672, 1545.731, 2327.4888, 1403.6459, 1919.3219]
2025-08-07 07:03:45,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:03:45,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 48 seconds)
2025-08-07 07:05:26,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:05:39,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3928.19092 ± 953.056
2025-08-07 07:05:39,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4352.0796, 2018.0934, 4597.054, 2141.7744, 3750.5464, 4363.019, 4593.4424, 4419.789, 4566.3525, 4479.7573]
2025-08-07 07:05:39,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:05:39,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 56 seconds)
2025-08-07 07:07:19,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:32,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2460.80615 ± 868.092
2025-08-07 07:07:32,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2734.2312, 1475.784, 2907.6465, 2356.123, 1812.5223, 3591.3704, 1881.6359, 2250.6638, 1389.5514, 4208.5347]
2025-08-07 07:07:32,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:07:32,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 2 seconds)
2025-08-07 07:09:13,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:09:26,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1420.36536 ± 470.971
2025-08-07 07:09:26,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1337.8192, 2354.6135, 1354.6631, 1323.3823, 1420.935, 1331.2065, 1690.219, 1239.8778, 1773.1523, 377.784]
2025-08-07 07:09:26,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:09:26,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 9 seconds)
2025-08-07 07:11:06,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:20,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3079.48096 ± 1244.073
2025-08-07 07:11:20,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4541.066, 2372.3064, 1564.537, 4311.202, 4666.0264, 3233.485, 1831.8063, 2294.8052, 1523.6135, 4455.96]
2025-08-07 07:11:20,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:11:20,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 15 seconds)
2025-08-07 07:13:00,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:13:13,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3802.22803 ± 1358.480
2025-08-07 07:13:13,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4878.21, 2400.6108, 1423.5925, 4798.9004, 4046.9104, 4736.4214, 4852.2227, 4515.671, 4824.1997, 1545.539]
2025-08-07 07:13:13,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:13:13,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 21 seconds)
2025-08-07 07:14:53,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:06,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4176.15869 ± 897.011
2025-08-07 07:15:06,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4465.913, 4504.4717, 4536.812, 4485.8965, 4415.576, 4390.0195, 4531.109, 1488.354, 4487.3286, 4456.1045]
2025-08-07 07:15:06,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:15:06,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (4176.16) for latency ExtremeClogL1U23
2025-08-07 07:15:06,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 27 seconds)
2025-08-07 07:16:46,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:16:59,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3655.71826 ± 1186.700
2025-08-07 07:16:59,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4110.9707, 4498.647, 1730.2754, 4441.1733, 4548.84, 4389.3516, 4625.5796, 4372.0615, 2035.0338, 1805.2482]
2025-08-07 07:16:59,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:16:59,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 33 seconds)
2025-08-07 07:18:39,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:18:52,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4002.71021 ± 1267.864
2025-08-07 07:18:52,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4621.0806, 4472.3784, 1424.4785, 4507.1865, 4707.795, 1522.3121, 4667.641, 4625.1064, 4688.9497, 4790.1714]
2025-08-07 07:18:52,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:18:52,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 39 seconds)
2025-08-07 07:20:31,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:20:44,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3372.27026 ± 1115.162
2025-08-07 07:20:44,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2713.9824, 1911.9133, 4230.2383, 2005.1383, 1615.9069, 3826.871, 4428.5654, 4514.0283, 4403.5596, 4072.5002]
2025-08-07 07:20:44,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:20:44,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 45 seconds)
2025-08-07 07:22:23,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:22:36,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2443.99341 ± 1047.871
2025-08-07 07:22:36,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1364.1508, 1322.9398, 1339.6023, 2633.7573, 2027.4963, 2540.237, 4687.3228, 2321.905, 3878.3235, 2324.1992]
2025-08-07 07:22:36,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:22:36,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 52 seconds)
2025-08-07 07:24:14,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:24:27,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3391.19775 ± 1359.287
2025-08-07 07:24:27,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1385.6298, 4743.3784, 4690.936, 4772.5894, 1868.6853, 4687.7354, 4660.8994, 2080.3955, 2237.9292, 2783.797]
2025-08-07 07:24:27,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:24:27,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1251 [DEBUG]: Training session finished
