2025-08-07 04:16:28,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc5-halfcheetah/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:16:28,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc5-halfcheetah/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:16:28,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1501fd53b010>}
2025-08-07 04:16:28,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 04:16:28,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 04:16:28,341 baseline-bpql-noiseperc5-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 04:16:28,341 baseline-bpql-noiseperc5-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 04:16:29,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 04:16:29,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 04:18:10,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:18:23,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: -451.91757 ± 53.374
2025-08-07 04:18:24,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [-382.85022, -492.11594, -347.21613, -489.1393, -438.6793, -448.4088, -419.76056, -527.12537, -472.3919, -501.48798]
2025-08-07 04:18:24,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:18:24,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (-451.92) for latency ExtremeClogL1U23
2025-08-07 04:18:24,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 8 minutes, 54 seconds)
2025-08-07 04:20:12,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:20:25,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: -232.05791 ± 36.636
2025-08-07 04:20:25,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [-146.59732, -243.2149, -181.66226, -235.39534, -249.628, -243.68054, -230.94669, -270.4896, -262.40146, -256.56287]
2025-08-07 04:20:25,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:20:25,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (-232.06) for latency ExtremeClogL1U23
2025-08-07 04:20:25,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 12 minutes, 35 seconds)
2025-08-07 04:22:13,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:22:26,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: -131.62376 ± 84.545
2025-08-07 04:22:26,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [-128.75066, -201.94846, -71.037605, -124.2724, -176.6854, -169.47559, 26.58062, -12.860284, -203.58696, -254.20108]
2025-08-07 04:22:26,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:22:26,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (-131.62) for latency ExtremeClogL1U23
2025-08-07 04:22:26,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 12 minutes, 11 seconds)
2025-08-07 04:24:14,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:24:27,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: -191.78880 ± 101.695
2025-08-07 04:24:27,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [-121.40932, -135.98521, -270.97464, -118.96415, -464.41327, -143.25465, -195.33623, -170.3874, -186.61502, -110.548195]
2025-08-07 04:24:27,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:24:27,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 11 minutes, 2 seconds)
2025-08-07 04:26:15,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:26:29,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 29.75850 ± 149.284
2025-08-07 04:26:29,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [-48.36512, -227.77315, 45.1462, 265.03146, -100.55893, 62.72584, 26.571419, -87.739105, 273.5806, 88.9658]
2025-08-07 04:26:29,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:26:29,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (29.76) for latency ExtremeClogL1U23
2025-08-07 04:26:29,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 9 minutes, 50 seconds)
2025-08-07 04:28:16,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:28:29,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 179.96298 ± 141.707
2025-08-07 04:28:29,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [18.110125, 1.5588384, 195.36597, 330.86664, 315.5572, 296.74133, 124.808304, 226.66396, 345.07877, -55.121353]
2025-08-07 04:28:29,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:28:29,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (179.96) for latency ExtremeClogL1U23
2025-08-07 04:28:29,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 9 minutes, 40 seconds)
2025-08-07 04:30:18,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:30:31,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 252.01094 ± 177.300
2025-08-07 04:30:31,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [337.18146, 261.1206, 660.5633, 94.44966, 123.10381, 434.1529, 163.95192, 217.17323, 210.47557, 17.93694]
2025-08-07 04:30:31,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:30:31,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (252.01) for latency ExtremeClogL1U23
2025-08-07 04:30:31,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 7 minutes, 58 seconds)
2025-08-07 04:32:22,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:32:35,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 294.09943 ± 367.606
2025-08-07 04:32:35,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [-731.5846, 410.59274, 295.20093, 273.31894, 339.8879, 394.6226, 297.5739, 531.88135, 377.5986, 751.9017]
2025-08-07 04:32:35,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:32:35,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (294.10) for latency ExtremeClogL1U23
2025-08-07 04:32:35,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 6 minutes, 47 seconds)
2025-08-07 04:34:26,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:34:39,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 656.23108 ± 163.970
2025-08-07 04:34:39,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [538.1054, 761.7442, 604.5807, 653.88403, 1104.9099, 553.2724, 637.7753, 633.3356, 542.85706, 531.84656]
2025-08-07 04:34:39,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:34:39,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (656.23) for latency ExtremeClogL1U23
2025-08-07 04:34:39,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 5 minutes, 50 seconds)
2025-08-07 04:36:31,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:36:44,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 876.78772 ± 109.387
2025-08-07 04:36:44,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [917.2055, 873.34076, 838.43384, 860.61383, 672.8533, 768.13666, 921.5707, 1110.4183, 952.4338, 852.8703]
2025-08-07 04:36:44,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:36:44,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (876.79) for latency ExtremeClogL1U23
2025-08-07 04:36:44,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 4 minutes, 33 seconds)
2025-08-07 04:38:34,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:38:47,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1033.10999 ± 143.274
2025-08-07 04:38:47,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [812.7296, 1296.0844, 1074.4584, 903.0581, 1188.9303, 935.3164, 1077.7747, 1056.774, 871.93164, 1114.0421]
2025-08-07 04:38:47,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:38:47,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1033.11) for latency ExtremeClogL1U23
2025-08-07 04:38:47,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 3 minutes, 28 seconds)
2025-08-07 04:40:39,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:40:54,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 943.81659 ± 162.422
2025-08-07 04:40:54,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1074.7493, 1305.216, 804.62244, 940.9607, 679.202, 990.32635, 1009.22125, 943.65497, 826.06586, 864.14703]
2025-08-07 04:40:54,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:40:54,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 2 minutes, 36 seconds)
2025-08-07 04:42:45,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:42:58,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1046.74146 ± 110.032
2025-08-07 04:42:58,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1016.7938, 1021.22034, 1018.1873, 1153.2614, 970.7657, 998.65656, 980.18677, 923.3927, 1056.1766, 1328.7736]
2025-08-07 04:42:58,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:42:58,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1046.74) for latency ExtremeClogL1U23
2025-08-07 04:42:58,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 49 seconds)
2025-08-07 04:44:48,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:45:02,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 964.22900 ± 76.680
2025-08-07 04:45:02,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [974.6291, 825.02094, 1082.1981, 876.3761, 1085.612, 944.4378, 961.3613, 927.8525, 965.6761, 999.1257]
2025-08-07 04:45:02,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:45:02,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 58 minutes, 24 seconds)
2025-08-07 04:46:52,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:47:05,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 917.77600 ± 330.409
2025-08-07 04:47:05,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1123.5347, 932.8343, 932.0812, 1009.3324, 1121.1085, 934.5029, 939.0354, 1128.7758, -42.044476, 1098.599]
2025-08-07 04:47:05,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:47:05,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 56 minutes, 9 seconds)
2025-08-07 04:48:56,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:49:09,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 993.71716 ± 69.373
2025-08-07 04:49:09,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [932.47784, 1098.4012, 989.05676, 892.23254, 1035.2052, 963.4016, 1105.2856, 914.53125, 1031.1691, 975.40967]
2025-08-07 04:49:09,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:49:09,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 53 minutes, 57 seconds)
2025-08-07 04:51:00,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:51:14,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1151.20874 ± 148.591
2025-08-07 04:51:14,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1044.5171, 1174.9626, 1055.3115, 1171.3396, 1211.4413, 1070.9395, 973.4289, 1269.7207, 1030.2161, 1510.2107]
2025-08-07 04:51:14,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:51:14,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1151.21) for latency ExtremeClogL1U23
2025-08-07 04:51:14,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 51 minutes, 41 seconds)
2025-08-07 04:53:05,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:53:18,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1043.95081 ± 360.466
2025-08-07 04:53:18,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [969.8396, 1812.8733, 1090.4985, 1219.8136, 243.21959, 1037.2998, 929.1283, 981.9686, 1144.697, 1010.1697]
2025-08-07 04:53:18,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:53:18,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 49 minutes, 24 seconds)
2025-08-07 04:55:08,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:55:21,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1054.48840 ± 96.913
2025-08-07 04:55:21,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [978.5866, 921.8069, 1031.2244, 1228.7037, 1010.3334, 987.12006, 1085.8507, 1055.7429, 1229.6892, 1015.8254]
2025-08-07 04:55:21,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:55:21,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 47 minutes, 21 seconds)
2025-08-07 04:57:13,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:57:27,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1181.05859 ± 169.842
2025-08-07 04:57:27,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1609.7372, 1300.8864, 1140.8512, 1261.439, 1088.1965, 1082.4402, 1122.1619, 997.4737, 1017.76227, 1189.6376]
2025-08-07 04:57:27,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:57:27,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1181.06) for latency ExtremeClogL1U23
2025-08-07 04:57:27,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 45 minutes, 48 seconds)
2025-08-07 04:59:19,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:59:32,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1221.40015 ± 224.031
2025-08-07 04:59:32,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [980.3207, 1612.2872, 949.4593, 1511.1543, 1084.0089, 1182.7076, 1205.7482, 1107.173, 1078.6263, 1502.5164]
2025-08-07 04:59:32,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:59:32,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1221.40) for latency ExtremeClogL1U23
2025-08-07 04:59:32,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 44 minutes, 11 seconds)
2025-08-07 05:01:23,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:01:38,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1140.75317 ± 150.202
2025-08-07 05:01:38,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1266.0212, 1124.0432, 1105.6373, 1049.1466, 990.29877, 1522.6711, 1068.5503, 989.73303, 1104.2267, 1187.2032]
2025-08-07 05:01:38,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:01:38,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 42 minutes, 2 seconds)
2025-08-07 05:03:29,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:03:42,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1168.01611 ± 391.990
2025-08-07 05:03:42,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [106.11566, 1420.643, 1261.0614, 1472.8031, 1064.3048, 1475.0752, 1222.833, 1478.8542, 974.17346, 1204.2969]
2025-08-07 05:03:42,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:03:42,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 40 minutes, 7 seconds)
2025-08-07 05:05:32,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:05:45,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1449.42749 ± 381.771
2025-08-07 05:05:45,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1072.7524, 1333.6759, 1769.9984, 1128.5896, 1801.7714, 1328.3373, 992.4479, 2299.178, 1479.0795, 1288.4465]
2025-08-07 05:05:45,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:05:45,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1449.43) for latency ExtremeClogL1U23
2025-08-07 05:05:45,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 38 minutes, 3 seconds)
2025-08-07 05:07:36,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:07:51,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1356.24915 ± 222.626
2025-08-07 05:07:51,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1251.9229, 1407.6958, 1737.5592, 1194.0819, 1097.2511, 1401.09, 984.00305, 1662.2274, 1449.3993, 1377.2611]
2025-08-07 05:07:51,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:07:51,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 35 minutes, 54 seconds)
2025-08-07 05:09:42,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:09:55,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1477.58740 ± 346.178
2025-08-07 05:09:55,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1834.8623, 1404.7872, 1146.1202, 979.53046, 1513.8383, 2038.4928, 1204.4303, 1628.8646, 1894.3452, 1130.6027]
2025-08-07 05:09:55,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:09:55,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1477.59) for latency ExtremeClogL1U23
2025-08-07 05:09:55,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 33 minutes, 43 seconds)
2025-08-07 05:11:46,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:11:59,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1314.76880 ± 155.753
2025-08-07 05:11:59,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1093.1221, 1359.8379, 1251.5919, 1404.2355, 1315.4305, 1651.6261, 1413.9437, 1196.7522, 1112.2065, 1348.9427]
2025-08-07 05:11:59,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:11:59,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 31 minutes, 11 seconds)
2025-08-07 05:13:49,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:14:02,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1236.50635 ± 386.594
2025-08-07 05:14:02,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1417.3441, 1156.4114, 1762.6991, 1670.2478, 1007.0339, 322.08054, 1493.0083, 1049.05, 1260.2239, 1226.9648]
2025-08-07 05:14:02,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:14:02,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 28 minutes, 52 seconds)
2025-08-07 05:15:52,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:16:05,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1333.18030 ± 231.558
2025-08-07 05:16:05,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1144.5735, 1198.7039, 1089.355, 1804.2966, 1532.8666, 1567.5615, 1371.3666, 1012.9241, 1286.0658, 1324.09]
2025-08-07 05:16:05,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:16:05,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 26 minutes, 38 seconds)
2025-08-07 05:17:55,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:18:08,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1288.69751 ± 257.266
2025-08-07 05:18:08,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1077.966, 1066.4268, 1358.4203, 1047.0795, 1497.3943, 1456.593, 1893.225, 1113.5698, 1279.5806, 1096.7195]
2025-08-07 05:18:08,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:18:08,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 23 minutes, 56 seconds)
2025-08-07 05:19:58,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:20:12,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1457.99670 ± 393.886
2025-08-07 05:20:12,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1025.5845, 1270.1792, 1286.2003, 1396.6587, 2019.6564, 1324.4515, 1644.3894, 1038.2721, 1270.1519, 2304.4219]
2025-08-07 05:20:12,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:20:12,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 21 minutes, 55 seconds)
2025-08-07 05:22:03,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:22:16,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1393.14429 ± 120.246
2025-08-07 05:22:16,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1416.4874, 1336.2872, 1315.2205, 1222.7302, 1291.4658, 1531.6224, 1379.3359, 1665.2523, 1355.9824, 1417.0582]
2025-08-07 05:22:16,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:22:16,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 19 minutes, 59 seconds)
2025-08-07 05:24:07,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:24:20,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1463.18469 ± 489.976
2025-08-07 05:24:20,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1767.944, 1367.2542, 1231.01, 2476.278, 1221.2845, 962.2975, 1108.0527, 2215.114, 1069.5764, 1213.0352]
2025-08-07 05:24:20,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:24:20,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 17 minutes, 57 seconds)
2025-08-07 05:26:11,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:26:24,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1742.01440 ± 317.142
2025-08-07 05:26:24,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1919.922, 2311.4146, 1181.7715, 1596.8086, 1807.4543, 1441.7079, 1905.6959, 1427.5386, 2047.547, 1780.2854]
2025-08-07 05:26:24,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:26:24,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1742.01) for latency ExtremeClogL1U23
2025-08-07 05:26:24,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 16 minutes, 17 seconds)
2025-08-07 05:28:15,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:28:28,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1679.86682 ± 456.042
2025-08-07 05:28:28,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2324.474, 2009.8981, 1335.7513, 1514.693, 1647.7828, 1729.1418, 794.78705, 1408.3601, 2418.0767, 1615.7034]
2025-08-07 05:28:28,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:28:28,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 14 minutes, 25 seconds)
2025-08-07 05:30:19,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:30:34,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1563.89294 ± 332.810
2025-08-07 05:30:34,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1295.5289, 1547.0305, 1500.1691, 1892.9558, 1524.1017, 2352.2195, 1117.663, 1322.0947, 1687.6698, 1399.4961]
2025-08-07 05:30:34,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:30:34,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 12 minutes, 30 seconds)
2025-08-07 05:32:23,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:32:37,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1354.88611 ± 332.646
2025-08-07 05:32:37,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2304.1592, 1257.7579, 1189.8209, 1180.055, 1361.8448, 1441.0275, 1141.3752, 1149.2495, 1374.6869, 1148.885]
2025-08-07 05:32:37,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:32:37,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 10 minutes, 12 seconds)
2025-08-07 05:34:28,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:34:41,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1416.37415 ± 163.379
2025-08-07 05:34:41,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1448.004, 1618.0199, 1331.9067, 1409.1954, 1111.9932, 1266.7247, 1443.7896, 1382.0272, 1734.3435, 1417.7377]
2025-08-07 05:34:41,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:34:41,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 8 minutes, 21 seconds)
2025-08-07 05:36:31,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:36:46,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1656.13208 ± 343.667
2025-08-07 05:36:46,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1687.4747, 1182.0312, 1343.5542, 1297.703, 2252.2217, 1341.7781, 2031.4834, 1919.8829, 1904.1501, 1601.0404]
2025-08-07 05:36:46,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:36:46,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 6 minutes, 17 seconds)
2025-08-07 05:38:36,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:38:49,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1399.88843 ± 174.618
2025-08-07 05:38:49,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1269.395, 1361.9133, 1394.4829, 1280.6232, 1706.6857, 1206.7577, 1542.3866, 1244.7073, 1693.3457, 1298.5867]
2025-08-07 05:38:49,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:38:49,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 4 minutes, 14 seconds)
2025-08-07 05:40:40,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:40:54,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1508.72937 ± 375.204
2025-08-07 05:40:54,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1692.1301, 1357.2094, 1227.6749, 1261.3965, 1131.5392, 1272.4886, 1915.3491, 1258.2605, 1579.4158, 2391.829]
2025-08-07 05:40:54,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:40:54,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 2 minutes, 6 seconds)
2025-08-07 05:42:46,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:42:59,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1459.56299 ± 251.351
2025-08-07 05:42:59,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1458.6345, 1217.4231, 1787.9265, 1518.7236, 1985.9645, 1415.409, 1319.1527, 1155.3302, 1204.3304, 1532.7351]
2025-08-07 05:42:59,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:42:59,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 18 seconds)
2025-08-07 05:44:50,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:45:03,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1522.49585 ± 274.059
2025-08-07 05:45:03,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1332.8195, 1464.8918, 1747.5812, 1588.7399, 1538.0483, 1152.0786, 2056.331, 1218.8759, 1826.3014, 1299.291]
2025-08-07 05:45:03,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:45:03,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 58 minutes, 12 seconds)
2025-08-07 05:46:54,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:47:07,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1453.37964 ± 221.782
2025-08-07 05:47:07,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1642.7417, 1515.3905, 1263.2874, 1613.9917, 1280.8196, 1403.8143, 1384.6064, 1348.6777, 1944.725, 1135.7435]
2025-08-07 05:47:07,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:47:07,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 55 minutes, 58 seconds)
2025-08-07 05:48:58,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:49:11,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1555.15662 ± 664.384
2025-08-07 05:49:11,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [98.02749, 1479.6107, 2548.0276, 1604.509, 1268.1761, 1167.6292, 2277.7441, 1784.8842, 2144.622, 1178.3359]
2025-08-07 05:49:11,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:49:11,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 54 minutes, 1 second)
2025-08-07 05:51:03,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:51:16,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1676.87268 ± 470.632
2025-08-07 05:51:16,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1229.9425, 1541.1559, 2729.522, 1700.9796, 1187.4818, 1290.0013, 1440.2211, 1560.642, 2326.6145, 1762.1674]
2025-08-07 05:51:16,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:51:16,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 51 minutes, 50 seconds)
2025-08-07 05:53:06,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:53:21,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1447.98364 ± 371.407
2025-08-07 05:53:21,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1241.2244, 1188.6343, 1384.5753, 1775.7502, 2315.3494, 1810.0499, 1259.072, 1154.1589, 1130.4987, 1220.5221]
2025-08-07 05:53:21,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:53:21,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 49 minutes, 55 seconds)
2025-08-07 05:55:12,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:55:25,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1794.41992 ± 582.334
2025-08-07 05:55:25,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2569.2615, 2738.6836, 2002.266, 1404.5334, 1321.9191, 2548.3105, 1507.1058, 1141.791, 1449.8438, 1260.4851]
2025-08-07 05:55:25,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:55:25,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1794.42) for latency ExtremeClogL1U23
2025-08-07 05:55:25,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 47 minutes, 52 seconds)
2025-08-07 05:57:17,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:57:30,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1684.95251 ± 399.888
2025-08-07 05:57:30,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1367.6089, 1513.2296, 1572.2809, 1229.4943, 2296.5984, 1314.5111, 1541.1221, 1505.7927, 2120.2725, 2388.6155]
2025-08-07 05:57:30,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:57:30,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 45 minutes, 55 seconds)
2025-08-07 05:59:20,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:59:33,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1535.13611 ± 460.846
2025-08-07 05:59:33,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1230.7897, 1256.9137, 2038.3801, 1976.2422, 2259.9414, 1521.4631, 1462.3555, 1177.2914, 647.9603, 1780.0245]
2025-08-07 05:59:33,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:59:33,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 43 minutes, 39 seconds)
2025-08-07 06:01:24,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:01:37,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1713.38086 ± 437.816
2025-08-07 06:01:37,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1396.0496, 2413.0889, 2283.7998, 2278.4731, 1300.2239, 1895.9888, 1347.426, 1569.6161, 1222.4674, 1426.6757]
2025-08-07 06:01:37,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:01:37,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 41 minutes, 32 seconds)
2025-08-07 06:03:27,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:03:40,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1533.68579 ± 310.729
2025-08-07 06:03:40,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1233.2625, 1160.5985, 1971.7595, 2011.0858, 1564.0, 1606.3491, 1129.9535, 1846.6375, 1314.9019, 1498.3094]
2025-08-07 06:03:40,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:03:40,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 39 minutes, 3 seconds)
2025-08-07 06:05:31,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:05:44,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1826.65625 ± 531.198
2025-08-07 06:05:44,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1792.9725, 2232.2566, 1565.4468, 1631.1295, 3092.9482, 1397.3414, 1395.3334, 1290.6453, 1570.0293, 2298.4602]
2025-08-07 06:05:44,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:05:44,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1826.66) for latency ExtremeClogL1U23
2025-08-07 06:05:44,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 36 minutes, 54 seconds)
2025-08-07 06:07:35,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:07:48,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1792.06274 ± 447.544
2025-08-07 06:07:48,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1690.6749, 1730.6007, 2659.6206, 1803.4696, 1729.4484, 1911.2386, 1163.2242, 1309.511, 2477.3335, 1445.5032]
2025-08-07 06:07:48,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:07:48,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 34 minutes, 49 seconds)
2025-08-07 06:09:38,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:09:53,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1740.26404 ± 549.694
2025-08-07 06:09:53,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2483.8628, 2772.8733, 1474.5511, 1287.5205, 2274.795, 1724.1268, 1739.9507, 1165.1118, 1171.492, 1308.3545]
2025-08-07 06:09:53,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:09:53,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 32 minutes, 52 seconds)
2025-08-07 06:11:42,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:11:55,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1582.75562 ± 356.384
2025-08-07 06:11:55,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1660.3562, 1299.4528, 1596.2059, 1881.8994, 1192.0127, 1468.8331, 1797.9647, 1197.8386, 2401.7512, 1331.2418]
2025-08-07 06:11:55,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:11:55,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 30 minutes, 38 seconds)
2025-08-07 06:13:46,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:14:00,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1628.69897 ± 431.238
2025-08-07 06:14:00,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1412.7415, 1130.3375, 1938.0115, 1519.6648, 2631.9446, 2076.134, 1265.2804, 1514.1191, 1402.6678, 1396.0885]
2025-08-07 06:14:00,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:14:00,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 28 minutes, 53 seconds)
2025-08-07 06:15:52,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:16:05,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1556.93323 ± 621.107
2025-08-07 06:16:05,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1549.7307, 1301.9751, 2389.2888, 2309.2751, 1421.1106, 1261.6173, 1443.0223, 152.2536, 1520.2421, 2220.817]
2025-08-07 06:16:05,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:16:05,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 26 minutes, 54 seconds)
2025-08-07 06:17:55,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:18:10,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1921.18164 ± 653.221
2025-08-07 06:18:10,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2772.5398, 1931.2893, 1842.4825, 1693.6115, 1637.8779, 1409.3109, 3472.3044, 1801.0828, 1247.8585, 1403.4579]
2025-08-07 06:18:10,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:18:10,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1921.18) for latency ExtremeClogL1U23
2025-08-07 06:18:10,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 24 minutes, 56 seconds)
2025-08-07 06:20:00,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:20:13,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1858.33472 ± 496.323
2025-08-07 06:20:13,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2218.3884, 1251.5189, 1684.5571, 1899.4672, 1305.7771, 1546.6545, 3023.8494, 2196.3882, 1839.4098, 1617.3384]
2025-08-07 06:20:13,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:20:13,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 22 minutes, 46 seconds)
2025-08-07 06:22:04,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:22:17,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1521.34009 ± 620.348
2025-08-07 06:22:17,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1353.4485, 1180.6362, 1158.6461, 1275.7125, 1507.8691, 1660.9878, 1247.0424, 3328.4985, 1202.968, 1297.5906]
2025-08-07 06:22:17,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:22:17,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 20 minutes, 47 seconds)
2025-08-07 06:24:09,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:24:22,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2160.87549 ± 910.763
2025-08-07 06:24:22,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2119.029, 895.5979, 2568.6365, 1544.5922, 3485.2756, 2252.122, 3585.0752, 2804.608, 1225.209, 1128.6097]
2025-08-07 06:24:22,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:24:22,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (2160.88) for latency ExtremeClogL1U23
2025-08-07 06:24:22,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 18 minutes, 43 seconds)
2025-08-07 06:26:13,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:26:26,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1988.21777 ± 596.988
2025-08-07 06:26:26,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1320.6084, 1478.2817, 2231.8608, 1477.8357, 2668.2708, 3051.29, 2477.249, 1646.9634, 1269.7179, 2260.1003]
2025-08-07 06:26:26,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:26:26,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 16 minutes, 34 seconds)
2025-08-07 06:28:17,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:28:30,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1801.21753 ± 433.624
2025-08-07 06:28:30,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2066.0234, 2847.9067, 1481.5514, 1729.3273, 2182.4685, 1715.1233, 1712.6172, 1446.4149, 1308.1166, 1522.6277]
2025-08-07 06:28:30,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:28:30,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 14 minutes, 25 seconds)
2025-08-07 06:30:20,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:30:33,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2044.68494 ± 744.034
2025-08-07 06:30:33,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1390.7983, 3186.3916, 1900.9684, 1434.737, 1987.4374, 3351.8625, 1254.3431, 1767.3436, 1381.1205, 2791.847]
2025-08-07 06:30:33,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:30:33,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 12 minutes, 19 seconds)
2025-08-07 06:32:24,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:32:37,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2319.31421 ± 718.582
2025-08-07 06:32:37,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2711.1946, 1579.1456, 3201.5496, 2863.8474, 2929.7173, 3171.8765, 1492.6384, 1279.173, 1597.924, 2366.076]
2025-08-07 06:32:37,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:32:37,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (2319.31) for latency ExtremeClogL1U23
2025-08-07 06:32:37,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 10 minutes, 18 seconds)
2025-08-07 06:34:29,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:34:42,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2064.27148 ± 562.868
2025-08-07 06:34:42,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1783.1655, 2326.1438, 1622.7908, 2930.3472, 1407.755, 2299.3562, 2127.719, 3051.6316, 1353.1451, 1740.66]
2025-08-07 06:34:42,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:34:42,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 8 minutes, 13 seconds)
2025-08-07 06:36:33,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:36:46,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1421.95361 ± 671.211
2025-08-07 06:36:46,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1618.9248, 2458.7217, 1580.2402, 2109.9705, -97.60816, 810.9292, 1691.6346, 1184.042, 1229.8335, 1632.8469]
2025-08-07 06:36:46,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:36:46,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 6 minutes, 12 seconds)
2025-08-07 06:38:38,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:38:51,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1890.97034 ± 784.806
2025-08-07 06:38:51,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3181.7603, 1424.3467, 1289.3754, 1469.6837, 1936.9608, 1407.9197, 1178.7072, 3323.3682, 1164.0172, 2533.5635]
2025-08-07 06:38:51,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:38:51,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 4 minutes, 9 seconds)
2025-08-07 06:40:42,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:40:56,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2127.83911 ± 735.793
2025-08-07 06:40:56,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3112.8892, 1219.5155, 1711.3745, 1632.4409, 3106.4792, 2740.1206, 1136.2684, 2664.3613, 1452.7003, 2502.24]
2025-08-07 06:40:56,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:40:56,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 2 minutes, 14 seconds)
2025-08-07 06:42:46,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:42:59,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2439.40649 ± 767.001
2025-08-07 06:42:59,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1259.3638, 3500.8926, 3093.2083, 2280.8481, 3198.6917, 2325.9395, 1266.991, 3159.0056, 2519.9517, 1789.1732]
2025-08-07 06:42:59,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:42:59,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (2439.41) for latency ExtremeClogL1U23
2025-08-07 06:42:59,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 7 seconds)
2025-08-07 06:44:50,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:45:04,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1982.35999 ± 563.593
2025-08-07 06:45:04,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1886.017, 2477.6074, 2566.6667, 1307.325, 2376.1567, 2865.9912, 1567.2319, 1280.8077, 2203.0881, 1292.7063]
2025-08-07 06:45:04,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:45:04,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 58 minutes, 4 seconds)
2025-08-07 06:46:56,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:47:11,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2691.40869 ± 441.815
2025-08-07 06:47:11,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2184.1094, 2721.496, 2763.9446, 2733.0667, 3259.712, 3336.2822, 2276.9949, 3009.8723, 2756.0862, 1872.5219]
2025-08-07 06:47:11,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:47:11,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (2691.41) for latency ExtremeClogL1U23
2025-08-07 06:47:11,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 56 minutes, 10 seconds)
2025-08-07 06:49:00,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:49:13,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1872.61890 ± 396.238
2025-08-07 06:49:13,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1476.0183, 2176.972, 2137.7952, 1637.0479, 1286.948, 2551.2095, 2258.2178, 1475.3219, 2068.401, 1658.257]
2025-08-07 06:49:13,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:49:13,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 53 minutes, 56 seconds)
2025-08-07 06:51:04,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:51:17,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1990.84204 ± 497.789
2025-08-07 06:51:17,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1907.2428, 1936.5635, 2376.7302, 1171.2921, 3015.3357, 2416.5376, 1856.9337, 1366.1964, 1905.3187, 1956.2704]
2025-08-07 06:51:17,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:51:17,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 51 minutes, 47 seconds)
2025-08-07 06:53:08,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:53:21,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2542.66211 ± 993.617
2025-08-07 06:53:21,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1430.2761, 3572.802, 1632.7898, 2336.4438, 3636.759, 1569.4253, 3957.0845, 2139.5178, 3669.8865, 1481.6372]
2025-08-07 06:53:21,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:53:21,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 49 minutes, 43 seconds)
2025-08-07 06:55:12,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:25,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2410.83130 ± 906.806
2025-08-07 06:55:25,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2892.3608, 1163.6656, 1289.011, 2937.807, 1237.1702, 3028.0547, 2034.4315, 2728.765, 2710.3303, 4086.716]
2025-08-07 06:55:25,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:55:25,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 47 minutes, 34 seconds)
2025-08-07 06:57:16,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:57:29,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2951.68457 ± 972.855
2025-08-07 06:57:29,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3766.7104, 3702.4185, 3632.0742, 3863.051, 1974.8333, 1358.8447, 1227.6696, 3265.4028, 3509.263, 3216.5786]
2025-08-07 06:57:29,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:57:29,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (2951.68) for latency ExtremeClogL1U23
2025-08-07 06:57:29,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 45 minutes, 19 seconds)
2025-08-07 06:59:18,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:59:31,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1905.98865 ± 765.004
2025-08-07 06:59:31,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1352.013, 1880.2333, 1962.3625, 1166.6157, 1815.6383, 1676.4211, 1180.7875, 3315.7957, 1350.687, 3359.331]
2025-08-07 06:59:31,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:59:31,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 43 minutes, 13 seconds)
2025-08-07 07:01:22,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:01:37,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2517.25977 ± 828.814
2025-08-07 07:01:37,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3444.998, 2742.7058, 1425.3295, 3443.026, 3574.975, 2384.4412, 3027.6428, 2218.9434, 1166.3889, 1744.1464]
2025-08-07 07:01:37,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:01:37,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 41 minutes, 18 seconds)
2025-08-07 07:03:28,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:03:41,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2039.61523 ± 642.042
2025-08-07 07:03:41,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1285.7413, 1554.8556, 2031.706, 2838.5613, 1882.0117, 1878.8086, 1546.4739, 3388.2466, 1463.9375, 2525.8088]
2025-08-07 07:03:41,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:03:41,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 39 minutes, 15 seconds)
2025-08-07 07:05:31,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:05:44,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2044.78809 ± 881.511
2025-08-07 07:05:44,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1351.645, 1574.0643, 1456.0433, 1873.9701, 2136.7239, 3599.466, 1584.28, 1381.7739, 3892.823, 1597.0923]
2025-08-07 07:05:44,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:05:44,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 37 minutes, 9 seconds)
2025-08-07 07:07:35,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:48,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1847.74438 ± 421.533
2025-08-07 07:07:48,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2029.8464, 1883.391, 1417.0723, 2611.8333, 1566.3661, 1937.5441, 1383.2853, 1218.0764, 2170.6301, 2259.399]
2025-08-07 07:07:48,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:07:48,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 35 minutes, 5 seconds)
2025-08-07 07:09:39,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:09:52,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2194.31128 ± 966.471
2025-08-07 07:09:52,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1190.231, 3886.1401, 1378.8276, 2275.4497, 2233.7493, 1317.14, 3754.2275, 1414.9902, 1545.6235, 2946.7341]
2025-08-07 07:09:52,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:09:52,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 33 minutes, 6 seconds)
2025-08-07 07:11:44,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:57,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2289.18896 ± 975.705
2025-08-07 07:11:57,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1032.8444, 3924.949, 2297.531, 2094.5251, 1257.401, 3871.918, 1570.6962, 2388.2717, 2976.7114, 1477.043]
2025-08-07 07:11:57,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:11:57,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 31 minutes, 2 seconds)
2025-08-07 07:13:48,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:14:01,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1748.55688 ± 516.093
2025-08-07 07:14:01,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3079.356, 1439.5175, 1861.6794, 1637.8497, 1826.9205, 2114.5645, 1237.3545, 1327.2377, 1637.6074, 1323.4801]
2025-08-07 07:14:01,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:14:01,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 55 seconds)
2025-08-07 07:15:51,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:16:04,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2726.28564 ± 1011.293
2025-08-07 07:16:04,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3949.4395, 3260.3562, 1371.2396, 1557.3358, 1456.6748, 2616.945, 3964.145, 2103.6724, 2977.6084, 4005.4402]
2025-08-07 07:16:04,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:16:04,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 51 seconds)
2025-08-07 07:17:53,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:18:06,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2920.94824 ± 1082.394
2025-08-07 07:18:06,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1776.1113, 1997.2814, 3609.215, 1261.4836, 4017.4807, 3970.7075, 2556.1777, 1856.9567, 4253.3145, 3910.7544]
2025-08-07 07:18:06,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:18:06,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 42 seconds)
2025-08-07 07:19:54,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:20:07,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3487.61182 ± 1001.380
2025-08-07 07:20:07,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1294.9379, 1774.7174, 3901.614, 4135.8877, 4043.6492, 3908.7092, 4233.646, 3469.931, 4014.9216, 4098.0996]
2025-08-07 07:20:07,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:20:07,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (3487.61) for latency ExtremeClogL1U23
2025-08-07 07:20:07,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 34 seconds)
2025-08-07 07:21:58,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:22:11,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2997.04248 ± 1093.108
2025-08-07 07:22:11,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4168.681, 4098.3584, 4363.7676, 1486.0018, 4099.013, 1404.6808, 2275.9795, 3355.1484, 2378.474, 2340.3186]
2025-08-07 07:22:11,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:22:11,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 27 seconds)
2025-08-07 07:23:59,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:24:12,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2951.76392 ± 1128.865
2025-08-07 07:24:12,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4146.458, 2223.784, 4305.7554, 1754.429, 4069.8489, 2298.7864, 3900.333, 1423.9923, 1583.7607, 3810.4895]
2025-08-07 07:24:12,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:24:12,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 20 seconds)
2025-08-07 07:26:00,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:13,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1914.80737 ± 1009.272
2025-08-07 07:26:13,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2229.9744, 1331.6129, 1866.8434, 1680.5626, 3331.349, 1227.635, 1474.858, 3900.7869, 161.96527, 1942.4875]
2025-08-07 07:26:13,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:26:13,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 14 seconds)
2025-08-07 07:28:00,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:13,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2643.71411 ± 1252.176
2025-08-07 07:28:13,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1594.7979, 3625.7102, 1450.6177, 4067.2617, 1548.0751, 2631.932, 1386.602, 4376.641, 1416.34, 4339.165]
2025-08-07 07:28:13,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:28:13,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 10 seconds)
2025-08-07 07:30:01,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:30:14,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1864.82300 ± 725.036
2025-08-07 07:30:14,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3675.6257, 1639.6534, 1259.3002, 1881.4163, 2648.3735, 1326.0485, 1973.8164, 1568.1991, 1357.0801, 1318.7168]
2025-08-07 07:30:14,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:30:14,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 7 seconds)
2025-08-07 07:32:02,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:32:16,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2759.07031 ± 1289.710
2025-08-07 07:32:16,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4433.566, 1225.1711, 4000.564, 4368.7354, 4185.344, 2638.328, 1221.1833, 1411.5973, 2296.0442, 1810.1703]
2025-08-07 07:32:16,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:32:16,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 5 seconds)
2025-08-07 07:34:05,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:17,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2643.57349 ± 1092.235
2025-08-07 07:34:17,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1435.3519, 2275.5332, 3647.9238, 1712.623, 4286.301, 4213.277, 2762.0767, 1379.1609, 3234.9458, 1488.5422]
2025-08-07 07:34:17,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:34:17,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 4 seconds)
2025-08-07 07:36:08,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:22,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3041.35693 ± 1201.323
2025-08-07 07:36:22,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4142.067, 4001.555, 1186.9554, 1435.0134, 3659.3252, 4116.81, 4011.9556, 2704.934, 1301.7456, 3853.2063]
2025-08-07 07:36:22,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:36:22,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 5 seconds)
2025-08-07 07:38:11,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:38:24,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3488.61060 ± 1055.041
2025-08-07 07:38:24,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2320.4558, 1495.4198, 4432.19, 3747.6545, 4369.3906, 4443.6714, 4072.6743, 4358.674, 2060.0593, 3585.9172]
2025-08-07 07:38:24,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:38:24,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (3488.61) for latency ExtremeClogL1U23
2025-08-07 07:38:24,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 4 seconds)
2025-08-07 07:40:11,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:40:24,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2941.50195 ± 1103.400
2025-08-07 07:40:24,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4161.309, 2803.1665, 1235.2924, 2124.31, 4152.493, 2503.844, 2891.9526, 3983.0325, 1315.0005, 4244.6187]
2025-08-07 07:40:24,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:40:24,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 1 second)
2025-08-07 07:42:12,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:25,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4246.23145 ± 711.698
2025-08-07 07:42:25,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4366.3267, 4560.0996, 2146.958, 4394.938, 4501.2812, 4333.7085, 4741.595, 4295.157, 4514.5415, 4607.7065]
2025-08-07 07:42:25,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:42:25,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (4246.23) for latency ExtremeClogL1U23
2025-08-07 07:42:25,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1251 [DEBUG]: Training session finished
