2025-08-07 05:42:27,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc20-halfcheetah/ExtremeClogL1U23-bpql-mem24
2025-08-07 05:42:27,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc20-halfcheetah/ExtremeClogL1U23-bpql-mem24
2025-08-07 05:42:27,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1458e4691650>}
2025-08-07 05:42:27,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 05:42:27,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 05:42:27,706 baseline-bpql-noiseperc20-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 05:42:27,706 baseline-bpql-noiseperc20-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 05:42:28,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 05:42:28,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 05:44:07,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:44:20,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -324.99283 ± 49.539
2025-08-07 05:44:20,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-306.6788, -284.04214, -321.09235, -300.91498, -353.2842, -459.91367, -302.81058, -295.04242, -288.01303, -338.13608]
2025-08-07 05:44:20,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:44:20,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-324.99) for latency ExtremeClogL1U23
2025-08-07 05:44:20,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 3 minutes, 33 seconds)
2025-08-07 05:46:03,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:46:16,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -283.63153 ± 60.212
2025-08-07 05:46:16,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-320.49835, -345.92285, -236.7937, -418.67612, -277.10037, -263.04022, -218.6243, -216.29617, -247.6778, -291.6855]
2025-08-07 05:46:16,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:46:16,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-283.63) for latency ExtremeClogL1U23
2025-08-07 05:46:16,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 6 minutes, 9 seconds)
2025-08-07 05:48:00,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:48:13,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -224.49594 ± 31.015
2025-08-07 05:48:13,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-222.8925, -212.33023, -277.94534, -201.02216, -175.93237, -256.33176, -230.07072, -264.20245, -195.8226, -208.40938]
2025-08-07 05:48:13,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:48:13,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-224.50) for latency ExtremeClogL1U23
2025-08-07 05:48:13,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 5 minutes, 49 seconds)
2025-08-07 05:49:57,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:50:10,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -63.62868 ± 137.704
2025-08-07 05:50:10,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-67.965065, -90.128784, 134.47998, 50.138878, -61.26655, -354.06195, 90.51503, 5.34916, -156.0208, -187.32677]
2025-08-07 05:50:10,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:50:10,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-63.63) for latency ExtremeClogL1U23
2025-08-07 05:50:10,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 4 minutes, 39 seconds)
2025-08-07 05:51:54,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:52:07,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 177.81535 ± 72.974
2025-08-07 05:52:07,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [138.53128, 136.10023, 191.96112, 41.348515, 208.76945, 312.96762, 267.40744, 158.56317, 122.67951, 199.82524]
2025-08-07 05:52:07,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:52:07,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (177.82) for latency ExtremeClogL1U23
2025-08-07 05:52:07,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 3 minutes, 13 seconds)
2025-08-07 05:53:51,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:54:04,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 172.33583 ± 84.148
2025-08-07 05:54:04,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [207.81863, 15.561955, 224.74963, 262.30383, 180.5833, 198.428, 256.04968, 76.00468, 240.86575, 60.99285]
2025-08-07 05:54:04,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:54:04,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 3 minutes, 2 seconds)
2025-08-07 05:55:47,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:56:00,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 266.96841 ± 128.616
2025-08-07 05:56:00,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [40.71011, 371.5542, 321.35825, 273.7203, 444.13303, 320.78854, 154.22188, 61.45793, 349.03882, 332.70105]
2025-08-07 05:56:00,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:56:00,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (266.97) for latency ExtremeClogL1U23
2025-08-07 05:56:01,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 1 minute, 5 seconds)
2025-08-07 05:57:44,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:57:57,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 324.75165 ± 264.567
2025-08-07 05:57:57,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [493.51312, 574.6683, 584.81366, 425.2941, 11.255405, 218.78073, 108.01534, 501.82236, 546.9001, -217.54666]
2025-08-07 05:57:57,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:57:57,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (324.75) for latency ExtremeClogL1U23
2025-08-07 05:57:57,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 59 minutes, 7 seconds)
2025-08-07 05:59:41,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:59:54,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 440.27875 ± 207.433
2025-08-07 05:59:54,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [240.16188, 739.5557, 598.9788, 285.2113, 198.41644, 505.96222, 715.7367, 586.6065, 389.86838, 142.28946]
2025-08-07 05:59:54,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:59:54,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (440.28) for latency ExtremeClogL1U23
2025-08-07 05:59:54,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 57 minutes, 12 seconds)
2025-08-07 06:01:38,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:01:51,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 350.50626 ± 241.069
2025-08-07 06:01:51,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-10.753236, 446.312, 526.1306, 681.0638, 270.64148, -127.63092, 540.85443, 402.5349, 277.03452, 498.87488]
2025-08-07 06:01:51,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:01:51,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 55 minutes, 11 seconds)
2025-08-07 06:03:35,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:03:48,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 604.99176 ± 197.078
2025-08-07 06:03:48,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [706.3156, 519.1746, 379.5714, 308.7311, 922.8712, 577.29333, 466.70554, 535.575, 908.12476, 725.5553]
2025-08-07 06:03:48,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:03:48,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (604.99) for latency ExtremeClogL1U23
2025-08-07 06:03:48,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 53 minutes, 18 seconds)
2025-08-07 06:05:32,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:05:45,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 568.83331 ± 55.655
2025-08-07 06:05:45,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [504.2662, 578.8868, 523.0654, 564.13275, 551.8925, 598.4273, 525.4255, 707.3117, 533.1004, 601.82416]
2025-08-07 06:05:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:05:45,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 51 minutes, 23 seconds)
2025-08-07 06:07:29,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:07:41,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 624.19171 ± 154.546
2025-08-07 06:07:41,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [571.53265, 637.70233, 796.8554, 721.4667, 669.3285, 321.7855, 862.3689, 429.54883, 550.98645, 680.34155]
2025-08-07 06:07:41,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:07:41,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (624.19) for latency ExtremeClogL1U23
2025-08-07 06:07:41,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 49 minutes, 24 seconds)
2025-08-07 06:09:25,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:09:38,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 782.71265 ± 125.067
2025-08-07 06:09:38,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [554.0249, 776.8831, 675.1067, 623.3874, 821.5868, 932.4017, 816.943, 966.9393, 872.5594, 787.29407]
2025-08-07 06:09:38,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:09:38,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (782.71) for latency ExtremeClogL1U23
2025-08-07 06:09:38,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 47 minutes, 29 seconds)
2025-08-07 06:11:22,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:11:35,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 703.22278 ± 187.767
2025-08-07 06:11:35,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [906.02814, 541.88794, 850.0124, 774.01953, 295.5225, 845.0349, 484.26944, 743.09827, 742.7779, 849.57715]
2025-08-07 06:11:35,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:11:35,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 45 minutes, 34 seconds)
2025-08-07 06:13:19,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:13:32,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 600.25500 ± 355.391
2025-08-07 06:13:32,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [873.3695, 708.30005, 831.74677, 754.2851, 813.913, 774.2022, 819.4855, -197.14981, 606.87897, 17.518343]
2025-08-07 06:13:32,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:13:32,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 43 minutes, 32 seconds)
2025-08-07 06:15:16,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:15:29,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 799.47961 ± 225.654
2025-08-07 06:15:29,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [934.1846, 999.53345, 723.11017, 812.02094, 1059.4458, 721.34283, 945.7688, 238.21599, 911.3926, 649.7806]
2025-08-07 06:15:29,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:15:29,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (799.48) for latency ExtremeClogL1U23
2025-08-07 06:15:29,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 41 minutes, 36 seconds)
2025-08-07 06:17:12,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:17:26,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 714.25519 ± 176.203
2025-08-07 06:17:26,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [570.0575, 917.75195, 687.07715, 586.42444, 739.7154, 663.8945, 810.4882, 814.4019, 997.18665, 355.55396]
2025-08-07 06:17:26,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:17:26,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 39 minutes, 39 seconds)
2025-08-07 06:19:08,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:19:22,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 899.95056 ± 294.621
2025-08-07 06:19:22,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1206.7006, 758.7533, 1020.33984, 976.8303, 998.8814, 77.394844, 1046.9948, 1028.9496, 890.65765, 994.00336]
2025-08-07 06:19:22,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:19:22,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (899.95) for latency ExtremeClogL1U23
2025-08-07 06:19:22,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 37 minutes, 26 seconds)
2025-08-07 06:21:04,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:21:18,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 715.40002 ± 115.378
2025-08-07 06:21:18,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [875.49365, 888.30414, 857.6511, 628.21747, 658.38617, 726.89856, 665.53894, 594.51306, 714.7016, 544.2952]
2025-08-07 06:21:18,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:21:18,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 35 minutes, 17 seconds)
2025-08-07 06:23:01,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:23:14,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 904.62268 ± 129.364
2025-08-07 06:23:14,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [670.7961, 1060.1688, 1029.8708, 990.94745, 854.26306, 831.38904, 866.0597, 1056.0042, 730.3716, 956.3559]
2025-08-07 06:23:14,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:23:14,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (904.62) for latency ExtremeClogL1U23
2025-08-07 06:23:14,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 33 minutes, 8 seconds)
2025-08-07 06:24:57,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:25:10,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1085.65784 ± 223.159
2025-08-07 06:25:10,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [932.96, 980.0568, 988.82306, 1176.6414, 909.0749, 1217.4349, 1130.9955, 810.32355, 1059.9833, 1650.2856]
2025-08-07 06:25:10,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:25:10,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1085.66) for latency ExtremeClogL1U23
2025-08-07 06:25:10,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 31 minutes, 2 seconds)
2025-08-07 06:26:53,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:27:06,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1035.92407 ± 338.348
2025-08-07 06:27:06,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1350.0549, 1018.4617, 101.08261, 1178.5079, 1143.589, 1212.618, 1104.2332, 1331.4945, 1009.8895, 909.309]
2025-08-07 06:27:06,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:27:06,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 28 minutes, 55 seconds)
2025-08-07 06:28:49,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:29:02,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 836.17737 ± 361.540
2025-08-07 06:29:02,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [969.8242, 967.37427, 968.6009, 895.7008, -218.93137, 1164.2142, 996.07513, 889.06866, 858.7267, 871.1199]
2025-08-07 06:29:02,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:29:02,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 26 minutes, 58 seconds)
2025-08-07 06:30:45,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:30:58,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1127.07080 ± 149.704
2025-08-07 06:30:58,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1122.5917, 1277.1312, 956.21295, 917.48737, 1128.5406, 1245.639, 1093.8922, 1126.2161, 975.4284, 1427.5686]
2025-08-07 06:30:58,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:30:58,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1127.07) for latency ExtremeClogL1U23
2025-08-07 06:30:58,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 24 minutes, 59 seconds)
2025-08-07 06:32:40,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:32:54,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 943.91095 ± 162.950
2025-08-07 06:32:54,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [998.12335, 893.2162, 1054.9293, 1071.8303, 790.25134, 917.1375, 1224.9698, 729.19745, 1071.6486, 687.8052]
2025-08-07 06:32:54,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:32:54,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 23 minutes, 3 seconds)
2025-08-07 06:34:36,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:34:49,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1019.75232 ± 126.987
2025-08-07 06:34:49,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1075.4948, 898.0623, 937.2048, 1028.011, 882.5685, 1208.8217, 1283.482, 979.00214, 980.8579, 924.01984]
2025-08-07 06:34:49,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:34:49,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 21 minutes, 2 seconds)
2025-08-07 06:36:32,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:36:46,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 956.77441 ± 374.555
2025-08-07 06:36:46,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1108.7797, 1021.4767, 1105.6378, -58.281086, 1392.2324, 843.40753, 782.9347, 1074.1057, 1187.2738, 1110.1774]
2025-08-07 06:36:46,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:36:46,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 19 minutes, 8 seconds)
2025-08-07 06:38:28,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:38:42,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1088.47485 ± 147.514
2025-08-07 06:38:42,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [910.96106, 1157.3138, 1250.703, 1214.3933, 1081.5095, 1119.4877, 961.23553, 782.44, 1172.2837, 1234.4204]
2025-08-07 06:38:42,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:38:42,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 17 minutes, 13 seconds)
2025-08-07 06:40:24,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:40:38,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1104.77075 ± 78.821
2025-08-07 06:40:38,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1065.3014, 1011.3413, 1124.1797, 975.1987, 1081.3899, 1172.5624, 1267.2877, 1125.1599, 1148.5717, 1076.716]
2025-08-07 06:40:38,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:40:38,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 15 minutes, 18 seconds)
2025-08-07 06:42:21,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:42:34,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1137.38599 ± 176.581
2025-08-07 06:42:34,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1339.5851, 991.08203, 1132.7871, 1210.8022, 1081.8961, 1314.1625, 1062.4218, 728.7554, 1323.0984, 1189.2694]
2025-08-07 06:42:34,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:42:34,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1137.39) for latency ExtremeClogL1U23
2025-08-07 06:42:34,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 13 minutes, 23 seconds)
2025-08-07 06:44:17,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:44:30,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 973.51495 ± 173.035
2025-08-07 06:44:30,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [976.00256, 996.76685, 886.6, 1184.388, 1194.3082, 978.4428, 730.7714, 1155.3662, 648.00806, 984.49506]
2025-08-07 06:44:30,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:44:30,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 11 minutes, 31 seconds)
2025-08-07 06:46:13,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:46:26,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1002.40674 ± 153.554
2025-08-07 06:46:26,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1060.428, 929.6956, 742.07855, 911.2909, 1132.9572, 1012.2695, 1204.1025, 802.1718, 1232.861, 996.2121]
2025-08-07 06:46:26,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:46:26,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 9 minutes, 33 seconds)
2025-08-07 06:48:09,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:48:22,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 775.68799 ± 115.317
2025-08-07 06:48:22,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [860.9642, 687.3857, 923.48566, 804.95355, 514.81903, 674.01825, 827.31885, 857.8942, 857.3511, 748.6895]
2025-08-07 06:48:22,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:48:22,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 7 minutes, 38 seconds)
2025-08-07 06:50:05,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:50:18,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 977.70813 ± 226.700
2025-08-07 06:50:18,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1076.9597, 1211.1183, 1178.5317, 582.06433, 1019.79065, 1110.4989, 1008.75995, 1155.0913, 536.7161, 897.5504]
2025-08-07 06:50:18,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:50:18,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 5 minutes, 42 seconds)
2025-08-07 06:52:01,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:52:14,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 987.03632 ± 232.657
2025-08-07 06:52:14,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [762.19965, 969.24, 957.5269, 1315.8037, 957.1217, 1186.3405, 1289.9883, 490.6935, 1003.75024, 937.6995]
2025-08-07 06:52:14,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:52:14,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 3 minutes, 43 seconds)
2025-08-07 06:53:56,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:54:09,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1060.47339 ± 192.923
2025-08-07 06:54:09,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [749.4472, 1046.6375, 1161.4857, 1030.1896, 1192.0972, 1087.9507, 683.1927, 1078.0555, 1284.4043, 1291.273]
2025-08-07 06:54:09,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:54:09,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 1 minute, 44 seconds)
2025-08-07 06:55:52,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:56:05,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1264.46289 ± 124.176
2025-08-07 06:56:05,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1200.2632, 1535.3324, 1294.2571, 1174.4385, 1138.8591, 1230.8939, 1133.3783, 1183.5651, 1407.1063, 1346.5355]
2025-08-07 06:56:05,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:56:05,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1264.46) for latency ExtremeClogL1U23
2025-08-07 06:56:05,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 59 minutes, 49 seconds)
2025-08-07 06:57:48,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:58:01,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1279.94971 ± 144.850
2025-08-07 06:58:01,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1424.5374, 1477.954, 1256.5892, 1317.9768, 1361.1974, 1239.9836, 1207.5314, 1074.8796, 1013.40344, 1425.444]
2025-08-07 06:58:01,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:58:01,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1279.95) for latency ExtremeClogL1U23
2025-08-07 06:58:01,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 57 minutes, 52 seconds)
2025-08-07 06:59:44,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:59:58,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1272.19702 ± 211.940
2025-08-07 06:59:58,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1126.2559, 1178.277, 1146.6799, 1049.5603, 1270.206, 1566.7013, 1670.0321, 1458.0515, 1255.9033, 1000.3027]
2025-08-07 06:59:58,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:59:58,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 55 minutes, 57 seconds)
2025-08-07 07:01:41,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:01:54,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1259.87915 ± 191.225
2025-08-07 07:01:54,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1395.9843, 1561.5294, 1290.2003, 1129.7085, 837.66473, 1251.6621, 1348.4531, 1144.96, 1193.1395, 1445.4891]
2025-08-07 07:01:54,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:01:54,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 54 minutes, 4 seconds)
2025-08-07 07:03:36,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:03:50,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1168.26917 ± 108.226
2025-08-07 07:03:50,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1199.5542, 1219.2572, 1231.2888, 989.04504, 1163.3593, 1274.0936, 939.35785, 1191.5461, 1194.7013, 1280.4879]
2025-08-07 07:03:50,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:03:50,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 52 minutes, 8 seconds)
2025-08-07 07:05:32,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:05:45,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1189.94250 ± 182.930
2025-08-07 07:05:45,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1344.6937, 977.46344, 809.07324, 1344.4878, 1062.8489, 1260.3718, 1283.6871, 1109.6335, 1295.7716, 1411.3937]
2025-08-07 07:05:45,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:05:45,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 50 minutes, 11 seconds)
2025-08-07 07:07:28,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:41,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1289.11035 ± 215.610
2025-08-07 07:07:41,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1209.8584, 1295.9812, 1489.667, 1273.4604, 1268.5706, 1339.157, 729.8149, 1539.2388, 1486.2775, 1259.0784]
2025-08-07 07:07:41,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:07:41,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1289.11) for latency ExtremeClogL1U23
2025-08-07 07:07:41,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 48 minutes, 13 seconds)
2025-08-07 07:09:24,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:09:37,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1136.46912 ± 218.451
2025-08-07 07:09:37,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1220.2941, 1089.5974, 687.11053, 1319.0009, 1236.2264, 1431.038, 823.413, 1258.088, 1256.8821, 1043.0411]
2025-08-07 07:09:37,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:09:37,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 46 minutes, 15 seconds)
2025-08-07 07:11:20,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:33,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1328.89600 ± 83.507
2025-08-07 07:11:33,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1485.2446, 1231.874, 1331.2253, 1244.5911, 1234.6234, 1390.5608, 1423.7488, 1371.1633, 1319.096, 1256.8329]
2025-08-07 07:11:33,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:11:33,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1328.90) for latency ExtremeClogL1U23
2025-08-07 07:11:33,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 44 minutes, 17 seconds)
2025-08-07 07:13:16,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:13:29,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1303.21533 ± 134.347
2025-08-07 07:13:29,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1388.2136, 1018.14716, 1249.4478, 1277.8575, 1463.8638, 1225.9481, 1530.1678, 1281.2814, 1250.3431, 1346.8829]
2025-08-07 07:13:29,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:13:29,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 42 minutes, 21 seconds)
2025-08-07 07:15:12,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:25,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1300.48474 ± 83.116
2025-08-07 07:15:25,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1426.9946, 1177.4323, 1362.6409, 1260.7086, 1192.8584, 1394.1936, 1367.8706, 1213.8661, 1298.8458, 1309.436]
2025-08-07 07:15:25,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:15:25,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 40 minutes, 25 seconds)
2025-08-07 07:17:07,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:17:21,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1199.41187 ± 103.843
2025-08-07 07:17:21,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1308.4679, 1191.3499, 1142.4285, 1184.9869, 1047.7258, 1219.2292, 1295.1741, 1078.0299, 1127.4663, 1399.2598]
2025-08-07 07:17:21,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:17:21,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 38 minutes, 28 seconds)
2025-08-07 07:19:04,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:19:17,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1291.52026 ± 82.362
2025-08-07 07:19:17,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1250.6943, 1269.6375, 1314.2365, 1139.2239, 1211.8472, 1265.0739, 1432.0001, 1385.1445, 1370.3828, 1276.9618]
2025-08-07 07:19:17,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:19:17,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 36 minutes, 35 seconds)
2025-08-07 07:21:00,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:21:13,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1273.08484 ± 218.141
2025-08-07 07:21:13,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1281.3066, 941.36475, 1254.854, 1468.1174, 863.1191, 1340.1213, 1484.2804, 1483.8661, 1485.0845, 1128.7357]
2025-08-07 07:21:13,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:21:13,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 34 minutes, 41 seconds)
2025-08-07 07:22:56,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:23:09,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1245.98657 ± 129.707
2025-08-07 07:23:09,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1085.3351, 1266.1625, 1190.839, 1396.5563, 1440.5452, 1055.8545, 1187.5558, 1337.6677, 1373.0874, 1126.2625]
2025-08-07 07:23:09,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:23:09,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 32 minutes, 46 seconds)
2025-08-07 07:24:52,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:25:05,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1262.23840 ± 135.780
2025-08-07 07:25:05,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1382.2572, 1309.655, 1410.0233, 1247.7812, 1054.9923, 1121.1943, 1098.8303, 1195.4836, 1482.0149, 1320.1527]
2025-08-07 07:25:05,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:25:05,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 30 minutes, 50 seconds)
2025-08-07 07:26:47,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:27:00,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1280.72974 ± 276.891
2025-08-07 07:27:00,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1671.1713, 1006.5103, 1517.8593, 1105.2045, 1031.716, 1508.834, 1478.3707, 842.3137, 1095.5238, 1549.7935]
2025-08-07 07:27:00,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:27:00,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 28 minutes, 52 seconds)
2025-08-07 07:28:42,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:55,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1196.49780 ± 114.900
2025-08-07 07:28:55,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1310.2896, 1296.6669, 1038.9014, 1274.9421, 1041.5135, 1091.8007, 1363.7118, 1278.6589, 1151.6333, 1116.8618]
2025-08-07 07:28:55,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:28:55,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 26 minutes, 43 seconds)
2025-08-07 07:30:37,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:30:50,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1394.96362 ± 174.774
2025-08-07 07:30:50,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1192.4978, 1364.459, 1345.9375, 1493.6783, 1308.0332, 1395.072, 1549.7148, 1515.7438, 1071.8746, 1712.6254]
2025-08-07 07:30:50,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:30:50,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1394.96) for latency ExtremeClogL1U23
2025-08-07 07:30:50,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 24 minutes, 35 seconds)
2025-08-07 07:32:31,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:32:44,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1363.23657 ± 73.446
2025-08-07 07:32:44,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1287.2626, 1381.4692, 1473.4658, 1261.2435, 1277.3687, 1461.6029, 1309.6345, 1380.6047, 1365.3696, 1434.3441]
2025-08-07 07:32:44,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:32:44,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 22 minutes, 23 seconds)
2025-08-07 07:34:25,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:37,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1317.89099 ± 152.811
2025-08-07 07:34:37,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1251.5696, 1334.7534, 1067.2932, 1263.9032, 1360.959, 1710.5641, 1322.0684, 1293.9231, 1334.6945, 1239.182]
2025-08-07 07:34:37,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:34:37,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 20 minutes, 10 seconds)
2025-08-07 07:36:18,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:31,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1278.13489 ± 205.110
2025-08-07 07:36:31,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1281.8574, 1042.0673, 1523.3536, 1242.5403, 1381.6421, 816.0248, 1330.6432, 1550.9491, 1289.6143, 1322.6562]
2025-08-07 07:36:31,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:36:31,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 18 minutes, 3 seconds)
2025-08-07 07:38:12,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:38:25,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1265.11450 ± 97.929
2025-08-07 07:38:25,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1172.6165, 1279.7062, 1389.2408, 1277.532, 1194.0585, 1278.2632, 1139.5228, 1282.503, 1468.01, 1169.6909]
2025-08-07 07:38:25,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:38:25,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 16 minutes, 1 second)
2025-08-07 07:40:06,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:40:19,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1241.83423 ± 128.528
2025-08-07 07:40:19,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1196.3164, 1311.8821, 1206.9203, 1295.8536, 1329.248, 1213.1725, 1208.8032, 1196.2188, 1496.5754, 963.3525]
2025-08-07 07:40:19,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:40:19,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 14 minutes, 2 seconds)
2025-08-07 07:42:00,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:13,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1352.31763 ± 97.548
2025-08-07 07:42:13,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1371.0612, 1518.8044, 1191.753, 1272.0244, 1402.5951, 1244.5685, 1476.3893, 1289.3009, 1379.5027, 1377.1763]
2025-08-07 07:42:13,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:42:13,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 12 minutes, 6 seconds)
2025-08-07 07:43:54,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:44:07,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1445.20532 ± 197.796
2025-08-07 07:44:07,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1376.0131, 1373.4337, 1839.4724, 1495.1455, 1541.8068, 1148.806, 1195.0443, 1585.6882, 1596.3635, 1300.2804]
2025-08-07 07:44:07,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:44:07,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1445.21) for latency ExtremeClogL1U23
2025-08-07 07:44:07,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 10 minutes, 13 seconds)
2025-08-07 07:45:48,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:46:01,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1330.62451 ± 136.215
2025-08-07 07:46:01,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1452.3806, 1279.051, 1323.6552, 1024.5771, 1392.347, 1253.1476, 1382.9661, 1576.5276, 1327.2506, 1294.3419]
2025-08-07 07:46:01,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:46:01,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 8 minutes, 20 seconds)
2025-08-07 07:47:42,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:54,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1447.28943 ± 193.825
2025-08-07 07:47:54,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1346.2234, 1676.7642, 1513.0033, 1146.8086, 1609.1237, 1680.51, 1183.5703, 1584.2582, 1500.8492, 1231.7834]
2025-08-07 07:47:54,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:47:54,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1447.29) for latency ExtremeClogL1U23
2025-08-07 07:47:54,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 6 minutes, 25 seconds)
2025-08-07 07:49:35,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:49:48,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1322.18347 ± 143.440
2025-08-07 07:49:48,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1335.2211, 1204.1526, 1295.1932, 1503.3678, 1392.2112, 1457.4159, 1470.8055, 992.00836, 1309.6039, 1261.8564]
2025-08-07 07:49:48,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:49:48,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 4 minutes, 29 seconds)
2025-08-07 07:51:29,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:51:42,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1315.67871 ± 95.288
2025-08-07 07:51:42,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1336.4325, 1467.0386, 1143.6631, 1165.5339, 1298.9756, 1407.423, 1352.642, 1384.7826, 1298.65, 1301.6456]
2025-08-07 07:51:42,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:51:42,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 2 minutes, 33 seconds)
2025-08-07 07:53:22,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:53:35,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1495.17810 ± 128.464
2025-08-07 07:53:35,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1436.0485, 1459.853, 1656.0784, 1601.493, 1684.9026, 1510.4602, 1462.625, 1222.4597, 1392.588, 1525.273]
2025-08-07 07:53:35,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:53:35,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1495.18) for latency ExtremeClogL1U23
2025-08-07 07:53:35,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 35 seconds)
2025-08-07 07:55:15,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:55:28,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1470.69775 ± 117.791
2025-08-07 07:55:28,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1686.2913, 1465.475, 1567.0784, 1346.6816, 1483.7133, 1499.2317, 1514.6743, 1217.5999, 1447.6461, 1478.586]
2025-08-07 07:55:28,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:55:28,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 58 minutes, 37 seconds)
2025-08-07 07:57:09,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:21,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1482.06177 ± 132.199
2025-08-07 07:57:21,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1353.8807, 1470.0717, 1597.8264, 1361.0491, 1651.3035, 1229.3406, 1509.2972, 1466.7079, 1671.5339, 1509.6052]
2025-08-07 07:57:21,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:57:21,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 56 minutes, 41 seconds)
2025-08-07 07:59:02,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:15,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1407.01453 ± 115.369
2025-08-07 07:59:15,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1512.0166, 1482.3975, 1388.891, 1577.4119, 1378.9785, 1249.8665, 1267.2981, 1520.8029, 1242.0986, 1450.3839]
2025-08-07 07:59:15,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:59:15,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 54 minutes, 47 seconds)
2025-08-07 08:00:56,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:08,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1477.19922 ± 140.442
2025-08-07 08:01:08,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1398.7633, 1373.8436, 1173.0763, 1733.4447, 1479.0004, 1594.7295, 1558.8885, 1504.174, 1455.3048, 1500.7678]
2025-08-07 08:01:08,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:01:08,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 52 minutes, 54 seconds)
2025-08-07 08:02:49,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:03:02,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1493.34827 ± 92.102
2025-08-07 08:03:02,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1482.1227, 1558.1691, 1579.3202, 1281.2799, 1576.9792, 1531.3041, 1557.8143, 1422.2306, 1541.3638, 1402.8982]
2025-08-07 08:03:02,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:03:02,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 51 minutes, 1 second)
2025-08-07 08:04:42,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:55,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1588.28247 ± 129.386
2025-08-07 08:04:55,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1514.2781, 1660.193, 1833.326, 1457.6161, 1645.799, 1618.4257, 1613.6826, 1410.335, 1709.48, 1419.69]
2025-08-07 08:04:55,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:04:55,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1588.28) for latency ExtremeClogL1U23
2025-08-07 08:04:55,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 49 minutes, 8 seconds)
2025-08-07 08:06:36,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:06:49,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1550.10364 ± 205.950
2025-08-07 08:06:49,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1025.164, 1602.1926, 1414.5487, 1632.5411, 1642.8328, 1659.4109, 1524.1914, 1863.3436, 1587.1345, 1549.6771]
2025-08-07 08:06:49,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:06:49,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 47 minutes, 15 seconds)
2025-08-07 08:08:29,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:08:42,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1573.64783 ± 132.479
2025-08-07 08:08:42,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1672.0681, 1775.1713, 1703.2976, 1512.9844, 1590.0438, 1425.5662, 1613.7833, 1342.4783, 1665.4459, 1435.6392]
2025-08-07 08:08:42,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:08:42,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 45 minutes, 21 seconds)
2025-08-07 08:10:22,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:10:35,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1494.53882 ± 92.077
2025-08-07 08:10:35,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1685.8054, 1366.1925, 1367.2034, 1490.496, 1526.2178, 1523.2118, 1546.238, 1417.8007, 1558.8517, 1463.3699]
2025-08-07 08:10:35,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:10:35,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 43 minutes, 26 seconds)
2025-08-07 08:12:16,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:12:29,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1581.09814 ± 168.010
2025-08-07 08:12:29,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1639.21, 1319.3594, 1598.973, 1645.2721, 1502.8732, 1523.1144, 1930.5531, 1755.4092, 1513.658, 1382.5586]
2025-08-07 08:12:29,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:12:29,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 41 minutes, 33 seconds)
2025-08-07 08:14:09,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:14:22,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1531.67200 ± 150.633
2025-08-07 08:14:22,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1380.0238, 1597.451, 1337.6074, 1360.6682, 1653.241, 1576.9899, 1363.4153, 1607.4106, 1646.1471, 1793.7656]
2025-08-07 08:14:22,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:14:22,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 39 minutes, 40 seconds)
2025-08-07 08:16:02,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:16:15,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1531.01880 ± 133.847
2025-08-07 08:16:15,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1553.4935, 1506.7904, 1368.7316, 1678.3337, 1500.4622, 1449.2239, 1443.0688, 1661.0598, 1357.7556, 1791.268]
2025-08-07 08:16:15,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:16:15,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 37 minutes, 46 seconds)
2025-08-07 08:17:56,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:18:08,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1585.89722 ± 133.505
2025-08-07 08:18:08,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1669.9877, 1295.4138, 1751.5775, 1755.764, 1555.9517, 1634.8827, 1601.054, 1629.7761, 1440.1156, 1524.4486]
2025-08-07 08:18:08,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:18:08,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 35 minutes, 53 seconds)
2025-08-07 08:19:49,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:20:02,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1578.79175 ± 104.219
2025-08-07 08:20:02,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1559.0013, 1676.3988, 1429.7241, 1619.3583, 1503.0042, 1493.8287, 1620.6635, 1649.1482, 1457.1063, 1779.6849]
2025-08-07 08:20:02,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:20:02,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes)
2025-08-07 08:21:42,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:21:55,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1725.16870 ± 120.240
2025-08-07 08:21:55,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1917.623, 1489.1887, 1712.4545, 1630.5123, 1851.0703, 1840.5048, 1786.7047, 1645.6377, 1709.5654, 1668.4261]
2025-08-07 08:21:55,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:21:55,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1725.17) for latency ExtremeClogL1U23
2025-08-07 08:21:55,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 6 seconds)
2025-08-07 08:23:35,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:23:48,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1542.09314 ± 135.417
2025-08-07 08:23:48,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1546.213, 1643.7018, 1651.5167, 1682.5338, 1199.7158, 1565.854, 1472.5709, 1479.2178, 1661.2661, 1518.3414]
2025-08-07 08:23:48,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:23:48,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 30 minutes, 12 seconds)
2025-08-07 08:25:29,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:25:42,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1639.05786 ± 157.700
2025-08-07 08:25:42,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1328.5934, 1529.2328, 1841.8685, 1808.4437, 1507.3668, 1600.3679, 1778.9567, 1533.6992, 1686.3485, 1775.6996]
2025-08-07 08:25:42,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:25:42,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 28 minutes, 19 seconds)
2025-08-07 08:27:22,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:27:35,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1708.95337 ± 152.380
2025-08-07 08:27:35,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1851.2516, 1832.1057, 1885.7648, 1607.1602, 1592.3485, 1747.4921, 1766.4895, 1542.3229, 1411.7904, 1852.8073]
2025-08-07 08:27:35,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:27:35,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 26 seconds)
2025-08-07 08:29:15,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:28,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1751.48730 ± 158.714
2025-08-07 08:29:28,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1708.8099, 1832.7108, 1629.2975, 2043.5509, 1539.4224, 1954.7256, 1884.8604, 1601.9319, 1660.202, 1659.3612]
2025-08-07 08:29:28,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:29:28,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1751.49) for latency ExtremeClogL1U23
2025-08-07 08:29:28,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 33 seconds)
2025-08-07 08:31:09,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:31:22,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1749.89490 ± 217.187
2025-08-07 08:31:22,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1690.7114, 1610.5542, 1837.2261, 2229.8696, 1924.103, 1429.0239, 1611.7201, 1911.1196, 1593.1517, 1661.4703]
2025-08-07 08:31:22,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:31:22,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 40 seconds)
2025-08-07 08:33:02,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:15,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1774.28772 ± 112.478
2025-08-07 08:33:15,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1877.4006, 1835.26, 1581.8864, 1743.715, 1548.9738, 1869.8496, 1822.8217, 1849.679, 1760.9395, 1852.352]
2025-08-07 08:33:15,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:33:15,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1774.29) for latency ExtremeClogL1U23
2025-08-07 08:33:15,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 47 seconds)
2025-08-07 08:34:56,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:35:09,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1824.58850 ± 146.637
2025-08-07 08:35:09,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1549.7462, 1849.7383, 2003.0897, 1776.9266, 1739.6678, 1770.7904, 2038.2035, 1895.25, 1958.9377, 1663.5332]
2025-08-07 08:35:09,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:35:09,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1824.59) for latency ExtremeClogL1U23
2025-08-07 08:35:09,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 54 seconds)
2025-08-07 08:36:49,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:37:02,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1799.82190 ± 46.244
2025-08-07 08:37:02,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1803.9728, 1914.4666, 1763.9518, 1806.0425, 1820.5311, 1786.4241, 1809.4105, 1783.4445, 1785.8922, 1724.0822]
2025-08-07 08:37:02,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:37:02,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes)
2025-08-07 08:38:42,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:38:55,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1819.62305 ± 83.966
2025-08-07 08:38:55,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1934.7223, 1846.9999, 1782.5033, 1908.216, 1802.1158, 1879.4434, 1795.8491, 1820.6462, 1812.9525, 1612.7816]
2025-08-07 08:38:55,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:38:55,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 7 seconds)
2025-08-07 08:40:36,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:40:49,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1798.06970 ± 119.123
2025-08-07 08:40:49,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1860.9757, 1858.0568, 1681.103, 1683.641, 1918.885, 1587.7177, 1739.5797, 1998.2626, 1783.5647, 1868.9122]
2025-08-07 08:40:49,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:40:49,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 13 seconds)
2025-08-07 08:42:29,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:42:42,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1797.68750 ± 98.119
2025-08-07 08:42:42,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1849.9341, 1635.3707, 1699.6666, 1932.4564, 1827.9, 1955.1683, 1744.2549, 1816.7239, 1818.1047, 1697.2955]
2025-08-07 08:42:42,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:42:42,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 19 seconds)
2025-08-07 08:44:22,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:44:35,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1811.74585 ± 72.262
2025-08-07 08:44:35,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1798.2634, 1868.224, 1803.3805, 1868.5242, 1846.4258, 1632.2112, 1808.2699, 1913.8595, 1771.0286, 1807.2721]
2025-08-07 08:44:35,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:44:35,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 26 seconds)
2025-08-07 08:46:15,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:46:28,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1899.00903 ± 89.657
2025-08-07 08:46:28,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1826.3889, 1917.7534, 1837.486, 1852.5923, 1987.0471, 2025.392, 2011.2963, 1944.8884, 1857.5364, 1729.7098]
2025-08-07 08:46:28,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:46:28,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1899.01) for latency ExtremeClogL1U23
2025-08-07 08:46:28,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 32 seconds)
2025-08-07 08:48:09,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:48:22,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1803.82446 ± 85.003
2025-08-07 08:48:22,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1690.6592, 1811.4156, 1705.205, 1772.555, 1871.847, 1881.0396, 1748.8785, 1723.5931, 1876.2793, 1956.7723]
2025-08-07 08:48:22,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:48:22,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 39 seconds)
2025-08-07 08:50:02,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:50:15,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1873.11877 ± 110.229
2025-08-07 08:50:15,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1679.2878, 2003.2177, 1934.0203, 1811.9553, 1738.1735, 1884.9863, 1915.0942, 1806.7458, 2058.4812, 1899.2267]
2025-08-07 08:50:15,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:50:15,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 46 seconds)
2025-08-07 08:51:55,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:52:08,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1828.83984 ± 89.660
2025-08-07 08:52:08,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1700.1027, 1898.5714, 1897.5741, 1858.1394, 1866.2991, 1989.4291, 1734.5753, 1722.7806, 1758.6558, 1862.2716]
2025-08-07 08:52:08,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:52:08,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 53 seconds)
2025-08-07 08:53:49,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:54:01,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1856.78589 ± 116.174
2025-08-07 08:54:01,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1689.8494, 1704.8517, 2018.4572, 1876.2992, 1816.283, 1732.142, 1925.2849, 1985.1389, 1827.5139, 1992.0398]
2025-08-07 08:54:01,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:54:01,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1251 [DEBUG]: Training session finished
