2025-08-07 06:16:27,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc0-hopper/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:16:27,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc0-hopper/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:16:27,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1473110bfe10>}
2025-08-07 06:16:27,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 06:16:27,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1133 [INFO]: Creating new trainer
2025-08-07 06:16:27,343 baseline-bpql-noiseperc0-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=83, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 06:16:27,343 baseline-bpql-noiseperc0-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 06:16:28,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 06:16:28,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 06:17:59,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:18:00,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 88.79870 ± 1.538
2025-08-07 06:18:00,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [86.42683, 88.17421, 90.20772, 89.8091, 89.89823, 88.06916, 87.32539, 88.23477, 91.90524, 87.936325]
2025-08-07 06:18:00,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 68.0, 69.0, 69.0, 69.0, 68.0, 68.0, 68.0, 70.0, 68.0]
2025-08-07 06:18:00,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (88.80) for latency ExtremeClogL1U23
2025-08-07 06:18:00,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 32 minutes, 45 seconds)
2025-08-07 06:19:39,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:19:40,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 123.45884 ± 4.665
2025-08-07 06:19:40,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [119.41445, 125.33902, 124.5876, 125.83394, 123.63198, 111.71218, 126.65505, 122.90562, 129.84618, 124.66239]
2025-08-07 06:19:40,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 87.0, 86.0, 86.0, 85.0, 79.0, 87.0, 85.0, 89.0, 86.0]
2025-08-07 06:19:40,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (123.46) for latency ExtremeClogL1U23
2025-08-07 06:19:40,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 36 minutes, 50 seconds)
2025-08-07 06:21:17,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:21:18,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 97.02356 ± 4.094
2025-08-07 06:21:18,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [92.901955, 97.5502, 105.646484, 98.91957, 95.30427, 100.8612, 92.35154, 93.282974, 99.76608, 93.65134]
2025-08-07 06:21:18,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 69.0, 74.0, 70.0, 68.0, 71.0, 66.0, 67.0, 70.0, 67.0]
2025-08-07 06:21:18,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 36 minutes, 32 seconds)
2025-08-07 06:22:57,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:22:59,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 231.61423 ± 41.481
2025-08-07 06:22:59,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [257.32108, 245.98125, 236.84947, 223.4349, 242.36046, 240.47443, 240.90326, 218.44301, 290.0236, 120.35094]
2025-08-07 06:22:59,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 129.0, 129.0, 122.0, 131.0, 127.0, 126.0, 118.0, 143.0, 78.0]
2025-08-07 06:22:59,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (231.61) for latency ExtremeClogL1U23
2025-08-07 06:22:59,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 36 minutes, 22 seconds)
2025-08-07 06:24:38,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:24:39,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 212.92220 ± 48.540
2025-08-07 06:24:39,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [253.51753, 146.47691, 247.60129, 201.16014, 255.26921, 271.66428, 141.0197, 143.95546, 228.14388, 240.41345]
2025-08-07 06:24:39,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 90.0, 128.0, 112.0, 129.0, 136.0, 88.0, 89.0, 121.0, 128.0]
2025-08-07 06:24:39,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 35 minutes, 41 seconds)
2025-08-07 06:26:19,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:26:20,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 89.12083 ± 89.153
2025-08-07 06:26:20,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [44.75321, 81.07936, 43.30694, 44.43883, 43.45809, 43.519398, 80.468895, 351.64542, 80.19426, 78.34381]
2025-08-07 06:26:20,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [50.0, 70.0, 49.0, 50.0, 49.0, 49.0, 70.0, 264.0, 70.0, 69.0]
2025-08-07 06:26:20,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 36 minutes, 25 seconds)
2025-08-07 06:27:58,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:27:59,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 118.22722 ± 68.202
2025-08-07 06:27:59,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [90.07202, 305.4072, 84.51951, 179.40436, 86.88834, 89.83799, 87.244156, 86.5571, 85.40901, 86.932526]
2025-08-07 06:27:59,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 161.0, 62.0, 109.0, 63.0, 64.0, 63.0, 63.0, 62.0, 63.0]
2025-08-07 06:27:59,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 34 minutes, 54 seconds)
2025-08-07 06:29:37,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:29:40,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 272.25049 ± 138.694
2025-08-07 06:29:40,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [520.7227, 395.8558, 209.99611, 111.82525, 167.9956, 162.65158, 188.67938, 445.7527, 146.73462, 372.2913]
2025-08-07 06:29:40,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [285.0, 343.0, 184.0, 78.0, 123.0, 120.0, 169.0, 265.0, 109.0, 264.0]
2025-08-07 06:29:40,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (272.25) for latency ExtremeClogL1U23
2025-08-07 06:29:40,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 33 minutes, 51 seconds)
2025-08-07 06:31:24,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:31:26,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 193.29288 ± 283.863
2025-08-07 06:31:26,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [79.91956, 78.10959, 142.75131, 105.387085, 79.73516, 1041.9951, 139.69444, 79.59359, 82.46646, 103.27663]
2025-08-07 06:31:26,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [57.0, 56.0, 108.0, 78.0, 57.0, 1000.0, 106.0, 57.0, 58.0, 80.0]
2025-08-07 06:31:26,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 33 minutes, 57 seconds)
2025-08-07 06:33:03,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:33:08,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 505.26401 ± 276.177
2025-08-07 06:33:08,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [652.3513, 558.6773, 233.60657, 203.79959, 705.6889, 470.15756, 230.63261, 682.91125, 1092.6624, 222.15305]
2025-08-07 06:33:08,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [461.0, 283.0, 164.0, 165.0, 532.0, 297.0, 158.0, 466.0, 965.0, 167.0]
2025-08-07 06:33:08,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (505.26) for latency ExtremeClogL1U23
2025-08-07 06:33:08,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 32 minutes, 33 seconds)
2025-08-07 06:34:48,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:34:51,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 358.92224 ± 94.818
2025-08-07 06:34:51,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [387.3546, 443.38474, 370.77188, 396.0003, 218.5851, 233.01332, 501.60916, 221.02518, 425.48325, 391.9949]
2025-08-07 06:34:51,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [202.0, 219.0, 202.0, 194.0, 134.0, 144.0, 231.0, 138.0, 211.0, 223.0]
2025-08-07 06:34:51,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 31 minutes, 38 seconds)
2025-08-07 06:36:29,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:36:30,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 289.52530 ± 30.603
2025-08-07 06:36:30,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [313.07608, 292.87234, 302.0789, 316.3828, 317.7363, 250.57297, 304.72482, 308.92657, 266.0633, 222.81873]
2025-08-07 06:36:30,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [148.0, 141.0, 144.0, 146.0, 149.0, 130.0, 142.0, 145.0, 133.0, 119.0]
2025-08-07 06:36:30,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 29 minutes, 52 seconds)
2025-08-07 06:38:09,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:38:11,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 367.42227 ± 86.484
2025-08-07 06:38:11,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [323.5781, 410.21994, 292.4406, 413.37668, 233.02507, 423.32425, 262.36404, 336.8804, 502.5276, 476.48602]
2025-08-07 06:38:11,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [149.0, 174.0, 142.0, 172.0, 125.0, 173.0, 133.0, 151.0, 190.0, 182.0]
2025-08-07 06:38:11,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 28 minutes, 13 seconds)
2025-08-07 06:39:50,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:39:53,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 494.64679 ± 85.821
2025-08-07 06:39:53,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [530.77094, 695.2414, 522.5454, 473.70483, 428.67313, 530.0106, 537.8953, 392.96054, 441.03973, 393.62604]
2025-08-07 06:39:53,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [235.0, 293.0, 262.0, 211.0, 202.0, 240.0, 227.0, 206.0, 202.0, 206.0]
2025-08-07 06:39:53,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 25 minutes, 20 seconds)
2025-08-07 06:41:34,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:41:37,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 515.55188 ± 147.709
2025-08-07 06:41:37,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [607.5024, 347.65555, 496.56693, 597.8674, 795.0642, 361.16388, 540.86804, 596.08606, 550.36914, 262.37488]
2025-08-07 06:41:37,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [280.0, 182.0, 247.0, 265.0, 356.0, 204.0, 252.0, 273.0, 259.0, 155.0]
2025-08-07 06:41:37,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (515.55) for latency ExtremeClogL1U23
2025-08-07 06:41:37,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 24 minutes, 15 seconds)
2025-08-07 06:43:14,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:43:17,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 430.50742 ± 80.132
2025-08-07 06:43:17,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [310.9433, 423.93716, 492.4378, 447.86728, 435.96548, 540.7846, 472.43585, 262.84567, 491.03467, 426.82248]
2025-08-07 06:43:17,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [143.0, 178.0, 204.0, 179.0, 181.0, 213.0, 195.0, 134.0, 195.0, 174.0]
2025-08-07 06:43:17,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 21 minutes, 39 seconds)
2025-08-07 06:44:55,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:44:58,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 418.01245 ± 178.887
2025-08-07 06:44:58,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [261.7163, 675.5464, 822.1649, 427.6423, 285.46713, 400.6679, 298.5227, 304.5469, 429.59552, 274.2545]
2025-08-07 06:44:58,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 302.0, 354.0, 211.0, 151.0, 200.0, 155.0, 160.0, 207.0, 147.0]
2025-08-07 06:44:58,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 20 minutes, 25 seconds)
2025-08-07 06:46:38,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:46:41,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 480.91473 ± 31.476
2025-08-07 06:46:41,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [524.34314, 464.71304, 473.97192, 448.15695, 466.47162, 556.0018, 475.95312, 458.761, 475.56522, 465.20923]
2025-08-07 06:46:41,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [223.0, 197.0, 211.0, 201.0, 215.0, 237.0, 216.0, 213.0, 212.0, 212.0]
2025-08-07 06:46:41,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 19 minutes, 20 seconds)
2025-08-07 06:48:19,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:48:22,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 454.52710 ± 31.487
2025-08-07 06:48:22,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [500.72604, 477.04578, 429.51724, 516.2471, 462.43396, 438.92792, 427.34247, 431.55646, 438.90628, 422.56808]
2025-08-07 06:48:22,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [198.0, 189.0, 181.0, 205.0, 190.0, 186.0, 180.0, 181.0, 182.0, 180.0]
2025-08-07 06:48:22,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 17 minutes, 16 seconds)
2025-08-07 06:50:01,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:50:04,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 442.23404 ± 111.513
2025-08-07 06:50:04,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [631.5667, 440.92575, 466.23322, 285.0437, 299.90417, 493.7507, 542.2681, 282.11575, 485.38025, 495.15176]
2025-08-07 06:50:04,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [251.0, 196.0, 207.0, 152.0, 154.0, 204.0, 226.0, 150.0, 204.0, 218.0]
2025-08-07 06:50:04,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 15 minutes, 5 seconds)
2025-08-07 06:51:44,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:51:47,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 462.31500 ± 72.536
2025-08-07 06:51:47,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [350.73953, 494.7266, 304.99924, 515.1577, 496.45572, 496.9372, 470.7367, 512.9947, 440.12308, 540.2796]
2025-08-07 06:51:47,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 204.0, 157.0, 214.0, 208.0, 208.0, 201.0, 209.0, 191.0, 213.0]
2025-08-07 06:51:47,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 14 minutes, 24 seconds)
2025-08-07 06:53:24,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:53:27,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 481.33823 ± 32.242
2025-08-07 06:53:27,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [509.18402, 515.54175, 435.95847, 502.57285, 441.78632, 511.28815, 451.27402, 505.99292, 440.69568, 499.0878]
2025-08-07 06:53:27,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [198.0, 199.0, 182.0, 196.0, 178.0, 198.0, 183.0, 196.0, 184.0, 195.0]
2025-08-07 06:53:27,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 12 minutes, 18 seconds)
2025-08-07 06:55:07,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:09,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 360.30902 ± 197.664
2025-08-07 06:55:09,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [577.07605, 196.39821, 208.97325, 205.17949, 555.8656, 509.44696, 202.38284, 731.807, 212.72731, 203.23376]
2025-08-07 06:55:09,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [240.0, 107.0, 111.0, 110.0, 222.0, 228.0, 109.0, 305.0, 112.0, 109.0]
2025-08-07 06:55:09,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 10 minutes, 29 seconds)
2025-08-07 06:56:49,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:56:53,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 734.43182 ± 170.716
2025-08-07 06:56:53,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [827.7418, 882.5689, 731.1669, 711.6591, 665.95435, 902.75977, 362.99176, 878.8483, 869.9225, 510.70517]
2025-08-07 06:56:53,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [345.0, 382.0, 342.0, 313.0, 308.0, 359.0, 177.0, 344.0, 412.0, 233.0]
2025-08-07 06:56:53,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (734.43) for latency ExtremeClogL1U23
2025-08-07 06:56:53,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 9 minutes, 36 seconds)
2025-08-07 06:58:32,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:58:35,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 556.33887 ± 175.562
2025-08-07 06:58:35,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [573.4157, 719.23535, 399.715, 611.4514, 438.27377, 291.65128, 526.8093, 918.1928, 405.16574, 679.4787]
2025-08-07 06:58:35,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [247.0, 311.0, 188.0, 278.0, 208.0, 148.0, 260.0, 349.0, 188.0, 270.0]
2025-08-07 06:58:35,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 7 minutes, 47 seconds)
2025-08-07 07:00:15,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:00:18,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 565.99182 ± 216.340
2025-08-07 07:00:18,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [525.6436, 956.12177, 369.24176, 757.7457, 378.24216, 368.9048, 808.4888, 745.6791, 358.77338, 391.0769]
2025-08-07 07:00:18,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [230.0, 421.0, 187.0, 290.0, 189.0, 185.0, 360.0, 309.0, 185.0, 191.0]
2025-08-07 07:00:19,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 6 minutes, 10 seconds)
2025-08-07 07:01:58,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:02:01,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 432.88135 ± 251.913
2025-08-07 07:02:01,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [746.43695, 398.31128, 288.4635, 269.4578, 688.1686, 120.735085, 259.02484, 825.95667, 622.3249, 109.93427]
2025-08-07 07:02:01,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [312.0, 194.0, 149.0, 143.0, 253.0, 82.0, 138.0, 321.0, 253.0, 77.0]
2025-08-07 07:02:01,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 5 minutes, 6 seconds)
2025-08-07 07:03:39,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:03:41,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 382.04337 ± 102.275
2025-08-07 07:03:41,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [468.3254, 294.84464, 531.87396, 293.02213, 307.97046, 500.7703, 302.21588, 522.17224, 299.70114, 299.53738]
2025-08-07 07:03:41,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [207.0, 153.0, 221.0, 152.0, 161.0, 224.0, 161.0, 226.0, 154.0, 154.0]
2025-08-07 07:03:41,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 2 minutes, 52 seconds)
2025-08-07 07:05:25,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:05:29,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 547.01874 ± 238.308
2025-08-07 07:05:29,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [914.0276, 547.34375, 466.64066, 385.61697, 399.20926, 1053.5845, 602.2672, 381.49075, 255.49702, 464.50977]
2025-08-07 07:05:29,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [371.0, 253.0, 199.0, 187.0, 191.0, 444.0, 262.0, 184.0, 137.0, 218.0]
2025-08-07 07:05:29,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 1 minute, 59 seconds)
2025-08-07 07:07:03,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:07,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 659.07684 ± 239.146
2025-08-07 07:07:07,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [495.58487, 984.1217, 576.95886, 318.2813, 315.31198, 582.10834, 727.7214, 713.9809, 1058.4122, 818.2873]
2025-08-07 07:07:07,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [219.0, 387.0, 237.0, 161.0, 155.0, 247.0, 324.0, 314.0, 421.0, 335.0]
2025-08-07 07:07:07,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 59 minutes, 31 seconds)
2025-08-07 07:08:49,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:08:54,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 881.42493 ± 341.889
2025-08-07 07:08:54,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1540.3994, 868.46497, 695.4529, 1047.0365, 1206.442, 353.70108, 695.10126, 553.1308, 1193.3702, 661.1503]
2025-08-07 07:08:54,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [638.0, 369.0, 325.0, 465.0, 544.0, 182.0, 331.0, 262.0, 467.0, 319.0]
2025-08-07 07:08:54,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (881.42) for latency ExtremeClogL1U23
2025-08-07 07:08:54,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 58 minutes, 33 seconds)
2025-08-07 07:10:32,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:10:34,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 296.23181 ± 6.868
2025-08-07 07:10:34,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [300.79535, 296.045, 304.8544, 279.5917, 292.15656, 299.52423, 303.91278, 297.38333, 293.00644, 295.04858]
2025-08-07 07:10:34,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [151.0, 151.0, 154.0, 146.0, 150.0, 149.0, 152.0, 150.0, 149.0, 150.0]
2025-08-07 07:10:34,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 56 minutes, 13 seconds)
2025-08-07 07:12:14,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:12:20,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1123.82214 ± 541.546
2025-08-07 07:12:20,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [731.525, 922.78284, 1272.6698, 2188.2634, 1780.8156, 593.5111, 1474.1992, 482.02676, 1236.8489, 555.5787]
2025-08-07 07:12:20,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [301.0, 419.0, 500.0, 861.0, 658.0, 263.0, 584.0, 221.0, 508.0, 246.0]
2025-08-07 07:12:20,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (1123.82) for latency ExtremeClogL1U23
2025-08-07 07:12:20,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 55 minutes, 52 seconds)
2025-08-07 07:14:01,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:14:03,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 443.03897 ± 387.469
2025-08-07 07:14:03,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [324.21512, 324.49814, 239.97176, 322.9523, 322.77127, 326.24994, 321.37302, 1603.0417, 322.02783, 323.28867]
2025-08-07 07:14:03,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [162.0, 164.0, 128.0, 163.0, 163.0, 161.0, 160.0, 605.0, 163.0, 162.0]
2025-08-07 07:14:03,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 53 minutes, 13 seconds)
2025-08-07 07:15:43,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:50,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1357.23486 ± 620.974
2025-08-07 07:15:50,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2484.1611, 1877.8109, 996.3812, 993.54346, 2447.474, 1104.2048, 889.68414, 1041.6343, 821.2386, 916.2161]
2025-08-07 07:15:50,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [882.0, 689.0, 363.0, 359.0, 1000.0, 384.0, 310.0, 379.0, 314.0, 327.0]
2025-08-07 07:15:50,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (1357.23) for latency ExtremeClogL1U23
2025-08-07 07:15:50,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 53 minutes, 17 seconds)
2025-08-07 07:17:31,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:17:39,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1587.86450 ± 550.359
2025-08-07 07:17:39,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1640.4736, 2582.336, 1432.6115, 1427.4823, 1202.5682, 1510.4583, 2682.1768, 1032.4683, 1204.0189, 1164.0516]
2025-08-07 07:17:39,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [649.0, 901.0, 567.0, 563.0, 498.0, 569.0, 1000.0, 375.0, 483.0, 497.0]
2025-08-07 07:17:39,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (1587.86) for latency ExtremeClogL1U23
2025-08-07 07:17:39,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 51 minutes, 57 seconds)
2025-08-07 07:19:19,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:19:22,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 587.00018 ± 417.115
2025-08-07 07:19:22,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1214.6627, 200.4232, 207.97496, 213.37746, 1012.33386, 210.93555, 1208.3967, 564.5999, 206.49535, 830.8022]
2025-08-07 07:19:22,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [381.0, 109.0, 111.0, 113.0, 319.0, 112.0, 378.0, 213.0, 111.0, 268.0]
2025-08-07 07:19:22,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 50 minutes, 50 seconds)
2025-08-07 07:20:59,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:21:07,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1418.02234 ± 591.942
2025-08-07 07:21:07,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2663.6074, 1168.8262, 472.7207, 1527.8878, 737.2716, 964.2308, 1611.3064, 1618.8105, 1521.0079, 1894.5544]
2025-08-07 07:21:07,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 482.0, 210.0, 595.0, 300.0, 380.0, 599.0, 644.0, 571.0, 651.0]
2025-08-07 07:21:07,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 48 minutes, 47 seconds)
2025-08-07 07:22:49,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:22:56,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1254.17163 ± 719.443
2025-08-07 07:22:56,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [787.1493, 429.47394, 1394.3519, 2636.457, 2275.1384, 399.7666, 944.76166, 668.05084, 1427.0792, 1579.4878]
2025-08-07 07:22:56,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [300.0, 189.0, 550.0, 984.0, 845.0, 180.0, 386.0, 265.0, 473.0, 541.0]
2025-08-07 07:22:56,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 48 minutes, 13 seconds)
2025-08-07 07:24:33,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:24:38,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1054.55786 ± 346.619
2025-08-07 07:24:38,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1616.8097, 1350.6473, 1598.4479, 972.95026, 579.80835, 778.6006, 1124.6488, 736.137, 760.31244, 1027.2151]
2025-08-07 07:24:38,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [522.0, 446.0, 539.0, 304.0, 211.0, 295.0, 342.0, 281.0, 288.0, 370.0]
2025-08-07 07:24:38,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 45 minutes, 30 seconds)
2025-08-07 07:26:22,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:26,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 834.64813 ± 317.088
2025-08-07 07:26:26,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1198.5077, 1188.874, 701.68085, 1508.3715, 626.709, 530.82794, 647.30334, 624.50867, 666.29065, 653.4073]
2025-08-07 07:26:26,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [413.0, 380.0, 269.0, 487.0, 244.0, 213.0, 249.0, 246.0, 254.0, 249.0]
2025-08-07 07:26:26,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 43 minutes, 41 seconds)
2025-08-07 07:28:03,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:09,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1009.58582 ± 406.626
2025-08-07 07:28:09,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1266.4838, 555.92615, 711.9565, 1662.5448, 828.73254, 531.78284, 580.92303, 1568.3967, 1050.9437, 1338.169]
2025-08-07 07:28:09,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [542.0, 243.0, 295.0, 644.0, 375.0, 241.0, 257.0, 590.0, 407.0, 505.0]
2025-08-07 07:28:09,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 41 minutes, 53 seconds)
2025-08-07 07:29:49,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:29:55,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1445.70422 ± 658.333
2025-08-07 07:29:55,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1392.8818, 3262.319, 903.9374, 1193.5522, 1706.7074, 819.5566, 1468.8868, 1285.1034, 1409.966, 1014.1309]
2025-08-07 07:29:55,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [447.0, 1000.0, 301.0, 390.0, 540.0, 280.0, 467.0, 405.0, 448.0, 336.0]
2025-08-07 07:29:55,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 40 minutes, 22 seconds)
2025-08-07 07:31:33,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:31:36,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 683.72333 ± 269.220
2025-08-07 07:31:36,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [423.67523, 414.04156, 883.3318, 1104.2136, 904.75446, 414.10257, 428.32126, 922.9669, 422.47632, 919.3494]
2025-08-07 07:31:36,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [188.0, 184.0, 300.0, 352.0, 305.0, 183.0, 185.0, 308.0, 184.0, 303.0]
2025-08-07 07:31:36,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 37 minutes, 9 seconds)
2025-08-07 07:33:17,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:33:19,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 186.70863 ± 2.972
2025-08-07 07:33:19,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [189.32582, 184.27922, 187.86147, 189.51424, 184.96606, 190.11284, 181.98386, 188.01646, 189.0407, 181.98576]
2025-08-07 07:33:19,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 102.0, 104.0, 105.0, 102.0, 105.0, 101.0, 104.0, 104.0, 101.0]
2025-08-07 07:33:19,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 35 minutes, 31 seconds)
2025-08-07 07:34:59,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:35:05,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1287.83386 ± 408.051
2025-08-07 07:35:05,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1366.6644, 1344.3743, 1563.5426, 1849.4727, 1704.069, 750.53033, 1273.8224, 1349.7548, 394.42792, 1281.6807]
2025-08-07 07:35:05,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [428.0, 422.0, 533.0, 624.0, 525.0, 275.0, 404.0, 423.0, 178.0, 407.0]
2025-08-07 07:35:05,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 33 minutes, 22 seconds)
2025-08-07 07:36:47,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:54,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1439.36169 ± 787.476
2025-08-07 07:36:54,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2897.7915, 728.26794, 2927.1206, 772.20544, 1423.4437, 1286.5238, 753.0978, 911.2204, 1591.3271, 1102.6177]
2025-08-07 07:36:54,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 280.0, 1000.0, 293.0, 503.0, 457.0, 285.0, 340.0, 495.0, 385.0]
2025-08-07 07:36:54,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 32 minutes, 51 seconds)
2025-08-07 07:38:31,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:38:38,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1526.87341 ± 613.484
2025-08-07 07:38:38,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1011.76526, 1120.0996, 1045.2709, 1137.7266, 2748.5789, 1040.8666, 1059.0253, 2083.5542, 1676.7867, 2345.0603]
2025-08-07 07:38:38,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [366.0, 367.0, 381.0, 387.0, 895.0, 372.0, 374.0, 717.0, 509.0, 803.0]
2025-08-07 07:38:38,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 30 minutes, 38 seconds)
2025-08-07 07:40:17,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:40:22,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1147.33142 ± 773.929
2025-08-07 07:40:22,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1407.2917, 856.44617, 469.77966, 177.41037, 486.87323, 1043.363, 786.84424, 1509.9657, 2980.3347, 1755.0059]
2025-08-07 07:40:22,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [486.0, 302.0, 200.0, 99.0, 207.0, 356.0, 294.0, 513.0, 951.0, 585.0]
2025-08-07 07:40:22,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 29 minutes, 26 seconds)
2025-08-07 07:42:04,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:09,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1033.37329 ± 220.613
2025-08-07 07:42:09,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1405.1809, 1160.9265, 1030.076, 870.37177, 683.76245, 1395.937, 1058.901, 894.5572, 867.37726, 966.6436]
2025-08-07 07:42:09,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [440.0, 377.0, 362.0, 283.0, 263.0, 441.0, 346.0, 297.0, 285.0, 348.0]
2025-08-07 07:42:09,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 28 minutes, 22 seconds)
2025-08-07 07:43:47,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:43:51,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 990.66388 ± 587.466
2025-08-07 07:43:51,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [222.64525, 232.77852, 1649.2723, 1501.3491, 248.36197, 1623.291, 1304.568, 533.42914, 1567.2332, 1023.70996]
2025-08-07 07:43:51,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 120.0, 545.0, 502.0, 125.0, 536.0, 456.0, 214.0, 474.0, 325.0]
2025-08-07 07:43:51,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 25 minutes, 57 seconds)
2025-08-07 07:45:32,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:45:38,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1263.00415 ± 474.287
2025-08-07 07:45:38,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1203.3854, 1112.8976, 1080.9175, 1147.636, 1141.2394, 1276.8942, 1098.6588, 2452.264, 1622.2435, 493.9054]
2025-08-07 07:45:38,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [379.0, 359.0, 380.0, 363.0, 361.0, 400.0, 350.0, 732.0, 486.0, 200.0]
2025-08-07 07:45:38,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 23 minutes, 46 seconds)
2025-08-07 07:47:20,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:23,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 513.16858 ± 672.302
2025-08-07 07:47:23,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [189.04541, 2178.79, 213.33789, 178.57564, 188.70428, 1457.9734, 174.44995, 178.8114, 194.73584, 177.26166]
2025-08-07 07:47:23,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 662.0, 117.0, 102.0, 107.0, 486.0, 100.0, 102.0, 110.0, 101.0]
2025-08-07 07:47:23,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 22 minutes, 12 seconds)
2025-08-07 07:49:01,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:49:06,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1220.67798 ± 295.193
2025-08-07 07:49:06,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1155.5199, 1172.3925, 1173.8545, 1861.904, 1603.7885, 1030.2811, 1154.6945, 1202.4036, 1138.1421, 713.79974]
2025-08-07 07:49:06,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [368.0, 376.0, 372.0, 565.0, 496.0, 341.0, 369.0, 376.0, 361.0, 270.0]
2025-08-07 07:49:06,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 20 minutes, 17 seconds)
2025-08-07 07:50:48,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:50:55,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1551.87659 ± 443.721
2025-08-07 07:50:55,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1925.8967, 2219.4636, 2072.3723, 1597.6373, 1059.6644, 1592.2343, 1248.9749, 1606.9708, 673.9801, 1521.5707]
2025-08-07 07:50:55,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [597.0, 693.0, 636.0, 508.0, 401.0, 504.0, 404.0, 504.0, 277.0, 518.0]
2025-08-07 07:50:55,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 18 minutes, 53 seconds)
2025-08-07 07:52:37,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:52:41,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1132.72266 ± 59.173
2025-08-07 07:52:41,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1141.3065, 1194.4768, 1159.8823, 1124.1028, 1131.7805, 1070.5562, 1137.9629, 1057.571, 1055.2942, 1254.2944]
2025-08-07 07:52:41,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [373.0, 387.0, 377.0, 364.0, 369.0, 348.0, 366.0, 347.0, 349.0, 401.0]
2025-08-07 07:52:41,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 17 minutes, 46 seconds)
2025-08-07 07:54:18,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:54:28,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2376.56299 ± 822.032
2025-08-07 07:54:28,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1084.2203, 3130.569, 1572.4037, 3095.009, 1679.9451, 3213.5046, 1576.4164, 1996.0725, 3213.484, 3204.0066]
2025-08-07 07:54:28,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [337.0, 1000.0, 482.0, 1000.0, 505.0, 1000.0, 478.0, 600.0, 1000.0, 1000.0]
2025-08-07 07:54:28,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (2376.56) for latency ExtremeClogL1U23
2025-08-07 07:54:28,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 15 minutes, 59 seconds)
2025-08-07 07:56:04,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:56:11,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1532.99719 ± 848.804
2025-08-07 07:56:11,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [899.95233, 1529.9878, 1488.159, 3244.1763, 872.6817, 1388.709, 1302.503, 709.4752, 864.9074, 3029.4202]
2025-08-07 07:56:11,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [333.0, 482.0, 497.0, 1000.0, 331.0, 488.0, 405.0, 272.0, 321.0, 934.0]
2025-08-07 07:56:11,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 13 minutes, 55 seconds)
2025-08-07 07:57:51,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:56,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1133.51978 ± 528.824
2025-08-07 07:57:56,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1021.5434, 621.86475, 631.6069, 910.2245, 907.7005, 622.9284, 2290.7869, 1220.0754, 1231.757, 1876.7107]
2025-08-07 07:57:56,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [368.0, 250.0, 252.0, 342.0, 341.0, 248.0, 699.0, 407.0, 401.0, 593.0]
2025-08-07 07:57:56,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 12 minutes, 25 seconds)
2025-08-07 07:59:36,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:40,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 875.55011 ± 643.954
2025-08-07 07:59:40,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1166.3204, 322.61627, 320.54318, 2360.5217, 394.54755, 312.60944, 1123.8243, 1431.4222, 1002.73364, 320.3622]
2025-08-07 07:59:40,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [380.0, 159.0, 158.0, 731.0, 178.0, 156.0, 371.0, 456.0, 331.0, 159.0]
2025-08-07 07:59:40,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 9 minutes, 59 seconds)
2025-08-07 08:01:15,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:23,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1839.43713 ± 793.939
2025-08-07 08:01:23,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3146.9358, 1178.2756, 1224.3314, 3126.7136, 1217.4747, 2394.3384, 1170.6465, 1285.423, 2404.7278, 1245.5048]
2025-08-07 08:01:23,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 378.0, 395.0, 1000.0, 397.0, 741.0, 381.0, 411.0, 731.0, 401.0]
2025-08-07 08:01:23,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 7 minutes, 45 seconds)
2025-08-07 08:03:04,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:03:12,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1812.45142 ± 951.012
2025-08-07 08:03:12,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2939.449, 633.83295, 3038.948, 1014.3497, 2368.5996, 1934.6884, 1589.8845, 928.3266, 595.85516, 3080.5808]
2025-08-07 08:03:12,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [925.0, 252.0, 1000.0, 332.0, 782.0, 625.0, 488.0, 327.0, 248.0, 1000.0]
2025-08-07 08:03:12,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 6 minutes, 21 seconds)
2025-08-07 08:04:48,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:52,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 813.90039 ± 841.302
2025-08-07 08:04:52,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [214.06259, 213.77089, 1539.1685, 2809.5044, 217.05128, 1301.1914, 216.02507, 1214.8258, 196.00983, 217.39438]
2025-08-07 08:04:52,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 114.0, 475.0, 855.0, 116.0, 404.0, 116.0, 386.0, 107.0, 116.0]
2025-08-07 08:04:52,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 4 minutes, 17 seconds)
2025-08-07 08:06:32,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:06:41,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2041.16638 ± 832.781
2025-08-07 08:06:41,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1666.9364, 3201.0496, 743.4111, 1520.1492, 3155.8923, 3248.7832, 2044.3298, 1972.0673, 1287.59, 1571.4575]
2025-08-07 08:06:41,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [524.0, 1000.0, 278.0, 467.0, 1000.0, 1000.0, 634.0, 608.0, 398.0, 490.0]
2025-08-07 08:06:41,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 3 minutes)
2025-08-07 08:08:18,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:08:28,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2432.73975 ± 1033.823
2025-08-07 08:08:28,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [747.07477, 3186.5408, 3176.6143, 2356.9385, 972.0238, 3227.3318, 3226.5493, 3193.9412, 991.22766, 3249.157]
2025-08-07 08:08:28,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [264.0, 1000.0, 1000.0, 711.0, 314.0, 1000.0, 1000.0, 1000.0, 321.0, 1000.0]
2025-08-07 08:08:28,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (2432.74) for latency ExtremeClogL1U23
2025-08-07 08:08:28,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 1 minute, 39 seconds)
2025-08-07 08:10:08,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:10:14,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1437.87891 ± 727.313
2025-08-07 08:10:14,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [681.6757, 1400.1309, 675.18713, 1263.2523, 1700.4635, 2080.9368, 715.2717, 3189.2263, 1413.4681, 1259.1764]
2025-08-07 08:10:14,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [261.0, 433.0, 258.0, 396.0, 519.0, 628.0, 269.0, 986.0, 434.0, 396.0]
2025-08-07 08:10:14,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 14 seconds)
2025-08-07 08:11:51,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:55,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 727.21521 ± 711.187
2025-08-07 08:11:55,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2623.0596, 380.4435, 431.89252, 393.02634, 388.33594, 383.40622, 375.2728, 430.77365, 1487.5142, 378.4274]
2025-08-07 08:11:55,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [793.0, 173.0, 184.0, 174.0, 174.0, 172.0, 171.0, 184.0, 460.0, 173.0]
2025-08-07 08:11:55,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 57 minutes, 31 seconds)
2025-08-07 08:13:32,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:13:37,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1051.61499 ± 691.160
2025-08-07 08:13:37,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1006.6292, 831.53613, 1029.2708, 3033.1616, 863.6838, 973.3164, 445.44464, 449.04782, 984.1144, 899.9461]
2025-08-07 08:13:37,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [314.0, 264.0, 319.0, 894.0, 270.0, 308.0, 182.0, 182.0, 302.0, 284.0]
2025-08-07 08:13:37,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 55 minutes, 59 seconds)
2025-08-07 08:15:18,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:15:30,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2903.16479 ± 512.716
2025-08-07 08:15:30,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3106.3157, 1473.3242, 3163.071, 3099.3123, 3125.091, 3197.3552, 2506.471, 3082.9673, 3124.2651, 3153.4749]
2025-08-07 08:15:30,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 458.0, 1000.0, 1000.0, 1000.0, 1000.0, 770.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:15:30,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (2903.16) for latency ExtremeClogL1U23
2025-08-07 08:15:30,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 54 minutes, 42 seconds)
2025-08-07 08:17:02,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:17:08,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1522.08105 ± 847.320
2025-08-07 08:17:08,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1522.4445, 879.9894, 1205.9543, 1480.5872, 1249.3085, 849.2279, 870.0314, 3148.3586, 867.8665, 3147.0415]
2025-08-07 08:17:08,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [462.0, 307.0, 410.0, 450.0, 385.0, 301.0, 305.0, 1000.0, 305.0, 1000.0]
2025-08-07 08:17:08,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 52 minutes, 1 second)
2025-08-07 08:18:48,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:18:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2573.12866 ± 768.688
2025-08-07 08:18:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2368.086, 1153.2015, 2870.5122, 3224.577, 3224.3113, 3207.4128, 3186.902, 1822.3707, 1482.7042, 3191.2068]
2025-08-07 08:18:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [748.0, 390.0, 893.0, 1000.0, 1000.0, 1000.0, 1000.0, 585.0, 504.0, 1000.0]
2025-08-07 08:18:59,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 50 minutes, 42 seconds)
2025-08-07 08:20:38,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:20:46,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1963.94299 ± 777.036
2025-08-07 08:20:46,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3315.7043, 1062.0366, 2020.3209, 2239.0208, 1032.6438, 3195.093, 1032.4523, 1940.1058, 1729.1307, 2072.9226]
2025-08-07 08:20:46,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [993.0, 371.0, 647.0, 688.0, 365.0, 1000.0, 367.0, 585.0, 527.0, 634.0]
2025-08-07 08:20:46,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 49 minutes, 35 seconds)
2025-08-07 08:22:21,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:22:27,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1410.48096 ± 225.825
2025-08-07 08:22:27,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1225.4828, 1434.1326, 1207.3733, 1186.8832, 1249.5098, 1223.0066, 1454.3533, 1589.9036, 1635.8726, 1898.2922]
2025-08-07 08:22:27,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [416.0, 460.0, 414.0, 406.0, 422.0, 418.0, 496.0, 519.0, 505.0, 569.0]
2025-08-07 08:22:27,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 47 minutes, 44 seconds)
2025-08-07 08:24:07,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:24:12,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1369.62915 ± 453.991
2025-08-07 08:24:12,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1023.24677, 986.9901, 1427.593, 1054.5752, 922.45764, 1301.5608, 2512.9285, 1756.2439, 1478.7552, 1231.9402]
2025-08-07 08:24:12,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [314.0, 308.0, 431.0, 330.0, 311.0, 398.0, 752.0, 532.0, 450.0, 379.0]
2025-08-07 08:24:12,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 45 minutes, 14 seconds)
2025-08-07 08:25:51,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:25:57,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1487.91858 ± 970.768
2025-08-07 08:25:57,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [784.4559, 3183.628, 2369.8203, 1096.9235, 779.40063, 794.72754, 789.97565, 1064.4108, 796.8921, 3218.951]
2025-08-07 08:25:57,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [289.0, 960.0, 716.0, 376.0, 291.0, 293.0, 289.0, 372.0, 290.0, 1000.0]
2025-08-07 08:25:57,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 44 minutes, 3 seconds)
2025-08-07 08:27:35,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:27:47,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2831.34473 ± 598.705
2025-08-07 08:27:47,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1635.868, 2106.88, 3196.4827, 2060.027, 3214.085, 3208.2124, 3213.4526, 3228.7532, 3219.11, 3230.5762]
2025-08-07 08:27:47,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [490.0, 674.0, 1000.0, 658.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:27:47,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 42 minutes, 18 seconds)
2025-08-07 08:29:26,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:34,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1857.56421 ± 1113.407
2025-08-07 08:29:34,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1184.5709, 623.13104, 624.3443, 906.5854, 3122.6582, 3148.381, 623.5497, 3016.0435, 2131.5781, 3194.8005]
2025-08-07 08:29:34,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [396.0, 232.0, 232.0, 316.0, 1000.0, 1000.0, 232.0, 900.0, 644.0, 1000.0]
2025-08-07 08:29:34,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 40 minutes, 28 seconds)
2025-08-07 08:31:16,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:31:28,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2973.91699 ± 535.126
2025-08-07 08:31:28,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1876.0326, 3245.7878, 3271.7446, 3217.9148, 3258.1733, 3217.0686, 3235.5745, 3243.995, 3240.0786, 1932.801]
2025-08-07 08:31:28,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [558.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 957.0, 1000.0, 1000.0, 578.0]
2025-08-07 08:31:28,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (2973.92) for latency ExtremeClogL1U23
2025-08-07 08:31:28,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 39 minutes, 41 seconds)
2025-08-07 08:33:02,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:07,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1116.63342 ± 435.026
2025-08-07 08:33:07,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [444.6569, 1512.0969, 1659.0115, 1150.4862, 444.87607, 869.15155, 1058.2753, 909.0134, 1421.2811, 1697.4849]
2025-08-07 08:33:07,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [183.0, 446.0, 539.0, 389.0, 183.0, 310.0, 324.0, 314.0, 468.0, 549.0]
2025-08-07 08:33:07,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 37 minutes, 24 seconds)
2025-08-07 08:34:50,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:34:59,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2133.60303 ± 691.477
2025-08-07 08:34:59,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1774.2048, 1693.048, 1200.2773, 2850.4905, 1599.1676, 3275.6365, 1826.9474, 1665.6825, 3210.1013, 2240.4746]
2025-08-07 08:34:59,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [529.0, 507.0, 399.0, 844.0, 486.0, 1000.0, 544.0, 498.0, 1000.0, 680.0]
2025-08-07 08:34:59,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 36 minutes, 6 seconds)
2025-08-07 08:36:33,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:36:39,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1652.00098 ± 1010.886
2025-08-07 08:36:39,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2030.5319, 2176.7327, 252.27417, 253.75882, 2404.4092, 2218.3906, 2183.0225, 250.4137, 1451.5321, 3298.9446]
2025-08-07 08:36:39,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [600.0, 644.0, 128.0, 128.0, 712.0, 672.0, 646.0, 127.0, 477.0, 1000.0]
2025-08-07 08:36:39,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 41 seconds)
2025-08-07 08:38:20,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:38:28,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1911.35327 ± 878.706
2025-08-07 08:38:28,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3209.5098, 1759.508, 3231.7783, 1335.2656, 1621.167, 1086.4183, 1308.4347, 3219.8975, 1162.1415, 1179.4144]
2025-08-07 08:38:28,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 525.0, 1000.0, 409.0, 485.0, 374.0, 404.0, 1000.0, 391.0, 394.0]
2025-08-07 08:38:28,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 32 minutes, 1 second)
2025-08-07 08:40:04,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:40:14,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2310.60962 ± 928.026
2025-08-07 08:40:14,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1564.8085, 3212.6008, 3208.068, 1536.2864, 1201.2798, 3197.974, 3187.3677, 1866.9386, 3234.5027, 896.2704]
2025-08-07 08:40:14,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [502.0, 1000.0, 1000.0, 501.0, 406.0, 1000.0, 1000.0, 591.0, 1000.0, 307.0]
2025-08-07 08:40:14,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 46 seconds)
2025-08-07 08:41:50,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:41:56,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1148.04834 ± 1362.839
2025-08-07 08:41:56,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3241.2146, 264.4722, 260.24118, 3222.134, 264.7444, 235.2438, 269.05023, 261.95367, 235.55928, 3225.87]
2025-08-07 08:41:56,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 133.0, 131.0, 1000.0, 133.0, 122.0, 134.0, 132.0, 122.0, 1000.0]
2025-08-07 08:41:56,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 28 minutes, 12 seconds)
2025-08-07 08:43:33,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:40,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1557.88245 ± 1230.482
2025-08-07 08:43:40,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [522.9465, 623.27875, 524.18567, 2956.6853, 3101.0334, 3110.396, 614.09375, 522.11505, 518.1632, 3085.9268]
2025-08-07 08:43:40,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [212.0, 248.0, 212.0, 889.0, 1000.0, 1000.0, 247.0, 212.0, 210.0, 965.0]
2025-08-07 08:43:40,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 4 seconds)
2025-08-07 08:45:25,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:45:35,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2164.68042 ± 1341.323
2025-08-07 08:45:35,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3203.3481, 3198.0156, 1740.9281, 3210.0564, 3232.2798, 222.6995, 219.568, 3205.7666, 228.00339, 3186.1396]
2025-08-07 08:45:35,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 513.0, 1000.0, 1000.0, 116.0, 115.0, 1000.0, 119.0, 1000.0]
2025-08-07 08:45:35,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 59 seconds)
2025-08-07 08:47:04,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:47:17,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 3043.99512 ± 405.119
2025-08-07 08:47:17,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3187.5012, 1828.9197, 3170.926, 3167.9788, 3171.6384, 3186.8037, 3195.423, 3184.282, 3169.6355, 3176.8438]
2025-08-07 08:47:17,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 538.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:47:17,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (3044.00) for latency ExtremeClogL1U23
2025-08-07 08:47:17,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 55 seconds)
2025-08-07 08:48:59,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:49:08,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2048.09326 ± 619.829
2025-08-07 08:49:08,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2491.5703, 1334.9464, 3007.3513, 1792.0123, 1690.3359, 3256.95, 1399.0967, 1852.3955, 1776.699, 1879.572]
2025-08-07 08:49:08,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [740.0, 392.0, 893.0, 535.0, 504.0, 1000.0, 408.0, 553.0, 531.0, 561.0]
2025-08-07 08:49:08,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 20 seconds)
2025-08-07 08:50:44,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:50:53,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2287.51611 ± 672.909
2025-08-07 08:50:53,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1460.863, 1710.8917, 2005.2362, 3061.6536, 2280.0083, 2187.3472, 1500.5466, 3257.6736, 3381.2134, 2029.727]
2025-08-07 08:50:53,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [436.0, 502.0, 593.0, 931.0, 679.0, 654.0, 448.0, 971.0, 1000.0, 628.0]
2025-08-07 08:50:53,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 41 seconds)
2025-08-07 08:52:30,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:52:43,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 3176.21631 ± 48.330
2025-08-07 08:52:43,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3139.2239, 3213.9353, 3189.1084, 3219.6033, 3171.2056, 3193.0579, 3214.334, 3209.6494, 3159.802, 3052.2437]
2025-08-07 08:52:43,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:52:43,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1226 [INFO]: New best (3176.22) for latency ExtremeClogL1U23
2025-08-07 08:52:43,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 5 seconds)
2025-08-07 08:54:26,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:54:29,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 622.36188 ± 633.019
2025-08-07 08:54:29,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [376.06415, 419.6826, 436.9808, 380.1106, 384.02173, 447.43625, 421.52975, 2519.7756, 389.68115, 448.33606]
2025-08-07 08:54:29,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 178.0, 182.0, 168.0, 168.0, 185.0, 178.0, 753.0, 169.0, 185.0]
2025-08-07 08:54:29,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 2 seconds)
2025-08-07 08:56:02,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:56:07,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1259.91663 ± 1345.246
2025-08-07 08:56:07,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1323.8589, 85.30609, 3190.7612, 75.86742, 1312.1296, 3180.0066, 83.60034, 3189.1213, 81.01872, 77.494774]
2025-08-07 08:56:07,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [402.0, 64.0, 1000.0, 59.0, 400.0, 1000.0, 63.0, 1000.0, 62.0, 60.0]
2025-08-07 08:56:07,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 8 seconds)
2025-08-07 08:57:47,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:57:54,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1622.49695 ± 1359.173
2025-08-07 08:57:54,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3188.117, 3182.2666, 1941.6172, 3186.0042, 336.8154, 333.04272, 229.4404, 324.35513, 3177.885, 325.42694]
2025-08-07 08:57:54,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 583.0, 1000.0, 157.0, 157.0, 119.0, 152.0, 1000.0, 152.0]
2025-08-07 08:57:54,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 16 seconds)
2025-08-07 08:59:31,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:59:39,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2102.18677 ± 758.658
2025-08-07 08:59:39,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3232.128, 1798.1918, 2056.1665, 2050.125, 1992.5521, 3204.4812, 885.965, 2572.725, 915.2018, 2314.3306]
2025-08-07 08:59:39,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 532.0, 616.0, 656.0, 595.0, 1000.0, 314.0, 774.0, 318.0, 678.0]
2025-08-07 08:59:40,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 31 seconds)
2025-08-07 09:01:26,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:01:38,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2732.43677 ± 1021.642
2025-08-07 09:01:38,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3239.0835, 596.1759, 3265.4993, 3247.86, 3260.6487, 3210.3489, 3270.1167, 3169.3682, 3277.7317, 787.5337]
2025-08-07 09:01:38,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 240.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 288.0]
2025-08-07 09:01:38,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 54 seconds)
2025-08-07 09:03:09,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:03:16,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1728.46973 ± 1217.429
2025-08-07 09:03:16,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3214.8357, 709.9019, 3227.896, 3210.0884, 534.12476, 586.39, 3194.035, 711.95026, 950.88727, 944.58875]
2025-08-07 09:03:16,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 239.0, 1000.0, 1000.0, 196.0, 214.0, 1000.0, 246.0, 298.0, 295.0]
2025-08-07 09:03:16,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 1 second)
2025-08-07 09:04:53,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:05:04,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2577.82251 ± 1027.703
2025-08-07 09:05:04,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [874.9342, 3245.4092, 1327.1316, 3218.8652, 3247.9756, 854.28204, 3247.621, 3276.6394, 3230.0852, 3255.2786]
2025-08-07 09:05:04,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [315.0, 1000.0, 421.0, 1000.0, 1000.0, 309.0, 1000.0, 1000.0, 1000.0, 974.0]
2025-08-07 09:05:04,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 21 seconds)
2025-08-07 09:06:44,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:06:56,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2853.23804 ± 777.640
2025-08-07 09:06:56,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3229.067, 3244.933, 3219.6836, 1021.00336, 3235.6086, 3241.7732, 3275.1768, 3231.3435, 1622.4641, 3211.3262]
2025-08-07 09:06:56,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 324.0, 1000.0, 1000.0, 1000.0, 1000.0, 484.0, 1000.0]
2025-08-07 09:06:56,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 36 seconds)
2025-08-07 09:08:34,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:08:43,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2256.25977 ± 880.256
2025-08-07 09:08:43,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1893.5338, 3284.7502, 1906.0109, 3284.7266, 1363.0665, 3270.7637, 1191.298, 3293.325, 1930.6293, 1144.4939]
2025-08-07 09:08:43,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [533.0, 1000.0, 576.0, 1000.0, 432.0, 1000.0, 366.0, 1000.0, 574.0, 367.0]
2025-08-07 09:08:43,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 48 seconds)
2025-08-07 09:10:23,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:10:36,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 3048.37646 ± 373.868
2025-08-07 09:10:36,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3228.5425, 3304.6418, 3120.1782, 3223.1448, 2426.5874, 3295.7883, 3276.01, 3207.4883, 3199.2896, 2202.0916]
2025-08-07 09:10:36,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 764.0, 1000.0, 1000.0, 1000.0, 1000.0, 697.0]
2025-08-07 09:10:36,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-hopper):1251 [DEBUG]: Training session finished
