2025-05-11 15:46:25,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2
2025-05-11 15:46:25,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2
2025-05-11 15:46:25,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7e90ac3cde80>}
2025-05-11 15:46:25,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1111 [DEBUG]: using device: cpu
2025-05-11 15:46:25,319 baseline-bpql-noisy-hopper:77 [WARNING]: args.assumed_delay != args.horizon: 2 != 24
2025-05-11 15:46:25,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1133 [INFO]: Creating new trainer
2025-05-11 15:46:25,328 baseline-bpql-noisy-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=17, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-11 15:46:25,328 baseline-bpql-noisy-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 15:46:25,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1194 [DEBUG]: Starting training session...
2025-05-11 15:46:25,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 1/100
2025-05-11 15:48:55,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:48:55,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 142.88942 ± 92.515
2025-05-11 15:48:55,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [245.47348, 211.40079, 214.03989, 213.04227, 25.690594, 30.502237, 49.64547, 211.34517, 211.24007, 16.514194]
2025-05-11 15:48:55,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 92.0, 93.0, 93.0, 18.0, 32.0, 43.0, 92.0, 92.0, 14.0]
2025-05-11 15:48:55,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (142.89) for latency ExtremeClogL1U23
2025-05-11 15:48:55,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 15:48:55,818 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 15:48:55,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 7 minutes, 58 seconds)
2025-05-11 15:51:37,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:51:38,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 211.04822 ± 87.904
2025-05-11 15:51:38,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [69.8907, 60.64344, 296.18448, 172.44566, 271.34174, 138.32782, 289.95612, 278.41934, 274.25723, 259.0157]
2025-05-11 15:51:38,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [45.0, 35.0, 173.0, 88.0, 146.0, 83.0, 160.0, 135.0, 135.0, 133.0]
2025-05-11 15:51:38,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (211.05) for latency ExtremeClogL1U23
2025-05-11 15:51:38,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 15:51:38,233 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 15:51:38,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 15 minutes, 22 seconds)
2025-05-11 15:54:22,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:54:24,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 214.66574 ± 112.835
2025-05-11 15:54:24,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [29.77139, 34.46476, 361.7825, 147.5629, 270.98236, 243.73544, 384.4512, 210.38939, 208.65997, 254.85765]
2025-05-11 15:54:24,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [45.0, 42.0, 229.0, 146.0, 191.0, 147.0, 252.0, 167.0, 143.0, 234.0]
2025-05-11 15:54:24,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (214.67) for latency ExtremeClogL1U23
2025-05-11 15:54:24,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 15:54:24,302 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 15:54:24,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 18 minutes)
2025-05-11 15:57:01,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:57:02,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 224.70610 ± 81.542
2025-05-11 15:57:02,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [99.45309, 289.76868, 311.24655, 227.5391, 50.141933, 297.6919, 218.32277, 236.11166, 237.99185, 278.79346]
2025-05-11 15:57:02,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 148.0, 211.0, 101.0, 35.0, 156.0, 97.0, 164.0, 109.0, 163.0]
2025-05-11 15:57:02,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (224.71) for latency ExtremeClogL1U23
2025-05-11 15:57:02,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 15:57:02,840 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 15:57:02,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 14 minutes, 55 seconds)
2025-05-11 15:59:49,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:59:51,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 204.07974 ± 141.693
2025-05-11 15:59:51,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [361.27524, 32.681747, 371.6117, 105.69639, 30.805882, 324.1701, 343.70813, 301.26596, 25.251307, 144.33101]
2025-05-11 15:59:51,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [169.0, 28.0, 351.0, 73.0, 34.0, 159.0, 156.0, 155.0, 28.0, 90.0]
2025-05-11 15:59:51,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 15 minutes, 7 seconds)
2025-05-11 16:02:38,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:02:39,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 213.61116 ± 116.323
2025-05-11 16:02:39,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [337.45862, 232.18214, 282.8696, 144.96408, 35.391663, 275.02023, 342.00693, 338.60498, 114.94304, 32.67006]
2025-05-11 16:02:39,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 117.0, 133.0, 86.0, 37.0, 131.0, 160.0, 166.0, 65.0, 28.0]
2025-05-11 16:02:39,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 18 minutes, 9 seconds)
2025-05-11 16:05:26,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:05:28,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 317.83975 ± 85.598
2025-05-11 16:05:28,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [359.63367, 228.09166, 367.73074, 375.87784, 354.92703, 355.24493, 364.17963, 328.34967, 91.265686, 353.0967]
2025-05-11 16:05:28,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [150.0, 104.0, 154.0, 162.0, 144.0, 140.0, 141.0, 133.0, 61.0, 140.0]
2025-05-11 16:05:28,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (317.84) for latency ExtremeClogL1U23
2025-05-11 16:05:28,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:05:28,487 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 16:05:28,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 17 minutes, 22 seconds)
2025-05-11 16:08:17,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:08:19,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 212.32596 ± 138.029
2025-05-11 16:08:19,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [340.1105, 209.24924, 123.228645, 340.95715, 350.12177, 350.19897, 7.902383, 33.571953, 49.14727, 318.77164]
2025-05-11 16:08:19,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 94.0, 65.0, 132.0, 136.0, 135.0, 9.0, 38.0, 57.0, 160.0]
2025-05-11 16:08:19,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 15 minutes, 59 seconds)
2025-05-11 16:11:06,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:11:09,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 322.22217 ± 175.219
2025-05-11 16:11:09,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [392.91898, 471.40375, 115.29859, 28.193655, 529.0987, 490.40338, 84.84787, 278.147, 458.0926, 373.81726]
2025-05-11 16:11:09,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 287.0, 66.0, 49.0, 474.0, 440.0, 79.0, 244.0, 256.0, 177.0]
2025-05-11 16:11:09,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (322.22) for latency ExtremeClogL1U23
2025-05-11 16:11:09,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:11:09,016 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 16:11:09,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 16 minutes, 40 seconds)
2025-05-11 16:14:00,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:14:01,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 318.22855 ± 84.343
2025-05-11 16:14:01,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [383.0883, 350.9115, 368.8134, 354.95416, 235.01904, 362.62305, 94.00418, 365.6132, 339.3096, 327.94888]
2025-05-11 16:14:01,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [197.0, 128.0, 133.0, 136.0, 99.0, 134.0, 53.0, 165.0, 126.0, 124.0]
2025-05-11 16:14:01,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 15 minutes, 12 seconds)
2025-05-11 16:16:49,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:16:50,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 271.34891 ± 169.073
2025-05-11 16:16:50,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [289.60318, 8.954268, 331.60056, 44.47436, 33.305714, 364.81485, 353.70944, 442.61182, 507.47165, 336.94318]
2025-05-11 16:16:50,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [136.0, 15.0, 127.0, 48.0, 36.0, 151.0, 132.0, 180.0, 182.0, 139.0]
2025-05-11 16:16:50,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 12 minutes, 31 seconds)
2025-05-11 16:19:38,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:19:40,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 343.58261 ± 154.532
2025-05-11 16:19:40,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [287.05908, 520.2839, 208.70432, 558.3296, 275.14972, 38.88789, 368.6064, 286.86316, 351.78653, 540.1553]
2025-05-11 16:19:40,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 174.0, 93.0, 183.0, 114.0, 43.0, 151.0, 116.0, 134.0, 177.0]
2025-05-11 16:19:40,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (343.58) for latency ExtremeClogL1U23
2025-05-11 16:19:40,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:19:40,147 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 16:19:40,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 9 minutes, 49 seconds)
2025-05-11 16:22:31,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:22:33,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 393.38770 ± 16.130
2025-05-11 16:22:33,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [394.06177, 394.98456, 347.49802, 398.99005, 398.4995, 397.80087, 398.2819, 399.57248, 392.04498, 412.14276]
2025-05-11 16:22:33,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [149.0, 149.0, 135.0, 152.0, 150.0, 153.0, 151.0, 152.0, 150.0, 157.0]
2025-05-11 16:22:33,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (393.39) for latency ExtremeClogL1U23
2025-05-11 16:22:33,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:22:33,592 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 16:22:33,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 7 minutes, 49 seconds)
2025-05-11 16:25:21,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:25:23,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 405.24585 ± 107.066
2025-05-11 16:25:23,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [484.25607, 259.52222, 500.18732, 185.99403, 460.77036, 314.92114, 508.35504, 409.21185, 440.86777, 488.37256]
2025-05-11 16:25:23,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [177.0, 108.0, 207.0, 84.0, 160.0, 123.0, 179.0, 151.0, 159.0, 195.0]
2025-05-11 16:25:23,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (405.25) for latency ExtremeClogL1U23
2025-05-11 16:25:23,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:25:23,453 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 16:25:23,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 4 minutes, 56 seconds)
2025-05-11 16:28:15,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:28:17,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 410.24103 ± 139.497
2025-05-11 16:28:17,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [491.338, 484.7458, 490.96298, 478.33862, 489.20337, 327.95157, 36.137226, 329.50015, 489.76962, 484.463]
2025-05-11 16:28:17,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [167.0, 165.0, 167.0, 162.0, 165.0, 126.0, 38.0, 125.0, 166.0, 165.0]
2025-05-11 16:28:17,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (410.24) for latency ExtremeClogL1U23
2025-05-11 16:28:17,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:28:17,236 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 16:28:17,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 2 minutes, 21 seconds)
2025-05-11 16:31:05,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:31:07,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 287.69507 ± 199.250
2025-05-11 16:31:07,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [80.827675, 228.13094, 530.85297, 526.60297, 227.53212, 83.17428, 501.4215, 43.01394, 127.365875, 528.0282]
2025-05-11 16:31:07,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [49.0, 96.0, 207.0, 176.0, 100.0, 79.0, 171.0, 25.0, 65.0, 201.0]
2025-05-11 16:31:07,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 59 minutes, 45 seconds)
2025-05-11 16:33:56,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:33:58,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 362.80502 ± 203.207
2025-05-11 16:33:58,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [378.759, 454.85742, 623.43195, 562.9649, 218.75623, 560.7353, 490.00403, 38.681004, 260.428, 39.43247]
2025-05-11 16:33:58,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 165.0, 223.0, 185.0, 96.0, 195.0, 169.0, 40.0, 125.0, 46.0]
2025-05-11 16:33:58,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 57 minutes, 23 seconds)
2025-05-11 16:36:48,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:36:49,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 377.10651 ± 160.045
2025-05-11 16:36:49,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [524.59357, 578.6642, 194.0613, 414.73807, 177.07722, 440.9217, 365.02063, 598.02734, 121.39648, 356.56445]
2025-05-11 16:36:49,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 195.0, 95.0, 163.0, 83.0, 179.0, 143.0, 207.0, 64.0, 167.0]
2025-05-11 16:36:49,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 54 minutes, 4 seconds)
2025-05-11 16:39:39,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:39:40,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 388.42679 ± 153.233
2025-05-11 16:39:40,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [510.5776, 312.6314, 136.33028, 587.57324, 490.89166, 141.1446, 525.66766, 339.6431, 330.23184, 509.57648]
2025-05-11 16:39:40,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [174.0, 121.0, 112.0, 189.0, 166.0, 71.0, 176.0, 124.0, 126.0, 171.0]
2025-05-11 16:39:40,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 51 minutes, 27 seconds)
2025-05-11 16:42:28,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:42:30,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 564.89417 ± 106.364
2025-05-11 16:42:30,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [652.2092, 588.63214, 296.31958, 569.224, 659.1823, 447.7617, 640.21515, 617.90424, 605.5191, 571.97424]
2025-05-11 16:42:30,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [209.0, 188.0, 119.0, 219.0, 209.0, 162.0, 206.0, 201.0, 192.0, 189.0]
2025-05-11 16:42:30,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (564.89) for latency ExtremeClogL1U23
2025-05-11 16:42:30,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:42:30,848 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 16:42:30,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 47 minutes, 37 seconds)
2025-05-11 16:45:19,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:45:20,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 333.71875 ± 193.641
2025-05-11 16:45:20,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [521.8867, 489.46283, 395.4355, 457.41434, 389.3964, 45.441902, 24.166311, 520.6225, 67.6774, 425.6834]
2025-05-11 16:45:20,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 167.0, 139.0, 170.0, 140.0, 45.0, 28.0, 190.0, 40.0, 189.0]
2025-05-11 16:45:20,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 44 minutes, 43 seconds)
2025-05-11 16:48:09,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:48:11,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 441.86884 ± 247.710
2025-05-11 16:48:11,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [74.43993, 460.84946, 59.1445, 640.9405, 116.39666, 437.99918, 637.841, 670.0782, 667.6566, 653.3426]
2025-05-11 16:48:11,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [50.0, 204.0, 52.0, 205.0, 61.0, 147.0, 203.0, 219.0, 212.0, 212.0]
2025-05-11 16:48:11,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 41 minutes, 43 seconds)
2025-05-11 16:51:00,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:51:01,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 394.23578 ± 220.934
2025-05-11 16:51:01,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [533.8489, 160.6456, 570.3671, 222.68459, 620.96027, 534.21783, 80.73443, 756.207, 242.29947, 220.39294]
2025-05-11 16:51:01,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 80.0, 189.0, 101.0, 213.0, 177.0, 47.0, 267.0, 102.0, 96.0]
2025-05-11 16:51:01,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 38 minutes, 37 seconds)
2025-05-11 16:53:50,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:53:52,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 433.38052 ± 189.793
2025-05-11 16:53:52,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [629.70685, 349.97522, 266.1477, 307.1126, 613.1053, 55.80667, 656.001, 352.41672, 628.55444, 474.97888]
2025-05-11 16:53:52,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [204.0, 139.0, 111.0, 131.0, 197.0, 41.0, 223.0, 138.0, 206.0, 171.0]
2025-05-11 16:53:52,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 35 minutes, 46 seconds)
2025-05-11 16:56:39,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:56:40,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 386.24240 ± 309.245
2025-05-11 16:56:40,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [45.77883, 90.28163, 668.537, 8.213545, 33.747433, 783.0168, 670.5238, 723.1772, 580.7784, 258.3696]
2025-05-11 16:56:40,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 62.0, 243.0, 10.0, 42.0, 266.0, 227.0, 231.0, 186.0, 113.0]
2025-05-11 16:56:40,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 32 minutes, 28 seconds)
2025-05-11 16:59:29,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 16:59:31,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 345.96637 ± 142.037
2025-05-11 16:59:31,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [210.41864, 329.40952, 205.37218, 536.63934, 227.88356, 169.55382, 566.001, 381.68808, 521.9691, 310.72876]
2025-05-11 16:59:31,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 136.0, 91.0, 195.0, 97.0, 117.0, 213.0, 137.0, 172.0, 130.0]
2025-05-11 16:59:31,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 29 minutes, 50 seconds)
2025-05-11 17:02:22,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:02:24,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 438.64615 ± 188.426
2025-05-11 17:02:24,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [575.6196, 598.1097, 303.85495, 182.03952, 641.61896, 605.53973, 624.37787, 303.06613, 437.48434, 114.75104]
2025-05-11 17:02:24,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 192.0, 124.0, 86.0, 205.0, 193.0, 232.0, 133.0, 156.0, 71.0]
2025-05-11 17:02:24,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 27 minutes, 37 seconds)
2025-05-11 17:05:11,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:05:13,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 556.90076 ± 117.271
2025-05-11 17:05:13,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [569.70416, 573.453, 587.4175, 615.36414, 583.393, 590.32495, 675.0821, 581.2592, 576.652, 216.35806]
2025-05-11 17:05:13,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [195.0, 186.0, 189.0, 195.0, 188.0, 215.0, 224.0, 188.0, 186.0, 93.0]
2025-05-11 17:05:13,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 24 minutes, 23 seconds)
2025-05-11 17:08:03,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:08:04,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 425.00162 ± 148.176
2025-05-11 17:08:04,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [409.25623, 211.67223, 569.25116, 387.7954, 618.10706, 551.79474, 560.1186, 169.62389, 462.87668, 309.52005]
2025-05-11 17:08:04,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [142.0, 94.0, 184.0, 136.0, 197.0, 180.0, 181.0, 82.0, 162.0, 125.0]
2025-05-11 17:08:04,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 21 minutes, 43 seconds)
2025-05-11 17:10:54,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:10:55,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 423.55460 ± 299.825
2025-05-11 17:10:55,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [104.393845, 1053.8005, 29.236694, 347.1679, 518.4943, 554.9528, 9.284059, 534.19775, 555.25366, 528.76465]
2025-05-11 17:10:55,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [62.0, 454.0, 40.0, 175.0, 172.0, 180.0, 10.0, 175.0, 175.0, 175.0]
2025-05-11 17:10:55,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 19 minutes, 30 seconds)
2025-05-11 17:13:43,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:13:45,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 416.33405 ± 193.647
2025-05-11 17:13:45,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [324.3806, 286.98807, 579.50354, 529.6445, 530.837, 561.6557, 68.66929, 106.30462, 573.4424, 601.91473]
2025-05-11 17:13:45,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [124.0, 131.0, 186.0, 175.0, 175.0, 182.0, 62.0, 60.0, 201.0, 202.0]
2025-05-11 17:13:45,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 16 minutes, 21 seconds)
2025-05-11 17:16:33,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:16:35,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 460.79263 ± 99.460
2025-05-11 17:16:35,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [547.5007, 587.11975, 542.20166, 354.08398, 394.79178, 395.30698, 297.06876, 389.7855, 572.5341, 527.5332]
2025-05-11 17:16:35,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 188.0, 179.0, 133.0, 144.0, 156.0, 125.0, 158.0, 193.0, 173.0]
2025-05-11 17:16:35,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 12 minutes, 52 seconds)
2025-05-11 17:19:25,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:19:27,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 485.71420 ± 128.057
2025-05-11 17:19:27,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [531.9021, 542.1521, 545.464, 537.0098, 108.09119, 529.7229, 540.22253, 515.679, 544.8819, 462.01648]
2025-05-11 17:19:27,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 179.0, 179.0, 180.0, 59.0, 176.0, 178.0, 173.0, 178.0, 163.0]
2025-05-11 17:19:27,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 10 minutes, 46 seconds)
2025-05-11 17:22:16,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:22:18,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 463.83466 ± 187.172
2025-05-11 17:22:18,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [597.60223, 639.647, 8.032881, 443.98047, 268.76526, 408.5649, 572.4202, 494.40237, 626.0491, 578.8819]
2025-05-11 17:22:18,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [211.0, 205.0, 9.0, 184.0, 156.0, 153.0, 186.0, 196.0, 203.0, 186.0]
2025-05-11 17:22:18,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 7 minutes, 46 seconds)
2025-05-11 17:25:05,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:25:06,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 460.52002 ± 217.976
2025-05-11 17:25:06,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [599.3809, 34.65566, 36.09416, 623.3255, 565.0566, 485.29523, 649.0159, 529.3043, 503.32642, 579.7456]
2025-05-11 17:25:06,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [206.0, 28.0, 40.0, 206.0, 184.0, 165.0, 250.0, 173.0, 175.0, 227.0]
2025-05-11 17:25:06,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 4 minutes, 25 seconds)
2025-05-11 17:27:48,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:27:50,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 443.37054 ± 209.913
2025-05-11 17:27:50,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [34.71291, 239.84111, 648.8634, 571.8142, 586.1254, 610.3711, 637.4038, 440.52014, 154.8738, 509.17984]
2025-05-11 17:27:50,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 103.0, 221.0, 185.0, 188.0, 196.0, 206.0, 157.0, 76.0, 209.0]
2025-05-11 17:27:50,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 22 seconds)
2025-05-11 17:30:33,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:30:34,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 451.42563 ± 187.647
2025-05-11 17:30:34,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [564.134, 14.715372, 522.2295, 543.65375, 149.93648, 556.64795, 525.46533, 550.4245, 565.4618, 521.5872]
2025-05-11 17:30:34,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [183.0, 15.0, 177.0, 178.0, 72.0, 180.0, 174.0, 218.0, 184.0, 173.0]
2025-05-11 17:30:34,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 56 minutes, 17 seconds)
2025-05-11 17:33:18,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:33:20,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 516.29858 ± 134.572
2025-05-11 17:33:20,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [129.41908, 470.8778, 599.8183, 528.4698, 562.919, 558.6177, 561.6358, 571.6012, 555.5613, 624.06573]
2025-05-11 17:33:20,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 175.0, 198.0, 175.0, 185.0, 183.0, 183.0, 187.0, 183.0, 199.0]
2025-05-11 17:33:20,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 52 minutes, 9 seconds)
2025-05-11 17:36:03,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:36:04,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 438.87939 ± 190.221
2025-05-11 17:36:04,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [549.85504, 84.428894, 739.4654, 573.0664, 569.32385, 473.71918, 519.484, 215.24075, 237.26814, 426.9422]
2025-05-11 17:36:04,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [182.0, 80.0, 260.0, 186.0, 183.0, 162.0, 182.0, 94.0, 100.0, 143.0]
2025-05-11 17:36:04,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 48 minutes, 5 seconds)
2025-05-11 17:38:47,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:38:49,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 488.80460 ± 192.644
2025-05-11 17:38:49,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [615.06287, 580.882, 8.172079, 664.1417, 345.80176, 557.66125, 613.6245, 578.65955, 593.6761, 330.36432]
2025-05-11 17:38:49,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 187.0, 9.0, 234.0, 122.0, 188.0, 196.0, 187.0, 193.0, 152.0]
2025-05-11 17:38:49,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 44 minutes, 27 seconds)
2025-05-11 17:41:34,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:41:35,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 326.54858 ± 251.432
2025-05-11 17:41:35,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [535.17096, 424.7183, 48.271923, 607.9101, 557.1314, 25.310894, 38.74363, 626.20215, 393.83734, 8.188985]
2025-05-11 17:41:35,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 150.0, 34.0, 209.0, 183.0, 35.0, 42.0, 214.0, 144.0, 14.0]
2025-05-11 17:41:35,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 42 minutes, 16 seconds)
2025-05-11 17:44:17,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:44:19,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 512.35925 ± 125.628
2025-05-11 17:44:19,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [662.16644, 608.2613, 614.232, 238.67116, 564.09534, 471.2205, 378.96237, 629.27875, 441.2386, 515.46655]
2025-05-11 17:44:19,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [240.0, 194.0, 198.0, 101.0, 198.0, 162.0, 142.0, 214.0, 158.0, 184.0]
2025-05-11 17:44:19,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 39 minutes, 26 seconds)
2025-05-11 17:47:02,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:47:03,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 407.59430 ± 217.221
2025-05-11 17:47:03,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [100.92378, 629.0615, 615.52545, 157.97447, 492.93164, 627.0714, 64.68422, 577.2708, 301.88104, 508.6184]
2025-05-11 17:47:03,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [56.0, 214.0, 197.0, 74.0, 178.0, 203.0, 42.0, 189.0, 118.0, 183.0]
2025-05-11 17:47:03,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 36 minutes, 22 seconds)
2025-05-11 17:49:48,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:49:50,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 425.97003 ± 199.639
2025-05-11 17:49:50,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [147.73662, 527.87476, 594.61835, 591.71674, 555.5594, 181.22345, 456.28104, 59.69828, 580.7406, 564.2511]
2025-05-11 17:49:50,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 175.0, 204.0, 191.0, 182.0, 87.0, 181.0, 61.0, 187.0, 184.0]
2025-05-11 17:49:50,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 34 minutes, 2 seconds)
2025-05-11 17:52:32,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:52:34,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 497.65045 ± 220.420
2025-05-11 17:52:34,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [664.2323, 629.0036, 93.038994, 624.4392, 60.449947, 620.7849, 452.17847, 513.28754, 656.2637, 662.82556]
2025-05-11 17:52:34,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [211.0, 201.0, 53.0, 201.0, 40.0, 200.0, 160.0, 199.0, 209.0, 212.0]
2025-05-11 17:52:34,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 31 minutes, 12 seconds)
2025-05-11 17:55:18,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:55:19,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 482.90811 ± 237.401
2025-05-11 17:55:19,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [640.0566, 645.2903, 404.34708, 56.225945, 21.869568, 653.0217, 466.16696, 677.5991, 634.86725, 629.63605]
2025-05-11 17:55:19,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [204.0, 205.0, 151.0, 51.0, 27.0, 207.0, 165.0, 230.0, 202.0, 211.0]
2025-05-11 17:55:19,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 28 minutes, 22 seconds)
2025-05-11 17:58:01,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 17:58:02,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 453.99268 ± 218.247
2025-05-11 17:58:02,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [627.40076, 49.146603, 601.11334, 633.8114, 141.52869, 231.67415, 502.1234, 624.63324, 451.19254, 677.3025]
2025-05-11 17:58:02,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 34.0, 194.0, 204.0, 71.0, 98.0, 172.0, 203.0, 163.0, 235.0]
2025-05-11 17:58:02,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 25 minutes, 27 seconds)
2025-05-11 18:00:46,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:00:48,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 619.35498 ± 164.097
2025-05-11 18:00:48,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [613.7762, 652.8976, 624.4637, 429.1569, 622.8047, 623.4622, 375.86752, 1034.1519, 620.9907, 595.97845]
2025-05-11 18:00:48,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 224.0, 198.0, 147.0, 199.0, 198.0, 141.0, 391.0, 198.0, 193.0]
2025-05-11 18:00:48,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (619.35) for latency ExtremeClogL1U23
2025-05-11 18:00:48,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 18:00:48,520 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 18:00:48,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 22 minutes, 58 seconds)
2025-05-11 18:03:30,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:03:32,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 546.80920 ± 203.145
2025-05-11 18:03:32,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [654.32294, 674.031, 652.9989, 208.72247, 678.1556, 643.51575, 524.6281, 634.4205, 98.08257, 699.2139]
2025-05-11 18:03:32,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [209.0, 215.0, 211.0, 94.0, 247.0, 207.0, 182.0, 202.0, 55.0, 238.0]
2025-05-11 18:03:32,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 19 minutes, 46 seconds)
2025-05-11 18:06:17,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:06:18,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 383.89056 ± 209.998
2025-05-11 18:06:18,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [626.1995, 87.69141, 641.6293, 608.452, 461.8744, 311.03885, 221.70697, 354.35864, 28.582506, 497.3724]
2025-05-11 18:06:18,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [205.0, 52.0, 204.0, 194.0, 163.0, 144.0, 98.0, 157.0, 24.0, 161.0]
2025-05-11 18:06:18,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 17 minutes, 27 seconds)
2025-05-11 18:09:00,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:09:02,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 479.47217 ± 226.679
2025-05-11 18:09:02,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [647.0833, 554.57306, 643.2387, 273.25894, 697.5075, 217.681, 656.2158, 9.576713, 676.8451, 418.74194]
2025-05-11 18:09:02,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [207.0, 183.0, 205.0, 112.0, 223.0, 106.0, 215.0, 10.0, 216.0, 150.0]
2025-05-11 18:09:02,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 14 minutes, 22 seconds)
2025-05-11 18:11:44,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:11:45,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 534.21130 ± 185.400
2025-05-11 18:11:45,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [687.937, 435.17847, 586.3085, 625.0258, 639.7401, 32.5057, 653.22534, 601.3615, 446.24545, 634.58527]
2025-05-11 18:11:45,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [218.0, 154.0, 202.0, 200.0, 202.0, 26.0, 207.0, 196.0, 179.0, 202.0]
2025-05-11 18:11:45,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 11 minutes, 42 seconds)
2025-05-11 18:14:32,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:14:34,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 483.70328 ± 256.576
2025-05-11 18:14:34,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [632.58484, 38.06162, 75.87407, 652.2265, 655.72375, 661.70355, 170.79904, 670.3289, 635.4185, 644.3118]
2025-05-11 18:14:34,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [205.0, 31.0, 63.0, 208.0, 208.0, 209.0, 79.0, 215.0, 215.0, 206.0]
2025-05-11 18:14:34,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 9 minutes, 23 seconds)
2025-05-11 18:17:15,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:17:17,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 394.10791 ± 261.186
2025-05-11 18:17:17,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [617.06824, 622.65985, 8.197696, 35.90148, 628.885, 325.89716, 587.96936, 9.439796, 491.40295, 613.65765]
2025-05-11 18:17:17,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [199.0, 216.0, 9.0, 42.0, 226.0, 118.0, 193.0, 10.0, 159.0, 196.0]
2025-05-11 18:17:17,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 6 minutes, 30 seconds)
2025-05-11 18:20:04,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:20:05,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 333.21332 ± 291.075
2025-05-11 18:20:05,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [674.4256, 96.73128, 34.230812, 44.905807, 683.58527, 297.32623, 659.35223, 92.25406, 699.0614, 50.260803]
2025-05-11 18:20:05,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [224.0, 54.0, 35.0, 51.0, 241.0, 119.0, 221.0, 53.0, 252.0, 35.0]
2025-05-11 18:20:05,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 4 minutes)
2025-05-11 18:22:53,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:22:55,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 492.40244 ± 187.981
2025-05-11 18:22:55,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [621.6669, 174.8284, 650.95154, 608.8207, 584.70026, 624.34906, 354.84445, 126.78824, 595.7768, 581.2979]
2025-05-11 18:22:55,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [199.0, 79.0, 218.0, 197.0, 191.0, 201.0, 139.0, 66.0, 192.0, 188.0]
2025-05-11 18:22:55,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 2 minutes, 8 seconds)
2025-05-11 18:25:44,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:25:46,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 581.53601 ± 46.969
2025-05-11 18:25:46,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [548.2233, 527.0763, 641.8682, 587.1032, 615.82, 603.13184, 619.17426, 491.50772, 552.1161, 629.3395]
2025-05-11 18:25:46,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [183.0, 175.0, 206.0, 191.0, 197.0, 193.0, 198.0, 160.0, 186.0, 201.0]
2025-05-11 18:25:46,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 25 seconds)
2025-05-11 18:28:29,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:28:32,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 642.54010 ± 420.035
2025-05-11 18:28:32,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [770.74835, 1498.0448, 424.5423, 135.66856, 1188.9095, 236.10263, 207.19458, 683.02264, 808.87665, 472.29114]
2025-05-11 18:28:32,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [256.0, 488.0, 161.0, 78.0, 369.0, 109.0, 88.0, 229.0, 254.0, 173.0]
2025-05-11 18:28:32,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (642.54) for latency ExtremeClogL1U23
2025-05-11 18:28:32,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 18:28:32,397 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 18:28:32,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 57 minutes, 19 seconds)
2025-05-11 18:31:22,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:31:24,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 441.59409 ± 153.659
2025-05-11 18:31:24,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [625.7788, 383.09805, 531.21686, 385.15747, 136.39995, 603.0803, 426.49326, 287.53357, 646.07855, 391.1041]
2025-05-11 18:31:24,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 136.0, 194.0, 153.0, 70.0, 194.0, 165.0, 159.0, 209.0, 158.0]
2025-05-11 18:31:24,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 55 minutes, 47 seconds)
2025-05-11 18:34:07,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:34:09,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 415.68726 ± 250.213
2025-05-11 18:34:09,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [486.57224, 254.7302, 11.976896, 171.52777, 623.0843, 652.9084, 655.28827, 406.56662, 774.0515, 120.16634]
2025-05-11 18:34:09,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [178.0, 119.0, 14.0, 118.0, 201.0, 208.0, 226.0, 151.0, 250.0, 62.0]
2025-05-11 18:34:09,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 52 minutes, 31 seconds)
2025-05-11 18:36:55,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:36:57,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 431.92188 ± 241.355
2025-05-11 18:36:57,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [44.2659, 681.9765, 663.36707, 634.5313, 154.17006, 198.5356, 283.49286, 651.3839, 681.1923, 326.3034]
2025-05-11 18:36:57,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 238.0, 210.0, 202.0, 75.0, 102.0, 114.0, 208.0, 215.0, 127.0]
2025-05-11 18:36:57,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 49 minutes, 25 seconds)
2025-05-11 18:39:37,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:39:39,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 428.84113 ± 247.977
2025-05-11 18:39:39,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [612.43506, 348.6484, 330.8993, 738.55334, 653.063, 202.41428, 66.45494, 557.05365, 60.156563, 718.7324]
2025-05-11 18:39:39,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 133.0, 130.0, 230.0, 208.0, 94.0, 52.0, 196.0, 54.0, 229.0]
2025-05-11 18:39:39,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 45 minutes, 31 seconds)
2025-05-11 18:42:20,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:42:22,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 530.48383 ± 217.894
2025-05-11 18:42:22,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [557.72107, 668.0616, 665.0813, 56.54709, 661.1385, 714.6506, 167.8871, 512.30273, 632.47845, 668.9696]
2025-05-11 18:42:22,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 235.0, 211.0, 38.0, 226.0, 255.0, 79.0, 163.0, 204.0, 213.0]
2025-05-11 18:42:22,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 42 minutes, 24 seconds)
2025-05-11 18:45:06,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:45:07,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 488.90021 ± 205.719
2025-05-11 18:45:07,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [611.9772, 577.1553, 451.18008, 591.8623, 79.47964, 678.98193, 577.3137, 637.29144, 577.9071, 105.85309]
2025-05-11 18:45:07,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 187.0, 153.0, 191.0, 48.0, 237.0, 185.0, 204.0, 188.0, 56.0]
2025-05-11 18:45:07,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 38 minutes, 46 seconds)
2025-05-11 18:47:49,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:47:50,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 482.33853 ± 189.055
2025-05-11 18:47:50,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [295.7481, 609.82825, 651.7908, 649.44305, 297.14407, 670.16364, 174.12064, 629.1594, 591.124, 254.86336]
2025-05-11 18:47:50,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [111.0, 196.0, 207.0, 209.0, 127.0, 222.0, 92.0, 202.0, 198.0, 106.0]
2025-05-11 18:47:50,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 35 minutes, 49 seconds)
2025-05-11 18:50:37,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:50:38,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 486.74237 ± 236.717
2025-05-11 18:50:38,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [725.3653, 562.18207, 481.9658, 54.28859, 656.73444, 528.76056, 27.294199, 679.161, 486.32797, 665.3436]
2025-05-11 18:50:38,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [229.0, 183.0, 167.0, 58.0, 213.0, 201.0, 32.0, 253.0, 167.0, 212.0]
2025-05-11 18:50:38,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 33 minutes, 8 seconds)
2025-05-11 18:53:24,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:53:26,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 418.44635 ± 243.037
2025-05-11 18:53:26,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [651.0702, 93.17166, 548.4303, 679.80255, 664.91235, 655.44476, 339.07376, 373.63168, 45.477737, 133.44829]
2025-05-11 18:53:26,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [207.0, 53.0, 181.0, 224.0, 213.0, 209.0, 131.0, 160.0, 45.0, 106.0]
2025-05-11 18:53:26,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 31 minutes)
2025-05-11 18:56:13,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:56:15,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 624.65979 ± 40.013
2025-05-11 18:56:15,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [632.82135, 651.79047, 653.0733, 616.2195, 643.2156, 624.4866, 649.0013, 509.88654, 639.31024, 626.7939]
2025-05-11 18:56:15,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [202.0, 208.0, 209.0, 198.0, 207.0, 199.0, 208.0, 173.0, 204.0, 201.0]
2025-05-11 18:56:16,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 28 minutes, 52 seconds)
2025-05-11 18:58:59,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 18:59:01,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 538.91229 ± 162.087
2025-05-11 18:59:01,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [616.783, 618.9844, 618.60486, 641.65796, 654.1234, 273.93323, 342.08713, 657.2791, 269.74786, 695.922]
2025-05-11 18:59:01,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [199.0, 198.0, 198.0, 205.0, 207.0, 112.0, 160.0, 209.0, 111.0, 220.0]
2025-05-11 18:59:01,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 26 minutes, 9 seconds)
2025-05-11 19:01:48,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:01:50,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 420.55817 ± 247.190
2025-05-11 19:01:50,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [500.86685, 641.075, 12.702675, 6.573788, 511.09125, 656.7406, 196.46701, 385.622, 652.428, 642.0142]
2025-05-11 19:01:50,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [171.0, 205.0, 15.0, 9.0, 174.0, 211.0, 88.0, 151.0, 209.0, 203.0]
2025-05-11 19:01:50,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 23 minutes, 55 seconds)
2025-05-11 19:04:35,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:04:37,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 491.22675 ± 231.175
2025-05-11 19:04:37,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [640.05237, 662.63135, 441.87756, 643.9431, 573.57654, 652.207, 5.8821945, 90.21126, 556.29486, 645.59143]
2025-05-11 19:04:37,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [204.0, 210.0, 153.0, 205.0, 186.0, 206.0, 8.0, 69.0, 181.0, 205.0]
2025-05-11 19:04:37,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 21 minutes, 1 second)
2025-05-11 19:07:26,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:07:27,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 402.51196 ± 253.028
2025-05-11 19:07:27,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [652.20355, 62.28929, 224.77824, 86.6865, 618.4592, 671.49866, 658.611, 652.61816, 219.35167, 178.62358]
2025-05-11 19:07:27,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [208.0, 40.0, 97.0, 50.0, 199.0, 215.0, 210.0, 206.0, 97.0, 85.0]
2025-05-11 19:07:27,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 18 minutes, 29 seconds)
2025-05-11 19:10:12,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:10:13,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 365.26361 ± 240.965
2025-05-11 19:10:13,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [56.090725, 665.8359, 421.56528, 632.4914, 175.69926, 538.3605, 63.2768, 69.60077, 390.63327, 639.0823]
2025-05-11 19:10:13,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [35.0, 239.0, 153.0, 202.0, 81.0, 184.0, 40.0, 75.0, 154.0, 205.0]
2025-05-11 19:10:13,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 15 minutes, 24 seconds)
2025-05-11 19:13:00,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:13:01,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 379.67526 ± 309.104
2025-05-11 19:13:01,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [631.67847, 862.7032, 651.35297, 107.96822, 135.94128, 629.99475, 625.6407, 13.922691, 94.674576, 42.875965]
2025-05-11 19:13:01,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [202.0, 327.0, 211.0, 57.0, 105.0, 201.0, 200.0, 15.0, 72.0, 49.0]
2025-05-11 19:13:01,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 12 minutes, 48 seconds)
2025-05-11 19:15:47,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:15:48,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 367.30844 ± 242.333
2025-05-11 19:15:48,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [612.4737, 326.5244, 220.96353, 632.012, 604.41815, 98.00973, 493.66098, 621.06116, 34.506645, 29.454224]
2025-05-11 19:15:48,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [197.0, 152.0, 130.0, 207.0, 196.0, 53.0, 173.0, 200.0, 28.0, 35.0]
2025-05-11 19:15:48,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 9 minutes, 54 seconds)
2025-05-11 19:18:36,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:18:37,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 401.17398 ± 214.360
2025-05-11 19:18:37,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [307.428, 207.06549, 655.69684, 608.44745, 502.09122, 93.5303, 646.1531, 243.10995, 612.854, 135.36333]
2025-05-11 19:18:37,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 94.0, 241.0, 197.0, 171.0, 64.0, 249.0, 102.0, 213.0, 67.0]
2025-05-11 19:18:37,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 7 minutes, 16 seconds)
2025-05-11 19:21:23,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:21:24,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 498.65414 ± 210.767
2025-05-11 19:21:24,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [600.90515, 598.72864, 585.2013, 97.617615, 656.29736, 604.3617, 609.8355, 61.50065, 604.36206, 567.732]
2025-05-11 19:21:24,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 193.0, 188.0, 54.0, 226.0, 195.0, 196.0, 40.0, 196.0, 194.0]
2025-05-11 19:21:24,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 4 minutes, 11 seconds)
2025-05-11 19:24:08,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:24:09,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 382.65704 ± 239.135
2025-05-11 19:24:09,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [624.55035, 531.4214, 688.54944, 58.953396, 398.73767, 287.9236, 37.134598, 464.6593, 647.53827, 87.10233]
2025-05-11 19:24:09,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [201.0, 181.0, 242.0, 40.0, 148.0, 115.0, 39.0, 164.0, 217.0, 51.0]
2025-05-11 19:24:09,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 1 minute, 16 seconds)
2025-05-11 19:26:57,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:26:59,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 536.23230 ± 187.492
2025-05-11 19:26:59,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [644.1964, 623.1576, 166.76071, 366.0871, 272.1621, 774.0013, 566.32416, 644.9147, 673.63226, 631.0871]
2025-05-11 19:26:59,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [207.0, 202.0, 81.0, 151.0, 109.0, 242.0, 189.0, 207.0, 254.0, 201.0]
2025-05-11 19:26:59,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 58 minutes, 40 seconds)
2025-05-11 19:29:46,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:29:48,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 402.94113 ± 261.436
2025-05-11 19:29:48,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [412.7755, 53.99944, 670.76886, 385.35654, 682.2439, 503.48444, 7.9595757, 36.066692, 636.8686, 639.88794]
2025-05-11 19:29:48,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 61.0, 237.0, 142.0, 230.0, 171.0, 9.0, 39.0, 206.0, 205.0]
2025-05-11 19:29:48,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 55 minutes, 58 seconds)
2025-05-11 19:32:38,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:32:40,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 511.45514 ± 185.021
2025-05-11 19:32:40,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [402.23236, 8.065075, 590.74316, 623.40735, 444.31958, 649.9927, 611.1681, 640.84186, 556.2732, 587.5076]
2025-05-11 19:32:40,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [143.0, 9.0, 192.0, 200.0, 153.0, 215.0, 196.0, 206.0, 184.0, 219.0]
2025-05-11 19:32:40,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 53 minutes, 20 seconds)
2025-05-11 19:35:30,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:35:32,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 516.14124 ± 196.109
2025-05-11 19:35:32,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [232.3456, 628.1412, 800.5713, 625.8395, 636.6436, 634.0171, 607.3562, 384.75473, 473.42007, 138.32326]
2025-05-11 19:35:32,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [137.0, 238.0, 261.0, 203.0, 204.0, 204.0, 196.0, 143.0, 176.0, 68.0]
2025-05-11 19:35:32,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 50 minutes, 51 seconds)
2025-05-11 19:38:22,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:38:24,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 512.90436 ± 169.660
2025-05-11 19:38:24,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [636.2254, 607.6238, 527.72784, 676.8455, 632.9607, 198.64536, 379.52252, 231.14775, 661.79144, 576.5532]
2025-05-11 19:38:24,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [203.0, 194.0, 178.0, 225.0, 204.0, 90.0, 142.0, 130.0, 237.0, 186.0]
2025-05-11 19:38:24,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 48 minutes, 26 seconds)
2025-05-11 19:41:12,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:41:13,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 479.43292 ± 241.828
2025-05-11 19:41:13,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [629.9792, 489.64554, 649.3758, 451.8485, 7.92636, 678.6158, 552.7628, 27.267303, 655.1539, 651.7537]
2025-05-11 19:41:13,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [222.0, 168.0, 208.0, 158.0, 9.0, 229.0, 182.0, 38.0, 211.0, 210.0]
2025-05-11 19:41:13,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 45 minutes, 32 seconds)
2025-05-11 19:44:03,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:44:04,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 493.89313 ± 212.824
2025-05-11 19:44:04,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [630.24536, 638.44324, 628.6167, 610.7179, 28.32849, 232.07954, 641.79974, 616.9896, 287.87357, 623.8369]
2025-05-11 19:44:04,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [204.0, 206.0, 201.0, 197.0, 26.0, 99.0, 205.0, 200.0, 113.0, 199.0]
2025-05-11 19:44:04,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 42 minutes, 49 seconds)
2025-05-11 19:46:54,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:46:56,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 463.08789 ± 227.733
2025-05-11 19:46:56,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [595.54504, 38.27605, 627.6483, 637.27637, 458.526, 614.8857, 43.015427, 622.4225, 630.86115, 362.42227]
2025-05-11 19:46:56,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 29.0, 202.0, 205.0, 162.0, 199.0, 25.0, 227.0, 203.0, 134.0]
2025-05-11 19:46:56,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 39 minutes, 58 seconds)
2025-05-11 19:49:45,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:49:47,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 461.69891 ± 175.398
2025-05-11 19:49:47,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [483.58954, 686.8331, 631.9777, 358.71225, 214.60486, 422.0821, 260.9947, 672.9179, 627.8825, 257.39435]
2025-05-11 19:49:47,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [193.0, 230.0, 203.0, 132.0, 94.0, 155.0, 110.0, 222.0, 209.0, 108.0]
2025-05-11 19:49:47,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 37 minutes, 2 seconds)
2025-05-11 19:52:39,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:52:41,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 490.03412 ± 237.321
2025-05-11 19:52:41,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [486.6033, 88.24189, 738.8383, 675.5292, 558.5298, 635.05005, 230.26424, 704.4032, 662.4156, 120.465744]
2025-05-11 19:52:41,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [172.0, 54.0, 251.0, 216.0, 189.0, 208.0, 106.0, 224.0, 212.0, 61.0]
2025-05-11 19:52:41,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 34 minutes, 16 seconds)
2025-05-11 19:55:27,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:55:29,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 469.21771 ± 211.604
2025-05-11 19:55:29,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [244.88649, 210.4253, 704.16235, 670.83527, 277.3664, 648.77234, 187.85641, 676.5656, 667.15155, 404.15582]
2025-05-11 19:55:29,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 110.0, 249.0, 215.0, 129.0, 206.0, 86.0, 216.0, 213.0, 150.0]
2025-05-11 19:55:29,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 31 minutes, 22 seconds)
2025-05-11 19:58:21,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 19:58:23,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 474.07635 ± 243.352
2025-05-11 19:58:23,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [122.50683, 698.02466, 674.6019, 444.09158, 107.088684, 746.4311, 663.79083, 432.62567, 174.69174, 676.91113]
2025-05-11 19:58:23,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 254.0, 217.0, 167.0, 59.0, 241.0, 213.0, 161.0, 83.0, 269.0]
2025-05-11 19:58:23,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 28 minutes, 36 seconds)
2025-05-11 20:01:12,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:01:14,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 496.36676 ± 177.582
2025-05-11 20:01:14,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [640.7055, 192.36609, 644.28424, 639.2983, 611.08307, 619.64154, 335.56842, 636.612, 434.17767, 209.9309]
2025-05-11 20:01:14,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [204.0, 90.0, 219.0, 205.0, 196.0, 200.0, 128.0, 206.0, 158.0, 93.0]
2025-05-11 20:01:14,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 25 minutes, 44 seconds)
2025-05-11 20:04:01,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:04:03,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 559.73157 ± 102.513
2025-05-11 20:04:03,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [660.08813, 633.8924, 654.6123, 600.4377, 661.9751, 455.2547, 443.30154, 443.27408, 404.85223, 639.6273]
2025-05-11 20:04:03,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [215.0, 205.0, 210.0, 214.0, 212.0, 182.0, 154.0, 157.0, 160.0, 207.0]
2025-05-11 20:04:03,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 22 minutes, 49 seconds)
2025-05-11 20:06:53,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:06:55,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 542.60785 ± 148.820
2025-05-11 20:06:55,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [529.0766, 657.2634, 638.6053, 664.1945, 258.76764, 388.63138, 333.14496, 654.61, 649.9908, 651.7938]
2025-05-11 20:06:55,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [183.0, 212.0, 204.0, 214.0, 108.0, 135.0, 129.0, 211.0, 208.0, 209.0]
2025-05-11 20:06:55,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 19 minutes, 55 seconds)
2025-05-11 20:09:45,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:09:47,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 406.99713 ± 289.284
2025-05-11 20:09:47,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [637.4013, 745.01843, 84.67876, 674.85223, 644.80963, 126.71534, 395.94397, 45.863293, 691.4753, 23.213144]
2025-05-11 20:09:47,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [208.0, 249.0, 50.0, 224.0, 207.0, 64.0, 148.0, 27.0, 239.0, 23.0]
2025-05-11 20:09:47,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 17 minutes, 9 seconds)
2025-05-11 20:12:36,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:12:38,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 517.00525 ± 227.332
2025-05-11 20:12:38,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [231.04941, 686.74786, 443.03217, 694.3696, 462.42987, 532.3325, 683.0457, 7.243786, 727.5448, 702.2564]
2025-05-11 20:12:38,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 260.0, 161.0, 221.0, 154.0, 188.0, 218.0, 9.0, 235.0, 223.0]
2025-05-11 20:12:38,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 14 minutes, 15 seconds)
2025-05-11 20:15:29,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:15:32,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 536.78937 ± 292.010
2025-05-11 20:15:32,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [698.6395, 576.6267, 483.1484, 1114.5505, 31.033314, 629.7421, 279.25467, 195.67062, 666.9889, 692.2387]
2025-05-11 20:15:32,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [232.0, 195.0, 187.0, 382.0, 26.0, 212.0, 116.0, 101.0, 215.0, 220.0]
2025-05-11 20:15:32,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 25 seconds)
2025-05-11 20:18:23,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:18:26,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 569.82428 ± 154.200
2025-05-11 20:18:26,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [536.7952, 631.1224, 660.28925, 297.37415, 618.6351, 742.73157, 657.9587, 631.96, 664.1409, 257.23538]
2025-05-11 20:18:26,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [178.0, 202.0, 210.0, 118.0, 214.0, 237.0, 209.0, 203.0, 212.0, 107.0]
2025-05-11 20:18:26,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 37 seconds)
2025-05-11 20:21:13,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:21:15,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 538.28906 ± 189.406
2025-05-11 20:21:15,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [661.419, 674.66455, 434.62268, 488.30353, 205.1124, 732.2887, 648.6971, 664.2947, 197.57071, 675.9173]
2025-05-11 20:21:15,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [213.0, 216.0, 173.0, 161.0, 94.0, 281.0, 210.0, 214.0, 89.0, 220.0]
2025-05-11 20:21:15,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 44 seconds)
2025-05-11 20:24:06,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:24:08,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 476.92496 ± 170.576
2025-05-11 20:24:08,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [606.5142, 508.5179, 631.2183, 623.48956, 96.4545, 367.76227, 630.5852, 333.34216, 369.86392, 601.5016]
2025-05-11 20:24:08,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 171.0, 204.0, 199.0, 54.0, 163.0, 201.0, 128.0, 153.0, 193.0]
2025-05-11 20:24:08,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 52 seconds)
2025-05-11 20:26:54,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:26:56,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 524.05310 ± 210.157
2025-05-11 20:26:56,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [692.9126, 177.39812, 653.65326, 138.49992, 740.5297, 603.503, 653.2545, 337.13974, 664.28375, 579.3562]
2025-05-11 20:26:56,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [247.0, 81.0, 225.0, 71.0, 250.0, 195.0, 208.0, 131.0, 245.0, 189.0]
2025-05-11 20:26:56,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1251 [DEBUG]: Training session finished
