2025-05-11 15:58:26,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem4
2025-05-11 15:58:26,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem4
2025-05-11 15:58:26,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x7fecdb9c5c70>}
2025-05-11 15:58:26,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1111 [DEBUG]: using device: cpu
2025-05-11 15:58:26,741 baseline-bpql-noisy-hopper:77 [WARNING]: args.assumed_delay != args.horizon: 4 != 24
2025-05-11 15:58:26,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1133 [INFO]: Creating new trainer
2025-05-11 15:58:26,752 baseline-bpql-noisy-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=23, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-11 15:58:26,752 baseline-bpql-noisy-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 15:58:27,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1194 [DEBUG]: Starting training session...
2025-05-11 15:58:27,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 1/100
2025-05-11 16:01:01,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:01:02,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 105.79926 ± 68.459
2025-05-11 16:01:02,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [40.736355, 48.93213, 87.60048, 188.65616, 43.74121, 199.49432, 169.35594, 193.23589, 39.234333, 47.005653]
2025-05-11 16:01:02,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 32.0, 47.0, 89.0, 29.0, 91.0, 79.0, 90.0, 27.0, 31.0]
2025-05-11 16:01:02,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (105.80) for latency MM1Queue_a033_s075
2025-05-11 16:01:02,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:01:02,342 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 16:01:02,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 16 minutes, 17 seconds)
2025-05-11 16:03:46,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:03:46,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 10.68825 ± 3.629
2025-05-11 16:03:46,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [21.538889, 9.971214, 9.564184, 9.672734, 8.905652, 9.408666, 9.540625, 9.167117, 9.291589, 9.821785]
2025-05-11 16:03:46,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 40.0, 39.0, 41.0, 38.0, 39.0, 40.0, 40.0, 39.0, 39.0]
2025-05-11 16:03:46,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 21 minutes, 16 seconds)
2025-05-11 16:06:38,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:06:40,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 204.70352 ± 155.233
2025-05-11 16:06:40,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [240.73064, 97.49046, 451.9666, 372.28806, 11.315885, 14.798969, 147.99655, 38.201073, 333.68292, 338.56403]
2025-05-11 16:06:40,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [205.0, 79.0, 288.0, 271.0, 12.0, 16.0, 116.0, 33.0, 290.0, 254.0]
2025-05-11 16:06:40,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (204.70) for latency MM1Queue_a033_s075
2025-05-11 16:06:40,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:06:40,255 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 16:06:40,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 25 minutes, 48 seconds)
2025-05-11 16:09:22,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:09:23,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 253.43570 ± 65.794
2025-05-11 16:09:23,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [297.95447, 223.6845, 296.62305, 209.39886, 257.23843, 281.84842, 304.56784, 79.62977, 296.33023, 287.0812]
2025-05-11 16:09:23,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 103.0, 131.0, 97.0, 121.0, 125.0, 131.0, 47.0, 126.0, 127.0]
2025-05-11 16:09:23,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (253.44) for latency MM1Queue_a033_s075
2025-05-11 16:09:23,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:09:23,688 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 16:09:23,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 22 minutes, 40 seconds)
2025-05-11 16:12:05,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:12:06,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 197.60036 ± 88.331
2025-05-11 16:12:06,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [153.00467, 352.85754, 133.26556, 125.20247, 314.49432, 322.34137, 175.68123, 128.74582, 122.620705, 147.78976]
2025-05-11 16:12:06,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 152.0, 79.0, 75.0, 146.0, 148.0, 99.0, 79.0, 74.0, 88.0]
2025-05-11 16:12:06,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 19 minutes, 32 seconds)
2025-05-11 16:14:50,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:14:55,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 407.90509 ± 178.997
2025-05-11 16:14:55,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [456.8205, 299.2693, 335.7077, 155.41933, 457.00186, 244.85658, 533.8966, 843.83075, 388.80563, 363.4427]
2025-05-11 16:14:55,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [297.0, 162.0, 305.0, 139.0, 409.0, 207.0, 472.0, 825.0, 353.0, 182.0]
2025-05-11 16:14:55,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1226 [INFO]: New best (407.91) for latency MM1Queue_a033_s075
2025-05-11 16:14:55,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:14:55,526 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 16:14:55,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 21 minutes, 3 seconds)
2025-05-11 16:17:45,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:17:48,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 301.05255 ± 155.133
2025-05-11 16:17:48,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [58.35373, 257.7888, 44.676765, 223.28448, 495.62033, 312.73087, 350.28186, 475.2569, 494.28604, 298.24567]
2025-05-11 16:17:48,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [48.0, 190.0, 36.0, 209.0, 350.0, 128.0, 325.0, 368.0, 447.0, 287.0]
2025-05-11 16:17:48,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 20 minutes, 52 seconds)
2025-05-11 16:20:35,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:20:36,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 260.84464 ± 89.795
2025-05-11 16:20:36,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [308.49786, 88.92375, 302.38846, 328.2372, 282.90146, 307.05463, 314.63644, 76.45976, 301.60208, 297.7445]
2025-05-11 16:20:36,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 55.0, 126.0, 145.0, 121.0, 127.0, 127.0, 48.0, 124.0, 132.0]
2025-05-11 16:20:36,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 16 minutes, 31 seconds)
2025-05-11 16:23:23,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:23:24,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 216.18242 ± 127.496
2025-05-11 16:23:24,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [313.7071, 46.066814, 324.04987, 45.857784, 308.8779, 319.14786, 36.895527, 123.597824, 346.6287, 296.99493]
2025-05-11 16:23:24,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 34.0, 152.0, 35.0, 125.0, 130.0, 30.0, 63.0, 130.0, 130.0]
2025-05-11 16:23:24,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 14 minutes, 59 seconds)
2025-05-11 16:26:11,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:26:12,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 239.15605 ± 102.387
2025-05-11 16:26:12,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [32.80646, 290.39087, 310.93555, 305.8046, 303.14645, 286.98093, 158.792, 306.86057, 318.7657, 77.0774]
2025-05-11 16:26:12,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 125.0, 132.0, 139.0, 128.0, 123.0, 83.0, 128.0, 135.0, 49.0]
2025-05-11 16:26:12,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 13 minutes, 50 seconds)
2025-05-11 16:28:59,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:29:00,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 230.37634 ± 72.433
2025-05-11 16:29:00,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [307.2502, 226.77245, 301.33945, 312.55014, 109.34788, 232.65674, 305.65857, 165.58795, 130.94705, 211.65309]
2025-05-11 16:29:00,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [123.0, 100.0, 121.0, 125.0, 59.0, 98.0, 122.0, 83.0, 68.0, 95.0]
2025-05-11 16:29:00,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 10 minutes, 41 seconds)
2025-05-11 16:31:46,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:31:48,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 205.82986 ± 102.990
2025-05-11 16:31:48,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [202.33727, 316.4371, 24.744797, 132.22586, 256.08826, 79.34684, 260.26926, 135.36693, 311.52563, 339.95663]
2025-05-11 16:31:48,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 128.0, 21.0, 70.0, 115.0, 49.0, 116.0, 72.0, 127.0, 145.0]
2025-05-11 16:31:48,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 6 minutes, 17 seconds)
2025-05-11 16:34:35,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:34:36,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 204.44691 ± 96.282
2025-05-11 16:34:36,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [168.6993, 306.81027, 234.82924, 193.9838, 308.99762, 44.470192, 316.88638, 98.13363, 282.65842, 89.00027]
2025-05-11 16:34:36,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 123.0, 103.0, 87.0, 125.0, 33.0, 138.0, 55.0, 116.0, 52.0]
2025-05-11 16:34:36,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 3 minutes, 31 seconds)
2025-05-11 16:37:24,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:37:25,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 274.03430 ± 95.740
2025-05-11 16:37:25,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [325.75888, 306.1823, 318.45245, 331.31332, 323.2397, 322.59125, 127.16111, 322.81314, 317.08554, 45.745384]
2025-05-11 16:37:25,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 128.0, 135.0, 145.0, 136.0, 139.0, 70.0, 134.0, 139.0, 33.0]
2025-05-11 16:37:25,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 1 minute, 15 seconds)
2025-05-11 16:40:14,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:40:15,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 261.49405 ± 97.937
2025-05-11 16:40:15,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [312.61588, 42.738716, 312.0739, 297.37097, 299.33273, 293.38275, 292.7776, 313.31744, 97.36335, 353.96725]
2025-05-11 16:40:15,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 32.0, 129.0, 120.0, 123.0, 119.0, 120.0, 125.0, 53.0, 132.0]
2025-05-11 16:40:15,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 58 minutes, 47 seconds)
2025-05-11 16:43:04,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:43:05,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 263.43454 ± 101.412
2025-05-11 16:43:05,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [70.392395, 321.65863, 305.31976, 321.6643, 328.1509, 320.03418, 314.5157, 331.1467, 57.385002, 264.07797]
2025-05-11 16:43:05,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 136.0, 126.0, 130.0, 139.0, 132.0, 129.0, 143.0, 42.0, 117.0]
2025-05-11 16:43:05,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 56 minutes, 43 seconds)
2025-05-11 16:45:52,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:45:53,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 284.97345 ± 52.572
2025-05-11 16:45:53,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [201.39291, 296.86154, 309.53055, 305.3054, 285.67523, 309.8889, 334.2007, 166.71709, 320.2474, 319.91467]
2025-05-11 16:45:53,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [92.0, 123.0, 141.0, 132.0, 137.0, 128.0, 152.0, 80.0, 134.0, 136.0]
2025-05-11 16:45:53,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 53 minutes, 54 seconds)
2025-05-11 16:48:40,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:48:41,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 176.89195 ± 113.194
2025-05-11 16:48:41,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [311.37473, 53.48154, 50.03456, 120.743904, 305.34338, 301.5245, 317.6694, 97.77796, 45.239403, 165.73012]
2025-05-11 16:48:41,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 31.0, 38.0, 64.0, 122.0, 123.0, 129.0, 56.0, 34.0, 79.0]
2025-05-11 16:48:41,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 50 minutes, 54 seconds)
2025-05-11 16:51:27,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:51:28,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 255.74431 ± 107.805
2025-05-11 16:51:28,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [343.53653, 317.6819, 184.74036, 316.5555, 320.2739, 323.89523, 311.60013, 324.8297, 70.71633, 43.613422]
2025-05-11 16:51:28,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [149.0, 131.0, 97.0, 135.0, 132.0, 131.0, 131.0, 131.0, 45.0, 34.0]
2025-05-11 16:51:28,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 47 minutes, 29 seconds)
2025-05-11 16:54:17,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:54:18,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 169.02457 ± 116.029
2025-05-11 16:54:18,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [12.718004, 313.19836, 80.47982, 230.09587, 324.8311, 28.541918, 322.8742, 146.21431, 71.25615, 160.03587]
2025-05-11 16:54:18,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 127.0, 48.0, 101.0, 132.0, 23.0, 131.0, 75.0, 44.0, 78.0]
2025-05-11 16:54:18,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 44 minutes, 51 seconds)
2025-05-11 16:57:03,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:57:05,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 261.85626 ± 105.613
2025-05-11 16:57:05,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [313.34824, 314.396, 304.8782, 316.97797, 320.0802, 321.8908, 60.16736, 42.208828, 301.5625, 323.0524]
2025-05-11 16:57:05,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 129.0, 122.0, 130.0, 138.0, 141.0, 41.0, 32.0, 121.0, 132.0]
2025-05-11 16:57:05,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 41 minutes, 1 second)
2025-05-11 16:59:54,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:59:55,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 221.12036 ± 92.744
2025-05-11 16:59:55,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [72.11914, 225.26265, 168.33714, 137.33475, 254.67937, 314.02158, 314.61484, 320.80396, 313.42685, 90.603615]
2025-05-11 16:59:55,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 102.0, 81.0, 71.0, 121.0, 127.0, 130.0, 133.0, 129.0, 53.0]
2025-05-11 16:59:55,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 38 minutes, 59 seconds)
2025-05-11 17:02:41,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:02:42,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 286.77048 ± 78.665
2025-05-11 17:02:42,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [320.95645, 52.82742, 300.8855, 312.40247, 325.71875, 300.69318, 321.77728, 317.14413, 293.5639, 321.7358]
2025-05-11 17:02:42,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 36.0, 127.0, 124.0, 134.0, 132.0, 135.0, 129.0, 134.0, 130.0]
2025-05-11 17:02:42,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 35 minutes, 59 seconds)
2025-05-11 17:05:29,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:05:30,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 205.95273 ± 130.788
2025-05-11 17:05:30,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [335.26526, 320.93954, 17.771748, 305.3119, 216.9029, 338.46448, 32.999954, 147.00903, 26.633013, 318.2296]
2025-05-11 17:05:30,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 135.0, 19.0, 127.0, 97.0, 140.0, 27.0, 75.0, 22.0, 130.0]
2025-05-11 17:05:30,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 33 minutes, 17 seconds)
2025-05-11 17:08:19,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:08:20,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 203.52628 ± 103.201
2025-05-11 17:08:20,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [312.30368, 180.12259, 192.01492, 178.17169, 300.96542, 36.01486, 31.261225, 318.18164, 172.97072, 313.256]
2025-05-11 17:08:20,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [125.0, 85.0, 88.0, 93.0, 122.0, 29.0, 25.0, 128.0, 81.0, 127.0]
2025-05-11 17:08:20,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 30 minutes, 23 seconds)
2025-05-11 17:11:06,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:11:07,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 164.18300 ± 125.338
2025-05-11 17:11:07,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [121.271225, 322.60632, 80.69312, 320.14487, 48.71057, 310.21494, 303.81714, 80.28364, 32.171272, 21.916851]
2025-05-11 17:11:07,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 133.0, 46.0, 132.0, 35.0, 128.0, 126.0, 46.0, 30.0, 18.0]
2025-05-11 17:11:07,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 27 minutes, 40 seconds)
2025-05-11 17:13:53,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:13:55,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 282.08701 ± 80.246
2025-05-11 17:13:55,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [311.2056, 320.75143, 307.40433, 91.76919, 331.67508, 331.84537, 330.19086, 158.88736, 325.64948, 311.4915]
2025-05-11 17:13:55,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 131.0, 126.0, 51.0, 131.0, 138.0, 135.0, 75.0, 133.0, 140.0]
2025-05-11 17:13:55,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 24 minutes, 13 seconds)
2025-05-11 17:16:43,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:16:44,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 277.82611 ± 80.520
2025-05-11 17:16:44,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [312.77362, 326.6316, 258.48126, 311.64832, 334.61874, 194.85367, 320.20306, 69.507965, 323.37564, 326.16708]
2025-05-11 17:16:44,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 134.0, 132.0, 128.0, 139.0, 90.0, 131.0, 43.0, 133.0, 136.0]
2025-05-11 17:16:44,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 22 minutes, 5 seconds)
2025-05-11 17:19:32,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:19:33,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 276.61145 ± 80.304
2025-05-11 17:19:33,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [121.59597, 118.35548, 330.6753, 264.6778, 322.8646, 318.43036, 326.71216, 323.28006, 314.8765, 324.6462]
2025-05-11 17:19:33,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 64.0, 138.0, 116.0, 131.0, 131.0, 133.0, 131.0, 129.0, 134.0]
2025-05-11 17:19:33,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 19 minutes, 35 seconds)
2025-05-11 17:22:20,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:22:21,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 291.79208 ± 88.633
2025-05-11 17:22:21,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [26.916986, 326.1378, 300.91486, 321.65567, 317.36523, 317.60355, 328.46216, 326.8952, 323.7775, 328.19162]
2025-05-11 17:22:21,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 135.0, 127.0, 141.0, 127.0, 130.0, 133.0, 135.0, 131.0, 136.0]
2025-05-11 17:22:21,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 16 minutes, 20 seconds)
2025-05-11 17:25:07,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:25:08,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 297.37219 ± 80.279
2025-05-11 17:25:08,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [326.31616, 309.7869, 324.87646, 321.28955, 321.88293, 57.622902, 320.98508, 324.464, 342.75546, 323.74243]
2025-05-11 17:25:08,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 118.0, 133.0, 132.0, 132.0, 40.0, 129.0, 133.0, 137.0, 133.0]
2025-05-11 17:25:08,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 13 minutes, 29 seconds)
2025-05-11 17:27:51,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:27:53,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 281.10968 ± 75.118
2025-05-11 17:27:53,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [321.2422, 123.217995, 318.87488, 319.25275, 320.54486, 139.26927, 318.0375, 320.17993, 307.75983, 322.71783]
2025-05-11 17:27:53,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 63.0, 128.0, 131.0, 132.0, 69.0, 128.0, 131.0, 127.0, 133.0]
2025-05-11 17:27:53,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 9 minutes, 55 seconds)
2025-05-11 17:30:36,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:30:38,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 285.53168 ± 61.061
2025-05-11 17:30:38,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [317.68842, 339.71994, 332.25693, 303.44077, 153.48328, 266.37802, 315.1491, 321.40878, 318.89615, 186.89528]
2025-05-11 17:30:38,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 138.0, 141.0, 139.0, 75.0, 115.0, 125.0, 132.0, 128.0, 102.0]
2025-05-11 17:30:38,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 6 minutes, 7 seconds)
2025-05-11 17:33:20,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:33:21,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 282.23297 ± 87.224
2025-05-11 17:33:21,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [30.5208, 244.405, 324.3096, 314.71796, 321.54233, 321.59583, 320.0775, 326.4028, 293.3572, 325.4007]
2025-05-11 17:33:21,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 106.0, 131.0, 128.0, 131.0, 131.0, 129.0, 134.0, 122.0, 131.0]
2025-05-11 17:33:21,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 2 minutes, 9 seconds)
2025-05-11 17:36:05,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:36:06,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 290.95486 ± 86.912
2025-05-11 17:36:06,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [30.814114, 324.01605, 330.09894, 318.1935, 325.2698, 307.58035, 315.35864, 316.85422, 318.5281, 322.835]
2025-05-11 17:36:06,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 132.0, 136.0, 128.0, 133.0, 126.0, 127.0, 130.0, 128.0, 133.0]
2025-05-11 17:36:06,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 58 minutes, 43 seconds)
2025-05-11 17:38:50,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:38:51,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 279.41406 ± 62.318
2025-05-11 17:38:51,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [336.24753, 218.01035, 320.23282, 298.47836, 149.22862, 309.05246, 323.39908, 319.4456, 198.01584, 322.03012]
2025-05-11 17:38:51,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 98.0, 131.0, 121.0, 74.0, 135.0, 130.0, 129.0, 91.0, 134.0]
2025-05-11 17:38:51,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 55 minutes, 35 seconds)
2025-05-11 17:41:32,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:41:34,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 300.52383 ± 65.281
2025-05-11 17:41:34,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [105.67728, 321.2126, 315.2239, 314.12067, 321.0623, 332.46088, 326.98297, 321.72543, 333.01404, 313.75827]
2025-05-11 17:41:34,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 129.0, 128.0, 128.0, 133.0, 140.0, 132.0, 131.0, 138.0, 128.0]
2025-05-11 17:41:34,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 52 minutes, 25 seconds)
2025-05-11 17:44:18,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:44:19,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 277.20868 ± 84.521
2025-05-11 17:44:19,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [313.4752, 323.20743, 305.3707, 50.770996, 189.9303, 319.2339, 316.35425, 321.25436, 312.01352, 320.47614]
2025-05-11 17:44:19,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 131.0, 128.0, 36.0, 86.0, 128.0, 130.0, 133.0, 129.0, 130.0]
2025-05-11 17:44:19,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 49 minutes, 42 seconds)
2025-05-11 17:47:01,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:47:02,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 280.13705 ± 68.442
2025-05-11 17:47:02,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [306.7938, 301.79968, 320.76147, 320.88345, 119.666695, 308.20697, 316.83035, 172.2436, 310.20026, 323.9844]
2025-05-11 17:47:02,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [121.0, 123.0, 130.0, 128.0, 61.0, 129.0, 127.0, 82.0, 125.0, 133.0]
2025-05-11 17:47:02,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 46 minutes, 54 seconds)
2025-05-11 17:49:46,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:49:48,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 256.73782 ± 85.778
2025-05-11 17:49:48,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [117.87048, 321.5518, 153.67096, 326.44797, 327.3192, 143.5351, 319.83035, 318.34634, 203.38226, 335.4236]
2025-05-11 17:49:48,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 132.0, 76.0, 134.0, 137.0, 71.0, 131.0, 129.0, 95.0, 144.0]
2025-05-11 17:49:48,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 44 minutes, 17 seconds)
2025-05-11 17:52:31,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:52:32,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 261.95175 ± 96.842
2025-05-11 17:52:32,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [76.72849, 314.7622, 324.599, 320.30176, 322.77106, 215.11673, 322.18637, 322.5328, 80.77115, 319.74783]
2025-05-11 17:52:32,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [48.0, 126.0, 129.0, 132.0, 130.0, 96.0, 131.0, 128.0, 49.0, 130.0]
2025-05-11 17:52:32,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 41 minutes, 26 seconds)
2025-05-11 17:55:14,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:55:15,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 299.09967 ± 75.928
2025-05-11 17:55:15,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [324.65356, 325.27237, 322.98724, 327.32236, 321.0724, 326.68677, 322.83377, 71.39312, 322.2456, 326.5293]
2025-05-11 17:55:15,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [136.0, 135.0, 133.0, 136.0, 131.0, 137.0, 131.0, 42.0, 131.0, 133.0]
2025-05-11 17:55:15,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 38 minutes, 51 seconds)
2025-05-11 17:57:56,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:57:58,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 299.08609 ± 72.210
2025-05-11 17:57:58,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [82.66981, 318.70142, 318.6167, 319.73764, 326.70544, 321.64355, 328.07217, 325.63123, 325.25864, 323.82452]
2025-05-11 17:57:58,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [47.0, 129.0, 128.0, 131.0, 135.0, 132.0, 134.0, 132.0, 132.0, 130.0]
2025-05-11 17:57:58,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 35 minutes, 36 seconds)
2025-05-11 18:00:41,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:00:42,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 306.07266 ± 42.390
2025-05-11 18:00:42,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [180.07504, 316.64667, 316.27994, 328.27216, 327.39792, 314.3737, 322.40042, 309.20245, 325.16617, 320.91177]
2025-05-11 18:00:42,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 127.0, 128.0, 133.0, 132.0, 125.0, 130.0, 127.0, 132.0, 129.0]
2025-05-11 18:00:42,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 33 minutes, 5 seconds)
2025-05-11 18:03:24,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:03:25,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 207.69722 ± 117.731
2025-05-11 18:03:25,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [123.269424, 316.00363, 60.291233, 167.68695, 314.92438, 323.41132, 23.69101, 102.04005, 322.5722, 323.08203]
2025-05-11 18:03:25,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 127.0, 37.0, 87.0, 128.0, 132.0, 25.0, 57.0, 130.0, 127.0]
2025-05-11 18:03:25,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 29 minutes, 52 seconds)
2025-05-11 18:06:09,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:06:10,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 316.08496 ± 7.694
2025-05-11 18:06:10,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [311.61487, 319.9447, 312.36832, 314.80017, 318.16296, 328.086, 301.35883, 321.1977, 308.0069, 325.30936]
2025-05-11 18:06:10,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 127.0, 123.0, 131.0, 127.0, 132.0, 123.0, 130.0, 126.0, 131.0]
2025-05-11 18:06:10,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 27 minutes, 15 seconds)
2025-05-11 18:08:51,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:08:52,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 278.95749 ± 70.837
2025-05-11 18:08:52,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [295.61966, 91.42391, 325.11285, 320.05408, 246.86015, 226.21896, 320.61026, 320.14386, 323.4753, 320.05582]
2025-05-11 18:08:52,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [125.0, 56.0, 132.0, 131.0, 109.0, 107.0, 130.0, 133.0, 132.0, 130.0]
2025-05-11 18:08:52,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 24 minutes, 22 seconds)
2025-05-11 18:11:35,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:11:36,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 248.44199 ± 90.091
2025-05-11 18:11:36,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [320.00027, 115.413574, 320.03882, 322.73474, 82.20912, 303.79642, 222.0272, 322.4119, 165.05034, 310.73758]
2025-05-11 18:11:36,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 60.0, 129.0, 131.0, 49.0, 120.0, 95.0, 129.0, 78.0, 124.0]
2025-05-11 18:11:36,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 21 minutes, 49 seconds)
2025-05-11 18:14:20,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:14:22,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 245.01791 ± 106.480
2025-05-11 18:14:22,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [165.15125, 323.87695, 319.26474, 349.95273, 318.5052, 324.7481, 319.93872, 106.9802, 187.84967, 33.91173]
2025-05-11 18:14:22,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 131.0, 129.0, 137.0, 126.0, 133.0, 130.0, 59.0, 89.0, 27.0]
2025-05-11 18:14:22,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 19 minutes, 18 seconds)
2025-05-11 18:17:04,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:17:06,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 265.05807 ± 86.343
2025-05-11 18:17:06,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [322.61603, 324.50485, 318.1345, 147.10173, 162.88565, 322.28674, 317.3186, 96.14181, 318.6313, 320.9598]
2025-05-11 18:17:06,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 129.0, 130.0, 72.0, 78.0, 130.0, 126.0, 55.0, 129.0, 131.0]
2025-05-11 18:17:06,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 16 minutes, 44 seconds)
2025-05-11 18:19:59,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:20:00,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 317.74677 ± 2.373
2025-05-11 18:20:00,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [313.41483, 317.57977, 321.0769, 315.4999, 318.457, 318.25513, 318.24387, 320.10324, 314.71835, 320.11832]
2025-05-11 18:20:00,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 128.0, 130.0, 127.0, 128.0, 129.0, 129.0, 129.0, 128.0, 129.0]
2025-05-11 18:20:00,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 15 minutes, 35 seconds)
2025-05-11 18:22:51,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:22:52,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 294.03244 ± 52.480
2025-05-11 18:22:52,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [316.57217, 318.19644, 264.6305, 319.16272, 318.8225, 322.34598, 145.1675, 294.99918, 318.59296, 321.83444]
2025-05-11 18:22:52,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 128.0, 114.0, 131.0, 128.0, 129.0, 70.0, 121.0, 129.0, 129.0]
2025-05-11 18:22:52,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 14 minutes, 22 seconds)
2025-05-11 18:25:45,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:25:46,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 213.27039 ± 132.141
2025-05-11 18:25:46,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [314.3127, 124.43805, 39.33462, 21.451138, 321.98178, 322.9394, 324.9738, 33.195164, 315.1872, 314.89005]
2025-05-11 18:25:46,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 65.0, 30.0, 22.0, 129.0, 130.0, 131.0, 26.0, 130.0, 127.0]
2025-05-11 18:25:46,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 13 minutes, 12 seconds)
2025-05-11 18:28:38,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:28:39,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 260.09222 ± 83.280
2025-05-11 18:28:39,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [310.1627, 101.552315, 317.97543, 310.53955, 313.94998, 316.40005, 313.5347, 314.7853, 158.13266, 143.88977]
2025-05-11 18:28:39,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [124.0, 57.0, 128.0, 124.0, 126.0, 126.0, 125.0, 126.0, 75.0, 70.0]
2025-05-11 18:28:39,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 11 minutes, 26 seconds)
2025-05-11 18:31:30,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:31:32,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 285.81976 ± 71.549
2025-05-11 18:31:32,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [106.75159, 322.05798, 188.19405, 323.73508, 317.79672, 319.9877, 318.79648, 320.97528, 321.18588, 318.717]
2025-05-11 18:31:32,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [57.0, 129.0, 88.0, 132.0, 128.0, 130.0, 129.0, 129.0, 131.0, 128.0]
2025-05-11 18:31:32,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 9 minutes, 54 seconds)
2025-05-11 18:34:20,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:34:21,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 287.05759 ± 88.727
2025-05-11 18:34:21,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [326.6292, 328.49374, 328.41913, 28.833399, 313.55334, 338.48322, 316.6009, 316.49728, 317.88205, 255.18349]
2025-05-11 18:34:21,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 134.0, 138.0, 24.0, 128.0, 145.0, 126.0, 125.0, 126.0, 110.0]
2025-05-11 18:34:21,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 6 minutes, 13 seconds)
2025-05-11 18:37:04,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:37:06,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 291.34735 ± 88.032
2025-05-11 18:37:06,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [326.32245, 322.76343, 312.57272, 323.5271, 319.53976, 27.505247, 319.04565, 316.10873, 322.03845, 324.04962]
2025-05-11 18:37:06,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 130.0, 125.0, 131.0, 129.0, 24.0, 133.0, 127.0, 130.0, 133.0]
2025-05-11 18:37:06,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 2 minutes, 19 seconds)
2025-05-11 18:39:50,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:39:51,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 302.33282 ± 35.911
2025-05-11 18:39:51,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [279.4107, 317.13025, 309.18036, 201.12297, 319.331, 316.74457, 319.6728, 323.21786, 325.3857, 312.13217]
2025-05-11 18:39:51,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 129.0, 125.0, 93.0, 129.0, 129.0, 130.0, 132.0, 132.0, 127.0]
2025-05-11 18:39:51,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 58 minutes, 16 seconds)
2025-05-11 18:42:31,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:42:33,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 321.00204 ± 4.394
2025-05-11 18:42:33,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [319.6045, 323.38947, 315.88794, 320.8126, 322.4648, 323.06366, 323.6995, 311.3219, 328.17673, 321.5995]
2025-05-11 18:42:33,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 133.0, 126.0, 130.0, 130.0, 130.0, 130.0, 123.0, 134.0, 128.0]
2025-05-11 18:42:33,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 53 minutes, 56 seconds)
2025-05-11 18:45:13,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:45:15,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 317.18109 ± 27.595
2025-05-11 18:45:15,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [331.06607, 235.2857, 322.31863, 318.87366, 332.22314, 327.322, 325.66245, 324.86218, 323.26492, 330.9323]
2025-05-11 18:45:15,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 103.0, 131.0, 129.0, 134.0, 132.0, 135.0, 131.0, 130.0, 132.0]
2025-05-11 18:45:15,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 49 minutes, 45 seconds)
2025-05-11 18:47:58,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:48:00,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 234.94421 ± 112.112
2025-05-11 18:48:00,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [325.49603, 86.16331, 321.64914, 324.42053, 104.50236, 323.7975, 181.65839, 317.2962, 323.1968, 41.26188]
2025-05-11 18:48:00,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 51.0, 130.0, 131.0, 58.0, 134.0, 86.0, 125.0, 130.0, 32.0]
2025-05-11 18:48:00,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 46 minutes, 24 seconds)
2025-05-11 18:50:47,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:50:48,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 263.15356 ± 102.906
2025-05-11 18:50:48,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [319.93298, 38.483093, 313.7949, 296.82336, 78.54964, 311.84528, 318.8378, 319.1879, 317.35104, 316.72958]
2025-05-11 18:50:48,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 30.0, 125.0, 116.0, 49.0, 124.0, 128.0, 128.0, 128.0, 128.0]
2025-05-11 18:50:48,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 44 minutes, 11 seconds)
2025-05-11 18:53:37,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:53:38,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 252.73613 ± 101.938
2025-05-11 18:53:38,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [319.50827, 320.52866, 140.42368, 318.5986, 311.3525, 91.550255, 65.76383, 318.42386, 319.25018, 321.9614]
2025-05-11 18:53:38,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 130.0, 69.0, 128.0, 125.0, 52.0, 43.0, 129.0, 128.0, 130.0]
2025-05-11 18:53:38,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 42 minutes)
2025-05-11 18:56:24,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:56:25,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 301.96237 ± 58.082
2025-05-11 18:56:25,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [315.13824, 320.73865, 322.56116, 324.30225, 321.1205, 322.08228, 324.08365, 321.90475, 319.82224, 127.870026]
2025-05-11 18:56:25,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 132.0, 130.0, 132.0, 129.0, 133.0, 133.0, 131.0, 129.0, 67.0]
2025-05-11 18:56:25,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 39 minutes, 53 seconds)
2025-05-11 18:59:14,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:59:16,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 243.13525 ± 103.941
2025-05-11 18:59:16,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [168.07748, 329.412, 32.982613, 318.10782, 325.10892, 325.63913, 142.65393, 147.79276, 334.95596, 306.62173]
2025-05-11 18:59:16,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 133.0, 27.0, 127.0, 132.0, 131.0, 72.0, 71.0, 145.0, 122.0]
2025-05-11 18:59:16,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 38 minutes, 6 seconds)
2025-05-11 19:02:01,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:02:03,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 305.50513 ± 59.996
2025-05-11 19:02:03,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [320.7354, 325.95728, 328.25763, 322.62515, 332.5325, 325.6812, 125.82759, 328.83234, 320.88, 323.7222]
2025-05-11 19:02:03,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 130.0, 132.0, 132.0, 134.0, 132.0, 66.0, 133.0, 129.0, 131.0]
2025-05-11 19:02:03,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 35 minutes, 34 seconds)
2025-05-11 19:04:51,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:04:52,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 326.02402 ± 1.511
2025-05-11 19:04:52,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [322.3498, 325.4603, 327.1319, 326.08286, 327.11148, 324.7095, 328.00174, 326.24896, 326.17905, 326.96472]
2025-05-11 19:04:52,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 132.0, 132.0, 132.0, 132.0, 130.0, 132.0, 132.0, 133.0, 131.0]
2025-05-11 19:04:52,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 32 minutes, 51 seconds)
2025-05-11 19:07:39,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:07:41,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 322.57077 ± 4.908
2025-05-11 19:07:41,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [319.62463, 334.19418, 322.70193, 321.3494, 322.18823, 323.86047, 319.1091, 319.31586, 315.78055, 327.58353]
2025-05-11 19:07:41,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 132.0, 129.0, 129.0, 131.0, 130.0, 128.0, 128.0, 126.0, 131.0]
2025-05-11 19:07:41,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 29 minutes, 54 seconds)
2025-05-11 19:10:28,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:10:30,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 305.76654 ± 46.283
2025-05-11 19:10:30,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [324.6124, 318.36758, 318.0713, 326.68356, 320.7243, 318.9975, 320.33078, 318.77847, 323.92624, 167.1734]
2025-05-11 19:10:30,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 129.0, 130.0, 132.0, 130.0, 128.0, 130.0, 129.0, 130.0, 79.0]
2025-05-11 19:10:30,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 27 minutes, 17 seconds)
2025-05-11 19:13:19,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:13:21,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 300.70206 ± 56.040
2025-05-11 19:13:21,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [321.7787, 323.58707, 325.1845, 256.34897, 320.84906, 323.5196, 326.22885, 145.02203, 326.52368, 337.9782]
2025-05-11 19:13:21,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 131.0, 140.0, 104.0, 132.0, 129.0, 132.0, 74.0, 134.0, 137.0]
2025-05-11 19:13:21,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 24 minutes, 29 seconds)
2025-05-11 19:16:07,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:16:09,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 287.01935 ± 74.140
2025-05-11 19:16:09,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [322.21893, 320.734, 322.01843, 83.97004, 219.4136, 319.2602, 321.08374, 319.6021, 319.695, 322.19724]
2025-05-11 19:16:09,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 129.0, 128.0, 50.0, 99.0, 128.0, 129.0, 134.0, 127.0, 131.0]
2025-05-11 19:16:09,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 21 minutes, 45 seconds)
2025-05-11 19:18:55,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:18:56,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 287.59195 ± 62.685
2025-05-11 19:18:56,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [323.67517, 324.06577, 317.2567, 321.44955, 179.85071, 307.52783, 146.98782, 318.0406, 321.62546, 315.43973]
2025-05-11 19:18:56,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 130.0, 128.0, 129.0, 84.0, 123.0, 71.0, 128.0, 130.0, 127.0]
2025-05-11 19:18:56,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 18 minutes, 45 seconds)
2025-05-11 19:21:41,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:21:43,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 313.80032 ± 24.299
2025-05-11 19:21:43,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [321.59692, 317.30707, 306.56253, 320.25937, 336.7929, 320.24646, 244.79854, 332.42548, 317.50577, 320.5081]
2025-05-11 19:21:43,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 130.0, 130.0, 129.0, 145.0, 129.0, 106.0, 144.0, 128.0, 130.0]
2025-05-11 19:21:43,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 15 minutes, 45 seconds)
2025-05-11 19:24:27,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:24:29,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 322.16193 ± 4.231
2025-05-11 19:24:29,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [319.67276, 318.25076, 315.74255, 330.55222, 321.23523, 323.73764, 318.67438, 327.36615, 323.97025, 322.41714]
2025-05-11 19:24:29,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 128.0, 138.0, 139.0, 129.0, 131.0, 129.0, 137.0, 130.0, 130.0]
2025-05-11 19:24:29,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 12 minutes, 42 seconds)
2025-05-11 19:27:16,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:27:17,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 304.78998 ± 44.302
2025-05-11 19:27:17,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [322.4945, 319.05127, 321.69977, 318.5195, 317.6999, 316.65002, 320.0989, 316.70593, 322.93768, 172.04233]
2025-05-11 19:27:17,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 129.0, 129.0, 129.0, 128.0, 126.0, 129.0, 129.0, 130.0, 79.0]
2025-05-11 19:27:17,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 9 minutes, 43 seconds)
2025-05-11 19:30:07,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:30:09,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 306.57471 ± 47.530
2025-05-11 19:30:09,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [320.46655, 325.1912, 165.08289, 325.19, 324.78928, 325.3008, 318.28015, 330.7425, 323.33444, 307.36914]
2025-05-11 19:30:09,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 131.0, 78.0, 130.0, 130.0, 131.0, 130.0, 135.0, 130.0, 128.0]
2025-05-11 19:30:09,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 7 minutes, 12 seconds)
2025-05-11 19:32:56,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:32:58,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 275.13580 ± 100.324
2025-05-11 19:32:58,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [53.907295, 324.53268, 324.7067, 96.99619, 328.9209, 324.32028, 323.19882, 325.56696, 321.78287, 327.42523]
2025-05-11 19:32:58,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [39.0, 131.0, 131.0, 53.0, 133.0, 130.0, 130.0, 130.0, 129.0, 133.0]
2025-05-11 19:32:58,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 4 minutes, 29 seconds)
2025-05-11 19:35:47,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:35:49,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 326.56622 ± 1.374
2025-05-11 19:35:49,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [326.1471, 323.57474, 325.00156, 326.48685, 327.65445, 326.49533, 328.1953, 327.7746, 327.96805, 326.3643]
2025-05-11 19:35:49,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 131.0, 132.0, 131.0, 131.0, 134.0, 134.0, 131.0, 131.0, 132.0]
2025-05-11 19:35:49,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 2 minutes, 1 second)
2025-05-11 19:38:37,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:38:38,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 297.14026 ± 40.734
2025-05-11 19:38:38,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [314.3275, 309.8302, 319.88367, 317.38156, 188.01389, 312.7333, 316.39514, 317.87625, 319.65088, 255.31041]
2025-05-11 19:38:38,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 129.0, 128.0, 125.0, 88.0, 124.0, 129.0, 129.0, 129.0, 109.0]
2025-05-11 19:38:38,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 59 minutes, 27 seconds)
2025-05-11 19:41:27,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:41:28,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 299.96710 ± 61.423
2025-05-11 19:41:28,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [322.63718, 320.1611, 321.52396, 325.37463, 306.89594, 316.55325, 319.60736, 319.00592, 331.35513, 116.556404]
2025-05-11 19:41:28,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 129.0, 129.0, 130.0, 123.0, 129.0, 129.0, 130.0, 135.0, 62.0]
2025-05-11 19:41:28,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 56 minutes, 43 seconds)
2025-05-11 19:44:16,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:44:18,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 313.61749 ± 22.123
2025-05-11 19:44:18,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [327.2323, 322.65305, 323.05435, 317.73056, 315.01187, 323.07745, 322.50906, 248.37141, 323.28644, 313.2484]
2025-05-11 19:44:18,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 130.0, 131.0, 130.0, 128.0, 132.0, 131.0, 108.0, 130.0, 124.0]
2025-05-11 19:44:18,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 53 minutes, 45 seconds)
2025-05-11 19:47:07,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:47:08,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 305.83954 ± 56.247
2025-05-11 19:47:08,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [324.93237, 324.30792, 328.6123, 325.3473, 326.33044, 143.49669, 328.77573, 282.44925, 347.87894, 326.2646]
2025-05-11 19:47:08,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 131.0, 133.0, 131.0, 132.0, 74.0, 138.0, 117.0, 153.0, 131.0]
2025-05-11 19:47:08,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 51 minutes, 3 seconds)
2025-05-11 19:49:56,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:49:58,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 304.29553 ± 44.057
2025-05-11 19:49:58,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [317.41235, 309.60623, 325.56552, 321.73834, 318.885, 320.46942, 172.69647, 322.1427, 315.22363, 319.21567]
2025-05-11 19:49:58,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 124.0, 131.0, 129.0, 130.0, 131.0, 82.0, 129.0, 128.0, 128.0]
2025-05-11 19:49:58,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 48 minutes, 7 seconds)
2025-05-11 19:52:48,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:52:49,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 318.02786 ± 12.076
2025-05-11 19:52:49,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [328.70364, 325.7967, 304.15457, 318.20453, 321.5903, 324.37073, 287.26273, 324.6189, 321.5097, 324.0667]
2025-05-11 19:52:49,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 132.0, 124.0, 127.0, 129.0, 130.0, 117.0, 130.0, 131.0, 132.0]
2025-05-11 19:52:49,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 45 minutes, 23 seconds)
2025-05-11 19:55:37,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:55:38,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 291.58215 ± 72.223
2025-05-11 19:55:38,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [78.35713, 325.4721, 290.2898, 322.27948, 320.64767, 289.8805, 322.37415, 322.96457, 321.87796, 321.6782]
2025-05-11 19:55:38,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [48.0, 129.0, 120.0, 129.0, 128.0, 122.0, 130.0, 134.0, 129.0, 130.0]
2025-05-11 19:55:38,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 42 minutes, 30 seconds)
2025-05-11 19:58:27,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:58:29,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 281.75223 ± 70.729
2025-05-11 19:58:29,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [325.45016, 303.74072, 321.93362, 321.97345, 140.31647, 141.17352, 312.42624, 317.30844, 316.5879, 316.61166]
2025-05-11 19:58:29,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 125.0, 133.0, 129.0, 71.0, 74.0, 126.0, 127.0, 127.0, 126.0]
2025-05-11 19:58:29,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 39 minutes, 42 seconds)
2025-05-11 20:01:18,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:01:20,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 312.35452 ± 17.178
2025-05-11 20:01:20,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [327.48865, 317.3928, 320.51227, 324.1142, 270.6496, 322.03793, 317.8055, 316.9978, 318.14246, 288.40396]
2025-05-11 20:01:20,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 127.0, 129.0, 132.0, 117.0, 130.0, 126.0, 128.0, 129.0, 119.0]
2025-05-11 20:01:20,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 36 minutes, 52 seconds)
2025-05-11 20:04:09,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:04:11,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 293.68936 ± 83.902
2025-05-11 20:04:11,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [318.83154, 324.4839, 327.684, 318.3474, 321.06448, 328.41708, 42.48583, 323.18704, 308.80597, 323.58633]
2025-05-11 20:04:11,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 130.0, 134.0, 127.0, 128.0, 134.0, 38.0, 129.0, 122.0, 132.0]
2025-05-11 20:04:11,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 34 minutes, 6 seconds)
2025-05-11 20:06:57,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:06:59,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 317.47137 ± 12.763
2025-05-11 20:06:59,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [280.13843, 323.93896, 320.41153, 325.54932, 324.946, 315.47516, 320.9021, 320.89883, 323.22046, 319.23282]
2025-05-11 20:06:59,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 132.0, 129.0, 131.0, 132.0, 126.0, 129.0, 130.0, 131.0, 131.0]
2025-05-11 20:06:59,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 31 minutes, 9 seconds)
2025-05-11 20:09:48,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:09:50,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 320.10278 ± 2.074
2025-05-11 20:09:50,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [317.0806, 317.63422, 321.2233, 322.58734, 323.31375, 318.5373, 320.35654, 322.33975, 318.9373, 319.01773]
2025-05-11 20:09:50,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 130.0, 129.0, 129.0, 129.0, 127.0, 129.0, 129.0, 129.0, 128.0]
2025-05-11 20:09:50,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 28 minutes, 22 seconds)
2025-05-11 20:12:38,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:12:40,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 285.71997 ± 80.224
2025-05-11 20:12:40,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [85.275154, 329.4625, 325.50626, 325.52695, 320.92935, 176.39336, 333.75317, 314.28656, 322.51755, 323.54883]
2025-05-11 20:12:40,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [53.0, 132.0, 131.0, 132.0, 130.0, 82.0, 143.0, 125.0, 130.0, 131.0]
2025-05-11 20:12:40,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 25 minutes, 31 seconds)
2025-05-11 20:15:27,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:15:29,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 296.82639 ± 52.957
2025-05-11 20:15:29,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [315.9735, 316.1433, 312.31323, 319.94116, 315.96692, 138.55461, 305.08334, 307.77933, 316.32495, 320.1835]
2025-05-11 20:15:29,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 126.0, 125.0, 127.0, 125.0, 69.0, 121.0, 124.0, 127.0, 127.0]
2025-05-11 20:15:29,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 22 minutes, 38 seconds)
2025-05-11 20:18:17,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:18:19,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 320.69037 ± 7.838
2025-05-11 20:18:19,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [325.7165, 320.70593, 324.4677, 324.5638, 324.80252, 321.07828, 298.57642, 326.96887, 322.60266, 317.42105]
2025-05-11 20:18:19,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 130.0, 131.0, 131.0, 131.0, 130.0, 127.0, 132.0, 131.0, 127.0]
2025-05-11 20:18:19,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 19 minutes, 47 seconds)
2025-05-11 20:21:06,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:21:08,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 314.80069 ± 13.711
2025-05-11 20:21:08,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [324.29117, 324.67993, 322.28113, 298.52118, 280.2515, 315.3581, 315.9772, 324.9275, 320.95636, 320.76294]
2025-05-11 20:21:08,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 133.0, 131.0, 119.0, 116.0, 127.0, 126.0, 131.0, 129.0, 131.0]
2025-05-11 20:21:08,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 16 minutes, 58 seconds)
2025-05-11 20:23:57,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:23:59,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 314.31860 ± 18.534
2025-05-11 20:23:59,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [325.47766, 321.38556, 319.9247, 305.8194, 319.1743, 320.71582, 323.33102, 319.7071, 326.5538, 261.09656]
2025-05-11 20:23:59,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 129.0, 130.0, 121.0, 129.0, 127.0, 129.0, 129.0, 133.0, 111.0]
2025-05-11 20:23:59,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 14 minutes, 8 seconds)
2025-05-11 20:26:47,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:26:49,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 323.74051 ± 2.945
2025-05-11 20:26:49,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [326.0036, 328.36478, 324.25198, 323.48804, 327.6761, 320.74597, 321.31412, 321.45575, 325.1039, 319.00067]
2025-05-11 20:26:49,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 132.0, 131.0, 130.0, 132.0, 127.0, 130.0, 129.0, 131.0, 128.0]
2025-05-11 20:26:49,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 19 seconds)
2025-05-11 20:29:35,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:29:37,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 311.01642 ± 36.637
2025-05-11 20:29:37,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [307.091, 325.37515, 323.2579, 202.74225, 328.33798, 316.3246, 326.91183, 328.42255, 324.4626, 327.23822]
2025-05-11 20:29:37,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 132.0, 128.0, 130.0, 133.0, 129.0, 133.0, 136.0, 132.0, 132.0]
2025-05-11 20:29:37,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 28 seconds)
2025-05-11 20:32:23,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:32:24,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 305.04102 ± 54.670
2025-05-11 20:32:24,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [141.4889, 323.2074, 323.85306, 325.5602, 323.40793, 324.69257, 311.72833, 322.76883, 326.8787, 326.82434]
2025-05-11 20:32:24,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 129.0, 129.0, 131.0, 129.0, 129.0, 126.0, 128.0, 131.0, 132.0]
2025-05-11 20:32:24,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 38 seconds)
2025-05-11 20:35:12,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:35:13,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 326.19873 ± 2.019
2025-05-11 20:35:13,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [325.93732, 324.76, 327.43298, 326.78806, 330.46454, 327.46307, 326.2735, 324.8524, 325.67358, 322.3417]
2025-05-11 20:35:13,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 131.0, 133.0, 133.0, 134.0, 132.0, 133.0, 131.0, 132.0, 131.0]
2025-05-11 20:35:13,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 49 seconds)
2025-05-11 20:37:59,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:38:00,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1221 [DEBUG]: Total Reward: 322.32227 ± 3.506
2025-05-11 20:38:00,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1222 [DEBUG]: All rewards: [321.5977, 316.0462, 323.80823, 323.33694, 324.4407, 324.05508, 320.75516, 320.9892, 329.65283, 318.54062]
2025-05-11 20:38:00,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 125.0, 130.0, 129.0, 131.0, 130.0, 129.0, 128.0, 132.0, 131.0]
2025-05-11 20:38:00,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1251 [DEBUG]: Training session finished
