2025-08-07 09:32:12,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc5-hopper/MM1Queue_a033_s075-bpql-mem16
2025-08-07 09:32:12,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc5-hopper/MM1Queue_a033_s075-bpql-mem16
2025-08-07 09:32:12,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x152e0588fbd0>}
2025-08-07 09:32:12,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 09:32:12,059 baseline-bpql-noiseperc5-hopper:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 09:32:12,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1133 [INFO]: Creating new trainer
2025-08-07 09:32:12,075 baseline-bpql-noiseperc5-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=59, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 09:32:12,075 baseline-bpql-noiseperc5-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 09:32:13,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 09:32:13,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 09:33:44,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:33:45,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 52.42750 ± 1.015
2025-08-07 09:33:45,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [50.162914, 51.96867, 53.31687, 52.83283, 52.881508, 53.379784, 52.462296, 53.338238, 52.870472, 51.06141]
2025-08-07 09:33:45,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [37.0, 37.0, 35.0, 35.0, 36.0, 37.0, 34.0, 36.0, 36.0, 37.0]
2025-08-07 09:33:45,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (52.43) for latency MM1Queue_a033_s075
2025-08-07 09:33:45,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 30 minutes, 47 seconds)
2025-08-07 09:35:25,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:35:29,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 349.72748 ± 120.194
2025-08-07 09:35:29,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [459.36618, 421.94763, 306.83942, 380.23727, 407.82166, 396.25327, 362.23355, 450.0912, 286.27716, 26.207676]
2025-08-07 09:35:29,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [426.0, 382.0, 281.0, 359.0, 386.0, 356.0, 321.0, 414.0, 251.0, 27.0]
2025-08-07 09:35:29,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (349.73) for latency MM1Queue_a033_s075
2025-08-07 09:35:29,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 39 minutes, 47 seconds)
2025-08-07 09:37:05,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:37:07,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 198.60010 ± 124.728
2025-08-07 09:37:07,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [224.62694, 87.98343, 111.54596, 394.22955, 45.861927, 102.71231, 166.43489, 121.8458, 394.7722, 335.988]
2025-08-07 09:37:07,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [222.0, 65.0, 127.0, 374.0, 39.0, 92.0, 151.0, 115.0, 370.0, 306.0]
2025-08-07 09:37:07,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 38 minutes, 12 seconds)
2025-08-07 09:38:46,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:38:54,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 691.83380 ± 342.867
2025-08-07 09:38:54,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [630.70154, 745.8342, 988.17334, 100.06294, 934.21844, 923.311, 695.2755, 21.856953, 786.0321, 1092.8721]
2025-08-07 09:38:54,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [441.0, 566.0, 1000.0, 111.0, 1000.0, 1000.0, 799.0, 24.0, 634.0, 1000.0]
2025-08-07 09:38:54,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (691.83) for latency MM1Queue_a033_s075
2025-08-07 09:38:54,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 40 minutes, 13 seconds)
2025-08-07 09:40:36,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:40:47,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 805.42670 ± 244.464
2025-08-07 09:40:47,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [949.3623, 963.5427, 294.71317, 930.63556, 936.07214, 344.68655, 923.0345, 860.3811, 935.0289, 916.81024]
2025-08-07 09:40:47,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 232.0, 1000.0, 1000.0, 236.0, 1000.0, 908.0, 1000.0, 1000.0]
2025-08-07 09:40:47,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (805.43) for latency MM1Queue_a033_s075
2025-08-07 09:40:47,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 42 minutes, 32 seconds)
2025-08-07 09:42:21,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:42:23,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 223.68611 ± 77.642
2025-08-07 09:42:23,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [80.69004, 190.85028, 190.807, 217.24033, 309.5463, 322.3421, 272.3782, 327.3361, 164.9625, 160.70827]
2025-08-07 09:42:23,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [49.0, 108.0, 104.0, 112.0, 160.0, 155.0, 131.0, 148.0, 86.0, 94.0]
2025-08-07 09:42:23,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 42 minutes, 15 seconds)
2025-08-07 09:44:02,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:44:03,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 250.11319 ± 72.044
2025-08-07 09:44:03,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [306.5415, 209.16891, 351.1116, 218.11974, 82.2107, 284.58463, 292.53873, 241.90108, 304.71637, 210.23856]
2025-08-07 09:44:03,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [174.0, 142.0, 212.0, 125.0, 56.0, 175.0, 159.0, 147.0, 180.0, 113.0]
2025-08-07 09:44:03,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 39 minutes, 27 seconds)
2025-08-07 09:45:43,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:45:46,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 370.84006 ± 358.667
2025-08-07 09:45:46,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [46.349598, 264.7933, 426.47345, 1046.5504, 341.01007, 1038.603, 311.47318, 68.18527, 51.028103, 113.93407]
2025-08-07 09:45:46,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [41.0, 150.0, 236.0, 1000.0, 188.0, 1000.0, 162.0, 54.0, 40.0, 86.0]
2025-08-07 09:45:46,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 39 minutes, 16 seconds)
2025-08-07 09:47:26,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:47:28,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 364.25742 ± 90.078
2025-08-07 09:47:28,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [485.08902, 303.74014, 303.50186, 324.57013, 534.0103, 253.1327, 293.79068, 465.38013, 334.84503, 344.51437]
2025-08-07 09:47:28,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [332.0, 150.0, 150.0, 167.0, 381.0, 143.0, 144.0, 284.0, 183.0, 195.0]
2025-08-07 09:47:28,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 35 minutes, 57 seconds)
2025-08-07 09:49:06,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:49:08,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 218.19168 ± 27.984
2025-08-07 09:49:08,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [271.99548, 182.07672, 261.01825, 207.14983, 184.25694, 220.31296, 207.6618, 200.02858, 226.72792, 220.68825]
2025-08-07 09:49:08,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 94.0, 125.0, 110.0, 96.0, 113.0, 108.0, 102.0, 111.0, 112.0]
2025-08-07 09:49:08,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 30 minutes, 14 seconds)
2025-08-07 09:50:46,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:50:48,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 174.46243 ± 56.084
2025-08-07 09:50:48,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [165.66739, 176.76886, 292.29962, 103.1406, 193.05103, 202.42477, 183.81766, 110.579926, 101.03841, 215.83597]
2025-08-07 09:50:48,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 146.0, 200.0, 72.0, 151.0, 149.0, 142.0, 80.0, 69.0, 163.0]
2025-08-07 09:50:48,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 29 minutes, 49 seconds)
2025-08-07 09:52:27,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:52:29,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 294.01270 ± 121.059
2025-08-07 09:52:29,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [337.2561, 42.348984, 358.05746, 65.42636, 358.46005, 343.40594, 345.1239, 351.04984, 393.36823, 345.63007]
2025-08-07 09:52:29,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [164.0, 37.0, 168.0, 48.0, 186.0, 157.0, 172.0, 186.0, 208.0, 177.0]
2025-08-07 09:52:29,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 28 minutes, 9 seconds)
2025-08-07 09:54:06,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:54:08,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 322.09283 ± 104.837
2025-08-07 09:54:08,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [107.83074, 376.54654, 385.88293, 132.06989, 331.54105, 426.6906, 362.71112, 325.7163, 387.2453, 384.69357]
2025-08-07 09:54:08,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 176.0, 179.0, 80.0, 166.0, 192.0, 166.0, 161.0, 186.0, 182.0]
2025-08-07 09:54:08,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 25 minutes, 29 seconds)
2025-08-07 09:55:49,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:55:51,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 396.53592 ± 66.821
2025-08-07 09:55:51,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [406.40332, 410.4908, 235.71721, 419.79398, 402.27094, 393.90363, 322.77448, 444.14468, 444.29645, 485.56366]
2025-08-07 09:55:51,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [195.0, 215.0, 138.0, 210.0, 200.0, 200.0, 169.0, 208.0, 222.0, 248.0]
2025-08-07 09:55:51,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 24 minutes, 9 seconds)
2025-08-07 09:57:34,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:57:37,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 485.15787 ± 163.715
2025-08-07 09:57:37,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [459.3107, 240.9114, 258.59094, 598.3278, 525.096, 777.2157, 483.8574, 355.06802, 469.31543, 683.8854]
2025-08-07 09:57:37,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [265.0, 138.0, 151.0, 298.0, 285.0, 420.0, 272.0, 181.0, 262.0, 318.0]
2025-08-07 09:57:37,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 24 minutes, 20 seconds)
2025-08-07 09:59:11,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:59:15,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 582.64417 ± 226.524
2025-08-07 09:59:15,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [770.4926, 66.05153, 508.94498, 558.13586, 578.6666, 1032.6383, 554.6502, 591.3226, 610.8405, 554.698]
2025-08-07 09:59:15,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [417.0, 51.0, 246.0, 316.0, 354.0, 387.0, 359.0, 355.0, 298.0, 275.0]
2025-08-07 09:59:15,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 21 minutes, 54 seconds)
2025-08-07 10:00:53,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:00:56,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 654.00171 ± 407.893
2025-08-07 10:00:56,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [717.98083, 309.35645, 159.9473, 315.22473, 766.6844, 1661.6937, 875.78937, 357.542, 773.0913, 602.70685]
2025-08-07 10:00:56,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [308.0, 152.0, 85.0, 155.0, 270.0, 675.0, 328.0, 158.0, 299.0, 229.0]
2025-08-07 10:00:56,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 20 minutes, 19 seconds)
2025-08-07 10:02:39,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:02:43,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 949.04395 ± 356.994
2025-08-07 10:02:43,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [344.48712, 825.9796, 925.00903, 764.5165, 779.88477, 1109.9349, 1187.9135, 939.1192, 811.0388, 1802.5555]
2025-08-07 10:02:43,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 308.0, 357.0, 275.0, 284.0, 435.0, 586.0, 369.0, 297.0, 659.0]
2025-08-07 10:02:43,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (949.04) for latency MM1Queue_a033_s075
2025-08-07 10:02:43,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 20 minutes, 46 seconds)
2025-08-07 10:04:22,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:04:27,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 866.90643 ± 684.383
2025-08-07 10:04:27,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1796.9298, 2411.8499, 769.1734, 591.2202, 966.19336, 295.29187, 889.49866, 196.64764, 178.17816, 574.0819]
2025-08-07 10:04:27,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [766.0, 1000.0, 299.0, 275.0, 365.0, 146.0, 448.0, 105.0, 96.0, 265.0]
2025-08-07 10:04:27,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 19 minutes, 13 seconds)
2025-08-07 10:06:03,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:06:06,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 742.72620 ± 486.018
2025-08-07 10:06:06,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [575.0836, 949.3984, 433.8719, 1221.1096, 229.86703, 369.25888, 965.36096, 1866.1688, 286.9971, 530.146]
2025-08-07 10:06:06,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [235.0, 326.0, 203.0, 457.0, 115.0, 156.0, 388.0, 700.0, 145.0, 204.0]
2025-08-07 10:06:06,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 15 minutes, 46 seconds)
2025-08-07 10:07:49,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:07:54,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1200.70642 ± 685.750
2025-08-07 10:07:54,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2179.1074, 438.86584, 708.16437, 1131.2317, 530.8965, 978.6873, 736.1195, 2227.903, 859.4684, 2216.62]
2025-08-07 10:07:54,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [787.0, 201.0, 293.0, 384.0, 254.0, 333.0, 304.0, 771.0, 329.0, 821.0]
2025-08-07 10:07:54,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (1200.71) for latency MM1Queue_a033_s075
2025-08-07 10:07:54,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 16 minutes, 48 seconds)
2025-08-07 10:09:30,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:09:32,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 464.37085 ± 282.283
2025-08-07 10:09:32,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [410.4983, 217.742, 213.44197, 647.33014, 198.10956, 208.27951, 863.60016, 307.0004, 559.22034, 1018.48615]
2025-08-07 10:09:32,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [171.0, 116.0, 120.0, 276.0, 110.0, 115.0, 308.0, 158.0, 249.0, 444.0]
2025-08-07 10:09:32,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 14 minutes, 12 seconds)
2025-08-07 10:11:16,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:11:21,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1135.26074 ± 730.302
2025-08-07 10:11:21,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1096.273, 191.22626, 1178.9873, 1160.8042, 2769.0376, 1636.2078, 1524.2034, 1020.88513, 730.4704, 44.51305]
2025-08-07 10:11:21,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [425.0, 102.0, 409.0, 384.0, 1000.0, 530.0, 533.0, 418.0, 245.0, 39.0]
2025-08-07 10:11:21,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 12 minutes, 55 seconds)
2025-08-07 10:12:57,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:13:01,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 703.66180 ± 483.881
2025-08-07 10:13:01,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [915.8596, 1010.65717, 1559.8964, 357.6857, 218.42159, 290.5995, 354.0884, 1465.2874, 648.17596, 215.94669]
2025-08-07 10:13:01,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [347.0, 414.0, 603.0, 172.0, 110.0, 148.0, 170.0, 563.0, 276.0, 112.0]
2025-08-07 10:13:01,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 10 minutes, 13 seconds)
2025-08-07 10:14:40,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:14:45,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1098.18384 ± 667.289
2025-08-07 10:14:45,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1352.6484, 1059.4203, 286.9825, 853.25, 1829.1478, 2057.0586, 2112.754, 604.85645, 376.54175, 449.1779]
2025-08-07 10:14:45,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [498.0, 409.0, 150.0, 343.0, 749.0, 781.0, 801.0, 262.0, 167.0, 208.0]
2025-08-07 10:14:45,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 9 minutes, 44 seconds)
2025-08-07 10:16:29,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:16:33,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 835.51642 ± 499.439
2025-08-07 10:16:33,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1341.7755, 215.20848, 659.88763, 449.84494, 182.95695, 1245.2714, 473.55624, 1780.3607, 998.32983, 1007.97235]
2025-08-07 10:16:33,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [473.0, 112.0, 276.0, 202.0, 105.0, 432.0, 207.0, 674.0, 352.0, 405.0]
2025-08-07 10:16:33,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 7 minutes, 56 seconds)
2025-08-07 10:18:08,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:18:12,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 848.45862 ± 485.543
2025-08-07 10:18:12,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1143.2231, 1934.266, 178.40747, 460.07532, 637.81537, 339.48392, 654.0897, 898.6464, 1092.6069, 1145.9713]
2025-08-07 10:18:12,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [432.0, 672.0, 96.0, 208.0, 275.0, 163.0, 267.0, 352.0, 425.0, 424.0]
2025-08-07 10:18:12,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 6 minutes, 36 seconds)
2025-08-07 10:19:55,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:03,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1591.92358 ± 734.024
2025-08-07 10:20:03,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2621.9124, 1650.0879, 1322.2144, 425.68384, 1931.6594, 2655.9678, 436.2024, 1896.8877, 1178.3169, 1800.3046]
2025-08-07 10:20:03,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [993.0, 602.0, 464.0, 199.0, 698.0, 1000.0, 199.0, 658.0, 431.0, 655.0]
2025-08-07 10:20:03,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (1591.92) for latency MM1Queue_a033_s075
2025-08-07 10:20:03,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 5 minutes, 10 seconds)
2025-08-07 10:21:42,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:21:45,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 814.68573 ± 774.406
2025-08-07 10:21:45,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1311.8082, 1143.1039, 33.69119, 279.5181, 1300.5214, 1228.3629, 2499.61, 276.27875, 40.785236, 33.17775]
2025-08-07 10:21:45,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [443.0, 422.0, 33.0, 143.0, 496.0, 460.0, 845.0, 138.0, 39.0, 31.0]
2025-08-07 10:21:45,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 4 minutes, 11 seconds)
2025-08-07 10:23:21,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:23:26,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1021.49573 ± 750.467
2025-08-07 10:23:26,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1484.8064, 1183.1462, 252.6617, 282.8243, 875.18036, 2536.2034, 181.91472, 1852.0037, 319.19543, 1247.0204]
2025-08-07 10:23:26,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [530.0, 431.0, 132.0, 152.0, 355.0, 932.0, 102.0, 639.0, 149.0, 490.0]
2025-08-07 10:23:26,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 1 minute, 30 seconds)
2025-08-07 10:25:10,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:25:15,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1140.42993 ± 917.977
2025-08-07 10:25:15,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1135.1013, 161.61736, 1133.8048, 1411.6024, 2757.6785, 2823.9802, 198.23415, 392.3065, 470.20978, 919.7637]
2025-08-07 10:25:15,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [409.0, 90.0, 400.0, 510.0, 945.0, 1000.0, 104.0, 179.0, 207.0, 332.0]
2025-08-07 10:25:15,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 4 seconds)
2025-08-07 10:26:53,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:27:01,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1873.01538 ± 895.274
2025-08-07 10:27:01,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2811.6104, 2715.366, 403.931, 2170.7812, 1216.688, 2797.2551, 2785.7039, 671.43146, 2049.1829, 1108.2048]
2025-08-07 10:27:01,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 186.0, 743.0, 451.0, 1000.0, 960.0, 271.0, 688.0, 399.0]
2025-08-07 10:27:01,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (1873.02) for latency MM1Queue_a033_s075
2025-08-07 10:27:01,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 59 minutes, 48 seconds)
2025-08-07 10:28:36,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:38,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 467.88550 ± 223.416
2025-08-07 10:28:38,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [596.7587, 202.85083, 532.8698, 579.037, 39.194633, 658.64264, 164.89041, 629.08905, 642.45764, 633.0644]
2025-08-07 10:28:38,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 109.0, 194.0, 204.0, 38.0, 223.0, 91.0, 222.0, 224.0, 219.0]
2025-08-07 10:28:38,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 55 minutes, 6 seconds)
2025-08-07 10:30:19,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:24,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1254.61401 ± 531.725
2025-08-07 10:30:24,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1202.8384, 415.43594, 1510.9375, 1567.8892, 827.92, 2537.537, 1321.7795, 1100.72, 1028.9757, 1032.1075]
2025-08-07 10:30:24,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [406.0, 172.0, 498.0, 533.0, 269.0, 839.0, 447.0, 409.0, 338.0, 345.0]
2025-08-07 10:30:24,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 54 minutes, 8 seconds)
2025-08-07 10:32:07,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:32:13,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1356.22290 ± 914.511
2025-08-07 10:32:13,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1501.2336, 134.37242, 2932.0544, 1690.2128, 1843.1548, 860.7452, 2655.871, 1268.03, 301.78958, 374.76443]
2025-08-07 10:32:13,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [486.0, 75.0, 1000.0, 587.0, 610.0, 328.0, 968.0, 417.0, 163.0, 171.0]
2025-08-07 10:32:13,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 54 minutes, 12 seconds)
2025-08-07 10:33:50,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:56,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1293.21460 ± 874.145
2025-08-07 10:33:56,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1259.1392, 1429.9755, 236.35158, 2687.5938, 1539.907, 1040.642, 2884.8875, 110.59728, 599.6056, 1143.4459]
2025-08-07 10:33:56,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [444.0, 544.0, 122.0, 960.0, 535.0, 396.0, 1000.0, 64.0, 252.0, 438.0]
2025-08-07 10:33:56,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 51 minutes, 11 seconds)
2025-08-07 10:35:32,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:37,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1346.92834 ± 795.878
2025-08-07 10:35:37,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1022.201, 935.13824, 2381.0107, 633.7005, 1320.8077, 917.19763, 1626.2134, 1436.8867, 3026.9797, 169.14793]
2025-08-07 10:35:37,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [327.0, 290.0, 757.0, 239.0, 412.0, 289.0, 525.0, 475.0, 1000.0, 95.0]
2025-08-07 10:35:37,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 48 minutes, 29 seconds)
2025-08-07 10:37:19,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:24,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1175.38904 ± 683.437
2025-08-07 10:37:24,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [841.7198, 1304.9741, 2793.1313, 1076.4081, 1882.8313, 464.0779, 449.0907, 639.6153, 1367.4265, 934.6155]
2025-08-07 10:37:24,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [325.0, 495.0, 1000.0, 394.0, 680.0, 185.0, 180.0, 250.0, 509.0, 345.0]
2025-08-07 10:37:24,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 48 minutes, 44 seconds)
2025-08-07 10:39:04,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:10,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1338.80212 ± 1036.343
2025-08-07 10:39:10,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [382.98135, 638.35626, 400.41745, 413.17133, 2984.725, 1353.5531, 395.99945, 2963.801, 2504.8093, 1350.2056]
2025-08-07 10:39:10,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [172.0, 264.0, 176.0, 176.0, 1000.0, 422.0, 167.0, 1000.0, 869.0, 483.0]
2025-08-07 10:39:10,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 46 minutes, 48 seconds)
2025-08-07 10:40:47,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:40:52,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1240.51648 ± 436.624
2025-08-07 10:40:52,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1198.4111, 392.0133, 1196.4261, 1824.6527, 1189.4496, 1140.7428, 1191.7374, 1210.5059, 2103.9211, 957.30493]
2025-08-07 10:40:52,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [380.0, 165.0, 373.0, 583.0, 376.0, 355.0, 377.0, 389.0, 652.0, 307.0]
2025-08-07 10:40:52,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 43 minutes, 48 seconds)
2025-08-07 10:42:36,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:42:43,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1841.60571 ± 1127.330
2025-08-07 10:42:43,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2970.351, 2935.9578, 2396.349, 411.54156, 1987.7485, 2926.1858, 391.8596, 3110.627, 419.43695, 865.9986]
2025-08-07 10:42:43,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 813.0, 184.0, 639.0, 1000.0, 173.0, 1000.0, 187.0, 284.0]
2025-08-07 10:42:43,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 43 minutes, 43 seconds)
2025-08-07 10:44:18,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:22,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 951.77441 ± 637.731
2025-08-07 10:44:22,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [239.47894, 1097.8777, 2125.77, 223.6744, 316.50546, 1597.3042, 1145.0902, 1242.8429, 1320.235, 208.96516]
2025-08-07 10:44:22,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [125.0, 365.0, 705.0, 115.0, 153.0, 505.0, 372.0, 395.0, 413.0, 109.0]
2025-08-07 10:44:22,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 41 minutes, 21 seconds)
2025-08-07 10:46:02,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:12,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2197.15796 ± 863.383
2025-08-07 10:46:12,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2933.41, 2932.2622, 2910.499, 1718.2472, 1982.9338, 2912.5845, 225.45526, 2939.861, 1989.6442, 1426.6824]
2025-08-07 10:46:12,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 617.0, 623.0, 1000.0, 119.0, 1000.0, 702.0, 530.0]
2025-08-07 10:46:12,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (2197.16) for latency MM1Queue_a033_s075
2025-08-07 10:46:12,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 40 minutes, 11 seconds)
2025-08-07 10:47:55,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:58,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 913.91583 ± 708.093
2025-08-07 10:47:58,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [394.52563, 134.09186, 1188.5819, 171.45451, 1610.4124, 1808.0839, 1465.2882, 407.47665, 69.78531, 1889.4586]
2025-08-07 10:47:58,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 75.0, 398.0, 98.0, 553.0, 567.0, 468.0, 175.0, 58.0, 623.0]
2025-08-07 10:47:58,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 38 minutes, 44 seconds)
2025-08-07 10:49:38,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:42,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 991.09766 ± 680.225
2025-08-07 10:49:42,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [708.1654, 1183.1981, 282.06485, 1290.3693, 617.65, 1645.3707, 2403.2473, 201.05394, 1360.456, 219.40073]
2025-08-07 10:49:42,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [288.0, 412.0, 153.0, 419.0, 249.0, 523.0, 799.0, 115.0, 426.0, 121.0]
2025-08-07 10:49:42,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 37 minutes, 11 seconds)
2025-08-07 10:51:15,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:51:21,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1248.70020 ± 1084.886
2025-08-07 10:51:21,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2908.3938, 139.06917, 295.38766, 274.40253, 306.9192, 240.25021, 2179.2925, 2931.559, 1791.3046, 1420.4236]
2025-08-07 10:51:21,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 81.0, 141.0, 134.0, 153.0, 122.0, 770.0, 1000.0, 622.0, 502.0]
2025-08-07 10:51:21,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 33 minutes, 6 seconds)
2025-08-07 10:53:00,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:09,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1982.81482 ± 976.933
2025-08-07 10:53:09,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [537.9541, 2271.769, 438.62817, 2910.1196, 2964.0747, 911.89044, 2896.3203, 2915.2366, 2387.7317, 1594.4247]
2025-08-07 10:53:09,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [221.0, 808.0, 190.0, 1000.0, 1000.0, 343.0, 1000.0, 1000.0, 830.0, 578.0]
2025-08-07 10:53:09,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 33 minutes, 4 seconds)
2025-08-07 10:54:47,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:53,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1291.32959 ± 1137.729
2025-08-07 10:54:53,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2889.5466, 151.31218, 976.58826, 1488.8673, 439.728, 242.58551, 2913.433, 153.91287, 683.0647, 2974.258]
2025-08-07 10:54:53,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 80.0, 367.0, 504.0, 193.0, 121.0, 1000.0, 87.0, 265.0, 1000.0]
2025-08-07 10:54:53,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 30 minutes, 19 seconds)
2025-08-07 10:56:30,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:37,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1822.56665 ± 1022.969
2025-08-07 10:56:37,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1370.1913, 1133.6788, 1633.2935, 2969.5293, 3042.3535, 2969.848, 2939.985, 638.6434, 1371.8728, 156.2699]
2025-08-07 10:56:37,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [432.0, 411.0, 576.0, 1000.0, 1000.0, 1000.0, 1000.0, 252.0, 443.0, 88.0]
2025-08-07 10:56:37,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 28 minutes, 11 seconds)
2025-08-07 10:58:12,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:17,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1253.32874 ± 797.556
2025-08-07 10:58:17,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1135.3708, 1385.009, 102.475365, 1127.2919, 3043.3726, 1151.1764, 1218.388, 868.4378, 2152.3486, 349.4167]
2025-08-07 10:58:17,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [380.0, 458.0, 64.0, 375.0, 1000.0, 377.0, 405.0, 276.0, 700.0, 160.0]
2025-08-07 10:58:17,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 25 minutes, 50 seconds)
2025-08-07 10:59:56,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:03,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1663.05786 ± 844.076
2025-08-07 11:00:03,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2148.6882, 917.9371, 1661.7087, 2970.5144, 1752.0913, 203.12466, 2957.0916, 1190.3573, 933.2226, 1895.8417]
2025-08-07 11:00:03,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [731.0, 337.0, 550.0, 1000.0, 597.0, 107.0, 1000.0, 425.0, 335.0, 649.0]
2025-08-07 11:00:03,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 25 minutes, 15 seconds)
2025-08-07 11:01:40,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:46,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1623.16711 ± 857.426
2025-08-07 11:01:46,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1162.6743, 1684.4402, 933.74274, 1931.7588, 1065.8994, 138.56688, 1248.6765, 2108.579, 2967.886, 2989.4475]
2025-08-07 11:01:46,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [393.0, 585.0, 302.0, 637.0, 351.0, 82.0, 393.0, 651.0, 1000.0, 1000.0]
2025-08-07 11:01:46,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 22 minutes, 48 seconds)
2025-08-07 11:03:32,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:40,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1973.91626 ± 956.822
2025-08-07 11:03:40,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1680.9515, 431.51788, 576.3492, 1668.5889, 2912.795, 2931.1453, 2955.1804, 1125.4849, 2618.298, 2838.8518]
2025-08-07 11:03:40,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [586.0, 187.0, 235.0, 596.0, 1000.0, 1000.0, 1000.0, 429.0, 914.0, 982.0]
2025-08-07 11:03:40,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 22 minutes, 40 seconds)
2025-08-07 11:05:14,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:22,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2009.30933 ± 1120.611
2025-08-07 11:05:22,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1457.9293, 3021.0874, 1642.626, 2993.3406, 98.57147, 3006.4553, 143.58665, 2959.6926, 3023.5278, 1746.277]
2025-08-07 11:05:22,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [491.0, 1000.0, 514.0, 1000.0, 64.0, 1000.0, 79.0, 1000.0, 1000.0, 539.0]
2025-08-07 11:05:22,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 20 minutes, 25 seconds)
2025-08-07 11:07:01,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:08,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1662.77954 ± 976.641
2025-08-07 11:07:08,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [913.34216, 2978.34, 1443.2561, 103.8678, 2915.7632, 2853.479, 1707.817, 673.0509, 904.17615, 2134.7026]
2025-08-07 11:07:08,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [344.0, 1000.0, 477.0, 68.0, 1000.0, 964.0, 556.0, 262.0, 339.0, 720.0]
2025-08-07 11:07:08,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 19 minutes, 32 seconds)
2025-08-07 11:08:37,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:46,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2300.72583 ± 739.831
2025-08-07 11:08:46,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2600.0527, 3072.822, 3062.915, 1404.6532, 2125.5916, 821.854, 1870.0283, 2851.4265, 3108.6152, 2089.2996]
2025-08-07 11:08:46,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [820.0, 1000.0, 957.0, 453.0, 682.0, 302.0, 591.0, 891.0, 1000.0, 662.0]
2025-08-07 11:08:46,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (2300.73) for latency MM1Queue_a033_s075
2025-08-07 11:08:46,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 16 minutes, 43 seconds)
2025-08-07 11:10:24,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:31,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1683.90588 ± 1056.952
2025-08-07 11:10:31,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [221.94785, 1230.4078, 1847.3401, 3001.834, 377.1103, 912.72424, 2924.4246, 777.50586, 2649.95, 2895.8152]
2025-08-07 11:10:31,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 436.0, 646.0, 1000.0, 159.0, 347.0, 1000.0, 296.0, 860.0, 1000.0]
2025-08-07 11:10:31,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 15 minutes, 12 seconds)
2025-08-07 11:12:14,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:23,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2057.86182 ± 1286.051
2025-08-07 11:12:23,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2948.4106, 2943.8103, 92.369316, 86.65159, 2943.1116, 2931.5103, 2831.6658, 2835.9126, 2860.902, 104.27307]
2025-08-07 11:12:23,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 59.0, 56.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 66.0]
2025-08-07 11:12:23,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 13 minutes, 8 seconds)
2025-08-07 11:13:59,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:04,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1383.31128 ± 802.973
2025-08-07 11:14:04,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3070.8938, 1291.1106, 1255.3225, 1767.38, 1772.7587, 1249.2706, 1215.8679, 160.88794, 1884.5448, 165.07547]
2025-08-07 11:14:04,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [955.0, 408.0, 398.0, 552.0, 559.0, 393.0, 389.0, 90.0, 609.0, 96.0]
2025-08-07 11:14:04,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 11 minutes, 23 seconds)
2025-08-07 11:15:38,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:46,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2070.14526 ± 1172.000
2025-08-07 11:15:46,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [95.400826, 2837.5044, 3130.5786, 1068.9602, 3004.2769, 3018.647, 2621.2544, 2992.7825, 1902.7968, 29.249529]
2025-08-07 11:15:46,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [62.0, 912.0, 1000.0, 356.0, 1000.0, 1000.0, 823.0, 934.0, 592.0, 34.0]
2025-08-07 11:15:46,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 9 minutes, 3 seconds)
2025-08-07 11:17:23,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:29,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1510.65784 ± 957.176
2025-08-07 11:17:29,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2635.9692, 1152.9186, 1594.773, 1360.1506, 1001.13965, 96.33917, 1520.9202, 92.29861, 2772.9666, 2879.103]
2025-08-07 11:17:29,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [839.0, 374.0, 518.0, 436.0, 331.0, 64.0, 500.0, 63.0, 888.0, 1000.0]
2025-08-07 11:17:29,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 8 minutes)
2025-08-07 11:19:09,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:19:16,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2039.66858 ± 948.982
2025-08-07 11:19:16,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1675.8192, 3058.1216, 3022.0825, 680.7698, 2097.2163, 41.359516, 2037.6046, 2379.2566, 2570.9792, 2833.476]
2025-08-07 11:19:16,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [532.0, 966.0, 1000.0, 235.0, 660.0, 35.0, 647.0, 744.0, 824.0, 906.0]
2025-08-07 11:19:16,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 6 minutes, 33 seconds)
2025-08-07 11:20:47,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:57,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2403.68311 ± 903.162
2025-08-07 11:20:57,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3017.5964, 3042.7632, 2863.3691, 3069.3164, 1604.6903, 2997.573, 1820.9282, 2497.7036, 2968.4048, 154.4867]
2025-08-07 11:20:57,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 908.0, 1000.0, 566.0, 1000.0, 576.0, 795.0, 1000.0, 85.0]
2025-08-07 11:20:57,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (2403.68) for latency MM1Queue_a033_s075
2025-08-07 11:20:57,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 3 minutes, 23 seconds)
2025-08-07 11:22:36,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:41,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1489.70825 ± 880.277
2025-08-07 11:22:41,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2916.6104, 1368.5391, 1534.6105, 1153.4049, 1073.7632, 154.74745, 955.3515, 3111.9592, 744.4245, 1883.6719]
2025-08-07 11:22:41,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [916.0, 434.0, 479.0, 385.0, 343.0, 88.0, 350.0, 1000.0, 281.0, 586.0]
2025-08-07 11:22:41,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 2 minutes, 3 seconds)
2025-08-07 11:24:15,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:19,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1055.99475 ± 405.984
2025-08-07 11:24:19,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [155.38454, 1264.2922, 1371.2474, 1311.7947, 1124.8679, 1094.7429, 600.8019, 1689.0358, 998.9559, 948.8243]
2025-08-07 11:24:19,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 389.0, 439.0, 401.0, 357.0, 335.0, 233.0, 529.0, 314.0, 303.0]
2025-08-07 11:24:19,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 59 minutes, 54 seconds)
2025-08-07 11:25:58,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:26:04,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1530.66309 ± 1163.728
2025-08-07 11:26:04,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2220.3608, 1662.6943, 2974.7764, 133.97882, 91.88806, 2919.3513, 2963.9685, 1647.4049, 81.10743, 611.09973]
2025-08-07 11:26:04,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [714.0, 530.0, 1000.0, 72.0, 64.0, 1000.0, 1000.0, 535.0, 55.0, 248.0]
2025-08-07 11:26:04,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 58 minutes, 24 seconds)
2025-08-07 11:27:38,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:27:44,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1399.59631 ± 221.413
2025-08-07 11:27:44,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1330.7816, 1263.1676, 1853.5513, 1480.6625, 1183.2675, 1411.9972, 1648.6946, 1419.693, 1019.30975, 1384.8383]
2025-08-07 11:27:44,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [422.0, 405.0, 574.0, 473.0, 391.0, 443.0, 516.0, 461.0, 327.0, 434.0]
2025-08-07 11:27:44,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 55 minutes, 47 seconds)
2025-08-07 11:29:21,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:29:26,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1473.97180 ± 964.190
2025-08-07 11:29:26,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [330.47073, 1653.3376, 1400.0553, 1538.1748, 1036.4277, 1635.6998, 3153.172, 528.0199, 3125.3386, 339.02136]
2025-08-07 11:29:26,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 536.0, 441.0, 485.0, 364.0, 545.0, 1000.0, 211.0, 1000.0, 157.0]
2025-08-07 11:29:26,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 54 minutes, 21 seconds)
2025-08-07 11:31:04,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:31:08,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1117.06006 ± 362.191
2025-08-07 11:31:08,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1127.5903, 1223.965, 940.4447, 1116.0469, 1331.4231, 1243.7864, 1346.8951, 103.55392, 1386.7289, 1350.1669]
2025-08-07 11:31:08,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [371.0, 394.0, 309.0, 348.0, 419.0, 409.0, 425.0, 68.0, 454.0, 426.0]
2025-08-07 11:31:08,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 52 minutes, 19 seconds)
2025-08-07 11:32:44,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:32:51,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1796.85474 ± 675.080
2025-08-07 11:32:51,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [463.87808, 1925.5653, 2094.7712, 2660.8682, 1411.409, 2374.9646, 2700.8076, 1331.3948, 1156.9398, 1847.9492]
2025-08-07 11:32:51,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [193.0, 604.0, 673.0, 847.0, 463.0, 744.0, 838.0, 435.0, 399.0, 585.0]
2025-08-07 11:32:51,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 8 seconds)
2025-08-07 11:34:28,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:32,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1139.27026 ± 488.851
2025-08-07 11:34:32,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1446.398, 1158.7532, 1145.8213, 87.38117, 1082.2076, 1215.4742, 2093.6448, 1168.0032, 660.1528, 1334.8661]
2025-08-07 11:34:32,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [460.0, 374.0, 378.0, 62.0, 346.0, 397.0, 660.0, 379.0, 261.0, 419.0]
2025-08-07 11:34:32,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 7 seconds)
2025-08-07 11:36:09,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:13,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1102.13708 ± 497.906
2025-08-07 11:36:13,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1350.646, 1366.8707, 116.55413, 1561.0911, 925.4109, 1537.9062, 1452.2968, 1312.544, 1174.5493, 223.50151]
2025-08-07 11:36:13,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [429.0, 429.0, 71.0, 488.0, 317.0, 480.0, 463.0, 424.0, 376.0, 118.0]
2025-08-07 11:36:13,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 47 minutes, 33 seconds)
2025-08-07 11:37:48,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:37:52,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1215.79773 ± 194.286
2025-08-07 11:37:52,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1009.1501, 1061.3491, 1273.6892, 1527.6223, 1135.8466, 1158.9093, 1502.187, 1083.6401, 978.21045, 1427.3743]
2025-08-07 11:37:52,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [327.0, 339.0, 414.0, 488.0, 368.0, 363.0, 470.0, 369.0, 319.0, 456.0]
2025-08-07 11:37:52,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 32 seconds)
2025-08-07 11:39:35,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:42,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1724.36523 ± 987.877
2025-08-07 11:39:42,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1077.5074, 1219.6167, 1004.11957, 2905.4966, 533.577, 187.37117, 2994.7263, 2630.1577, 2040.8464, 2650.2336]
2025-08-07 11:39:42,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [396.0, 428.0, 359.0, 1000.0, 211.0, 103.0, 1000.0, 833.0, 636.0, 884.0]
2025-08-07 11:39:42,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 33 seconds)
2025-08-07 11:41:18,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:24,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1518.45935 ± 655.939
2025-08-07 11:41:24,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1668.4651, 2586.1567, 1155.3861, 1573.98, 99.61155, 1993.3989, 1834.4243, 2074.351, 1042.7158, 1156.1034]
2025-08-07 11:41:24,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [532.0, 813.0, 376.0, 506.0, 69.0, 624.0, 595.0, 651.0, 356.0, 383.0]
2025-08-07 11:41:24,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 45 seconds)
2025-08-07 11:42:58,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:43:04,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1489.38672 ± 771.559
2025-08-07 11:43:04,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1798.339, 1601.2429, 1555.0428, 3062.5679, 1169.5122, 543.33374, 2138.152, 179.68642, 1757.2819, 1088.7085]
2025-08-07 11:43:04,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [574.0, 511.0, 501.0, 978.0, 406.0, 211.0, 690.0, 103.0, 595.0, 362.0]
2025-08-07 11:43:04,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 54 seconds)
2025-08-07 11:44:44,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:44:50,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1489.34009 ± 724.591
2025-08-07 11:44:50,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1585.4363, 339.7865, 1867.8582, 429.89954, 3078.7358, 1588.3027, 1641.5822, 1594.6307, 1558.6788, 1208.49]
2025-08-07 11:44:50,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [497.0, 160.0, 583.0, 184.0, 1000.0, 520.0, 565.0, 503.0, 496.0, 402.0]
2025-08-07 11:44:50,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 36 seconds)
2025-08-07 11:46:20,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:46:26,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1400.04272 ± 855.340
2025-08-07 11:46:26,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [362.8822, 203.37546, 981.2462, 1372.5741, 918.55493, 1031.4498, 2771.5571, 2447.4053, 2534.0542, 1377.3284]
2025-08-07 11:46:26,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [162.0, 104.0, 325.0, 459.0, 330.0, 355.0, 889.0, 802.0, 807.0, 432.0]
2025-08-07 11:46:26,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 39 seconds)
2025-08-07 11:48:05,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:48:09,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1276.00061 ± 725.556
2025-08-07 11:48:09,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2116.3083, 1405.1268, 1309.6223, 2338.5317, 1085.761, 260.65057, 374.27847, 2111.3142, 370.01617, 1388.395]
2025-08-07 11:48:09,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [682.0, 444.0, 432.0, 745.0, 351.0, 123.0, 161.0, 683.0, 162.0, 439.0]
2025-08-07 11:48:09,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 31 seconds)
2025-08-07 11:49:48,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:49:58,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2596.31201 ± 864.463
2025-08-07 11:49:58,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3023.1416, 3032.8796, 2818.5444, 3031.2053, 3077.7754, 3067.2068, 2138.2437, 3085.0469, 155.93616, 2533.1387]
2025-08-07 11:49:58,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 904.0, 1000.0, 1000.0, 1000.0, 677.0, 1000.0, 91.0, 839.0]
2025-08-07 11:49:58,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (2596.31) for latency MM1Queue_a033_s075
2025-08-07 11:49:58,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 16 seconds)
2025-08-07 11:51:31,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:51:41,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2499.91528 ± 684.772
2025-08-07 11:51:41,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1896.1265, 2581.282, 2912.2112, 2984.308, 1809.2223, 2977.1895, 2981.756, 2991.6326, 2955.227, 910.1976]
2025-08-07 11:51:41,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [654.0, 870.0, 1000.0, 1000.0, 586.0, 1000.0, 994.0, 1000.0, 1000.0, 331.0]
2025-08-07 11:51:41,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 45 seconds)
2025-08-07 11:53:21,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:53:29,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1881.51526 ± 1198.363
2025-08-07 11:53:29,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3010.4673, 3111.9868, 3023.5637, 1572.3126, 2347.676, 347.47592, 103.07505, 2059.934, 3080.4126, 158.24884]
2025-08-07 11:53:29,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 542.0, 798.0, 159.0, 72.0, 710.0, 1000.0, 91.0]
2025-08-07 11:53:29,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 7 seconds)
2025-08-07 11:55:06,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:13,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1922.02466 ± 707.890
2025-08-07 11:55:13,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2580.6677, 1853.8297, 3029.4663, 1809.485, 3108.4124, 953.33765, 1631.9996, 1394.109, 1693.6887, 1165.2493]
2025-08-07 11:55:13,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [808.0, 594.0, 1000.0, 574.0, 1000.0, 322.0, 522.0, 453.0, 537.0, 372.0]
2025-08-07 11:55:13,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 51 seconds)
2025-08-07 11:56:54,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:57:00,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1588.93506 ± 447.178
2025-08-07 11:57:00,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1811.1631, 1618.8667, 1123.5532, 2613.5352, 1512.0132, 1287.3165, 1135.0135, 2095.2307, 1291.4246, 1401.234]
2025-08-07 11:57:00,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [574.0, 513.0, 370.0, 822.0, 486.0, 402.0, 371.0, 656.0, 399.0, 447.0]
2025-08-07 11:57:00,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 28 minutes, 18 seconds)
2025-08-07 11:58:28,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:58:35,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1811.52441 ± 605.208
2025-08-07 11:58:35,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1865.609, 3085.925, 1807.6156, 2780.619, 1328.5745, 1852.8784, 1391.2496, 1338.092, 1303.7491, 1360.9312]
2025-08-07 11:58:35,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [606.0, 981.0, 571.0, 912.0, 448.0, 597.0, 445.0, 448.0, 425.0, 467.0]
2025-08-07 11:58:35,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 51 seconds)
2025-08-07 12:00:16,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:00:27,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2610.56567 ± 624.187
2025-08-07 12:00:27,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3064.225, 2401.551, 3075.8523, 2043.1729, 1019.12286, 3025.9094, 3013.1575, 2570.9797, 2860.673, 3031.0103]
2025-08-07 12:00:27,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 811.0, 1000.0, 642.0, 326.0, 1000.0, 1000.0, 845.0, 936.0, 1000.0]
2025-08-07 12:00:27,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (2610.57) for latency MM1Queue_a033_s075
2025-08-07 12:00:27,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 31 seconds)
2025-08-07 12:02:01,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:02:09,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2014.71387 ± 1144.315
2025-08-07 12:02:09,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [147.28232, 3018.2722, 1284.1628, 143.15355, 2914.0637, 2949.4287, 2841.242, 2863.2966, 2852.48, 1133.7563]
2025-08-07 12:02:09,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 1000.0, 459.0, 82.0, 1000.0, 1000.0, 965.0, 1000.0, 1000.0, 365.0]
2025-08-07 12:02:09,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 32 seconds)
2025-08-07 12:03:45,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:53,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2243.97021 ± 1175.466
2025-08-07 12:03:53,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3073.6326, 2996.4001, 1284.3395, 104.1088, 143.98096, 3051.2498, 3006.1453, 2830.8894, 2909.2996, 3039.6555]
2025-08-07 12:03:53,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 407.0, 70.0, 84.0, 1000.0, 1000.0, 933.0, 925.0, 1000.0]
2025-08-07 12:03:53,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 49 seconds)
2025-08-07 12:05:33,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:39,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1635.48987 ± 696.696
2025-08-07 12:05:39,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1372.5574, 1423.8832, 2493.6577, 1606.7628, 1790.2705, 1338.4563, 434.58173, 1598.8112, 1167.7299, 3128.1887]
2025-08-07 12:05:39,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [444.0, 444.0, 775.0, 490.0, 553.0, 418.0, 183.0, 511.0, 375.0, 1000.0]
2025-08-07 12:05:39,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes)
2025-08-07 12:07:12,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:07:17,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1220.82104 ± 549.177
2025-08-07 12:07:17,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2118.1707, 1549.4468, 182.99251, 1479.6493, 1121.917, 352.26578, 1079.0172, 1575.9083, 1351.0735, 1397.7708]
2025-08-07 12:07:17,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [666.0, 474.0, 101.0, 454.0, 351.0, 157.0, 343.0, 521.0, 449.0, 452.0]
2025-08-07 12:07:17,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 23 seconds)
2025-08-07 12:09:00,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:09:08,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2270.50439 ± 683.660
2025-08-07 12:09:08,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2504.6096, 3063.5374, 1363.6041, 1561.0198, 3105.2917, 1574.1117, 3222.3044, 2011.581, 2639.536, 1659.4486]
2025-08-07 12:09:08,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [785.0, 1000.0, 428.0, 488.0, 1000.0, 498.0, 1000.0, 616.0, 864.0, 535.0]
2025-08-07 12:09:08,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 38 seconds)
2025-08-07 12:10:41,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:50,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2185.77661 ± 988.558
2025-08-07 12:10:50,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2250.8079, 123.53571, 3073.8584, 3073.982, 3032.921, 1623.8309, 2979.325, 3037.3591, 1541.325, 1120.8218]
2025-08-07 12:10:50,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [759.0, 80.0, 1000.0, 1000.0, 1000.0, 521.0, 1000.0, 1000.0, 491.0, 348.0]
2025-08-07 12:10:50,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 53 seconds)
2025-08-07 12:12:33,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:43,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2590.39648 ± 873.011
2025-08-07 12:12:43,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [286.10684, 2684.4753, 3026.3042, 2990.024, 3026.6143, 1651.5143, 3056.283, 3065.122, 3069.5344, 3047.9856]
2025-08-07 12:12:43,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 847.0, 1000.0, 1000.0, 1000.0, 529.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:12:43,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 21 seconds)
2025-08-07 12:14:16,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:21,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1476.91211 ± 706.840
2025-08-07 12:14:21,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2001.6565, 1152.0497, 2324.2917, 2802.741, 1428.9215, 1010.6472, 1285.7545, 163.89926, 1519.0234, 1080.1366]
2025-08-07 12:14:21,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [647.0, 368.0, 726.0, 884.0, 458.0, 325.0, 405.0, 89.0, 479.0, 344.0]
2025-08-07 12:14:21,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 27 seconds)
2025-08-07 12:15:58,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:04,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1685.59473 ± 1145.352
2025-08-07 12:16:04,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2593.3428, 638.35065, 3002.7104, 3009.6055, 65.931725, 3033.918, 1360.7444, 171.6272, 2120.0334, 859.6831]
2025-08-07 12:16:04,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [826.0, 249.0, 1000.0, 1000.0, 54.0, 1000.0, 457.0, 100.0, 709.0, 315.0]
2025-08-07 12:16:04,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 47 seconds)
2025-08-07 12:17:39,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:17:44,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1247.21338 ± 944.434
2025-08-07 12:17:44,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1733.4476, 2558.9495, 1289.7178, 1051.7672, 3030.6052, 77.62049, 1303.5087, 288.95218, 1084.568, 52.996857]
2025-08-07 12:17:44,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [548.0, 843.0, 403.0, 372.0, 1000.0, 58.0, 404.0, 144.0, 342.0, 46.0]
2025-08-07 12:17:44,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 52 seconds)
2025-08-07 12:19:21,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:19:29,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2033.01880 ± 1116.734
2025-08-07 12:19:29,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3061.227, 839.4629, 3038.183, 2955.9495, 60.89751, 3091.1887, 2232.0557, 906.9646, 1121.5121, 3022.747]
2025-08-07 12:19:29,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 316.0, 1000.0, 1000.0, 49.0, 1000.0, 716.0, 315.0, 353.0, 1000.0]
2025-08-07 12:19:29,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 11 seconds)
2025-08-07 12:21:10,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:21:21,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2722.68726 ± 719.105
2025-08-07 12:21:21,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3160.63, 3048.3174, 3131.2214, 3019.099, 1951.7006, 3144.0317, 3028.3154, 3088.2307, 823.3186, 2832.008]
2025-08-07 12:21:21,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 631.0, 1000.0, 1000.0, 1000.0, 300.0, 888.0]
2025-08-07 12:21:21,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (2722.69) for latency MM1Queue_a033_s075
2025-08-07 12:21:21,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 27 seconds)
2025-08-07 12:22:53,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:23:02,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2288.22510 ± 889.249
2025-08-07 12:23:02,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3030.852, 2987.2366, 2991.2903, 3003.5808, 2984.263, 1234.5933, 2168.671, 1382.0613, 467.94113, 2631.7605]
2025-08-07 12:23:02,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 395.0, 741.0, 441.0, 187.0, 875.0]
2025-08-07 12:23:02,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 44 seconds)
2025-08-07 12:24:38,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:24:47,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2438.21436 ± 809.332
2025-08-07 12:24:47,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2867.3635, 2344.4966, 2076.7903, 308.1232, 2111.0923, 2421.566, 3102.1646, 2943.7068, 3121.855, 3084.9844]
2025-08-07 12:24:47,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [893.0, 736.0, 636.0, 137.0, 665.0, 777.0, 1000.0, 909.0, 1000.0, 962.0]
2025-08-07 12:24:47,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1251 [DEBUG]: Training session finished
