2025-08-07 10:59:43,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc25-walker2d/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:59:43,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc25-walker2d/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:59:43,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1470c635a910>}
2025-08-07 10:59:43,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 10:59:43,699 baseline-bpql-noiseperc25-walker2d:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:59:43,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 10:59:43,716 baseline-bpql-noiseperc25-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=113, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 10:59:43,716 baseline-bpql-noiseperc25-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:59:44,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 10:59:44,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 11:01:19,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:19,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 12.88763 ± 11.053
2025-08-07 11:01:19,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [23.319576, -4.2712274, 7.6600523, 3.6252275, 23.222881, 29.93782, 3.166564, 23.268152, 3.0591438, 15.888111]
2025-08-07 11:01:19,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [58.0, 69.0, 33.0, 15.0, 73.0, 57.0, 19.0, 56.0, 15.0, 30.0]
2025-08-07 11:01:19,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (12.89) for latency MM1Queue_a033_s075
2025-08-07 11:01:19,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 36 minutes, 35 seconds)
2025-08-07 11:03:01,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:02,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 69.96385 ± 66.594
2025-08-07 11:03:02,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [142.41974, 224.80724, 104.12075, 9.31127, 18.10841, 25.558226, 85.25103, 16.808002, 23.733257, 49.52066]
2025-08-07 11:03:02,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [179.0, 142.0, 110.0, 32.0, 86.0, 163.0, 98.0, 45.0, 48.0, 59.0]
2025-08-07 11:03:02,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (69.96) for latency MM1Queue_a033_s075
2025-08-07 11:03:02,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 41 minutes, 56 seconds)
2025-08-07 11:04:46,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:04:46,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 36.32352 ± 84.472
2025-08-07 11:04:46,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-53.156326, 0.8919768, -2.6494045, -5.3962293, 244.76466, 30.816013, 1.2269759, 144.2478, -1.5017483, 3.9914172]
2025-08-07 11:04:46,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [183.0, 12.0, 17.0, 21.0, 262.0, 88.0, 12.0, 123.0, 11.0, 15.0]
2025-08-07 11:04:46,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 42 minutes, 52 seconds)
2025-08-07 11:06:28,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:06:29,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 13.21782 ± 15.878
2025-08-07 11:06:29,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.3396324, 1.3245264, 6.283301, 8.635404, 42.26824, 3.969698, 3.471545, 1.6967816, 44.575478, 18.613575]
2025-08-07 11:06:29,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 29.0, 124.0, 29.0, 73.0, 16.0, 53.0, 14.0, 48.0, 46.0]
2025-08-07 11:06:29,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 41 minutes, 50 seconds)
2025-08-07 11:08:12,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:13,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 67.24170 ± 98.107
2025-08-07 11:08:13,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [12.691053, 178.74026, 51.745213, 2.495493, 2.80212, 3.5766122, 317.50992, 65.99875, 38.45538, -1.5978602]
2025-08-07 11:08:13,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 158.0, 77.0, 15.0, 15.0, 106.0, 239.0, 82.0, 55.0, 13.0]
2025-08-07 11:08:13,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 41 minutes)
2025-08-07 11:09:54,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:09:55,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 32.03244 ± 37.388
2025-08-07 11:09:55,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [56.011433, 28.4579, 9.635609, 29.858269, 69.306656, -16.522142, 28.187344, 4.4389067, -4.782984, 115.73343]
2025-08-07 11:09:55,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [76.0, 35.0, 23.0, 60.0, 121.0, 120.0, 50.0, 15.0, 76.0, 107.0]
2025-08-07 11:09:55,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 41 minutes, 32 seconds)
2025-08-07 11:11:37,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:11:37,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 70.25549 ± 77.800
2025-08-07 11:11:37,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [197.85034, 0.70197594, 6.292612, 37.041557, 16.974445, 87.634094, 220.41913, -1.5358471, 109.61326, 27.563364]
2025-08-07 11:11:37,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [120.0, 19.0, 30.0, 43.0, 31.0, 146.0, 117.0, 10.0, 127.0, 53.0]
2025-08-07 11:11:37,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (70.26) for latency MM1Queue_a033_s075
2025-08-07 11:11:37,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 39 minutes, 36 seconds)
2025-08-07 11:13:20,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:13:21,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 41.78709 ± 76.974
2025-08-07 11:13:21,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.8254242, -1.4646926, -0.7092337, 155.78033, -1.1062601, 228.17471, 6.901227, 8.556052, 15.31513, 3.5981572]
2025-08-07 11:13:21,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 15.0, 16.0, 156.0, 58.0, 138.0, 110.0, 32.0, 77.0, 15.0]
2025-08-07 11:13:21,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 37 minutes, 46 seconds)
2025-08-07 11:15:04,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:04,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 87.98695 ± 104.269
2025-08-07 11:15:04,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [29.97616, 170.16084, 5.144292, 118.89645, -0.21783227, 10.875424, 159.08772, 1.9076418, 45.48065, 338.55823]
2025-08-07 11:15:04,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [39.0, 119.0, 17.0, 93.0, 10.0, 29.0, 116.0, 13.0, 97.0, 180.0]
2025-08-07 11:15:04,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (87.99) for latency MM1Queue_a033_s075
2025-08-07 11:15:04,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 36 minutes, 22 seconds)
2025-08-07 11:16:47,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:16:48,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 94.12909 ± 95.406
2025-08-07 11:16:48,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [36.17185, 187.05983, 3.8502643, 31.730335, 173.92007, -3.1009924, 7.4264693, 43.69628, 275.18723, 185.34952]
2025-08-07 11:16:48,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [49.0, 119.0, 17.0, 38.0, 119.0, 16.0, 61.0, 68.0, 162.0, 128.0]
2025-08-07 11:16:48,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (94.13) for latency MM1Queue_a033_s075
2025-08-07 11:16:48,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 34 minutes, 31 seconds)
2025-08-07 11:18:33,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:18:33,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 22.83342 ± 43.785
2025-08-07 11:18:33,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.7098155, 21.377491, 18.254534, 3.6753554, 6.5791497, 152.65529, 2.995062, 7.0931387, 0.03989627, 12.954507]
2025-08-07 11:18:33,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 37.0, 32.0, 15.0, 17.0, 178.0, 14.0, 18.0, 12.0, 26.0]
2025-08-07 11:18:33,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 33 minutes, 49 seconds)
2025-08-07 11:20:17,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:18,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 64.64240 ± 107.313
2025-08-07 11:20:18,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [67.047264, 0.7720622, -0.3672415, 10.11502, 0.8169715, -1.9629488, 45.617966, 342.3805, 181.76768, 0.2366588]
2025-08-07 11:20:18,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [73.0, 16.0, 23.0, 22.0, 11.0, 13.0, 59.0, 232.0, 115.0, 13.0]
2025-08-07 11:20:18,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 32 minutes, 36 seconds)
2025-08-07 11:22:00,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:01,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 94.32698 ± 104.918
2025-08-07 11:22:01,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [25.20287, 252.09825, 2.3326528, 28.441786, -0.4043594, 214.44658, -1.1252704, 212.56389, 207.05304, 2.6603742]
2025-08-07 11:22:01,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [50.0, 153.0, 35.0, 44.0, 10.0, 138.0, 16.0, 136.0, 120.0, 12.0]
2025-08-07 11:22:01,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (94.33) for latency MM1Queue_a033_s075
2025-08-07 11:22:01,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 30 minutes, 49 seconds)
2025-08-07 11:23:43,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:23:44,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 92.62249 ± 90.653
2025-08-07 11:23:44,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [165.65692, 186.98361, 82.87359, 1.9941922, 238.14325, 200.97783, 1.6872702, 3.2663958, 5.3317223, 39.310146]
2025-08-07 11:23:44,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [98.0, 113.0, 104.0, 16.0, 142.0, 115.0, 14.0, 14.0, 17.0, 48.0]
2025-08-07 11:23:44,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 28 minutes, 57 seconds)
2025-08-07 11:25:25,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:25:27,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 102.67717 ± 136.523
2025-08-07 11:25:27,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.06886563, 0.8926135, 2.1919641, 269.61957, 104.59746, 4.584188, 33.004612, 202.16689, 1.9874958, 407.65805]
2025-08-07 11:25:27,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 16.0, 13.0, 174.0, 181.0, 15.0, 89.0, 119.0, 11.0, 334.0]
2025-08-07 11:25:27,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (102.68) for latency MM1Queue_a033_s075
2025-08-07 11:25:27,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 27 minutes, 2 seconds)
2025-08-07 11:27:14,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:27:16,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 119.65141 ± 136.329
2025-08-07 11:27:16,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [344.69806, 2.494678, 0.029616699, 2.647237, 197.508, 197.29463, 3.8419251, 1.6453508, 95.45973, 350.89487]
2025-08-07 11:27:16,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [238.0, 17.0, 12.0, 13.0, 130.0, 264.0, 16.0, 14.0, 71.0, 193.0]
2025-08-07 11:27:16,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (119.65) for latency MM1Queue_a033_s075
2025-08-07 11:27:16,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 26 minutes, 15 seconds)
2025-08-07 11:28:57,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:57,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 54.87187 ± 82.839
2025-08-07 11:28:57,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.9245565, 0.5564132, 196.26366, 2.405104, 7.6716633, 1.3918388, 206.93819, 2.144068, 131.59286, 0.67941743]
2025-08-07 11:28:57,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 14.0, 106.0, 16.0, 32.0, 12.0, 120.0, 15.0, 223.0, 15.0]
2025-08-07 11:28:57,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 23 minutes, 45 seconds)
2025-08-07 11:30:42,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:30:43,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 167.64352 ± 82.807
2025-08-07 11:30:43,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [196.08585, 193.72665, 198.988, 89.71394, 282.3993, 177.7858, 91.12369, 11.584008, 141.45494, 293.57315]
2025-08-07 11:30:43,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [113.0, 144.0, 114.0, 63.0, 164.0, 102.0, 69.0, 32.0, 92.0, 170.0]
2025-08-07 11:30:43,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (167.64) for latency MM1Queue_a033_s075
2025-08-07 11:30:43,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 22 minutes, 44 seconds)
2025-08-07 11:32:29,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:32:30,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 73.81271 ± 81.195
2025-08-07 11:32:30,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-1.0272512, 202.92636, 179.8573, 5.0416713, 177.34734, -0.04233535, 20.68043, 0.0029049914, 36.55553, 116.785126]
2025-08-07 11:32:30,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 124.0, 175.0, 16.0, 106.0, 13.0, 51.0, 13.0, 108.0, 120.0]
2025-08-07 11:32:30,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 21 minutes, 59 seconds)
2025-08-07 11:34:16,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:16,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 26.23511 ± 38.560
2025-08-07 11:34:16,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [4.3211975, 3.6029, -2.9760315, 0.99611473, 117.669075, 4.218656, -2.378397, 30.346779, 78.20656, 28.34423]
2025-08-07 11:34:16,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [41.0, 16.0, 15.0, 17.0, 90.0, 14.0, 15.0, 46.0, 132.0, 43.0]
2025-08-07 11:34:16,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 21 minutes, 10 seconds)
2025-08-07 11:35:58,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:35:59,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 97.47541 ± 97.810
2025-08-07 11:35:59,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.7231195, 160.68857, 234.39252, 46.23745, 58.19897, 74.707825, 305.61047, 72.185135, 7.0966244, 11.91344]
2025-08-07 11:35:59,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 137.0, 140.0, 56.0, 67.0, 102.0, 254.0, 70.0, 15.0, 113.0]
2025-08-07 11:35:59,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 17 minutes, 55 seconds)
2025-08-07 11:37:41,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:37:42,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 188.21751 ± 150.343
2025-08-07 11:37:42,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [58.42757, 295.34433, 346.0719, -2.756898, 254.10806, 225.29306, 457.77567, 36.84904, 2.1960075, 208.86632]
2025-08-07 11:37:42,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [110.0, 168.0, 161.0, 16.0, 139.0, 112.0, 227.0, 95.0, 14.0, 115.0]
2025-08-07 11:37:42,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (188.22) for latency MM1Queue_a033_s075
2025-08-07 11:37:42,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 16 minutes, 25 seconds)
2025-08-07 11:39:25,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:26,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 97.68202 ± 111.203
2025-08-07 11:39:26,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [262.98184, -0.4722577, 68.601715, 13.90985, 18.590286, -0.9391813, 95.10637, 236.12083, 282.62094, 0.29981884]
2025-08-07 11:39:26,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [162.0, 15.0, 76.0, 38.0, 40.0, 14.0, 140.0, 147.0, 146.0, 13.0]
2025-08-07 11:39:26,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 14 minutes, 16 seconds)
2025-08-07 11:41:11,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:12,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 171.94955 ± 128.152
2025-08-07 11:41:12,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.1631498, 329.9133, 202.42339, 212.94539, 1.2677431, 0.938296, 199.77348, 382.2209, 237.52885, 149.32106]
2025-08-07 11:41:12,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 172.0, 113.0, 143.0, 16.0, 11.0, 112.0, 206.0, 142.0, 188.0]
2025-08-07 11:41:12,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 12 minutes, 21 seconds)
2025-08-07 11:42:58,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:43:00,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 113.93452 ± 160.640
2025-08-07 11:43:00,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [4.467639, 1.2728527, -1.0603501, 4.6868443, 33.966267, 455.32834, 3.1300178, 368.91516, 74.53407, 194.1044]
2025-08-07 11:43:00,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 14.0, 16.0, 42.0, 273.0, 16.0, 224.0, 115.0, 215.0]
2025-08-07 11:43:00,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 10 minutes, 51 seconds)
2025-08-07 11:44:43,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:44:45,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 179.62958 ± 180.196
2025-08-07 11:44:45,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [196.18427, 2.9107974, 457.2811, 346.5656, 40.541668, 283.99588, 1.6905398, 443.4929, -2.2890346, 25.922062]
2025-08-07 11:44:45,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [106.0, 13.0, 311.0, 223.0, 50.0, 233.0, 15.0, 211.0, 11.0, 65.0]
2025-08-07 11:44:45,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 9 minutes, 35 seconds)
2025-08-07 11:46:31,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:46:33,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 184.27817 ± 234.165
2025-08-07 11:46:33,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [249.30818, 5.769507, 5.2415376, 404.65973, 380.50934, 7.138496, 52.852886, 716.961, 16.451927, 3.8891397]
2025-08-07 11:46:33,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [130.0, 16.0, 15.0, 265.0, 257.0, 20.0, 75.0, 486.0, 51.0, 13.0]
2025-08-07 11:46:33,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 9 minutes, 12 seconds)
2025-08-07 11:48:15,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:48:16,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 225.53792 ± 161.288
2025-08-07 11:48:16,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [69.8407, 339.98828, 73.99229, 1.4832784, 464.25476, 3.2414036, 356.19708, 308.25357, 349.2502, 288.87756]
2025-08-07 11:48:16,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [132.0, 193.0, 95.0, 11.0, 269.0, 16.0, 168.0, 140.0, 127.0, 212.0]
2025-08-07 11:48:16,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (225.54) for latency MM1Queue_a033_s075
2025-08-07 11:48:16,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 7 minutes, 9 seconds)
2025-08-07 11:49:59,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:50:00,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 240.11600 ± 118.451
2025-08-07 11:50:00,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [49.1922, 249.64832, 248.70804, 1.3540875, 249.81633, 304.8395, 416.59525, 296.7346, 255.82834, 328.44342]
2025-08-07 11:50:00,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [63.0, 137.0, 126.0, 11.0, 237.0, 169.0, 213.0, 165.0, 130.0, 154.0]
2025-08-07 11:50:00,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (240.12) for latency MM1Queue_a033_s075
2025-08-07 11:50:00,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 4 minutes, 59 seconds)
2025-08-07 11:51:44,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:51:45,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 179.54956 ± 119.714
2025-08-07 11:51:45,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [25.242138, 243.52882, 333.43466, 162.43236, 2.9294677, 226.76392, 255.32137, 264.76645, -4.3150997, 285.39148]
2025-08-07 11:51:45,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [43.0, 196.0, 159.0, 166.0, 13.0, 128.0, 142.0, 190.0, 16.0, 136.0]
2025-08-07 11:51:45,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 2 minutes, 40 seconds)
2025-08-07 11:53:27,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:53:29,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 193.14500 ± 140.234
2025-08-07 11:53:29,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [40.62163, 126.511246, 230.47482, 0.95261323, 381.59995, 303.6704, 310.62555, 367.12396, -2.815607, 172.68544]
2025-08-07 11:53:29,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [46.0, 104.0, 117.0, 17.0, 217.0, 247.0, 189.0, 213.0, 22.0, 95.0]
2025-08-07 11:53:29,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 32 seconds)
2025-08-07 11:55:14,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:15,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 182.65028 ± 173.342
2025-08-07 11:55:15,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-1.3779697, 91.70091, 302.90863, 386.09222, 1.773353, 322.6787, 76.36218, 140.9569, 0.4051753, 505.00266]
2025-08-07 11:55:15,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 93.0, 135.0, 222.0, 14.0, 157.0, 84.0, 114.0, 14.0, 251.0]
2025-08-07 11:55:15,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 58 minutes, 25 seconds)
2025-08-07 11:56:58,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:56:59,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 120.39278 ± 166.736
2025-08-07 11:56:59,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.16171993, 0.54532135, 2.838734, 465.79855, 6.487197, 8.75795, 295.2724, 9.099201, 334.8386, 80.45149]
2025-08-07 11:56:59,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 23.0, 14.0, 221.0, 17.0, 18.0, 189.0, 50.0, 161.0, 65.0]
2025-08-07 11:56:59,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 56 minutes, 41 seconds)
2025-08-07 11:58:43,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:58:45,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 270.84344 ± 195.855
2025-08-07 11:58:45,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [348.05835, 447.74356, 276.08078, 5.9489837, 38.842316, 10.540028, 426.3021, 198.98509, 618.373, 337.56036]
2025-08-07 11:58:45,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [169.0, 299.0, 132.0, 18.0, 54.0, 74.0, 358.0, 118.0, 292.0, 191.0]
2025-08-07 11:58:45,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (270.84) for latency MM1Queue_a033_s075
2025-08-07 11:58:45,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 55 minutes, 26 seconds)
2025-08-07 12:00:31,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:00:33,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 211.82950 ± 171.961
2025-08-07 12:00:33,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [74.61763, 114.53497, 377.69617, 305.57208, 565.4571, 338.5965, 169.07948, 5.556067, 163.35779, 3.8272634]
2025-08-07 12:00:33,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [62.0, 111.0, 196.0, 142.0, 426.0, 183.0, 147.0, 14.0, 128.0, 16.0]
2025-08-07 12:00:33,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 54 minutes, 14 seconds)
2025-08-07 12:02:14,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:02:16,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 343.14801 ± 236.625
2025-08-07 12:02:16,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [90.65697, 672.3138, 653.4655, 365.1901, 469.725, -0.7399367, 277.59103, 3.7462673, 534.976, 364.55545]
2025-08-07 12:02:16,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [83.0, 341.0, 376.0, 206.0, 237.0, 15.0, 127.0, 15.0, 327.0, 240.0]
2025-08-07 12:02:16,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (343.15) for latency MM1Queue_a033_s075
2025-08-07 12:02:16,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 52 minutes, 29 seconds)
2025-08-07 12:04:01,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:04:03,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 285.02737 ± 140.399
2025-08-07 12:04:03,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [143.68256, 312.49863, 193.98082, 445.92868, 327.81177, 241.62662, 507.58987, -1.5117427, 317.4729, 361.1935]
2025-08-07 12:04:03,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 154.0, 115.0, 208.0, 161.0, 143.0, 253.0, 18.0, 184.0, 196.0]
2025-08-07 12:04:03,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 50 minutes, 43 seconds)
2025-08-07 12:05:45,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:47,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 247.59290 ± 139.952
2025-08-07 12:05:47,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [390.44113, 292.8508, 358.4929, 1.3076233, 0.5931004, 401.72562, 265.79672, 221.15924, 353.90994, 189.65189]
2025-08-07 12:05:47,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [200.0, 167.0, 248.0, 13.0, 13.0, 191.0, 124.0, 113.0, 196.0, 134.0]
2025-08-07 12:05:47,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 49 minutes, 13 seconds)
2025-08-07 12:07:28,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:07:29,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 165.39999 ± 150.252
2025-08-07 12:07:29,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [319.12726, 8.532222, 253.83453, 0.17866439, 296.75043, 303.9543, 83.24247, 6.5115175, 381.52478, 0.34389132]
2025-08-07 12:07:29,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [210.0, 17.0, 137.0, 14.0, 129.0, 147.0, 122.0, 18.0, 196.0, 15.0]
2025-08-07 12:07:29,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 46 minutes, 35 seconds)
2025-08-07 12:09:12,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:09:13,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 154.52861 ± 143.214
2025-08-07 12:09:13,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.0025635, 133.69867, -5.553303, 283.16614, 2.1928394, 339.4937, 365.031, 298.93393, 56.686993, 68.63351]
2025-08-07 12:09:13,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 150.0, 16.0, 143.0, 15.0, 137.0, 165.0, 147.0, 65.0, 55.0]
2025-08-07 12:09:13,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 44 minutes, 2 seconds)
2025-08-07 12:10:57,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:58,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 172.27094 ± 172.197
2025-08-07 12:10:58,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.6643133, 291.61615, 6.122572, 3.1863587, 409.78748, 305.82904, 397.38995, 297.9491, 2.566907, 6.5974736]
2025-08-07 12:10:58,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 162.0, 16.0, 17.0, 185.0, 155.0, 223.0, 148.0, 16.0, 16.0]
2025-08-07 12:10:58,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 42 minutes, 39 seconds)
2025-08-07 12:12:41,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:42,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 130.67532 ± 158.726
2025-08-07 12:12:42,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [441.1206, 249.01938, 348.0303, 203.05577, 0.7391706, 0.90630156, 5.614016, 0.15698197, -2.7578988, 60.868603]
2025-08-07 12:12:42,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [229.0, 133.0, 162.0, 111.0, 14.0, 17.0, 19.0, 11.0, 16.0, 156.0]
2025-08-07 12:12:42,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 40 minutes, 21 seconds)
2025-08-07 12:14:29,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:31,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 201.42201 ± 187.415
2025-08-07 12:14:31,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [512.04376, -3.0777283, 1.136959, 342.0568, 0.3927899, 158.4216, 38.483543, 456.73935, 336.21902, 171.80403]
2025-08-07 12:14:31,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [290.0, 9.0, 11.0, 170.0, 16.0, 120.0, 74.0, 234.0, 166.0, 159.0]
2025-08-07 12:14:31,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 39 minutes, 28 seconds)
2025-08-07 12:16:16,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:17,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 106.56152 ± 166.932
2025-08-07 12:16:17,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.061606, 151.84346, 2.9520302, 502.9716, 335.53333, 58.93847, -0.986213, 4.714246, 9.148391, -1.5617846]
2025-08-07 12:16:17,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 96.0, 13.0, 214.0, 174.0, 61.0, 14.0, 16.0, 45.0, 10.0]
2025-08-07 12:16:17,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 38 minutes, 27 seconds)
2025-08-07 12:18:01,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:18:03,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 214.16008 ± 156.297
2025-08-07 12:18:03,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [419.56332, 311.49063, -1.3589687, 377.96454, 5.772517, 376.15503, 263.21082, 156.31833, 5.002489, 227.48227]
2025-08-07 12:18:03,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [279.0, 168.0, 10.0, 200.0, 17.0, 175.0, 145.0, 95.0, 14.0, 124.0]
2025-08-07 12:18:03,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 37 minutes, 7 seconds)
2025-08-07 12:19:46,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:19:48,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 194.11047 ± 183.320
2025-08-07 12:19:48,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.66494, 439.28342, 338.3404, 397.8082, 441.8695, 3.6252325, -0.8013849, -3.759129, 162.5169, 159.5567]
2025-08-07 12:19:48,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 254.0, 186.0, 208.0, 247.0, 19.0, 13.0, 17.0, 88.0, 69.0]
2025-08-07 12:19:48,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 35 minutes, 22 seconds)
2025-08-07 12:21:32,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:21:33,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 242.41463 ± 170.441
2025-08-07 12:21:34,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [325.10492, 427.78476, 429.8192, 377.82816, 3.5573938, 384.80106, -1.9285201, 301.98315, 148.13614, 27.059816]
2025-08-07 12:21:34,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [178.0, 204.0, 226.0, 217.0, 17.0, 224.0, 9.0, 146.0, 110.0, 85.0]
2025-08-07 12:21:34,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 33 minutes, 56 seconds)
2025-08-07 12:23:17,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:23:18,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 289.58157 ± 148.959
2025-08-07 12:23:18,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [203.72922, 339.94797, 272.87152, 341.5196, 271.83002, 320.50015, 632.983, -0.346481, 214.28062, 298.50006]
2025-08-07 12:23:18,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [177.0, 179.0, 162.0, 171.0, 173.0, 156.0, 264.0, 14.0, 132.0, 150.0]
2025-08-07 12:23:19,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 31 minutes, 28 seconds)
2025-08-07 12:25:02,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:25:03,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 176.47409 ± 127.343
2025-08-07 12:25:03,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [368.38232, 150.18842, 2.3075688, 4.4632688, 207.84047, 321.87177, 250.91818, 0.41682822, 245.02487, 213.32718]
2025-08-07 12:25:03,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [173.0, 195.0, 13.0, 17.0, 106.0, 147.0, 228.0, 15.0, 184.0, 131.0]
2025-08-07 12:25:03,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 29 minutes, 29 seconds)
2025-08-07 12:26:43,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:26:45,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 218.28738 ± 170.914
2025-08-07 12:26:45,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [337.77637, -0.9026899, 515.304, 1.1491014, 217.6535, -1.8336444, 258.31967, 151.87816, 305.25278, 398.2764]
2025-08-07 12:26:45,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [234.0, 15.0, 322.0, 18.0, 144.0, 9.0, 141.0, 99.0, 155.0, 239.0]
2025-08-07 12:26:45,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 27 minutes, 5 seconds)
2025-08-07 12:28:29,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:28:31,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 228.67940 ± 174.653
2025-08-07 12:28:31,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-1.6292936, 377.95535, 473.36557, 0.6395615, 193.00676, 389.41354, 159.66805, 392.55167, 304.2354, -2.4124777]
2025-08-07 12:28:31,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 182.0, 277.0, 12.0, 99.0, 196.0, 72.0, 186.0, 152.0, 11.0]
2025-08-07 12:28:31,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 25 minutes, 25 seconds)
2025-08-07 12:30:14,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:30:15,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 175.70877 ± 160.078
2025-08-07 12:30:15,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [294.76138, 295.83813, 108.91491, 1.431451, 4.4503675, 272.48022, 351.20416, 0.626219, 425.5638, 1.8170073]
2025-08-07 12:30:15,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [150.0, 165.0, 94.0, 14.0, 15.0, 150.0, 187.0, 15.0, 206.0, 13.0]
2025-08-07 12:30:15,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 23 minutes, 25 seconds)
2025-08-07 12:32:04,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:32:06,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 229.75241 ± 139.746
2025-08-07 12:32:06,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [216.17542, 260.64566, 2.5257976, 285.341, 113.8331, 292.22745, 305.0035, 2.5868113, 417.84637, 401.3391]
2025-08-07 12:32:06,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 194.0, 12.0, 154.0, 102.0, 165.0, 169.0, 13.0, 175.0, 207.0]
2025-08-07 12:32:06,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 22 minutes, 33 seconds)
2025-08-07 12:33:48,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:33:50,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 245.26758 ± 147.249
2025-08-07 12:33:50,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [286.3263, 298.55524, 383.20245, 1.1684824, 324.1769, 454.5001, 247.65245, 336.58926, 1.6188265, 118.88581]
2025-08-07 12:33:50,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [150.0, 148.0, 179.0, 11.0, 212.0, 275.0, 136.0, 175.0, 14.0, 116.0]
2025-08-07 12:33:50,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 20 minutes, 43 seconds)
2025-08-07 12:35:34,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:35:35,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 210.08260 ± 149.971
2025-08-07 12:35:35,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [283.2745, 220.42906, 370.00003, 175.2479, 367.7136, -1.9207827, 358.3402, 325.84863, -1.3318368, 3.224635]
2025-08-07 12:35:35,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [156.0, 118.0, 194.0, 101.0, 172.0, 14.0, 209.0, 165.0, 9.0, 18.0]
2025-08-07 12:35:35,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 19 minutes, 32 seconds)
2025-08-07 12:37:20,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:37:21,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 172.17569 ± 148.448
2025-08-07 12:37:21,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [277.93124, 3.7021115, 217.65909, 420.73505, 4.9045486, 233.7342, 287.5976, -1.4888238, -1.9177492, 278.8994]
2025-08-07 12:37:21,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [140.0, 16.0, 113.0, 175.0, 15.0, 173.0, 174.0, 10.0, 16.0, 144.0]
2025-08-07 12:37:21,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 17 minutes, 48 seconds)
2025-08-07 12:39:04,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:39:05,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 142.34502 ± 181.258
2025-08-07 12:39:05,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [373.34702, -0.3367434, -1.5455326, 7.370624, 204.82355, 402.429, -4.818909, 433.30405, 5.922132, 2.955024]
2025-08-07 12:39:05,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [195.0, 12.0, 12.0, 21.0, 176.0, 195.0, 17.0, 201.0, 16.0, 13.0]
2025-08-07 12:39:05,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 15 minutes, 59 seconds)
2025-08-07 12:40:48,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:40:49,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 233.03308 ± 209.217
2025-08-07 12:40:49,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [419.05167, 419.2713, 6.3903785, 2.233973, 2.0373569, 608.3529, 2.011051, 251.59473, 264.41217, 354.97534]
2025-08-07 12:40:49,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [253.0, 223.0, 17.0, 16.0, 14.0, 292.0, 11.0, 128.0, 136.0, 199.0]
2025-08-07 12:40:49,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 13 minutes, 20 seconds)
2025-08-07 12:42:36,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:42:37,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 233.20963 ± 191.007
2025-08-07 12:42:37,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [220.06512, 288.57925, 596.72144, -2.0860431, 368.31277, 3.2100866, 2.7956955, 117.9933, 361.08676, 375.41763]
2025-08-07 12:42:37,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [112.0, 182.0, 311.0, 10.0, 207.0, 16.0, 14.0, 119.0, 162.0, 196.0]
2025-08-07 12:42:37,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 12 minutes, 5 seconds)
2025-08-07 12:44:19,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:44:21,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 322.10535 ± 102.683
2025-08-07 12:44:21,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [175.78764, 338.54053, 177.27777, 341.04013, 283.6569, 345.6902, 317.3438, 566.4682, 346.95032, 328.2981]
2025-08-07 12:44:21,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [101.0, 178.0, 104.0, 161.0, 153.0, 184.0, 166.0, 237.0, 159.0, 210.0]
2025-08-07 12:44:21,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 10 minutes, 4 seconds)
2025-08-07 12:46:07,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:46:09,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 203.86838 ± 146.844
2025-08-07 12:46:09,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [333.77048, 435.9206, 2.0231016, 1.7117662, 2.3902977, 262.53317, 168.33553, 293.64246, 242.59196, 295.76456]
2025-08-07 12:46:09,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [184.0, 215.0, 14.0, 15.0, 13.0, 132.0, 93.0, 164.0, 146.0, 155.0]
2025-08-07 12:46:09,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 8 minutes, 32 seconds)
2025-08-07 12:47:50,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:47:51,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 164.73572 ± 148.453
2025-08-07 12:47:51,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [254.32608, 3.2396414, -1.0796239, 173.88464, 169.69176, -3.6817565, 341.09116, 322.72528, 3.0766573, 384.08344]
2025-08-07 12:47:51,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [167.0, 16.0, 15.0, 100.0, 135.0, 13.0, 171.0, 159.0, 13.0, 198.0]
2025-08-07 12:47:51,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 6 minutes, 38 seconds)
2025-08-07 12:49:31,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:49:33,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 235.24422 ± 161.395
2025-08-07 12:49:33,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [291.85822, 0.10884486, 299.81458, 293.1181, 310.19836, 33.19162, 296.63184, 526.3751, 0.30030054, 300.84494]
2025-08-07 12:49:33,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [166.0, 12.0, 149.0, 148.0, 153.0, 122.0, 184.0, 397.0, 11.0, 156.0]
2025-08-07 12:49:33,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 4 minutes, 34 seconds)
2025-08-07 12:51:13,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:51:15,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 332.90308 ± 178.983
2025-08-07 12:51:15,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [471.93158, 264.3118, 581.5345, 222.19334, -1.6671705, 256.0558, 378.05283, 222.21068, 302.83395, 631.57355]
2025-08-07 12:51:15,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [250.0, 140.0, 260.0, 133.0, 17.0, 137.0, 172.0, 114.0, 288.0, 299.0]
2025-08-07 12:51:15,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 2 minutes, 11 seconds)
2025-08-07 12:52:58,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:52:59,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 242.97131 ± 163.474
2025-08-07 12:52:59,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [392.61298, 2.447636, 225.1825, 5.694072, 257.1567, 534.8631, 219.98904, 166.93811, 432.86954, 191.95949]
2025-08-07 12:52:59,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [196.0, 18.0, 116.0, 17.0, 130.0, 318.0, 112.0, 90.0, 214.0, 107.0]
2025-08-07 12:52:59,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 29 seconds)
2025-08-07 12:54:44,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:54:46,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 260.98175 ± 214.154
2025-08-07 12:54:46,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [7.528715, 1.7844635, 589.8274, 166.11784, 106.20438, 332.30115, 98.262184, 415.1382, 270.16135, 622.49176]
2025-08-07 12:54:46,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 16.0, 317.0, 92.0, 147.0, 184.0, 104.0, 258.0, 132.0, 368.0]
2025-08-07 12:54:46,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 58 minutes, 35 seconds)
2025-08-07 12:56:26,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:56:27,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 210.03935 ± 178.156
2025-08-07 12:56:27,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [174.62254, 332.0807, 59.59364, 1.4684001, 3.5871012, 375.03036, 3.51662, 530.33484, 307.89166, 312.2678]
2025-08-07 12:56:27,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [90.0, 155.0, 63.0, 16.0, 15.0, 190.0, 16.0, 300.0, 149.0, 170.0]
2025-08-07 12:56:27,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 56 minutes, 43 seconds)
2025-08-07 12:58:12,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:58:13,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 130.37543 ± 201.485
2025-08-07 12:58:13,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [336.68753, 363.0168, -0.36560607, 27.760048, -4.352359, 0.33746937, 5.6906266, 574.4336, 1.1851776, -0.63900954]
2025-08-07 12:58:13,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [201.0, 164.0, 11.0, 87.0, 15.0, 11.0, 17.0, 323.0, 12.0, 12.0]
2025-08-07 12:58:13,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 55 minutes, 28 seconds)
2025-08-07 12:59:53,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:59:55,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 272.51324 ± 199.859
2025-08-07 12:59:55,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [556.6986, 497.80542, 447.6823, -2.0727787, -0.71167797, 390.52448, 291.4488, 2.530946, 270.4154, 270.81137]
2025-08-07 12:59:55,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [271.0, 297.0, 231.0, 15.0, 13.0, 190.0, 169.0, 14.0, 137.0, 118.0]
2025-08-07 12:59:55,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 53 minutes, 40 seconds)
2025-08-07 13:01:38,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:01:39,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 221.21750 ± 155.802
2025-08-07 13:01:39,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [255.4774, 1.3373282, 371.70026, 204.17503, 338.70828, 292.39636, 435.7388, 2.2804577, -0.9302714, 311.29147]
2025-08-07 13:01:39,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [150.0, 13.0, 165.0, 176.0, 155.0, 152.0, 208.0, 16.0, 16.0, 160.0]
2025-08-07 13:01:39,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 59 seconds)
2025-08-07 13:03:20,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:03:22,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 189.82825 ± 188.257
2025-08-07 13:03:22,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [537.3483, -0.0021637217, 277.05362, 428.13287, 88.58259, 5.803809, 300.2719, 1.0179317, -0.52030224, 260.59372]
2025-08-07 13:03:22,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [290.0, 14.0, 147.0, 180.0, 118.0, 15.0, 152.0, 11.0, 16.0, 139.0]
2025-08-07 13:03:22,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 52 seconds)
2025-08-07 13:05:03,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:05:04,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 230.39116 ± 192.151
2025-08-07 13:05:04,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [338.79095, 1.5513644, 458.27286, 3.3743396, 2.2284455, 4.2526445, 421.39893, 444.26926, 283.8082, 345.9647]
2025-08-07 13:05:04,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [246.0, 11.0, 242.0, 15.0, 14.0, 14.0, 198.0, 204.0, 217.0, 156.0]
2025-08-07 13:05:04,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 16 seconds)
2025-08-07 13:06:48,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:06:49,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 172.70438 ± 164.598
2025-08-07 13:06:49,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [235.09764, -2.3454416, 226.48305, 1.8107613, 126.07037, 337.3127, 1.6000904, 0.20197462, 478.8002, 322.0124]
2025-08-07 13:06:49,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 15.0, 120.0, 12.0, 119.0, 170.0, 18.0, 15.0, 229.0, 189.0]
2025-08-07 13:06:49,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes, 27 seconds)
2025-08-07 13:08:32,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:08:34,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 237.06396 ± 282.407
2025-08-07 13:08:34,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [846.8709, 251.84279, 0.9938164, 577.5604, 0.7104524, 410.82312, 276.59988, 2.8296423, 5.7668614, -3.3584075]
2025-08-07 13:08:34,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [357.0, 175.0, 15.0, 266.0, 27.0, 264.0, 185.0, 14.0, 29.0, 15.0]
2025-08-07 13:08:34,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 58 seconds)
2025-08-07 13:10:15,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:10:17,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 290.73657 ± 185.581
2025-08-07 13:10:17,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [326.58264, 259.9614, 224.19875, 5.8697596, 598.2676, 551.66534, 395.70877, 4.37193, 263.90335, 276.83636]
2025-08-07 13:10:17,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [160.0, 122.0, 111.0, 16.0, 268.0, 251.0, 160.0, 15.0, 125.0, 136.0]
2025-08-07 13:10:17,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 43 minutes, 7 seconds)
2025-08-07 13:11:58,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:12:00,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 304.90213 ± 224.357
2025-08-07 13:12:00,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [183.88094, 265.38055, 2.5132968, 338.79532, 8.264311, 321.84824, 614.0046, 684.26215, 132.52597, 497.54614]
2025-08-07 13:12:00,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [346.0, 273.0, 13.0, 140.0, 27.0, 205.0, 281.0, 372.0, 152.0, 226.0]
2025-08-07 13:12:00,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 41 minutes, 29 seconds)
2025-08-07 13:13:46,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:13:48,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 309.68878 ± 238.868
2025-08-07 13:13:48,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [491.15616, -1.3942847, 548.6392, 498.48364, 226.46928, -0.0083062155, 6.43137, 410.40573, 680.82434, 235.88092]
2025-08-07 13:13:48,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [218.0, 11.0, 239.0, 229.0, 231.0, 11.0, 18.0, 160.0, 319.0, 114.0]
2025-08-07 13:13:48,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 40 minutes, 7 seconds)
2025-08-07 13:15:31,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:15:32,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 242.27971 ± 259.762
2025-08-07 13:15:32,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.6725626, 268.51474, 636.27515, 0.8441286, 397.98425, -1.1721201, 0.46978715, -0.46958497, 588.9013, 528.7768]
2025-08-07 13:15:32,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 203.0, 391.0, 16.0, 224.0, 10.0, 27.0, 15.0, 280.0, 260.0]
2025-08-07 13:15:32,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 38 minutes, 22 seconds)
2025-08-07 13:17:13,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:17:17,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 605.85590 ± 246.620
2025-08-07 13:17:17,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [567.5181, 543.7351, 210.39996, 612.7997, 373.69595, 620.97327, 705.98047, 914.4374, 1102.9918, 406.02734]
2025-08-07 13:17:17,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [285.0, 241.0, 154.0, 288.0, 176.0, 278.0, 285.0, 377.0, 463.0, 262.0]
2025-08-07 13:17:17,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (605.86) for latency MM1Queue_a033_s075
2025-08-07 13:17:17,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 35 seconds)
2025-08-07 13:18:55,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:18:57,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 314.48462 ± 267.611
2025-08-07 13:18:57,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.44217592, 331.95224, 585.4445, 467.69876, -1.1730285, 511.21094, 640.1957, 601.88696, 3.9521348, 3.2357614]
2025-08-07 13:18:57,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 140.0, 239.0, 202.0, 8.0, 211.0, 314.0, 354.0, 18.0, 15.0]
2025-08-07 13:18:57,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 39 seconds)
2025-08-07 13:20:41,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:20:43,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 266.45258 ± 214.231
2025-08-07 13:20:43,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [8.640167, 522.48566, 447.77005, 165.76909, 249.96896, -4.6516757, 238.64017, 0.8527247, 597.3869, 437.66376]
2025-08-07 13:20:43,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 224.0, 174.0, 155.0, 105.0, 8.0, 121.0, 18.0, 277.0, 192.0]
2025-08-07 13:20:43,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 5 seconds)
2025-08-07 13:22:26,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:22:27,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 275.50446 ± 245.594
2025-08-07 13:22:27,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [235.19745, 111.87919, 578.9303, 264.06476, 491.39142, 0.84061104, 329.72855, 7.053118, 734.3188, 1.6403654]
2025-08-07 13:22:27,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [168.0, 163.0, 270.0, 134.0, 224.0, 10.0, 147.0, 18.0, 366.0, 16.0]
2025-08-07 13:22:27,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 10 seconds)
2025-08-07 13:24:08,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:24:10,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 439.50995 ± 351.492
2025-08-07 13:24:10,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [819.09186, 3.3696468, 330.62744, -0.5778431, 397.4007, 1055.3151, 771.55444, 555.50793, 3.0178506, 459.79242]
2025-08-07 13:24:10,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [388.0, 18.0, 132.0, 10.0, 165.0, 446.0, 404.0, 252.0, 16.0, 179.0]
2025-08-07 13:24:10,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 19 seconds)
2025-08-07 13:25:49,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:25:51,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 285.42450 ± 425.878
2025-08-07 13:25:51,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.45764625, 579.6066, 3.456975, 0.22666839, 0.30024704, 1366.7812, 551.926, 0.0705107, 345.97244, 5.4466205]
2025-08-07 13:25:51,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 243.0, 14.0, 17.0, 14.0, 592.0, 250.0, 15.0, 184.0, 15.0]
2025-08-07 13:25:51,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 24 seconds)
2025-08-07 13:27:32,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:27:34,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 359.88168 ± 316.384
2025-08-07 13:27:34,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-5.8634276, 407.10373, 2.2391505, 377.60074, 303.75333, 548.773, 426.51282, 5.748464, 1111.0406, 421.90863]
2025-08-07 13:27:34,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 170.0, 16.0, 150.0, 195.0, 226.0, 211.0, 15.0, 475.0, 177.0]
2025-08-07 13:27:34,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 51 seconds)
2025-08-07 13:29:14,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:29:16,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 321.50931 ± 210.139
2025-08-07 13:29:16,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [4.8450575, 628.9626, 467.9297, 0.4171342, 232.344, 232.23845, 269.49283, 449.9285, 616.53754, 312.39728]
2025-08-07 13:29:16,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 242.0, 200.0, 11.0, 114.0, 112.0, 131.0, 187.0, 353.0, 134.0]
2025-08-07 13:29:16,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 56 seconds)
2025-08-07 13:30:53,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:30:54,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 172.95531 ± 215.125
2025-08-07 13:30:54,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [470.83063, 171.66248, 191.00229, 2.1215231, 2.6604483, 3.3228567, -0.84691757, 642.53595, 247.81395, -1.55012]
2025-08-07 13:30:54,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 148.0, 108.0, 15.0, 15.0, 14.0, 12.0, 248.0, 199.0, 13.0]
2025-08-07 13:30:54,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 56 seconds)
2025-08-07 13:32:33,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:32:33,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 184.26089 ± 282.864
2025-08-07 13:32:33,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [6.3455396, 5.35391, 6.0126367, 6.6665387, 654.24414, 0.83505154, 4.9173265, 8.461146, 733.5894, 416.18338]
2025-08-07 13:32:33,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 15.0, 47.0, 267.0, 13.0, 16.0, 19.0, 304.0, 171.0]
2025-08-07 13:32:33,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 8 seconds)
2025-08-07 13:34:11,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:34:14,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 424.93790 ± 326.971
2025-08-07 13:34:14,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [934.39386, 977.3523, 217.61606, 590.0319, 0.66475767, 560.3428, 450.74365, 2.669421, 279.60693, 235.95749]
2025-08-07 13:34:14,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [460.0, 432.0, 156.0, 346.0, 13.0, 238.0, 205.0, 13.0, 119.0, 180.0]
2025-08-07 13:34:14,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 26 seconds)
2025-08-07 13:35:53,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:35:55,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 401.22382 ± 321.191
2025-08-07 13:35:55,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [714.0334, 264.84335, 2.9061773, 772.6817, 483.09805, 871.75256, -1.1641259, 626.8705, -2.2522035, 279.4688]
2025-08-07 13:35:55,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [283.0, 163.0, 14.0, 312.0, 269.0, 330.0, 17.0, 309.0, 11.0, 180.0]
2025-08-07 13:35:55,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 42 seconds)
2025-08-07 13:37:36,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:37:38,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 589.39246 ± 303.061
2025-08-07 13:37:38,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [616.8364, 451.03058, 381.50967, 998.0896, 960.2195, 849.2329, 439.69873, 842.66486, 346.32736, 8.315212]
2025-08-07 13:37:38,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [205.0, 190.0, 144.0, 411.0, 416.0, 330.0, 205.0, 304.0, 138.0, 35.0]
2025-08-07 13:37:38,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 4 seconds)
2025-08-07 13:39:15,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:39:15,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 120.69308 ± 163.053
2025-08-07 13:39:15,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [86.14496, 431.93744, 4.8421893, 18.280712, 320.48767, 2.9171436, 0.4790925, 5.422611, -1.8558518, 338.2748]
2025-08-07 13:39:15,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [60.0, 179.0, 18.0, 53.0, 126.0, 15.0, 12.0, 16.0, 8.0, 137.0]
2025-08-07 13:39:15,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 22 seconds)
2025-08-07 13:40:57,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:40:58,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 378.78159 ± 333.952
2025-08-07 13:40:58,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [792.9439, 508.47348, 472.57098, 403.3043, 746.4894, 848.1639, 4.684322, 2.1810043, 1.8858814, 7.1189985]
2025-08-07 13:40:58,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [274.0, 172.0, 180.0, 164.0, 260.0, 341.0, 16.0, 16.0, 14.0, 17.0]
2025-08-07 13:40:58,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 46 seconds)
2025-08-07 13:42:38,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:42:40,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 330.93042 ± 292.420
2025-08-07 13:42:40,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [766.34485, 465.54132, 778.3409, 251.61514, 410.95004, 548.2457, 0.77090377, -0.6160629, 0.65450346, 87.4567]
2025-08-07 13:42:40,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [352.0, 222.0, 334.0, 113.0, 148.0, 254.0, 12.0, 46.0, 13.0, 111.0]
2025-08-07 13:42:40,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 7 seconds)
2025-08-07 13:44:15,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:44:19,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 758.42493 ± 558.268
2025-08-07 13:44:19,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.5767517, 823.4147, 582.01996, 1746.8547, 880.22095, 1519.4395, 896.7712, 934.5585, 198.40741, 0.98546636]
2025-08-07 13:44:19,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 302.0, 239.0, 763.0, 329.0, 638.0, 365.0, 388.0, 108.0, 42.0]
2025-08-07 13:44:19,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (758.42) for latency MM1Queue_a033_s075
2025-08-07 13:44:19,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 24 seconds)
2025-08-07 13:46:01,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:46:02,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 343.27097 ± 294.997
2025-08-07 13:46:02,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [515.1267, 845.13, 181.59691, 4.0122285, 326.4963, 3.3742645, 272.8855, 518.0308, 765.86816, 0.18863146]
2025-08-07 13:46:02,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [207.0, 339.0, 108.0, 14.0, 140.0, 16.0, 118.0, 238.0, 302.0, 12.0]
2025-08-07 13:46:02,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 43 seconds)
2025-08-07 13:47:38,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:47:39,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 299.21237 ± 317.593
2025-08-07 13:47:39,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [768.82416, 330.4769, -2.2412648, 3.141679, -2.6005363, -1.9233799, -0.7657136, 589.2101, 676.37665, 631.6252]
2025-08-07 13:47:39,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [286.0, 144.0, 14.0, 17.0, 13.0, 10.0, 11.0, 242.0, 416.0, 272.0]
2025-08-07 13:47:39,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 2 seconds)
2025-08-07 13:49:19,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:49:21,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 444.93646 ± 319.165
2025-08-07 13:49:21,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.99001884, 2.0036361, 405.9461, 730.32043, 632.17694, 813.465, 848.4589, 481.84402, 535.489, -1.3293829]
2025-08-07 13:49:21,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 12.0, 162.0, 309.0, 244.0, 268.0, 329.0, 191.0, 262.0, 12.0]
2025-08-07 13:49:21,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 21 seconds)
2025-08-07 13:51:07,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:51:08,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 242.31502 ± 263.206
2025-08-07 13:51:08,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.34409764, -0.82358617, 0.7489794, 167.72229, 284.26407, 3.2962224, 666.2414, 173.738, 738.6858, 388.93286]
2025-08-07 13:51:08,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 9.0, 12.0, 120.0, 135.0, 16.0, 302.0, 180.0, 302.0, 147.0]
2025-08-07 13:51:08,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 41 seconds)
2025-08-07 13:52:46,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:52:47,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 333.36490 ± 348.779
2025-08-07 13:52:47,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.3193634, -0.67199916, -1.2265594, 422.6884, 809.76074, 768.6795, 0.098866954, 125.77745, 319.39124, 885.83203]
2025-08-07 13:52:47,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 9.0, 182.0, 287.0, 259.0, 15.0, 81.0, 130.0, 351.0]
2025-08-07 13:52:47,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1251 [DEBUG]: Training session finished
