2025-08-07 10:43:21,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc15-walker2d/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:43:21,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc15-walker2d/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:43:21,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1480ed9d6c90>}
2025-08-07 10:43:21,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 10:43:21,919 baseline-bpql-noiseperc15-walker2d:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:43:21,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 10:43:21,935 baseline-bpql-noiseperc15-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=113, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 10:43:21,935 baseline-bpql-noiseperc15-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:43:23,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 10:43:23,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 10:44:56,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:57,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 12.89088 ± 10.148
2025-08-07 10:44:57,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [-4.3440022, 29.37093, 1.2926824, 21.509296, 12.610798, 10.768061, 7.2763186, 6.161658, 21.761606, 22.501436]
2025-08-07 10:44:57,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [46.0, 49.0, 69.0, 45.0, 41.0, 58.0, 34.0, 29.0, 48.0, 47.0]
2025-08-07 10:44:57,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (12.89) for latency MM1Queue_a033_s075
2025-08-07 10:44:57,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 35 minutes, 45 seconds)
2025-08-07 10:46:39,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:40,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 28.12047 ± 29.066
2025-08-07 10:46:40,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [39.42964, 72.30713, 50.52221, 10.642146, 17.962067, 56.03698, 3.0225968, -20.475292, -3.0755744, 54.83278]
2025-08-07 10:46:40,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [161.0, 74.0, 143.0, 62.0, 24.0, 88.0, 16.0, 117.0, 131.0, 93.0]
2025-08-07 10:46:40,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (28.12) for latency MM1Queue_a033_s075
2025-08-07 10:46:40,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 41 minutes, 36 seconds)
2025-08-07 10:48:25,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:48:25,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 52.79250 ± 48.118
2025-08-07 10:48:25,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [23.310965, 42.689854, 13.832928, 24.63594, 61.954124, 29.908007, 189.13391, 43.144596, 32.889256, 66.42545]
2025-08-07 10:48:25,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [35.0, 52.0, 23.0, 38.0, 62.0, 47.0, 122.0, 130.0, 36.0, 62.0]
2025-08-07 10:48:25,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (52.79) for latency MM1Queue_a033_s075
2025-08-07 10:48:25,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 43 minutes, 7 seconds)
2025-08-07 10:50:10,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:50:10,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 39.07012 ± 43.356
2025-08-07 10:50:10,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3.127946, 58.8621, 64.13307, 13.512378, 5.1122985, 43.87388, 10.471806, 40.288174, 150.35796, 0.9615669]
2025-08-07 10:50:10,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 60.0, 69.0, 24.0, 17.0, 64.0, 21.0, 183.0, 114.0, 14.0]
2025-08-07 10:50:10,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 43 minutes, 8 seconds)
2025-08-07 10:51:52,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:51:52,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 29.33601 ± 22.295
2025-08-07 10:51:52,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [15.306741, 26.573523, 37.391445, 15.949501, 5.208195, 74.592094, 33.479458, 6.7257705, 63.295166, 14.8381605]
2025-08-07 10:51:52,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 46.0, 53.0, 25.0, 16.0, 71.0, 51.0, 17.0, 71.0, 24.0]
2025-08-07 10:51:52,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 41 minutes, 24 seconds)
2025-08-07 10:53:35,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:36,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 74.47427 ± 58.490
2025-08-07 10:53:36,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [45.972374, 5.717837, 41.51564, 163.6789, 106.90626, 126.74752, 55.70388, 166.75633, 23.163574, 8.580435]
2025-08-07 10:53:36,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [50.0, 16.0, 54.0, 132.0, 103.0, 194.0, 109.0, 124.0, 32.0, 20.0]
2025-08-07 10:53:36,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (74.47) for latency MM1Queue_a033_s075
2025-08-07 10:53:36,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 42 minutes, 33 seconds)
2025-08-07 10:55:16,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:55:17,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 41.61021 ± 39.897
2025-08-07 10:55:17,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [17.392273, 16.744087, 88.775475, 130.50346, 30.405903, 77.14359, 13.669116, 5.6288447, 18.257517, 17.58184]
2025-08-07 10:55:17,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 28.0, 148.0, 95.0, 40.0, 114.0, 25.0, 24.0, 27.0, 26.0]
2025-08-07 10:55:17,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 40 minutes, 7 seconds)
2025-08-07 10:56:58,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:58,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 47.44666 ± 42.801
2025-08-07 10:56:58,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [86.06561, 15.778455, 10.251591, 59.240063, 2.7556453, 25.682747, 84.86782, 9.323847, 141.79251, 38.708344]
2025-08-07 10:56:58,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [85.0, 27.0, 27.0, 57.0, 15.0, 31.0, 78.0, 24.0, 133.0, 57.0]
2025-08-07 10:56:58,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 37 minutes, 18 seconds)
2025-08-07 10:58:40,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:41,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 127.89144 ± 87.794
2025-08-07 10:58:41,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [23.18497, 163.25479, 97.36267, 68.923195, 250.76067, 265.2199, 232.166, 27.57034, 72.35974, 78.11216]
2025-08-07 10:58:41,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [34.0, 100.0, 77.0, 101.0, 132.0, 166.0, 125.0, 37.0, 71.0, 65.0]
2025-08-07 10:58:41,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (127.89) for latency MM1Queue_a033_s075
2025-08-07 10:58:41,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 34 minutes, 43 seconds)
2025-08-07 11:00:21,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:22,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 143.11243 ± 98.186
2025-08-07 11:00:22,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [113.66783, 5.070104, 207.70525, 195.89418, 293.63983, 3.8678443, 228.89973, 183.75343, 184.83527, 13.790819]
2025-08-07 11:00:22,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [191.0, 15.0, 127.0, 135.0, 199.0, 15.0, 136.0, 129.0, 217.0, 23.0]
2025-08-07 11:00:22,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (143.11) for latency MM1Queue_a033_s075
2025-08-07 11:00:22,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 33 minutes)
2025-08-07 11:02:04,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:02:05,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 98.58498 ± 68.916
2025-08-07 11:02:05,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [184.6648, 10.787112, 101.86374, 6.0428147, 30.522688, 123.40362, 30.25259, 172.06291, 154.35052, 171.89902]
2025-08-07 11:02:05,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [108.0, 28.0, 92.0, 21.0, 39.0, 152.0, 37.0, 163.0, 106.0, 164.0]
2025-08-07 11:02:05,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 31 minutes, 3 seconds)
2025-08-07 11:03:47,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:49,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 138.78339 ± 96.954
2025-08-07 11:03:49,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [217.34836, 347.61404, 61.358906, 153.42433, 63.325127, 187.24501, 65.23544, 36.75678, 44.38017, 211.14578]
2025-08-07 11:03:49,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [111.0, 196.0, 121.0, 85.0, 62.0, 108.0, 112.0, 38.0, 63.0, 115.0]
2025-08-07 11:03:49,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 30 minutes, 4 seconds)
2025-08-07 11:05:29,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:30,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 151.47141 ± 97.997
2025-08-07 11:05:30,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [81.653076, 131.5263, 168.64551, 341.1372, 255.69766, 220.6792, 192.42578, 59.045933, 37.03692, 26.866457]
2025-08-07 11:05:30,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [65.0, 135.0, 94.0, 177.0, 132.0, 126.0, 283.0, 58.0, 42.0, 39.0]
2025-08-07 11:05:30,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (151.47) for latency MM1Queue_a033_s075
2025-08-07 11:05:30,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 28 minutes, 33 seconds)
2025-08-07 11:07:11,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:12,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 130.48712 ± 113.465
2025-08-07 11:07:12,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [157.92169, 343.9512, 36.890453, 234.77173, 89.85143, 5.502997, 236.66096, 5.7723346, 187.37134, 6.177175]
2025-08-07 11:07:12,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [113.0, 196.0, 40.0, 142.0, 116.0, 16.0, 128.0, 16.0, 115.0, 17.0]
2025-08-07 11:07:12,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 26 minutes, 39 seconds)
2025-08-07 11:08:54,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:55,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 136.59683 ± 73.297
2025-08-07 11:08:55,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [158.17561, 195.33827, 152.39372, 163.99736, 61.254128, 256.34833, 7.143536, 207.07265, 58.345585, 105.898994]
2025-08-07 11:08:55,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [97.0, 115.0, 120.0, 95.0, 69.0, 153.0, 16.0, 113.0, 86.0, 87.0]
2025-08-07 11:08:55,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 25 minutes, 16 seconds)
2025-08-07 11:10:37,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:38,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 185.87785 ± 84.827
2025-08-07 11:10:38,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [273.64124, 71.81733, 156.18187, 108.81202, 56.56136, 281.804, 238.02261, 139.9626, 286.43518, 245.54039]
2025-08-07 11:10:38,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [175.0, 70.0, 121.0, 72.0, 61.0, 185.0, 156.0, 110.0, 175.0, 218.0]
2025-08-07 11:10:38,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (185.88) for latency MM1Queue_a033_s075
2025-08-07 11:10:38,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 23 minutes, 43 seconds)
2025-08-07 11:12:19,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:20,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 101.01051 ± 100.096
2025-08-07 11:12:20,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [7.042655, 249.4214, 300.94504, 69.8551, 17.358156, 160.2266, 25.920223, 122.876785, 3.962528, 52.496616]
2025-08-07 11:12:20,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 130.0, 171.0, 100.0, 30.0, 86.0, 39.0, 121.0, 17.0, 59.0]
2025-08-07 11:12:20,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 21 minutes, 27 seconds)
2025-08-07 11:14:02,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:04,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 178.07146 ± 182.187
2025-08-07 11:14:04,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.934684, 258.61475, 41.128452, 154.89491, 183.4185, 105.379875, 330.3087, 62.060936, 3.598221, 631.3756]
2025-08-07 11:14:04,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [50.0, 183.0, 59.0, 114.0, 160.0, 135.0, 142.0, 95.0, 15.0, 553.0]
2025-08-07 11:14:04,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 20 minutes, 18 seconds)
2025-08-07 11:15:47,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:48,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 133.36453 ± 81.828
2025-08-07 11:15:48,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [306.37692, 40.64719, 55.068344, 167.60881, 199.65242, 150.26189, 67.33064, 184.29266, 126.20619, 36.200153]
2025-08-07 11:15:48,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [288.0, 71.0, 78.0, 164.0, 257.0, 82.0, 73.0, 124.0, 77.0, 45.0]
2025-08-07 11:15:48,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 19 minutes, 23 seconds)
2025-08-07 11:17:28,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:29,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 135.68398 ± 134.440
2025-08-07 11:17:29,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [37.035633, 142.05983, 470.17447, 93.51822, 211.22177, 31.423372, 246.00175, 58.55102, 4.275661, 62.57797]
2025-08-07 11:17:29,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [83.0, 82.0, 210.0, 116.0, 119.0, 48.0, 145.0, 95.0, 14.0, 93.0]
2025-08-07 11:17:29,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 17 minutes, 3 seconds)
2025-08-07 11:19:09,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:19:10,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 201.99826 ± 115.963
2025-08-07 11:19:10,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [311.35916, 73.20825, 131.94328, 123.13761, 124.50246, 381.0198, 366.98346, 282.07016, 173.00615, 52.752247]
2025-08-07 11:19:10,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 89.0, 220.0, 81.0, 120.0, 194.0, 195.0, 157.0, 86.0, 56.0]
2025-08-07 11:19:10,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (202.00) for latency MM1Queue_a033_s075
2025-08-07 11:19:10,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 14 minutes, 51 seconds)
2025-08-07 11:20:51,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:53,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 249.73491 ± 157.928
2025-08-07 11:20:53,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [467.50336, 48.914143, 233.03212, 376.97235, 154.36531, 4.093796, 373.66336, 85.985664, 349.08197, 403.73706]
2025-08-07 11:20:53,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [251.0, 66.0, 135.0, 236.0, 89.0, 14.0, 179.0, 94.0, 195.0, 316.0]
2025-08-07 11:20:53,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (249.73) for latency MM1Queue_a033_s075
2025-08-07 11:20:53,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 13 minutes, 23 seconds)
2025-08-07 11:22:33,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:35,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 172.23715 ± 89.328
2025-08-07 11:22:35,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [75.49107, 274.47546, 106.09371, 221.11829, 159.1414, 188.60213, 273.4745, 3.8209198, 134.90598, 285.24802]
2025-08-07 11:22:35,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [95.0, 152.0, 163.0, 117.0, 157.0, 99.0, 118.0, 15.0, 93.0, 193.0]
2025-08-07 11:22:35,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 11 minutes, 8 seconds)
2025-08-07 11:24:14,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:15,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 163.91862 ± 158.684
2025-08-07 11:24:15,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [250.91852, 2.8141208, 390.69293, 79.78226, 4.6255608, 459.49173, 43.880978, 64.1084, 277.63315, 65.23867]
2025-08-07 11:24:15,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 17.0, 198.0, 103.0, 16.0, 212.0, 82.0, 62.0, 133.0, 105.0]
2025-08-07 11:24:15,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 8 minutes, 27 seconds)
2025-08-07 11:25:56,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:25:57,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 190.55548 ± 161.932
2025-08-07 11:25:57,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [373.2225, 494.05963, 306.20572, 255.5096, 83.579765, 236.64377, 6.8166165, 2.0192835, 6.7295775, 140.7683]
2025-08-07 11:25:57,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [211.0, 233.0, 152.0, 162.0, 80.0, 112.0, 17.0, 17.0, 16.0, 88.0]
2025-08-07 11:25:57,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 6 minutes, 58 seconds)
2025-08-07 11:27:34,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:27:36,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 241.07170 ± 166.215
2025-08-07 11:27:36,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [90.38838, 66.86015, 396.5479, 293.83078, 4.123511, 186.74304, 500.3651, 88.35985, 416.30893, 367.18933]
2025-08-07 11:27:36,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [65.0, 55.0, 230.0, 148.0, 17.0, 202.0, 258.0, 85.0, 317.0, 175.0]
2025-08-07 11:27:36,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 4 minutes, 37 seconds)
2025-08-07 11:29:15,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:29:16,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 195.38040 ± 128.972
2025-08-07 11:29:16,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [284.47168, 388.14676, 106.98582, 2.1667264, 310.63876, 88.66175, 258.78528, 295.54068, 215.31093, 3.0956368]
2025-08-07 11:29:16,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [161.0, 190.0, 120.0, 16.0, 163.0, 137.0, 148.0, 126.0, 116.0, 15.0]
2025-08-07 11:29:16,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 2 minutes, 31 seconds)
2025-08-07 11:30:55,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:30:57,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 117.29946 ± 123.863
2025-08-07 11:30:57,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [38.57163, 222.16985, 47.464394, 41.55406, 76.22409, 426.91724, 4.714028, 5.417536, 151.30264, 158.65923]
2025-08-07 11:30:57,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [85.0, 193.0, 91.0, 90.0, 107.0, 194.0, 17.0, 17.0, 276.0, 85.0]
2025-08-07 11:30:57,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 26 seconds)
2025-08-07 11:32:35,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:32:36,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 149.34407 ± 148.249
2025-08-07 11:32:36,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [46.578773, 19.848024, 15.230494, 318.2976, 404.0615, 84.07306, 255.19513, 31.21396, 319.80075, -0.8586346]
2025-08-07 11:32:36,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [60.0, 50.0, 40.0, 179.0, 285.0, 75.0, 198.0, 36.0, 165.0, 14.0]
2025-08-07 11:32:36,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 58 minutes, 29 seconds)
2025-08-07 11:34:17,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:19,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 251.23340 ± 147.832
2025-08-07 11:34:19,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [227.10356, 136.87895, 419.3511, 353.63052, 261.73303, 493.576, 98.574104, 61.862305, 383.45157, 76.17273]
2025-08-07 11:34:19,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [116.0, 170.0, 325.0, 179.0, 118.0, 292.0, 145.0, 177.0, 309.0, 93.0]
2025-08-07 11:34:19,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (251.23) for latency MM1Queue_a033_s075
2025-08-07 11:34:19,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 57 minutes, 8 seconds)
2025-08-07 11:35:57,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:35:58,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 188.83963 ± 173.368
2025-08-07 11:35:58,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [253.12624, 352.17996, 2.93797, 97.468, 2.1272056, 373.87585, 3.9647775, 438.38043, 357.10403, 7.2319283]
2025-08-07 11:35:58,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [129.0, 158.0, 14.0, 151.0, 17.0, 167.0, 16.0, 222.0, 201.0, 17.0]
2025-08-07 11:35:58,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 55 minutes, 38 seconds)
2025-08-07 11:37:37,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:37:39,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 157.90504 ± 131.608
2025-08-07 11:37:39,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [5.171652, 45.228073, 97.08548, 284.4367, 274.8686, 277.10712, 4.00803, 95.3769, 399.2061, 96.56172]
2025-08-07 11:37:39,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 46.0, 130.0, 163.0, 149.0, 141.0, 14.0, 111.0, 205.0, 137.0]
2025-08-07 11:37:39,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 53 minutes, 49 seconds)
2025-08-07 11:39:16,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:17,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 139.84476 ± 103.554
2025-08-07 11:39:17,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [71.8155, 52.178654, 46.938404, 275.61942, 322.44296, 199.94867, 99.810455, 32.513123, 54.71855, 242.46196]
2025-08-07 11:39:17,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [58.0, 48.0, 44.0, 178.0, 166.0, 109.0, 54.0, 39.0, 58.0, 132.0]
2025-08-07 11:39:17,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 51 minutes, 52 seconds)
2025-08-07 11:40:55,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:40:56,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 176.82050 ± 147.020
2025-08-07 11:40:56,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3.1774535, 293.95398, 293.18967, 339.77832, 55.09219, 39.376637, 28.427134, 284.2858, 39.512085, 391.41162]
2025-08-07 11:40:56,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 130.0, 141.0, 225.0, 53.0, 66.0, 68.0, 142.0, 45.0, 214.0]
2025-08-07 11:40:56,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 50 minutes, 1 second)
2025-08-07 11:42:36,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:42:37,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 196.23660 ± 181.832
2025-08-07 11:42:37,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [61.86284, 299.158, 494.01523, 52.739704, 2.9767056, 52.8345, 36.587822, 131.79097, 483.28772, 347.11264]
2025-08-07 11:42:37,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [59.0, 143.0, 284.0, 52.0, 15.0, 68.0, 43.0, 79.0, 270.0, 157.0]
2025-08-07 11:42:37,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 47 minutes, 57 seconds)
2025-08-07 11:44:15,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:44:17,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 211.94638 ± 133.646
2025-08-07 11:44:17,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [69.05604, 305.43542, 155.61589, 48.217815, 445.73422, 92.3101, 301.67496, 354.0519, 75.37062, 271.9967]
2025-08-07 11:44:17,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [64.0, 342.0, 86.0, 83.0, 256.0, 79.0, 157.0, 185.0, 183.0, 120.0]
2025-08-07 11:44:17,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 46 minutes, 18 seconds)
2025-08-07 11:45:53,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:45:55,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 246.05171 ± 182.991
2025-08-07 11:45:55,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [456.57318, 508.6718, 327.23907, 355.34485, 301.72122, 34.793518, 3.1468415, 99.39158, 370.90207, 2.732816]
2025-08-07 11:45:55,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [221.0, 331.0, 174.0, 170.0, 192.0, 42.0, 16.0, 115.0, 188.0, 17.0]
2025-08-07 11:45:55,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 44 minutes, 14 seconds)
2025-08-07 11:47:33,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:34,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 128.90268 ± 81.846
2025-08-07 11:47:34,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [251.06891, 130.19266, 4.2377415, 129.27878, 75.8098, 232.16579, 183.54822, 42.801014, 198.95099, 40.972816]
2025-08-07 11:47:34,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [158.0, 116.0, 18.0, 105.0, 87.0, 129.0, 180.0, 45.0, 107.0, 46.0]
2025-08-07 11:47:34,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 42 minutes, 38 seconds)
2025-08-07 11:49:12,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:49:15,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 339.73749 ± 231.926
2025-08-07 11:49:15,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [821.85913, 303.7608, 3.8788347, 318.14795, 222.0677, 636.30664, 168.00305, 409.1695, 113.70404, 400.4772]
2025-08-07 11:49:15,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [591.0, 178.0, 17.0, 165.0, 270.0, 308.0, 124.0, 321.0, 76.0, 255.0]
2025-08-07 11:49:15,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (339.74) for latency MM1Queue_a033_s075
2025-08-07 11:49:15,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 41 minutes, 21 seconds)
2025-08-07 11:50:53,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:50:55,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 253.82568 ± 134.452
2025-08-07 11:50:55,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [212.35458, 5.896755, 291.3037, 392.86417, 348.59305, 169.0431, 398.2318, 46.619587, 296.90402, 376.44598]
2025-08-07 11:50:55,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 15.0, 134.0, 239.0, 220.0, 161.0, 207.0, 82.0, 154.0, 218.0]
2025-08-07 11:50:55,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 39 minutes, 31 seconds)
2025-08-07 11:52:34,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:52:36,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 288.19913 ± 168.506
2025-08-07 11:52:36,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [291.4824, 496.9681, 389.5549, 403.648, 346.74075, 372.85944, 7.5900297, 132.01866, 437.2067, 3.9220767]
2025-08-07 11:52:36,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 213.0, 204.0, 218.0, 177.0, 229.0, 17.0, 97.0, 220.0, 15.0]
2025-08-07 11:52:36,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 38 minutes, 8 seconds)
2025-08-07 11:54:13,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:54:15,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 324.65247 ± 200.697
2025-08-07 11:54:15,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [320.79736, 25.396101, 506.18826, 429.7438, 76.00251, 3.1008952, 435.35654, 500.0617, 563.83905, 386.0385]
2025-08-07 11:54:15,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 40.0, 214.0, 171.0, 80.0, 16.0, 222.0, 228.0, 287.0, 227.0]
2025-08-07 11:54:15,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 36 minutes, 39 seconds)
2025-08-07 11:55:53,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:54,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 213.27121 ± 132.253
2025-08-07 11:55:54,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [219.0474, 45.341278, 77.824295, 52.67379, 75.24786, 309.26474, 373.6095, 318.77194, 259.878, 401.05353]
2025-08-07 11:55:54,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [115.0, 101.0, 87.0, 50.0, 78.0, 157.0, 236.0, 167.0, 179.0, 232.0]
2025-08-07 11:55:54,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 35 minutes, 2 seconds)
2025-08-07 11:57:34,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:57:35,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 128.53970 ± 102.078
2025-08-07 11:57:35,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [206.76906, 1.8853968, 38.194614, 171.37975, 50.704155, 7.4431024, 255.52307, 240.5385, 53.79493, 259.1644]
2025-08-07 11:57:35,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [119.0, 14.0, 47.0, 100.0, 82.0, 17.0, 133.0, 112.0, 70.0, 154.0]
2025-08-07 11:57:35,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 33 minutes, 24 seconds)
2025-08-07 11:59:20,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:59:22,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 228.75562 ± 150.352
2025-08-07 11:59:22,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [277.15872, 385.17618, 4.273758, 442.3886, 194.4695, 175.53696, 435.693, 91.55777, 245.61967, 35.68178]
2025-08-07 11:59:22,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [154.0, 238.0, 17.0, 250.0, 139.0, 162.0, 237.0, 71.0, 136.0, 38.0]
2025-08-07 11:59:22,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 32 minutes, 56 seconds)
2025-08-07 12:01:01,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:01:03,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 342.52686 ± 155.231
2025-08-07 12:01:03,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [257.6065, 584.46027, 354.7975, 276.71298, 306.3975, 502.2075, 525.0106, 363.71945, 33.80519, 220.55095]
2025-08-07 12:01:03,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [149.0, 367.0, 172.0, 148.0, 149.0, 276.0, 270.0, 185.0, 44.0, 120.0]
2025-08-07 12:01:03,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (342.53) for latency MM1Queue_a033_s075
2025-08-07 12:01:03,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 31 minutes, 14 seconds)
2025-08-07 12:02:45,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:02:47,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 269.65256 ± 126.060
2025-08-07 12:02:47,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [463.98065, 26.385828, 382.24905, 202.34619, 300.6188, 183.90411, 230.69315, 358.96136, 389.71878, 157.66763]
2025-08-07 12:02:47,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [273.0, 36.0, 216.0, 118.0, 166.0, 173.0, 178.0, 181.0, 207.0, 96.0]
2025-08-07 12:02:47,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 30 minutes, 27 seconds)
2025-08-07 12:04:28,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:04:30,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 278.14804 ± 142.973
2025-08-07 12:04:30,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [72.78696, 405.38416, 422.7307, 192.196, 392.85168, 227.1268, 372.67484, 409.52933, 283.32465, 2.8755102]
2025-08-07 12:04:30,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [177.0, 224.0, 220.0, 159.0, 224.0, 133.0, 232.0, 209.0, 169.0, 16.0]
2025-08-07 12:04:30,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 29 minutes, 24 seconds)
2025-08-07 12:06:13,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:06:16,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 338.86142 ± 153.391
2025-08-07 12:06:16,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [619.431, 306.27518, 403.6792, 333.70615, 325.82394, 4.794034, 366.8611, 332.78934, 208.82033, 486.43408]
2025-08-07 12:06:16,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [409.0, 181.0, 222.0, 180.0, 192.0, 17.0, 269.0, 179.0, 117.0, 277.0]
2025-08-07 12:06:16,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 28 minutes, 30 seconds)
2025-08-07 12:07:58,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:08:00,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 279.52023 ± 107.362
2025-08-07 12:08:00,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [199.69841, 254.42645, 350.40927, 274.33673, 268.59244, 297.72888, 68.58292, 519.2118, 301.22006, 260.9956]
2025-08-07 12:08:00,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [107.0, 121.0, 213.0, 144.0, 173.0, 158.0, 87.0, 305.0, 171.0, 154.0]
2025-08-07 12:08:00,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 26 minutes, 23 seconds)
2025-08-07 12:09:43,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:09:45,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 280.74359 ± 125.904
2025-08-07 12:09:45,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [86.9292, 396.2607, 362.01315, 449.9411, 274.2479, 243.38661, 275.76477, 326.9929, 360.86942, 31.030182]
2025-08-07 12:09:45,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [156.0, 227.0, 201.0, 196.0, 147.0, 151.0, 143.0, 170.0, 206.0, 40.0]
2025-08-07 12:09:45,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 25 minutes, 15 seconds)
2025-08-07 12:11:25,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:11:27,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 310.91754 ± 153.858
2025-08-07 12:11:27,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [533.1079, 245.58823, 329.40494, 460.0484, 263.9873, 515.1159, 370.39243, 220.14485, 44.61409, 126.77145]
2025-08-07 12:11:27,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [290.0, 131.0, 157.0, 270.0, 123.0, 300.0, 206.0, 119.0, 50.0, 103.0]
2025-08-07 12:11:27,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 23 minutes, 14 seconds)
2025-08-07 12:13:09,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:13:10,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 215.76689 ± 142.529
2025-08-07 12:13:10,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [303.0673, 259.74063, 304.04694, 376.72562, 242.17549, 2.8931935, 7.536571, 6.0058103, 300.894, 354.58322]
2025-08-07 12:13:10,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [145.0, 121.0, 163.0, 229.0, 120.0, 14.0, 17.0, 16.0, 146.0, 191.0]
2025-08-07 12:13:10,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 21 minutes, 30 seconds)
2025-08-07 12:14:52,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:54,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 284.88617 ± 114.816
2025-08-07 12:14:54,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3.9783316, 243.78746, 383.64813, 295.49878, 296.99524, 363.6379, 290.88364, 189.91222, 433.56638, 346.95367]
2025-08-07 12:14:54,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 123.0, 191.0, 150.0, 189.0, 191.0, 149.0, 103.0, 233.0, 179.0]
2025-08-07 12:14:54,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 19 minutes, 25 seconds)
2025-08-07 12:16:36,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:37,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 217.47940 ± 200.971
2025-08-07 12:16:37,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [162.96457, 3.4573438, 290.92877, 14.987331, 2.366989, 288.23007, 425.25067, 3.9787774, 394.58212, 588.0472]
2025-08-07 12:16:37,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [96.0, 14.0, 155.0, 33.0, 16.0, 151.0, 189.0, 14.0, 234.0, 316.0]
2025-08-07 12:16:37,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 17 minutes, 35 seconds)
2025-08-07 12:18:20,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:18:21,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 238.30379 ± 163.931
2025-08-07 12:18:21,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1.4781848, 256.76953, 7.6300316, 345.31516, 422.58392, 447.74084, 284.78397, 314.32904, -1.6164347, 304.02365]
2025-08-07 12:18:21,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 131.0, 17.0, 147.0, 187.0, 254.0, 135.0, 179.0, 13.0, 202.0]
2025-08-07 12:18:21,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 15 minutes, 43 seconds)
2025-08-07 12:20:02,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:20:03,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 265.92316 ± 102.260
2025-08-07 12:20:03,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [323.6029, 308.32104, 325.81213, 400.2353, 257.75095, 279.7099, 3.6107244, 304.70425, 181.18591, 274.29834]
2025-08-07 12:20:03,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 151.0, 134.0, 178.0, 111.0, 172.0, 16.0, 125.0, 298.0, 120.0]
2025-08-07 12:20:03,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 13 minutes, 58 seconds)
2025-08-07 12:21:48,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:21:49,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 341.54968 ± 78.949
2025-08-07 12:21:49,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [314.7472, 290.69028, 377.07996, 265.16473, 512.215, 295.96124, 431.41144, 388.15994, 258.9667, 281.1004]
2025-08-07 12:21:49,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [154.0, 141.0, 163.0, 137.0, 263.0, 124.0, 223.0, 168.0, 131.0, 137.0]
2025-08-07 12:21:49,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 12 minutes, 39 seconds)
2025-08-07 12:23:30,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:23:32,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 388.68753 ± 79.122
2025-08-07 12:23:32,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [317.92368, 234.15356, 424.67676, 388.06662, 482.19125, 282.9448, 473.15588, 405.7639, 437.4097, 440.58905]
2025-08-07 12:23:32,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [135.0, 105.0, 213.0, 155.0, 242.0, 139.0, 259.0, 168.0, 175.0, 228.0]
2025-08-07 12:23:32,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (388.69) for latency MM1Queue_a033_s075
2025-08-07 12:23:32,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 10 minutes, 47 seconds)
2025-08-07 12:25:14,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:25:16,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 298.08069 ± 118.171
2025-08-07 12:25:16,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [379.0314, 386.96933, 27.568165, 244.19482, 380.10828, 285.89206, 158.12378, 326.59332, 360.17487, 432.15094]
2025-08-07 12:25:16,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [150.0, 159.0, 92.0, 113.0, 177.0, 123.0, 86.0, 169.0, 182.0, 173.0]
2025-08-07 12:25:16,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 9 minutes, 5 seconds)
2025-08-07 12:26:58,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:26:59,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 306.46884 ± 159.597
2025-08-07 12:26:59,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [2.3814788, 437.28275, 416.90805, 7.1777935, 426.8851, 397.9565, 434.02368, 278.57663, 364.6297, 298.867]
2025-08-07 12:26:59,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 219.0, 201.0, 16.0, 172.0, 164.0, 184.0, 123.0, 146.0, 154.0]
2025-08-07 12:26:59,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 7 minutes, 22 seconds)
2025-08-07 12:28:41,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:28:43,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 317.69003 ± 179.895
2025-08-07 12:28:43,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [241.17052, 565.30054, 394.6422, 5.43074, 519.30035, 361.3624, 309.71738, 1.2681268, 386.49957, 392.2087]
2025-08-07 12:28:43,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [110.0, 252.0, 186.0, 17.0, 226.0, 149.0, 140.0, 12.0, 152.0, 157.0]
2025-08-07 12:28:43,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 5 minutes, 49 seconds)
2025-08-07 12:30:25,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:30:26,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 346.38190 ± 196.167
2025-08-07 12:30:26,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [564.42114, 567.7038, 495.364, 465.5447, 247.14116, 422.36124, 7.097633, 288.7017, 398.4044, 7.0793977]
2025-08-07 12:30:26,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [220.0, 241.0, 186.0, 192.0, 112.0, 170.0, 17.0, 115.0, 146.0, 17.0]
2025-08-07 12:30:26,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 3 minutes, 45 seconds)
2025-08-07 12:32:08,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:32:10,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 538.97504 ± 323.694
2025-08-07 12:32:10,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1037.4071, 848.6257, 328.2765, 907.83905, 753.8508, 466.87686, 465.42606, 126.27758, 1.7782001, 453.39243]
2025-08-07 12:32:10,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [411.0, 319.0, 140.0, 345.0, 296.0, 173.0, 173.0, 74.0, 13.0, 202.0]
2025-08-07 12:32:10,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (538.98) for latency MM1Queue_a033_s075
2025-08-07 12:32:10,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 2 minutes, 14 seconds)
2025-08-07 12:33:55,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:33:57,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 546.13965 ± 327.242
2025-08-07 12:33:57,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [428.36465, 4.558683, 952.4496, 319.91974, 593.8051, 859.10767, 3.5024612, 731.5462, 834.85425, 733.28796]
2025-08-07 12:33:57,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [162.0, 16.0, 339.0, 126.0, 218.0, 327.0, 16.0, 270.0, 311.0, 276.0]
2025-08-07 12:33:57,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (546.14) for latency MM1Queue_a033_s075
2025-08-07 12:33:57,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 51 seconds)
2025-08-07 12:35:38,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:35:40,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 445.61108 ± 110.871
2025-08-07 12:35:40,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [528.9324, 397.90503, 455.18103, 593.0354, 649.19244, 296.22455, 480.57455, 357.02032, 362.9699, 335.07523]
2025-08-07 12:35:40,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [214.0, 150.0, 172.0, 206.0, 308.0, 130.0, 175.0, 142.0, 136.0, 136.0]
2025-08-07 12:35:40,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 58 minutes, 59 seconds)
2025-08-07 12:37:21,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:37:23,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 475.83505 ± 175.710
2025-08-07 12:37:23,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [594.0664, 608.96826, 5.18752, 583.96625, 431.31604, 440.96698, 607.35675, 606.60583, 397.15475, 482.76196]
2025-08-07 12:37:23,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [219.0, 265.0, 16.0, 215.0, 157.0, 160.0, 247.0, 238.0, 150.0, 176.0]
2025-08-07 12:37:24,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 57 minutes, 15 seconds)
2025-08-07 12:39:06,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:39:09,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 682.67450 ± 295.609
2025-08-07 12:39:09,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [420.18503, 954.18286, 408.45728, 663.03094, 1277.2866, 624.6813, 974.1995, 224.27478, 639.8761, 640.57007]
2025-08-07 12:39:09,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [174.0, 356.0, 154.0, 229.0, 434.0, 246.0, 332.0, 118.0, 215.0, 232.0]
2025-08-07 12:39:09,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (682.67) for latency MM1Queue_a033_s075
2025-08-07 12:39:09,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 55 minutes, 42 seconds)
2025-08-07 12:40:52,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:40:54,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 434.52490 ± 309.564
2025-08-07 12:40:54,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [989.9305, 173.6911, 364.4006, 697.4182, 6.2776446, 4.1962256, 755.689, 547.78815, 299.9504, 505.90695]
2025-08-07 12:40:54,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [335.0, 98.0, 146.0, 256.0, 17.0, 16.0, 263.0, 187.0, 128.0, 190.0]
2025-08-07 12:40:54,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 54 minutes, 6 seconds)
2025-08-07 12:42:34,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:42:36,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 554.21527 ± 246.901
2025-08-07 12:42:36,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [661.7352, 156.21088, 4.086495, 604.0381, 632.0309, 790.4929, 696.53033, 680.5826, 744.157, 572.28845]
2025-08-07 12:42:36,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [229.0, 86.0, 14.0, 201.0, 218.0, 274.0, 245.0, 247.0, 269.0, 210.0]
2025-08-07 12:42:36,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 51 seconds)
2025-08-07 12:44:16,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:44:19,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 663.50085 ± 299.511
2025-08-07 12:44:19,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [458.2989, 620.8823, 462.0777, 753.40155, 766.4062, 1124.4238, 671.1131, 1042.5258, 5.2949557, 730.58417]
2025-08-07 12:44:19,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [172.0, 226.0, 167.0, 257.0, 277.0, 448.0, 235.0, 375.0, 17.0, 261.0]
2025-08-07 12:44:19,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 50 minutes, 13 seconds)
2025-08-07 12:46:03,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:46:05,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 554.92566 ± 133.302
2025-08-07 12:46:05,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [492.5474, 728.4212, 728.39874, 385.37286, 703.75305, 398.60812, 375.19077, 527.6147, 616.4129, 592.9366]
2025-08-07 12:46:05,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [179.0, 269.0, 258.0, 143.0, 237.0, 156.0, 148.0, 206.0, 230.0, 224.0]
2025-08-07 12:46:05,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 41 seconds)
2025-08-07 12:47:46,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:47:48,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 470.43610 ± 355.716
2025-08-07 12:47:48,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [836.7948, 182.5808, 443.74777, 3.2719634, 202.3878, 733.5083, 641.3716, 528.6419, 1130.5477, 1.5077654]
2025-08-07 12:47:48,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [289.0, 135.0, 173.0, 14.0, 98.0, 238.0, 233.0, 210.0, 413.0, 14.0]
2025-08-07 12:47:48,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes, 46 seconds)
2025-08-07 12:49:31,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:49:34,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 570.91089 ± 245.913
2025-08-07 12:49:34,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [771.91144, 384.74615, 721.20654, 760.1045, 599.93866, 5.9507475, 782.57135, 734.2936, 648.9305, 299.455]
2025-08-07 12:49:34,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [271.0, 148.0, 244.0, 259.0, 241.0, 17.0, 273.0, 247.0, 254.0, 134.0]
2025-08-07 12:49:34,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 45 minutes, 3 seconds)
2025-08-07 12:51:14,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:51:17,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 631.05511 ± 326.634
2025-08-07 12:51:17,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [693.05927, 782.7749, 214.7626, 776.78424, 150.82762, 1077.3148, 1117.338, 227.51604, 725.59894, 544.57434]
2025-08-07 12:51:17,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [270.0, 289.0, 102.0, 269.0, 89.0, 385.0, 401.0, 111.0, 290.0, 201.0]
2025-08-07 12:51:17,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 43 minutes, 26 seconds)
2025-08-07 12:52:59,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:53:00,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 407.93890 ± 291.694
2025-08-07 12:53:00,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [494.49625, 38.76651, 153.6222, 668.3219, 156.68742, 421.9353, 6.644895, 896.92004, 721.8599, 520.1347]
2025-08-07 12:53:00,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [174.0, 69.0, 88.0, 228.0, 83.0, 151.0, 16.0, 336.0, 244.0, 181.0]
2025-08-07 12:53:00,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 41 minutes, 40 seconds)
2025-08-07 12:54:44,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:54:47,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 682.66992 ± 97.548
2025-08-07 12:54:47,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [689.1605, 656.713, 471.98117, 593.4095, 740.5955, 757.5883, 849.6533, 631.1048, 722.76886, 713.7243]
2025-08-07 12:54:47,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [252.0, 222.0, 170.0, 204.0, 243.0, 251.0, 327.0, 221.0, 233.0, 231.0]
2025-08-07 12:54:47,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 58 seconds)
2025-08-07 12:56:27,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:56:29,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 478.51080 ± 300.796
2025-08-07 12:56:29,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [713.5139, 746.4778, 3.4026196, 741.82996, 837.4853, 3.16089, 330.39514, 728.81305, 290.44046, 389.5886]
2025-08-07 12:56:29,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [241.0, 247.0, 15.0, 260.0, 306.0, 13.0, 137.0, 278.0, 120.0, 146.0]
2025-08-07 12:56:29,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 38 minutes, 9 seconds)
2025-08-07 12:58:11,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:58:13,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 588.51306 ± 453.846
2025-08-07 12:58:13,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1038.9934, 667.1053, 375.70276, 796.72253, 532.2442, 4.498042, 4.8397627, 253.26163, 1569.2218, 642.5411]
2025-08-07 12:58:13,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [368.0, 237.0, 145.0, 279.0, 179.0, 16.0, 15.0, 107.0, 605.0, 317.0]
2025-08-07 12:58:13,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 22 seconds)
2025-08-07 12:59:55,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:59:57,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 589.04413 ± 328.905
2025-08-07 12:59:57,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [824.57135, 683.58484, 319.8217, 4.819212, 5.4960117, 861.2922, 887.6291, 808.41907, 774.7194, 720.0884]
2025-08-07 12:59:57,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [286.0, 224.0, 130.0, 16.0, 14.0, 271.0, 323.0, 275.0, 294.0, 255.0]
2025-08-07 12:59:57,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 40 seconds)
2025-08-07 13:01:40,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:01:43,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 782.05780 ± 294.738
2025-08-07 13:01:43,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [835.39325, 801.23926, 909.22095, 946.6574, 1135.9271, 1050.265, 3.34032, 682.1155, 690.2598, 766.15924]
2025-08-07 13:01:43,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [275.0, 260.0, 311.0, 335.0, 404.0, 371.0, 15.0, 218.0, 225.0, 256.0]
2025-08-07 13:01:43,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (782.06) for latency MM1Queue_a033_s075
2025-08-07 13:01:43,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 7 seconds)
2025-08-07 13:03:26,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:03:29,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 748.62671 ± 92.597
2025-08-07 13:03:29,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [930.4011, 654.3836, 690.74146, 829.923, 619.1113, 746.4667, 672.406, 726.3702, 851.06604, 765.39813]
2025-08-07 13:03:29,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [321.0, 241.0, 272.0, 288.0, 208.0, 265.0, 266.0, 263.0, 300.0, 261.0]
2025-08-07 13:03:29,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 20 seconds)
2025-08-07 13:05:09,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:05:11,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 555.77423 ± 215.675
2025-08-07 13:05:11,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [742.2655, 410.40295, 958.15765, 340.8172, 472.3192, 423.2024, 799.5928, 308.6362, 383.81354, 718.53467]
2025-08-07 13:05:11,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [262.0, 148.0, 331.0, 146.0, 166.0, 154.0, 291.0, 131.0, 147.0, 234.0]
2025-08-07 13:05:11,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 37 seconds)
2025-08-07 13:06:56,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:06:59,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 699.43884 ± 371.116
2025-08-07 13:06:59,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3.434138, 837.28705, 777.3024, 833.93365, 980.8935, 937.6797, 7.011567, 1066.6982, 565.383, 984.7654]
2025-08-07 13:06:59,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 291.0, 292.0, 289.0, 333.0, 325.0, 18.0, 349.0, 206.0, 351.0]
2025-08-07 13:06:59,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 28 minutes)
2025-08-07 13:08:37,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:08:40,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 561.26733 ± 308.394
2025-08-07 13:08:40,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [625.22943, 758.9165, 126.15257, 574.9669, 232.1607, 772.3374, 967.61145, 769.5784, 780.35645, 5.3633733]
2025-08-07 13:08:40,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [233.0, 314.0, 74.0, 206.0, 127.0, 256.0, 324.0, 258.0, 266.0, 15.0]
2025-08-07 13:08:40,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 7 seconds)
2025-08-07 13:10:22,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:10:26,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 777.06946 ± 361.602
2025-08-07 13:10:26,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [-0.2831421, 899.8685, 843.0496, 869.3988, 1035.7848, 166.04306, 1116.3124, 1026.3716, 1016.619, 797.5296]
2025-08-07 13:10:26,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 328.0, 316.0, 274.0, 372.0, 230.0, 373.0, 370.0, 344.0, 267.0]
2025-08-07 13:10:26,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 22 seconds)
2025-08-07 13:12:08,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:12:11,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 682.00842 ± 303.370
2025-08-07 13:12:11,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [911.69977, 7.5837355, 808.8633, 1206.7677, 709.1361, 854.96454, 727.3943, 574.81165, 602.9571, 415.90637]
2025-08-07 13:12:11,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [357.0, 18.0, 279.0, 401.0, 237.0, 319.0, 248.0, 191.0, 197.0, 155.0]
2025-08-07 13:12:11,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 37 seconds)
2025-08-07 13:13:51,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:13:55,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 781.96155 ± 330.789
2025-08-07 13:13:55,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1124.9474, 982.6944, 1315.0142, 680.52075, 621.8627, 2.9750829, 789.84753, 729.33954, 729.06866, 843.3453]
2025-08-07 13:13:55,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [407.0, 307.0, 445.0, 243.0, 206.0, 14.0, 263.0, 277.0, 284.0, 301.0]
2025-08-07 13:13:55,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 56 seconds)
2025-08-07 13:15:37,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:15:40,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 785.46790 ± 117.007
2025-08-07 13:15:40,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [713.44305, 802.24854, 683.8838, 985.9593, 839.7051, 685.4868, 659.5359, 976.9206, 835.7752, 671.7201]
2025-08-07 13:15:40,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [253.0, 266.0, 277.0, 353.0, 281.0, 266.0, 270.0, 370.0, 272.0, 231.0]
2025-08-07 13:15:40,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (785.47) for latency MM1Queue_a033_s075
2025-08-07 13:15:40,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 6 seconds)
2025-08-07 13:17:23,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:17:26,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 771.76062 ± 177.795
2025-08-07 13:17:26,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [782.6516, 890.94226, 942.0755, 514.67346, 979.14856, 855.685, 763.9596, 906.4314, 667.41516, 414.6234]
2025-08-07 13:17:26,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [296.0, 295.0, 313.0, 186.0, 342.0, 285.0, 258.0, 309.0, 221.0, 152.0]
2025-08-07 13:17:26,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 32 seconds)
2025-08-07 13:19:06,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:19:09,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 763.08466 ± 284.996
2025-08-07 13:19:09,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [951.419, 788.02216, 874.30804, 774.3475, 0.4853717, 602.0923, 1072.7878, 1007.1188, 787.63617, 772.6292]
2025-08-07 13:19:09,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [322.0, 271.0, 309.0, 256.0, 15.0, 204.0, 353.0, 373.0, 273.0, 316.0]
2025-08-07 13:19:09,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 42 seconds)
2025-08-07 13:20:50,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:20:53,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 810.45734 ± 256.309
2025-08-07 13:20:53,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [913.3386, 872.05475, 1022.2393, 192.75511, 578.2855, 642.4259, 1013.73035, 1002.99713, 820.6531, 1046.0936]
2025-08-07 13:20:53,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [321.0, 308.0, 329.0, 103.0, 210.0, 233.0, 336.0, 363.0, 276.0, 337.0]
2025-08-07 13:20:53,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (810.46) for latency MM1Queue_a033_s075
2025-08-07 13:20:53,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 55 seconds)
2025-08-07 13:22:37,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:22:39,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 660.34485 ± 250.682
2025-08-07 13:22:39,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [802.23914, 719.49286, 459.83838, 810.23474, 656.10266, 814.0161, 711.1188, 682.21515, 946.34766, 1.8428506]
2025-08-07 13:22:39,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [287.0, 227.0, 170.0, 267.0, 233.0, 255.0, 236.0, 220.0, 310.0, 14.0]
2025-08-07 13:22:39,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 14 seconds)
2025-08-07 13:24:18,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:24:21,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 631.86444 ± 448.837
2025-08-07 13:24:21,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1105.799, 724.7142, 774.6296, 692.8915, 1.246352, 1300.9788, 3.6670835, 3.9392207, 962.5458, 748.233]
2025-08-07 13:24:21,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [369.0, 257.0, 273.0, 246.0, 16.0, 431.0, 16.0, 15.0, 320.0, 256.0]
2025-08-07 13:24:21,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 25 seconds)
2025-08-07 13:26:03,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:26:06,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 808.74457 ± 343.114
2025-08-07 13:26:06,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [2.8819265, 726.4765, 618.01355, 1022.0691, 854.305, 934.873, 747.7045, 1153.5542, 686.2782, 1341.2896]
2025-08-07 13:26:06,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 260.0, 215.0, 330.0, 310.0, 288.0, 257.0, 380.0, 232.0, 473.0]
2025-08-07 13:26:06,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 39 seconds)
2025-08-07 13:27:48,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:27:51,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 747.14868 ± 252.824
2025-08-07 13:27:51,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [741.0327, 847.5296, 711.0911, 825.3305, 1092.913, 733.125, 73.101845, 911.64136, 867.028, 668.6938]
2025-08-07 13:27:51,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [273.0, 269.0, 251.0, 286.0, 369.0, 270.0, 134.0, 293.0, 296.0, 226.0]
2025-08-07 13:27:51,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 57 seconds)
2025-08-07 13:29:33,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:29:36,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 883.41602 ± 187.425
2025-08-07 13:29:36,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1085.4261, 830.5696, 702.13995, 1121.9686, 1161.3682, 892.1375, 645.1866, 736.60693, 661.4005, 997.35614]
2025-08-07 13:29:36,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [446.0, 278.0, 230.0, 368.0, 361.0, 294.0, 216.0, 239.0, 232.0, 321.0]
2025-08-07 13:29:36,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (883.42) for latency MM1Queue_a033_s075
2025-08-07 13:29:36,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 13 seconds)
2025-08-07 13:31:18,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:31:21,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 784.71478 ± 246.953
2025-08-07 13:31:21,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [844.103, 817.5646, 762.32697, 891.1469, 985.22266, 84.69677, 836.93713, 722.31506, 973.6852, 929.1494]
2025-08-07 13:31:21,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [288.0, 284.0, 256.0, 341.0, 320.0, 139.0, 278.0, 247.0, 328.0, 309.0]
2025-08-07 13:31:21,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 28 seconds)
2025-08-07 13:33:03,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:33:05,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 574.63782 ± 236.125
2025-08-07 13:33:05,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [411.7592, 4.825319, 472.9958, 571.82007, 762.645, 798.8916, 797.26715, 488.68637, 641.6882, 795.7995]
2025-08-07 13:33:05,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [154.0, 16.0, 167.0, 189.0, 257.0, 262.0, 272.0, 167.0, 209.0, 265.0]
2025-08-07 13:33:05,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 44 seconds)
2025-08-07 13:34:47,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:34:49,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 712.94940 ± 321.577
2025-08-07 13:34:49,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [619.703, 863.3177, 831.6929, 857.0067, 4.0975523, 750.0015, 762.30414, 338.5743, 836.4204, 1266.3762]
2025-08-07 13:34:49,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [203.0, 283.0, 266.0, 274.0, 15.0, 251.0, 247.0, 147.0, 278.0, 430.0]
2025-08-07 13:34:49,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1251 [DEBUG]: Training session finished
