2025-08-07 10:41:52,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc10-walker2d/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:41:52,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc10-walker2d/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:41:52,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x154c7ff0ba90>}
2025-08-07 10:41:52,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 10:41:52,330 baseline-bpql-noiseperc10-walker2d:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:41:52,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 10:41:52,346 baseline-bpql-noiseperc10-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=113, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 10:41:52,346 baseline-bpql-noiseperc10-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:41:53,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 10:41:53,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 10:43:21,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:43:22,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 12.64194 ± 3.185
2025-08-07 10:43:22,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [10.482337, 9.74241, 15.4309025, 13.1200075, 11.331064, 11.629692, 20.062233, 10.227317, 15.054159, 9.339291]
2025-08-07 10:43:22,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [63.0, 41.0, 42.0, 39.0, 39.0, 38.0, 66.0, 40.0, 40.0, 37.0]
2025-08-07 10:43:22,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (12.64) for latency MM1Queue_a033_s075
2025-08-07 10:43:22,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 26 minutes, 48 seconds)
2025-08-07 10:44:58,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:59,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 36.80541 ± 26.058
2025-08-07 10:44:59,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [29.353662, 75.2075, -7.9392886, 13.961448, 47.732536, 41.0268, 40.721725, 9.068267, 79.30254, 39.618942]
2025-08-07 10:44:59,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [43.0, 73.0, 108.0, 72.0, 120.0, 50.0, 99.0, 51.0, 84.0, 57.0]
2025-08-07 10:44:59,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (36.81) for latency MM1Queue_a033_s075
2025-08-07 10:44:59,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 32 minutes, 8 seconds)
2025-08-07 10:46:35,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:36,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 52.85050 ± 47.793
2025-08-07 10:46:36,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [33.137466, 9.357592, 19.180624, 73.50574, 80.90607, 178.2784, 13.311163, 25.67779, 55.480118, 39.670036]
2025-08-07 10:46:36,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [40.0, 109.0, 37.0, 84.0, 108.0, 129.0, 24.0, 52.0, 102.0, 57.0]
2025-08-07 10:46:36,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (52.85) for latency MM1Queue_a033_s075
2025-08-07 10:46:36,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 32 minutes, 44 seconds)
2025-08-07 10:48:12,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:48:13,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 37.33280 ± 30.372
2025-08-07 10:48:13,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [34.616257, 40.471813, 0.56473833, 64.15553, 6.3391204, 71.02462, 4.127602, 45.30597, 12.372698, 94.34965]
2025-08-07 10:48:13,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [76.0, 64.0, 203.0, 65.0, 162.0, 92.0, 14.0, 84.0, 202.0, 122.0]
2025-08-07 10:48:14,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 32 minutes, 15 seconds)
2025-08-07 10:49:50,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:51,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 127.21299 ± 64.026
2025-08-07 10:49:51,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [69.80157, 64.58187, 96.36787, 38.388042, 92.66357, 229.92372, 148.2188, 233.31918, 133.56734, 165.29793]
2025-08-07 10:49:51,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [72.0, 65.0, 190.0, 56.0, 76.0, 144.0, 122.0, 127.0, 164.0, 102.0]
2025-08-07 10:49:51,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (127.21) for latency MM1Queue_a033_s075
2025-08-07 10:49:51,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 31 minutes, 22 seconds)
2025-08-07 10:51:26,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:51:27,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 44.06819 ± 20.990
2025-08-07 10:51:27,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [12.880753, 31.990332, 40.564564, 55.37984, 40.858562, 84.56161, 58.49286, 16.401352, 64.9176, 34.63441]
2025-08-07 10:51:27,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [62.0, 34.0, 47.0, 151.0, 47.0, 99.0, 78.0, 27.0, 114.0, 36.0]
2025-08-07 10:51:27,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 31 minutes, 57 seconds)
2025-08-07 10:53:02,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:04,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 171.63916 ± 95.625
2025-08-07 10:53:04,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [345.64188, 108.39703, 196.87221, 47.206226, 240.57878, 111.635345, 225.853, 20.054619, 166.37018, 253.78247]
2025-08-07 10:53:04,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [201.0, 179.0, 101.0, 48.0, 135.0, 96.0, 135.0, 28.0, 108.0, 148.0]
2025-08-07 10:53:04,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (171.64) for latency MM1Queue_a033_s075
2025-08-07 10:53:04,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 30 minutes, 14 seconds)
2025-08-07 10:54:40,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:41,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 104.73146 ± 56.500
2025-08-07 10:54:41,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [53.702267, 89.16902, 36.734272, 149.4654, 68.06387, 68.42603, 75.80874, 193.63383, 102.489006, 209.82213]
2025-08-07 10:54:41,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [52.0, 214.0, 46.0, 167.0, 73.0, 112.0, 80.0, 138.0, 108.0, 140.0]
2025-08-07 10:54:41,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 28 minutes, 44 seconds)
2025-08-07 10:56:17,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:18,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 106.83319 ± 89.851
2025-08-07 10:56:18,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [55.864964, 172.31459, 258.75586, 269.97452, 43.478615, 105.96344, 55.84257, 25.691599, 72.46389, 7.9819703]
2025-08-07 10:56:18,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [131.0, 135.0, 151.0, 179.0, 70.0, 110.0, 78.0, 29.0, 65.0, 19.0]
2025-08-07 10:56:18,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 27 minutes)
2025-08-07 10:57:55,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:57:57,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 193.34094 ± 95.462
2025-08-07 10:57:57,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [21.270636, 248.76756, 124.73263, 289.9457, 59.695995, 170.46075, 288.78802, 317.4983, 230.35739, 181.89233]
2025-08-07 10:57:57,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 252.0, 95.0, 198.0, 64.0, 132.0, 173.0, 147.0, 153.0, 124.0]
2025-08-07 10:57:57,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (193.34) for latency MM1Queue_a033_s075
2025-08-07 10:57:57,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 25 minutes, 48 seconds)
2025-08-07 10:59:32,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:59:33,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 106.41682 ± 62.195
2025-08-07 10:59:33,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [51.144375, 88.60119, 197.34213, 77.09151, 171.25539, 66.1824, 14.572262, 56.293602, 194.3988, 147.2866]
2025-08-07 10:59:33,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [130.0, 122.0, 145.0, 79.0, 123.0, 89.0, 23.0, 65.0, 136.0, 100.0]
2025-08-07 10:59:33,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 24 minutes, 15 seconds)
2025-08-07 11:01:09,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:11,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 189.44467 ± 134.850
2025-08-07 11:01:11,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [18.617992, 261.04196, 237.79875, 313.78433, 308.7747, 63.50329, 326.65988, 332.64734, 14.449248, 17.169147]
2025-08-07 11:01:11,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 137.0, 164.0, 183.0, 181.0, 70.0, 285.0, 164.0, 23.0, 24.0]
2025-08-07 11:01:11,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 22 minutes, 49 seconds)
2025-08-07 11:02:47,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:02:49,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 212.27168 ± 126.167
2025-08-07 11:02:49,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [313.17673, 222.4975, 102.01082, 30.159712, 301.30643, 5.231461, 288.70786, 204.25069, 227.10896, 428.2666]
2025-08-07 11:02:49,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [201.0, 144.0, 88.0, 32.0, 169.0, 17.0, 169.0, 143.0, 140.0, 213.0]
2025-08-07 11:02:49,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (212.27) for latency MM1Queue_a033_s075
2025-08-07 11:02:49,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 21 minutes, 20 seconds)
2025-08-07 11:04:24,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:04:25,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 137.18347 ± 83.608
2025-08-07 11:04:25,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [223.57372, 119.54902, 81.80856, 138.15215, 56.37382, 272.88937, 34.977406, 236.09772, 27.690254, 180.72267]
2025-08-07 11:04:25,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [126.0, 194.0, 111.0, 108.0, 81.0, 164.0, 35.0, 144.0, 31.0, 158.0]
2025-08-07 11:04:26,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 19 minutes, 42 seconds)
2025-08-07 11:06:02,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:06:03,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 156.95769 ± 109.420
2025-08-07 11:06:03,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [143.68648, 47.533993, 45.28873, 348.17786, 92.2605, 122.30058, 287.06857, 310.83627, 50.582645, 121.84119]
2025-08-07 11:06:03,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [161.0, 51.0, 46.0, 165.0, 75.0, 76.0, 155.0, 190.0, 51.0, 120.0]
2025-08-07 11:06:03,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 17 minutes, 45 seconds)
2025-08-07 11:07:39,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:40,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 108.86015 ± 91.128
2025-08-07 11:07:40,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [60.720074, 48.089767, 70.74369, 67.800934, 124.40714, 19.176477, 361.1586, 124.88794, 134.77675, 76.84008]
2025-08-07 11:07:40,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [57.0, 58.0, 85.0, 83.0, 105.0, 29.0, 216.0, 219.0, 184.0, 102.0]
2025-08-07 11:07:40,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 16 minutes, 25 seconds)
2025-08-07 11:09:16,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:09:18,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 169.48807 ± 102.408
2025-08-07 11:09:18,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [269.12933, 336.4491, 60.944283, 230.99901, 263.34988, 226.93733, 84.63409, 32.821957, 60.441917, 129.17387]
2025-08-07 11:09:18,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [132.0, 214.0, 61.0, 127.0, 148.0, 120.0, 62.0, 42.0, 51.0, 74.0]
2025-08-07 11:09:18,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 14 minutes, 42 seconds)
2025-08-07 11:10:55,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:56,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 174.50284 ± 132.921
2025-08-07 11:10:56,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [44.4724, 47.544632, 263.0733, 182.2611, 277.1352, 69.75907, 293.685, 444.49994, 88.77762, 33.82001]
2025-08-07 11:10:56,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [56.0, 44.0, 133.0, 120.0, 155.0, 62.0, 153.0, 185.0, 90.0, 38.0]
2025-08-07 11:10:56,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 13 minutes, 17 seconds)
2025-08-07 11:12:31,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:33,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 195.17967 ± 136.666
2025-08-07 11:12:33,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [330.02405, 199.31274, 58.336746, 170.04573, 232.96927, 508.0913, 220.04993, 75.82503, 141.0889, 16.053139]
2025-08-07 11:12:33,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [202.0, 138.0, 52.0, 135.0, 135.0, 292.0, 129.0, 96.0, 101.0, 25.0]
2025-08-07 11:12:33,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 11 minutes, 35 seconds)
2025-08-07 11:14:10,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:12,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 286.19318 ± 217.861
2025-08-07 11:14:12,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [25.650743, 493.39835, 366.48694, 158.6537, 60.062733, 175.7156, 225.48155, 790.2377, 383.11444, 183.12987]
2025-08-07 11:14:12,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 235.0, 198.0, 132.0, 64.0, 86.0, 194.0, 414.0, 189.0, 272.0]
2025-08-07 11:14:12,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (286.19) for latency MM1Queue_a033_s075
2025-08-07 11:14:12,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 10 minutes, 17 seconds)
2025-08-07 11:15:48,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:50,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 173.53273 ± 119.814
2025-08-07 11:15:50,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [19.564152, 31.34611, 152.97008, 261.48724, 204.401, 293.72824, 255.98654, 374.16144, 7.8673706, 133.81506]
2025-08-07 11:15:50,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 33.0, 124.0, 153.0, 138.0, 156.0, 147.0, 192.0, 18.0, 141.0]
2025-08-07 11:15:50,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 8 minutes, 51 seconds)
2025-08-07 11:17:26,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:27,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 194.23691 ± 137.175
2025-08-07 11:17:27,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [99.33161, 309.58868, 33.135334, 204.17278, 301.1125, 173.25049, 396.5622, 27.467583, 370.85074, 26.89731]
2025-08-07 11:17:27,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [110.0, 133.0, 35.0, 140.0, 172.0, 114.0, 235.0, 33.0, 183.0, 30.0]
2025-08-07 11:17:27,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 7 minutes, 19 seconds)
2025-08-07 11:19:02,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:19:03,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 146.60445 ± 95.211
2025-08-07 11:19:03,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [91.53296, 64.245415, 190.77875, 55.59472, 340.80206, 32.276546, 199.79263, 132.474, 93.674934, 264.87244]
2025-08-07 11:19:03,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [85.0, 63.0, 128.0, 122.0, 176.0, 38.0, 125.0, 118.0, 90.0, 157.0]
2025-08-07 11:19:03,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 5 minutes)
2025-08-07 11:20:39,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:40,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 179.50887 ± 156.254
2025-08-07 11:20:40,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [257.00186, 35.601982, 521.1382, 23.362747, 106.38527, 49.326614, 267.0304, 330.89594, 178.04533, 26.300428]
2025-08-07 11:20:40,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [149.0, 35.0, 269.0, 29.0, 99.0, 46.0, 161.0, 166.0, 120.0, 30.0]
2025-08-07 11:20:40,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 3 minutes, 27 seconds)
2025-08-07 11:22:15,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:16,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 134.84258 ± 114.102
2025-08-07 11:22:16,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [26.304264, 204.93265, 318.392, 43.89847, 271.55103, 18.503798, 24.446396, 28.19156, 153.20201, 259.00372]
2025-08-07 11:22:16,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 114.0, 187.0, 103.0, 172.0, 27.0, 29.0, 33.0, 154.0, 247.0]
2025-08-07 11:22:16,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 1 minute, 9 seconds)
2025-08-07 11:23:50,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:23:52,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 266.47882 ± 151.681
2025-08-07 11:23:52,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [360.47424, 342.12323, 237.31602, 141.91191, 270.97192, 499.0161, 27.273142, 24.605413, 339.89233, 421.20367]
2025-08-07 11:23:52,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [283.0, 309.0, 164.0, 129.0, 138.0, 224.0, 35.0, 29.0, 151.0, 229.0]
2025-08-07 11:23:52,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 58 minutes, 54 seconds)
2025-08-07 11:25:26,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:25:28,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 134.13971 ± 124.661
2025-08-07 11:25:28,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [246.64685, 409.5019, 134.60056, 198.9867, 94.95399, 20.110367, 16.60386, 197.79579, 15.155958, 7.041185]
2025-08-07 11:25:28,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [392.0, 241.0, 111.0, 149.0, 89.0, 26.0, 26.0, 264.0, 23.0, 18.0]
2025-08-07 11:25:28,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 56 minutes, 51 seconds)
2025-08-07 11:27:02,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:27:03,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 211.45222 ± 111.786
2025-08-07 11:27:03,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [31.590973, 219.53839, 289.36923, 332.27658, 44.553055, 275.31296, 346.9867, 99.563446, 175.69838, 299.63254]
2025-08-07 11:27:03,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [36.0, 181.0, 248.0, 168.0, 47.0, 152.0, 167.0, 84.0, 107.0, 183.0]
2025-08-07 11:27:03,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 55 minutes, 8 seconds)
2025-08-07 11:28:37,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:39,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 313.49579 ± 205.390
2025-08-07 11:28:39,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [146.7099, 745.3396, 406.4864, 122.517136, 253.14229, 126.36404, 74.21348, 285.33536, 538.46893, 436.38077]
2025-08-07 11:28:39,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 458.0, 184.0, 103.0, 133.0, 115.0, 68.0, 139.0, 277.0, 216.0]
2025-08-07 11:28:39,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (313.50) for latency MM1Queue_a033_s075
2025-08-07 11:28:39,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 53 minutes, 24 seconds)
2025-08-07 11:30:17,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:30:19,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 221.46631 ± 127.437
2025-08-07 11:30:19,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [64.54422, 30.345873, 314.26193, 173.29184, 420.53094, 323.9482, 344.41473, 117.638695, 296.94025, 128.7463]
2025-08-07 11:30:19,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [68.0, 34.0, 148.0, 125.0, 226.0, 147.0, 195.0, 108.0, 166.0, 144.0]
2025-08-07 11:30:19,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 52 minutes, 33 seconds)
2025-08-07 11:31:57,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:31:59,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 263.63931 ± 127.319
2025-08-07 11:31:59,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [221.00781, 422.85245, 72.938576, 360.63013, 225.07085, 389.0909, 46.350655, 388.35117, 322.21268, 187.88803]
2025-08-07 11:31:59,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [112.0, 236.0, 69.0, 184.0, 131.0, 187.0, 49.0, 207.0, 167.0, 103.0]
2025-08-07 11:31:59,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 51 minutes, 58 seconds)
2025-08-07 11:33:34,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:33:36,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 263.51611 ± 156.404
2025-08-07 11:33:36,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [182.22768, 111.84025, 200.83743, 271.18668, 645.54803, 41.195255, 325.71698, 318.0383, 203.19208, 335.37848]
2025-08-07 11:33:36,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [102.0, 107.0, 184.0, 201.0, 397.0, 42.0, 179.0, 188.0, 158.0, 288.0]
2025-08-07 11:33:36,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 50 minutes, 45 seconds)
2025-08-07 11:35:13,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:35:14,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 141.07405 ± 123.979
2025-08-07 11:35:14,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [318.32684, 16.125109, 28.503517, 173.62679, 31.999315, 39.895782, 314.7075, 308.35034, 28.844183, 150.36107]
2025-08-07 11:35:14,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [246.0, 25.0, 33.0, 144.0, 34.0, 42.0, 213.0, 232.0, 31.0, 154.0]
2025-08-07 11:35:14,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 49 minutes, 43 seconds)
2025-08-07 11:36:53,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:54,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 174.74611 ± 104.965
2025-08-07 11:36:54,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [442.07275, 55.762455, 244.70795, 111.594536, 124.071266, 203.59006, 202.14995, 161.42253, 116.56499, 85.524704]
2025-08-07 11:36:54,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [269.0, 83.0, 175.0, 139.0, 129.0, 177.0, 160.0, 145.0, 165.0, 110.0]
2025-08-07 11:36:54,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 48 minutes, 52 seconds)
2025-08-07 11:38:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:38:34,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 257.77255 ± 133.726
2025-08-07 11:38:34,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [239.09561, 63.105835, 202.00291, 162.17679, 73.2287, 464.80637, 407.08206, 224.76994, 406.1797, 335.27786]
2025-08-07 11:38:34,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [180.0, 108.0, 139.0, 120.0, 112.0, 241.0, 362.0, 138.0, 278.0, 185.0]
2025-08-07 11:38:34,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 47 minutes, 14 seconds)
2025-08-07 11:40:10,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:40:12,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 229.47562 ± 145.240
2025-08-07 11:40:12,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [305.56573, 134.37724, 456.1012, 246.14908, 281.27884, 4.5895543, 324.02515, 27.214727, 398.74603, 116.70842]
2025-08-07 11:40:12,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [212.0, 114.0, 287.0, 142.0, 209.0, 16.0, 244.0, 31.0, 206.0, 112.0]
2025-08-07 11:40:12,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 45 minutes, 10 seconds)
2025-08-07 11:41:50,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:52,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 195.04446 ± 194.983
2025-08-07 11:41:52,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [117.39982, 212.43993, 155.01317, 323.25137, 35.357513, 17.9541, 112.66416, 99.066154, 725.2377, 152.06068]
2025-08-07 11:41:52,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [95.0, 155.0, 123.0, 153.0, 36.0, 25.0, 108.0, 121.0, 390.0, 134.0]
2025-08-07 11:41:52,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 44 minutes, 3 seconds)
2025-08-07 11:43:28,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:43:30,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 287.41052 ± 125.658
2025-08-07 11:43:30,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [407.17148, 321.25027, 282.71765, 284.31833, 13.983575, 178.87715, 236.05865, 406.32043, 483.17044, 260.2372]
2025-08-07 11:43:30,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [164.0, 165.0, 152.0, 127.0, 23.0, 123.0, 158.0, 246.0, 259.0, 157.0]
2025-08-07 11:43:30,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 42 minutes, 27 seconds)
2025-08-07 11:45:07,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:45:09,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 375.07782 ± 197.263
2025-08-07 11:45:09,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [149.47615, 13.996544, 289.0606, 600.5166, 290.07886, 297.23367, 621.3381, 359.37003, 523.47784, 606.2301]
2025-08-07 11:45:09,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [126.0, 28.0, 120.0, 304.0, 216.0, 133.0, 349.0, 171.0, 265.0, 379.0]
2025-08-07 11:45:09,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (375.08) for latency MM1Queue_a033_s075
2025-08-07 11:45:09,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 40 minutes, 40 seconds)
2025-08-07 11:46:48,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:46:50,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 305.62140 ± 122.596
2025-08-07 11:46:50,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [369.85474, 212.93703, 257.45767, 390.73138, 328.2769, 494.13663, 25.677315, 271.3905, 415.95978, 289.7922]
2025-08-07 11:46:50,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [187.0, 127.0, 130.0, 268.0, 188.0, 217.0, 30.0, 198.0, 197.0, 171.0]
2025-08-07 11:46:50,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 39 minutes, 20 seconds)
2025-08-07 11:48:28,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:48:30,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 288.18585 ± 197.949
2025-08-07 11:48:30,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [377.8835, 742.1462, 110.68398, 193.05138, 50.36139, 119.75992, 341.85257, 453.2929, 342.84573, 149.98111]
2025-08-07 11:48:30,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [191.0, 367.0, 90.0, 124.0, 60.0, 93.0, 166.0, 272.0, 231.0, 134.0]
2025-08-07 11:48:30,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 37 minutes, 56 seconds)
2025-08-07 11:50:08,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:50:10,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 342.29355 ± 143.559
2025-08-07 11:50:10,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [580.7773, 244.58922, 317.8252, 419.67615, 406.04163, 371.40686, 482.01248, 326.46625, 246.99796, 27.142212]
2025-08-07 11:50:10,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [308.0, 153.0, 147.0, 212.0, 205.0, 210.0, 269.0, 177.0, 142.0, 30.0]
2025-08-07 11:50:10,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 36 minutes, 17 seconds)
2025-08-07 11:51:45,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:51:47,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 313.61945 ± 178.721
2025-08-07 11:51:47,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [497.24786, 589.6606, 296.54666, 367.2059, 40.394974, 4.0110836, 472.82465, 272.8697, 233.26521, 362.1679]
2025-08-07 11:51:47,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [250.0, 278.0, 189.0, 173.0, 40.0, 16.0, 250.0, 173.0, 129.0, 173.0]
2025-08-07 11:51:47,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 34 minutes, 23 seconds)
2025-08-07 11:53:24,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:53:26,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 332.06827 ± 160.681
2025-08-07 11:53:26,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [368.41696, 427.12366, 414.4194, 287.87744, 491.34195, 179.44865, 146.2353, 517.7085, 13.900187, 474.2107]
2025-08-07 11:53:26,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [330.0, 218.0, 208.0, 172.0, 232.0, 90.0, 105.0, 283.0, 25.0, 262.0]
2025-08-07 11:53:26,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 32 minutes, 45 seconds)
2025-08-07 11:55:04,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:06,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 336.48456 ± 196.849
2025-08-07 11:55:06,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [349.88364, 291.7581, 319.09744, 631.76776, 110.33316, 34.39088, 122.83646, 481.6124, 394.56113, 628.6045]
2025-08-07 11:55:06,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [169.0, 144.0, 140.0, 363.0, 110.0, 39.0, 105.0, 242.0, 202.0, 343.0]
2025-08-07 11:55:06,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 30 minutes, 49 seconds)
2025-08-07 11:56:43,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:56:46,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 398.97943 ± 183.071
2025-08-07 11:56:46,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [443.628, 471.72592, 604.4811, 407.09372, 350.73242, 756.4063, 68.8998, 320.8551, 365.20795, 200.76408]
2025-08-07 11:56:46,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [231.0, 226.0, 311.0, 225.0, 173.0, 402.0, 88.0, 168.0, 181.0, 104.0]
2025-08-07 11:56:46,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (398.98) for latency MM1Queue_a033_s075
2025-08-07 11:56:46,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 29 minutes, 16 seconds)
2025-08-07 11:58:24,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:58:26,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 367.65250 ± 199.309
2025-08-07 11:58:26,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [368.97882, 368.68973, 795.3446, 288.87576, 198.96613, 427.4151, 19.41269, 563.5455, 401.35086, 243.9458]
2025-08-07 11:58:26,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [232.0, 163.0, 441.0, 144.0, 151.0, 197.0, 27.0, 313.0, 231.0, 131.0]
2025-08-07 11:58:26,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 27 minutes, 43 seconds)
2025-08-07 12:00:04,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:00:07,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 487.30737 ± 136.876
2025-08-07 12:00:07,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [538.0881, 348.77945, 276.7243, 570.9754, 694.26935, 463.89856, 265.6233, 576.6436, 555.39764, 582.67444]
2025-08-07 12:00:07,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [288.0, 165.0, 149.0, 301.0, 299.0, 224.0, 197.0, 303.0, 262.0, 269.0]
2025-08-07 12:00:07,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (487.31) for latency MM1Queue_a033_s075
2025-08-07 12:00:07,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 26 minutes, 37 seconds)
2025-08-07 12:01:45,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:01:46,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 332.63702 ± 247.317
2025-08-07 12:01:46,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [213.75049, 45.673412, 212.46231, 17.926315, 842.2055, 393.63672, 363.04395, 251.1526, 692.1903, 294.3287]
2025-08-07 12:01:46,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [117.0, 47.0, 125.0, 25.0, 461.0, 199.0, 193.0, 132.0, 268.0, 144.0]
2025-08-07 12:01:46,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 25 minutes, 1 second)
2025-08-07 12:03:23,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:25,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 407.69434 ± 207.777
2025-08-07 12:03:25,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [280.53857, 21.837261, 625.38715, 291.85263, 423.80374, 458.78912, 838.59845, 316.66373, 342.63052, 476.84192]
2025-08-07 12:03:25,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 28.0, 384.0, 153.0, 239.0, 227.0, 374.0, 144.0, 166.0, 205.0]
2025-08-07 12:03:25,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 23 minutes, 16 seconds)
2025-08-07 12:05:05,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:07,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 346.24948 ± 109.839
2025-08-07 12:05:07,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [378.40875, 420.3767, 323.13733, 481.323, 287.33307, 174.63947, 160.24013, 473.2443, 449.52896, 314.263]
2025-08-07 12:05:07,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [203.0, 199.0, 169.0, 233.0, 151.0, 90.0, 131.0, 224.0, 220.0, 158.0]
2025-08-07 12:05:07,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 21 minutes, 54 seconds)
2025-08-07 12:06:43,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:06:45,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 411.19928 ± 213.840
2025-08-07 12:06:45,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [843.06354, 333.34622, 313.14886, 246.30241, 329.61932, 462.55505, 549.1655, 644.22534, 356.58377, 33.982758]
2025-08-07 12:06:45,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [487.0, 166.0, 150.0, 124.0, 212.0, 212.0, 258.0, 316.0, 171.0, 40.0]
2025-08-07 12:06:45,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 19 minutes, 50 seconds)
2025-08-07 12:08:22,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:08:24,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 403.14310 ± 180.620
2025-08-07 12:08:24,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [618.3465, 462.45447, 386.60245, 19.956953, 463.0277, 394.57474, 481.58185, 584.466, 496.40466, 124.0154]
2025-08-07 12:08:24,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [338.0, 243.0, 219.0, 27.0, 250.0, 253.0, 194.0, 280.0, 283.0, 111.0]
2025-08-07 12:08:24,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 17 minutes, 53 seconds)
2025-08-07 12:10:00,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:02,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 393.62387 ± 70.855
2025-08-07 12:10:02,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [474.9339, 305.69037, 340.22467, 350.81622, 381.49536, 424.8774, 381.22647, 531.174, 444.03406, 301.76608]
2025-08-07 12:10:02,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [195.0, 165.0, 157.0, 168.0, 187.0, 194.0, 192.0, 224.0, 209.0, 239.0]
2025-08-07 12:10:02,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 16 minutes, 1 second)
2025-08-07 12:11:39,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:11:42,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 481.93604 ± 240.155
2025-08-07 12:11:42,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [598.1814, 321.0228, 395.0764, 847.18365, 517.3765, 503.71585, 14.373494, 456.8088, 861.25085, 304.37085]
2025-08-07 12:11:42,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [275.0, 177.0, 215.0, 344.0, 239.0, 219.0, 23.0, 232.0, 421.0, 164.0]
2025-08-07 12:11:42,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 14 minutes, 24 seconds)
2025-08-07 12:13:17,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:13:21,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 525.09381 ± 363.132
2025-08-07 12:13:21,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [136.57419, 491.92026, 685.4104, 674.15125, 319.6042, 271.55014, 765.52075, 164.19112, 1411.7499, 330.2658]
2025-08-07 12:13:21,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [117.0, 254.0, 386.0, 350.0, 178.0, 132.0, 455.0, 126.0, 896.0, 185.0]
2025-08-07 12:13:21,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (525.09) for latency MM1Queue_a033_s075
2025-08-07 12:13:21,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 12 minutes, 25 seconds)
2025-08-07 12:14:57,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:15:01,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 696.55463 ± 425.002
2025-08-07 12:15:01,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [288.03564, 1204.9879, 986.8764, 595.5343, 1479.497, 269.39127, 8.128722, 661.63873, 656.0396, 815.4168]
2025-08-07 12:15:01,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [127.0, 555.0, 445.0, 283.0, 860.0, 135.0, 18.0, 360.0, 267.0, 391.0]
2025-08-07 12:15:01,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (696.55) for latency MM1Queue_a033_s075
2025-08-07 12:15:01,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 10 minutes, 57 seconds)
2025-08-07 12:16:36,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:39,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 469.09531 ± 225.362
2025-08-07 12:16:39,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [302.95883, 331.95145, 524.46356, 276.74457, 825.4596, 359.33832, 215.1773, 302.82944, 806.4003, 745.6297]
2025-08-07 12:16:39,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 172.0, 234.0, 136.0, 364.0, 171.0, 111.0, 137.0, 421.0, 333.0]
2025-08-07 12:16:39,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 9 minutes, 15 seconds)
2025-08-07 12:18:12,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:18:16,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 651.56079 ± 382.181
2025-08-07 12:18:16,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [998.97015, 779.86285, 593.3176, 961.21704, 648.10144, 1334.1302, 9.660037, 697.14575, 336.52676, 156.67563]
2025-08-07 12:18:16,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [506.0, 375.0, 273.0, 428.0, 382.0, 723.0, 19.0, 367.0, 177.0, 97.0]
2025-08-07 12:18:16,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 7 minutes, 26 seconds)
2025-08-07 12:19:54,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:19:58,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 832.96729 ± 364.343
2025-08-07 12:19:58,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [656.63635, 1441.4185, 601.1871, 1228.1074, 809.7897, 770.0046, 688.78705, 339.05942, 1365.7301, 428.95303]
2025-08-07 12:19:58,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [310.0, 699.0, 357.0, 596.0, 424.0, 386.0, 345.0, 207.0, 675.0, 244.0]
2025-08-07 12:19:58,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (832.97) for latency MM1Queue_a033_s075
2025-08-07 12:19:58,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 6 minutes, 15 seconds)
2025-08-07 12:21:31,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:21:34,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 509.39746 ± 274.764
2025-08-07 12:21:34,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [366.34055, 672.8826, 320.197, 302.65515, 313.3343, 393.35995, 645.2446, 381.79257, 1243.9288, 454.23926]
2025-08-07 12:21:34,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 316.0, 168.0, 149.0, 132.0, 180.0, 285.0, 191.0, 521.0, 206.0]
2025-08-07 12:21:34,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 4 minutes, 2 seconds)
2025-08-07 12:23:09,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:23:12,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 561.88507 ± 226.776
2025-08-07 12:23:12,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [532.61993, 252.31575, 480.74017, 593.83, 876.6821, 552.9631, 1044.8339, 514.1132, 465.90753, 304.8446]
2025-08-07 12:23:12,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [257.0, 121.0, 220.0, 256.0, 373.0, 239.0, 456.0, 224.0, 225.0, 152.0]
2025-08-07 12:23:12,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 2 minutes, 14 seconds)
2025-08-07 12:24:48,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:24:51,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 606.02844 ± 348.196
2025-08-07 12:24:51,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1034.961, 262.66534, 291.63107, 547.50323, 1287.3282, 1008.3843, 398.26492, 529.27234, 317.18356, 383.09042]
2025-08-07 12:24:51,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [453.0, 125.0, 156.0, 246.0, 601.0, 436.0, 192.0, 221.0, 165.0, 170.0]
2025-08-07 12:24:51,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 43 seconds)
2025-08-07 12:26:25,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:26:27,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 491.44550 ± 145.643
2025-08-07 12:26:27,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [461.97665, 410.603, 503.5771, 576.06976, 368.73932, 286.84174, 744.5978, 708.5534, 324.53717, 528.95886]
2025-08-07 12:26:27,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [218.0, 201.0, 223.0, 244.0, 171.0, 145.0, 376.0, 300.0, 150.0, 242.0]
2025-08-07 12:26:27,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 59 minutes)
2025-08-07 12:28:01,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:28:04,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 532.16809 ± 216.683
2025-08-07 12:28:04,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [458.41028, 502.47446, 315.83292, 357.62, 773.4767, 490.7468, 728.6749, 258.5402, 985.77765, 450.12698]
2025-08-07 12:28:04,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [195.0, 231.0, 266.0, 184.0, 309.0, 211.0, 292.0, 120.0, 403.0, 190.0]
2025-08-07 12:28:04,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 56 minutes, 37 seconds)
2025-08-07 12:29:44,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:29:49,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 903.31140 ± 421.031
2025-08-07 12:29:49,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [678.57806, 1426.3436, 1261.264, 575.04553, 695.5792, 961.8298, 1655.9152, 1042.0422, 305.33942, 431.17703]
2025-08-07 12:29:49,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [305.0, 632.0, 568.0, 242.0, 318.0, 507.0, 738.0, 515.0, 157.0, 204.0]
2025-08-07 12:29:49,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (903.31) for latency MM1Queue_a033_s075
2025-08-07 12:29:49,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 56 minutes, 10 seconds)
2025-08-07 12:31:27,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:31:31,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 838.12732 ± 321.292
2025-08-07 12:31:31,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [606.1066, 882.964, 806.23596, 262.2365, 942.08954, 1165.1633, 1237.1411, 1186.5594, 939.5344, 353.24228]
2025-08-07 12:31:31,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [315.0, 384.0, 343.0, 136.0, 425.0, 529.0, 542.0, 527.0, 408.0, 151.0]
2025-08-07 12:31:31,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 54 minutes, 53 seconds)
2025-08-07 12:33:12,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:33:15,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 662.17810 ± 236.957
2025-08-07 12:33:15,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [702.9807, 261.6438, 918.871, 945.7339, 758.7392, 751.5091, 275.6094, 904.6226, 588.9511, 513.1203]
2025-08-07 12:33:15,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [298.0, 130.0, 372.0, 371.0, 364.0, 354.0, 121.0, 408.0, 252.0, 189.0]
2025-08-07 12:33:15,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 53 minutes, 45 seconds)
2025-08-07 12:34:55,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:34:58,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 602.43250 ± 337.550
2025-08-07 12:34:58,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [701.727, 6.4924264, 872.51886, 284.9175, 546.8663, 237.60162, 1126.3304, 1036.7197, 631.58307, 579.5681]
2025-08-07 12:34:58,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [293.0, 17.0, 347.0, 155.0, 233.0, 123.0, 530.0, 402.0, 254.0, 254.0]
2025-08-07 12:34:58,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 52 minutes, 43 seconds)
2025-08-07 12:36:35,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:36:40,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 992.03888 ± 438.021
2025-08-07 12:36:40,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [673.8212, 217.94304, 756.8901, 1709.4453, 818.77527, 922.52045, 1717.3395, 1233.1783, 1041.6692, 828.80707]
2025-08-07 12:36:40,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [291.0, 117.0, 307.0, 722.0, 328.0, 465.0, 785.0, 509.0, 410.0, 347.0]
2025-08-07 12:36:40,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (992.04) for latency MM1Queue_a033_s075
2025-08-07 12:36:40,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 34 seconds)
2025-08-07 12:38:15,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:38:19,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 620.39905 ± 236.477
2025-08-07 12:38:19,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [630.58685, 762.57306, 272.83026, 1039.3063, 542.33014, 464.97522, 998.2516, 402.3911, 618.8636, 471.88223]
2025-08-07 12:38:19,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [273.0, 311.0, 140.0, 503.0, 235.0, 183.0, 395.0, 491.0, 253.0, 226.0]
2025-08-07 12:38:19,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 15 seconds)
2025-08-07 12:39:55,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:40:00,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 912.18231 ± 534.932
2025-08-07 12:40:00,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [278.10587, 443.72208, 799.61755, 1118.4767, 1591.908, 2109.381, 469.04498, 908.32294, 654.6165, 748.6276]
2025-08-07 12:40:00,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 197.0, 360.0, 442.0, 665.0, 1000.0, 212.0, 392.0, 337.0, 318.0]
2025-08-07 12:40:00,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 47 minutes, 28 seconds)
2025-08-07 12:41:39,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:41:42,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 718.01837 ± 197.412
2025-08-07 12:41:42,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [428.9281, 452.81345, 653.4744, 610.60565, 791.7428, 649.7612, 963.7733, 1021.9088, 657.6042, 949.5717]
2025-08-07 12:41:42,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [176.0, 259.0, 257.0, 307.0, 327.0, 280.0, 503.0, 436.0, 285.0, 444.0]
2025-08-07 12:41:42,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 39 seconds)
2025-08-07 12:43:11,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:43:15,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 690.87061 ± 271.477
2025-08-07 12:43:15,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [313.17258, 677.17413, 414.9176, 1064.5272, 775.9489, 1025.8257, 846.025, 654.9724, 886.99756, 249.14488]
2025-08-07 12:43:15,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [196.0, 254.0, 179.0, 453.0, 343.0, 513.0, 441.0, 264.0, 390.0, 127.0]
2025-08-07 12:43:15,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 43 minutes, 4 seconds)
2025-08-07 12:44:45,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:44:50,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 858.91425 ± 430.317
2025-08-07 12:44:50,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [182.64429, 956.8665, 1021.62244, 230.02081, 833.002, 674.30286, 924.79016, 997.92126, 1804.1006, 963.87164]
2025-08-07 12:44:50,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [95.0, 435.0, 422.0, 111.0, 362.0, 277.0, 380.0, 1000.0, 778.0, 414.0]
2025-08-07 12:44:50,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 40 minutes, 51 seconds)
2025-08-07 12:46:23,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:46:28,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1131.81860 ± 447.699
2025-08-07 12:46:28,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2133.5288, 733.9281, 1289.9382, 1450.1178, 1137.0857, 559.5015, 893.3139, 1461.0101, 973.29205, 686.4704]
2025-08-07 12:46:28,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 312.0, 566.0, 664.0, 551.0, 250.0, 357.0, 657.0, 437.0, 269.0]
2025-08-07 12:46:28,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (1131.82) for latency MM1Queue_a033_s075
2025-08-07 12:46:28,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 39 minutes, 9 seconds)
2025-08-07 12:48:00,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:48:03,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 650.06805 ± 509.754
2025-08-07 12:48:03,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1114.8556, 1989.0428, 656.4784, 423.16464, 325.82468, 333.98138, 698.6251, 363.71667, 291.61743, 303.37396]
2025-08-07 12:48:03,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [416.0, 857.0, 298.0, 167.0, 134.0, 139.0, 252.0, 154.0, 128.0, 128.0]
2025-08-07 12:48:03,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 37 minutes, 4 seconds)
2025-08-07 12:49:39,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:49:45,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1185.61206 ± 541.636
2025-08-07 12:49:45,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1283.5295, 1006.2875, 178.65369, 2286.3442, 883.054, 1580.255, 725.5698, 973.27167, 1446.284, 1492.8722]
2025-08-07 12:49:45,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [531.0, 391.0, 98.0, 1000.0, 385.0, 651.0, 325.0, 383.0, 582.0, 645.0]
2025-08-07 12:49:45,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (1185.61) for latency MM1Queue_a033_s075
2025-08-07 12:49:45,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 23 seconds)
2025-08-07 12:51:16,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:51:22,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1216.73865 ± 342.799
2025-08-07 12:51:22,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [935.16364, 1340.2162, 1422.0835, 1079.0321, 1192.6716, 1932.6727, 1405.1926, 656.9629, 1357.6027, 845.7879]
2025-08-07 12:51:22,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [364.0, 518.0, 623.0, 451.0, 542.0, 765.0, 586.0, 274.0, 533.0, 336.0]
2025-08-07 12:51:22,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (1216.74) for latency MM1Queue_a033_s075
2025-08-07 12:51:22,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 34 minutes, 6 seconds)
2025-08-07 12:52:52,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:52:58,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1208.04858 ± 426.744
2025-08-07 12:52:58,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2125.3406, 1200.7938, 1473.5864, 1583.6924, 989.95135, 855.90826, 790.00586, 1126.1029, 580.71655, 1354.3876]
2025-08-07 12:52:58,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 538.0, 600.0, 685.0, 457.0, 367.0, 338.0, 468.0, 317.0, 556.0]
2025-08-07 12:52:58,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 30 seconds)
2025-08-07 12:54:31,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:54:40,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1728.51245 ± 568.589
2025-08-07 12:54:40,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [826.0168, 2363.9685, 2453.5557, 2420.0356, 1491.0868, 1371.3744, 1108.4, 1640.5193, 1345.2974, 2264.8718]
2025-08-07 12:54:40,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [349.0, 1000.0, 976.0, 1000.0, 615.0, 559.0, 498.0, 636.0, 560.0, 1000.0]
2025-08-07 12:54:40,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (1728.51) for latency MM1Queue_a033_s075
2025-08-07 12:54:40,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes, 6 seconds)
2025-08-07 12:56:10,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:56:15,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 976.26318 ± 371.860
2025-08-07 12:56:15,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1312.983, 1244.1436, 1160.7126, 1473.5408, 316.59082, 1296.1763, 479.45764, 1015.0812, 657.2791, 806.6666]
2025-08-07 12:56:15,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [576.0, 536.0, 528.0, 701.0, 186.0, 536.0, 297.0, 428.0, 282.0, 427.0]
2025-08-07 12:56:15,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 29 minutes, 31 seconds)
2025-08-07 12:57:53,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:57:57,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 932.39697 ± 529.235
2025-08-07 12:57:57,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [599.37146, 732.1333, 902.31775, 1058.7201, 2306.6475, 163.22475, 754.2658, 800.1214, 787.4047, 1219.763]
2025-08-07 12:57:57,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [248.0, 325.0, 409.0, 440.0, 982.0, 90.0, 308.0, 347.0, 345.0, 482.0]
2025-08-07 12:57:57,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 54 seconds)
2025-08-07 12:59:28,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:59:33,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1168.50269 ± 592.487
2025-08-07 12:59:33,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [241.54024, 1458.95, 2508.879, 1236.7905, 742.7824, 1624.8644, 582.29926, 1192.3165, 1086.467, 1010.13806]
2025-08-07 12:59:33,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [117.0, 600.0, 1000.0, 487.0, 281.0, 627.0, 247.0, 439.0, 439.0, 376.0]
2025-08-07 12:59:33,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 10 seconds)
2025-08-07 13:01:06,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:01:16,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2053.03809 ± 486.529
2025-08-07 13:01:16,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1713.6855, 2156.4006, 1262.3148, 2248.34, 2341.0066, 2432.996, 2424.3638, 2559.216, 1103.758, 2288.2988]
2025-08-07 13:01:16,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [684.0, 859.0, 502.0, 958.0, 1000.0, 977.0, 983.0, 997.0, 450.0, 914.0]
2025-08-07 13:01:16,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (2053.04) for latency MM1Queue_a033_s075
2025-08-07 13:01:16,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 53 seconds)
2025-08-07 13:02:47,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:02:54,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1440.96948 ± 821.162
2025-08-07 13:02:54,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2466.8003, 2401.3972, 319.88852, 1093.4553, 1209.1791, 1638.0884, 2107.2524, 199.80963, 677.7405, 2296.083]
2025-08-07 13:02:54,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 146.0, 447.0, 470.0, 686.0, 852.0, 105.0, 278.0, 926.0]
2025-08-07 13:02:54,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 2 seconds)
2025-08-07 13:04:24,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:04:31,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1412.01306 ± 937.638
2025-08-07 13:04:31,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [830.2617, 2591.2163, 160.1058, 2524.3525, 1601.9308, 1098.7771, 2052.8086, 714.95416, 10.185005, 2535.539]
2025-08-07 13:04:31,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [379.0, 1000.0, 217.0, 1000.0, 607.0, 437.0, 774.0, 280.0, 20.0, 1000.0]
2025-08-07 13:04:31,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 27 seconds)
2025-08-07 13:06:06,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:06:14,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1788.68579 ± 763.680
2025-08-07 13:06:14,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1339.4696, 2459.0518, 734.91626, 2287.0037, 1640.9501, 2454.0432, 778.9457, 2680.555, 2633.5896, 878.3325]
2025-08-07 13:06:14,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [518.0, 1000.0, 295.0, 1000.0, 643.0, 1000.0, 348.0, 983.0, 995.0, 364.0]
2025-08-07 13:06:14,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 51 seconds)
2025-08-07 13:07:47,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:07:55,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1675.50415 ± 867.834
2025-08-07 13:07:55,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2395.8967, 2432.5854, 2311.9102, 640.4311, 2494.994, 2142.745, 1391.024, 156.60025, 520.659, 2268.1943]
2025-08-07 13:07:55,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 897.0, 265.0, 1000.0, 855.0, 532.0, 121.0, 220.0, 1000.0]
2025-08-07 13:07:55,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 23 seconds)
2025-08-07 13:09:30,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:09:39,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1915.17896 ± 834.326
2025-08-07 13:09:39,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1208.1549, 604.7978, 336.4601, 2491.5613, 2527.0159, 1845.3735, 2550.4937, 2488.009, 2585.833, 2514.0903]
2025-08-07 13:09:39,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [460.0, 248.0, 148.0, 1000.0, 1000.0, 676.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:09:39,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 46 seconds)
2025-08-07 13:11:08,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:11:17,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2038.08923 ± 700.780
2025-08-07 13:11:17,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2466.8774, 2463.4524, 1314.429, 2461.9673, 2365.6702, 2471.0188, 314.29828, 2479.1143, 1585.2472, 2458.8162]
2025-08-07 13:11:17,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 505.0, 1000.0, 1000.0, 1000.0, 136.0, 1000.0, 589.0, 1000.0]
2025-08-07 13:11:17,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 6 seconds)
2025-08-07 13:12:51,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:13:00,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1925.68713 ± 612.504
2025-08-07 13:13:00,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2510.4968, 1090.1366, 1554.7183, 725.67554, 2575.7651, 1654.9803, 2083.2683, 2163.4175, 2514.203, 2384.2083]
2025-08-07 13:13:00,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 409.0, 589.0, 319.0, 1000.0, 630.0, 855.0, 817.0, 1000.0, 1000.0]
2025-08-07 13:13:00,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 34 seconds)
2025-08-07 13:14:35,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:14:43,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1834.75037 ± 848.214
2025-08-07 13:14:43,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2608.2126, 2468.5464, 1863.9912, 548.22205, 2734.2446, 1121.6384, 2424.6946, 1738.3728, 2542.1536, 297.42764]
2025-08-07 13:14:43,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 896.0, 702.0, 220.0, 1000.0, 395.0, 1000.0, 624.0, 1000.0, 135.0]
2025-08-07 13:14:43,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 52 seconds)
2025-08-07 13:16:13,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:16:21,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2027.03418 ± 775.422
2025-08-07 13:16:21,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2655.3704, 985.7738, 932.3758, 1170.5681, 2582.2605, 2649.6682, 2769.9785, 2718.6868, 1252.9054, 2552.7534]
2025-08-07 13:16:21,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 411.0, 321.0, 449.0, 1000.0, 1000.0, 1000.0, 1000.0, 454.0, 1000.0]
2025-08-07 13:16:21,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 8 seconds)
2025-08-07 13:17:54,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:18:01,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1694.71021 ± 1042.218
2025-08-07 13:18:01,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2757.8296, 343.46024, 870.182, 1237.7625, 2615.698, 2727.3704, 1068.2668, 2642.2236, 7.233404, 2677.0747]
2025-08-07 13:18:01,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 155.0, 305.0, 454.0, 1000.0, 1000.0, 381.0, 1000.0, 18.0, 1000.0]
2025-08-07 13:18:01,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 22 seconds)
2025-08-07 13:19:39,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:19:45,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1282.48767 ± 1027.622
2025-08-07 13:19:45,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [937.15985, 3.955833, 1612.6547, 279.26505, 2668.9448, 648.24774, 2613.5186, 162.86804, 1097.0275, 2801.2349]
2025-08-07 13:19:45,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [335.0, 15.0, 577.0, 137.0, 1000.0, 252.0, 1000.0, 91.0, 446.0, 1000.0]
2025-08-07 13:19:45,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 45 seconds)
2025-08-07 13:21:19,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:21:28,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2182.63525 ± 763.716
2025-08-07 13:21:28,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2622.7988, 2687.1912, 2528.738, 708.70264, 2642.0664, 1120.1489, 2810.0103, 1287.7716, 2673.3303, 2745.5945]
2025-08-07 13:21:28,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 260.0, 1000.0, 414.0, 1000.0, 509.0, 1000.0, 1000.0]
2025-08-07 13:21:28,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (2182.64) for latency MM1Queue_a033_s075
2025-08-07 13:21:28,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 5 seconds)
2025-08-07 13:22:58,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:23:07,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1944.34253 ± 1023.601
2025-08-07 13:23:07,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [681.2552, 2461.0967, 171.36049, 2674.666, 327.9564, 2683.6267, 2624.5476, 2572.2827, 2629.4602, 2617.174]
2025-08-07 13:23:07,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [255.0, 897.0, 95.0, 1000.0, 142.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:23:07,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 21 seconds)
2025-08-07 13:24:37,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:24:43,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1547.31958 ± 981.287
2025-08-07 13:24:43,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [696.8148, 1686.7362, 720.3684, 333.0134, 1025.9008, 2680.8945, 344.72864, 2654.7908, 2655.8179, 2674.1296]
2025-08-07 13:24:43,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [234.0, 682.0, 302.0, 145.0, 326.0, 1000.0, 150.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:24:43,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 40 seconds)
2025-08-07 13:26:14,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:26:21,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1715.26245 ± 874.911
2025-08-07 13:26:21,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1490.0902, 2579.753, 371.24362, 2023.219, 1244.1144, 2655.5076, 2661.2617, 989.65497, 2660.9302, 476.84985]
2025-08-07 13:26:21,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [541.0, 1000.0, 155.0, 750.0, 463.0, 1000.0, 1000.0, 376.0, 1000.0, 190.0]
2025-08-07 13:26:21,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1251 [DEBUG]: Training session finished
