2025-08-07 10:19:58,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc5-walker2d/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:19:58,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc5-walker2d/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:19:58,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1497c59afe10>}
2025-08-07 10:19:58,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 10:19:58,917 baseline-bpql-noiseperc5-walker2d:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:19:58,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 10:19:58,935 baseline-bpql-noiseperc5-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=113, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 10:19:58,935 baseline-bpql-noiseperc5-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:19:59,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 10:19:59,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 10:21:31,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:21:32,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: -0.64295 ± 2.327
2025-08-07 10:21:32,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [-1.9431845, -4.0349016, -2.4129696, 4.5528016, -0.95240843, -0.18038584, 0.6274863, -3.1259127, 0.62527424, 0.4147218]
2025-08-07 10:21:32,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [46.0, 48.0, 46.0, 40.0, 44.0, 52.0, 40.0, 45.0, 46.0, 49.0]
2025-08-07 10:21:32,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (-0.64) for latency MM1Queue_a033_s075
2025-08-07 10:21:32,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 32 minutes, 13 seconds)
2025-08-07 10:23:11,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:23:12,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: -0.10288 ± 19.401
2025-08-07 10:23:12,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [-12.3158, 11.576128, 12.272317, -32.852406, 20.386978, -9.126013, 15.983404, -33.53022, 16.181469, 10.395372]
2025-08-07 10:23:12,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [104.0, 79.0, 79.0, 121.0, 55.0, 100.0, 74.0, 138.0, 50.0, 70.0]
2025-08-07 10:23:12,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (-0.10) for latency MM1Queue_a033_s075
2025-08-07 10:23:12,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 37 minutes, 5 seconds)
2025-08-07 10:24:51,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:52,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 75.12312 ± 56.661
2025-08-07 10:24:52,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [127.427925, 132.97942, 49.424305, 42.027264, 37.134323, 31.707422, 41.18878, 206.92844, 33.70725, 48.70606]
2025-08-07 10:24:52,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [95.0, 92.0, 62.0, 38.0, 36.0, 31.0, 39.0, 122.0, 35.0, 46.0]
2025-08-07 10:24:52,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (75.12) for latency MM1Queue_a033_s075
2025-08-07 10:24:52,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 37 minutes, 38 seconds)
2025-08-07 10:26:31,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:26:32,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 75.92588 ± 77.385
2025-08-07 10:26:32,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [305.03824, 31.393906, 57.54738, 62.211018, 68.46155, 32.357002, 44.437996, 43.56919, 66.71028, 47.53229]
2025-08-07 10:26:32,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [182.0, 38.0, 130.0, 92.0, 75.0, 36.0, 51.0, 53.0, 92.0, 59.0]
2025-08-07 10:26:32,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (75.93) for latency MM1Queue_a033_s075
2025-08-07 10:26:32,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 36 minutes, 58 seconds)
2025-08-07 10:28:12,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:14,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 163.64281 ± 100.780
2025-08-07 10:28:14,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [354.6257, 85.30662, 24.65424, 180.36916, 96.044914, 185.58717, 229.65771, 212.81712, 247.11931, 20.246183]
2025-08-07 10:28:14,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [268.0, 168.0, 29.0, 123.0, 114.0, 126.0, 162.0, 148.0, 158.0, 209.0]
2025-08-07 10:28:14,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (163.64) for latency MM1Queue_a033_s075
2025-08-07 10:28:14,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 36 minutes, 33 seconds)
2025-08-07 10:29:54,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:29:56,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 98.50905 ± 98.311
2025-08-07 10:29:56,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [43.443108, 43.42037, 54.636658, 75.05273, 20.20609, 63.199596, 342.501, 230.1608, 44.168236, 68.30185]
2025-08-07 10:29:56,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [102.0, 187.0, 73.0, 201.0, 96.0, 101.0, 273.0, 152.0, 151.0, 111.0]
2025-08-07 10:29:56,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 37 minutes, 53 seconds)
2025-08-07 10:31:34,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:31:36,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 152.80701 ± 89.660
2025-08-07 10:31:36,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [66.65281, 256.36664, 33.887028, 47.49925, 226.0473, 247.29694, 62.51216, 125.74444, 202.79611, 259.26736]
2025-08-07 10:31:36,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [137.0, 168.0, 38.0, 83.0, 160.0, 183.0, 211.0, 138.0, 160.0, 174.0]
2025-08-07 10:31:36,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 36 minutes, 13 seconds)
2025-08-07 10:33:15,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:16,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 151.37262 ± 110.155
2025-08-07 10:33:16,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [44.122772, 178.84056, 143.19664, 72.917656, 340.27274, 229.99812, 81.58667, 333.73682, 48.925972, 40.12818]
2025-08-07 10:33:16,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [62.0, 125.0, 131.0, 72.0, 227.0, 139.0, 128.0, 237.0, 67.0, 82.0]
2025-08-07 10:33:16,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 34 minutes, 42 seconds)
2025-08-07 10:34:56,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:34:57,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 113.64699 ± 41.351
2025-08-07 10:34:57,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [120.68767, 121.22689, 44.290462, 67.14832, 164.48033, 121.57236, 62.348663, 130.88754, 124.38737, 179.44028]
2025-08-07 10:34:57,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [119.0, 103.0, 57.0, 81.0, 115.0, 108.0, 72.0, 115.0, 108.0, 132.0]
2025-08-07 10:34:57,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 33 minutes, 11 seconds)
2025-08-07 10:36:37,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:36:38,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 133.49738 ± 63.355
2025-08-07 10:36:38,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [195.26466, 85.98587, 176.67381, 212.2822, 54.14715, 187.63562, 39.266186, 172.9457, 56.649345, 154.12329]
2025-08-07 10:36:38,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [124.0, 114.0, 126.0, 120.0, 62.0, 143.0, 68.0, 133.0, 59.0, 142.0]
2025-08-07 10:36:38,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 31 minutes, 11 seconds)
2025-08-07 10:38:17,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:38:19,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 202.37424 ± 137.144
2025-08-07 10:38:19,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [494.75635, 106.47813, 177.96217, 168.75801, 303.65668, 37.52403, 340.11633, 81.67818, 65.6432, 247.16933]
2025-08-07 10:38:19,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [229.0, 91.0, 117.0, 91.0, 151.0, 37.0, 194.0, 71.0, 58.0, 225.0]
2025-08-07 10:38:19,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (202.37) for latency MM1Queue_a033_s075
2025-08-07 10:38:19,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 29 minutes, 15 seconds)
2025-08-07 10:39:58,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:59,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 210.54147 ± 85.519
2025-08-07 10:39:59,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [250.05139, 122.54918, 291.89902, 179.57558, 189.15225, 179.2354, 54.124092, 176.71155, 322.2096, 339.90683]
2025-08-07 10:39:59,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 68.0, 147.0, 100.0, 132.0, 104.0, 80.0, 101.0, 173.0, 199.0]
2025-08-07 10:39:59,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (210.54) for latency MM1Queue_a033_s075
2025-08-07 10:39:59,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 27 minutes, 38 seconds)
2025-08-07 10:41:39,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:41:40,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 152.35704 ± 101.425
2025-08-07 10:41:40,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [305.88513, 291.85135, 34.58692, 47.086903, 201.43697, 28.84201, 135.23543, 78.72595, 140.89767, 259.02206]
2025-08-07 10:41:40,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 145.0, 38.0, 67.0, 116.0, 32.0, 81.0, 64.0, 71.0, 129.0]
2025-08-07 10:41:40,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 26 minutes, 10 seconds)
2025-08-07 10:43:19,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:43:21,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 279.45718 ± 172.845
2025-08-07 10:43:21,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [274.0468, 132.54166, 134.70876, 352.08963, 90.99126, 128.50702, 701.9991, 390.40817, 320.07333, 269.20602]
2025-08-07 10:43:21,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [140.0, 163.0, 107.0, 155.0, 75.0, 110.0, 403.0, 160.0, 150.0, 120.0]
2025-08-07 10:43:21,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (279.46) for latency MM1Queue_a033_s075
2025-08-07 10:43:21,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 24 minutes, 34 seconds)
2025-08-07 10:45:00,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:45:02,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 172.49326 ± 143.298
2025-08-07 10:45:02,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [120.08468, 134.71117, 191.05103, 186.38643, 77.78244, 30.870007, 162.94048, 29.945335, 556.29517, 234.8659]
2025-08-07 10:45:02,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [62.0, 129.0, 116.0, 133.0, 82.0, 33.0, 339.0, 120.0, 287.0, 155.0]
2025-08-07 10:45:02,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 22 minutes, 53 seconds)
2025-08-07 10:46:42,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:44,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 296.08487 ± 169.180
2025-08-07 10:46:44,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [427.9101, 358.4403, 368.81842, 323.43088, 605.7753, 433.31766, 83.786446, 85.759674, 91.391106, 182.21886]
2025-08-07 10:46:44,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [305.0, 195.0, 163.0, 200.0, 331.0, 225.0, 97.0, 110.0, 114.0, 135.0]
2025-08-07 10:46:44,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (296.08) for latency MM1Queue_a033_s075
2025-08-07 10:46:44,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 21 minutes, 27 seconds)
2025-08-07 10:48:24,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:48:25,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 148.69168 ± 97.319
2025-08-07 10:48:25,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [250.22247, 187.86314, 96.50789, 165.27705, 14.846172, 35.060783, 61.157276, 100.09354, 298.79138, 277.09692]
2025-08-07 10:48:25,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 123.0, 140.0, 122.0, 22.0, 40.0, 126.0, 141.0, 209.0, 125.0]
2025-08-07 10:48:25,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 20 minutes, 5 seconds)
2025-08-07 10:50:06,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:50:08,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 267.04156 ± 74.257
2025-08-07 10:50:08,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [319.8602, 173.78366, 149.7242, 237.0555, 272.77866, 176.31096, 362.01782, 343.1773, 298.54272, 337.16473]
2025-08-07 10:50:08,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [270.0, 129.0, 110.0, 139.0, 159.0, 121.0, 168.0, 170.0, 128.0, 172.0]
2025-08-07 10:50:08,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 18 minutes, 46 seconds)
2025-08-07 10:51:47,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:51:49,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 207.76746 ± 120.746
2025-08-07 10:51:49,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [96.64519, 106.083466, 440.81924, 69.61982, 289.47598, 129.67126, 350.57867, 109.51501, 289.83002, 195.43576]
2025-08-07 10:51:49,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 125.0, 242.0, 110.0, 145.0, 192.0, 209.0, 122.0, 136.0, 97.0]
2025-08-07 10:51:49,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 17 minutes, 4 seconds)
2025-08-07 10:53:30,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:32,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 345.06671 ± 181.015
2025-08-07 10:53:32,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [112.21473, 330.28595, 472.1971, 403.22626, 603.2513, 443.68015, 397.13248, 28.28213, 522.28125, 138.11581]
2025-08-07 10:53:32,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [109.0, 176.0, 199.0, 184.0, 277.0, 269.0, 205.0, 30.0, 316.0, 114.0]
2025-08-07 10:53:32,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (345.07) for latency MM1Queue_a033_s075
2025-08-07 10:53:32,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 15 minutes, 54 seconds)
2025-08-07 10:55:12,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:55:14,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 313.55243 ± 183.540
2025-08-07 10:55:14,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [35.109715, 198.92484, 282.0866, 433.99332, 604.1495, 186.66685, 286.62726, 204.06262, 253.03568, 650.8682]
2025-08-07 10:55:14,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [38.0, 194.0, 133.0, 277.0, 280.0, 143.0, 133.0, 106.0, 125.0, 342.0]
2025-08-07 10:55:14,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 14 minutes, 14 seconds)
2025-08-07 10:56:54,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:56,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 298.60397 ± 130.862
2025-08-07 10:56:56,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [429.9997, 241.1216, 190.37892, 203.96472, 153.85326, 179.20547, 388.6431, 570.6119, 392.31424, 235.94707]
2025-08-07 10:56:56,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [253.0, 119.0, 94.0, 144.0, 133.0, 93.0, 181.0, 269.0, 162.0, 153.0]
2025-08-07 10:56:56,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 12 minutes, 41 seconds)
2025-08-07 10:58:35,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:36,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 268.77710 ± 138.961
2025-08-07 10:58:36,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [200.76152, 188.46706, 335.3312, 28.97083, 162.88846, 479.97906, 240.46748, 340.7024, 207.80731, 502.3957]
2025-08-07 10:58:36,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [133.0, 141.0, 170.0, 32.0, 170.0, 208.0, 165.0, 176.0, 145.0, 224.0]
2025-08-07 10:58:36,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 10 minutes, 28 seconds)
2025-08-07 11:00:17,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:19,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 235.34793 ± 91.075
2025-08-07 11:00:19,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [14.091167, 225.24739, 336.0802, 254.13512, 161.82489, 234.12132, 341.09854, 307.91125, 266.86533, 212.1039]
2025-08-07 11:00:19,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 131.0, 153.0, 132.0, 200.0, 123.0, 160.0, 161.0, 142.0, 117.0]
2025-08-07 11:00:19,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 9 minutes, 9 seconds)
2025-08-07 11:01:58,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:59,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 253.58443 ± 105.028
2025-08-07 11:01:59,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [121.54546, 278.00702, 262.39307, 219.72072, 385.0337, 366.78867, 320.862, 26.226934, 319.96793, 235.29852]
2025-08-07 11:01:59,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [75.0, 132.0, 132.0, 106.0, 205.0, 186.0, 153.0, 29.0, 128.0, 137.0]
2025-08-07 11:01:59,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 6 minutes, 53 seconds)
2025-08-07 11:03:38,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:40,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 205.21957 ± 129.795
2025-08-07 11:03:40,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [29.110186, 251.7972, 34.633327, 253.53671, 287.85266, 188.19943, 448.42676, 266.75897, 30.253233, 261.62744]
2025-08-07 11:03:40,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [34.0, 114.0, 36.0, 120.0, 205.0, 116.0, 199.0, 144.0, 34.0, 130.0]
2025-08-07 11:03:40,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 4 minutes, 48 seconds)
2025-08-07 11:05:19,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:21,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 338.48239 ± 174.864
2025-08-07 11:05:21,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [183.37491, 239.81639, 206.51865, 357.66437, 145.33575, 213.70747, 579.99646, 528.94635, 658.17377, 271.2898]
2025-08-07 11:05:21,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [99.0, 107.0, 149.0, 165.0, 136.0, 118.0, 249.0, 309.0, 310.0, 132.0]
2025-08-07 11:05:21,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 3 minutes, 1 second)
2025-08-07 11:07:02,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:03,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 240.34860 ± 103.675
2025-08-07 11:07:03,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [313.8175, 272.6097, 60.8, 203.72769, 280.13965, 303.29742, 58.896202, 237.64186, 409.02637, 263.5296]
2025-08-07 11:07:03,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [188.0, 145.0, 61.0, 161.0, 146.0, 158.0, 86.0, 128.0, 197.0, 127.0]
2025-08-07 11:07:03,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 1 minute, 39 seconds)
2025-08-07 11:08:43,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:45,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 293.15482 ± 155.537
2025-08-07 11:08:45,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [189.4275, 480.38553, 84.03, 574.38745, 352.6027, 180.87233, 346.94418, 352.9407, 68.22159, 301.73618]
2025-08-07 11:08:45,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [126.0, 266.0, 93.0, 270.0, 220.0, 133.0, 178.0, 160.0, 85.0, 221.0]
2025-08-07 11:08:45,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 59 minutes, 43 seconds)
2025-08-07 11:10:26,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:28,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 333.97052 ± 117.907
2025-08-07 11:10:28,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [341.01486, 306.48755, 251.75493, 589.5774, 373.19147, 423.8896, 171.93784, 347.14243, 164.4099, 370.2992]
2025-08-07 11:10:28,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [160.0, 149.0, 128.0, 245.0, 183.0, 219.0, 132.0, 173.0, 118.0, 182.0]
2025-08-07 11:10:28,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 58 minutes, 47 seconds)
2025-08-07 11:12:06,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:08,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 350.66663 ± 169.144
2025-08-07 11:12:08,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [314.8794, 719.2887, 311.7147, 197.17476, 154.10121, 189.26025, 325.67764, 591.25366, 333.5335, 369.7825]
2025-08-07 11:12:08,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [130.0, 277.0, 164.0, 142.0, 107.0, 122.0, 138.0, 252.0, 164.0, 207.0]
2025-08-07 11:12:08,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (350.67) for latency MM1Queue_a033_s075
2025-08-07 11:12:08,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 56 minutes, 59 seconds)
2025-08-07 11:13:49,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:13:51,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 316.28824 ± 85.895
2025-08-07 11:13:51,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [269.832, 212.57051, 337.13516, 167.70251, 417.7061, 331.271, 325.0871, 262.53027, 461.7076, 377.34]
2025-08-07 11:13:51,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 103.0, 150.0, 119.0, 224.0, 177.0, 158.0, 147.0, 248.0, 210.0]
2025-08-07 11:13:51,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 55 minutes, 37 seconds)
2025-08-07 11:15:31,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:33,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 341.88422 ± 102.062
2025-08-07 11:15:33,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [247.82397, 216.5752, 322.91016, 456.7817, 253.85953, 492.61005, 417.4856, 205.03358, 445.9145, 359.8476]
2025-08-07 11:15:33,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [131.0, 123.0, 151.0, 188.0, 171.0, 209.0, 198.0, 120.0, 199.0, 181.0]
2025-08-07 11:15:33,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 53 minutes, 46 seconds)
2025-08-07 11:17:14,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:16,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 318.46942 ± 137.539
2025-08-07 11:17:16,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [327.05365, 311.67657, 404.366, 146.46889, 191.83015, 179.19876, 390.1884, 305.57272, 649.7605, 278.57867]
2025-08-07 11:17:16,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [215.0, 158.0, 199.0, 116.0, 134.0, 125.0, 199.0, 161.0, 315.0, 193.0]
2025-08-07 11:17:16,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 52 minutes, 30 seconds)
2025-08-07 11:18:55,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:18:56,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 245.95181 ± 178.516
2025-08-07 11:18:56,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [13.915723, 416.4897, 332.90225, 251.91281, 22.552816, 550.62683, 185.723, 226.94662, 28.131712, 430.31653]
2025-08-07 11:18:56,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 198.0, 172.0, 168.0, 27.0, 259.0, 154.0, 140.0, 31.0, 213.0]
2025-08-07 11:18:56,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 50 minutes)
2025-08-07 11:20:36,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:38,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 354.94595 ± 189.347
2025-08-07 11:20:38,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [181.80627, 22.631372, 195.55359, 574.45233, 524.16736, 491.61026, 543.86926, 514.6526, 166.29482, 334.42163]
2025-08-07 11:20:38,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [119.0, 28.0, 132.0, 267.0, 280.0, 208.0, 265.0, 222.0, 117.0, 204.0]
2025-08-07 11:20:38,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (354.95) for latency MM1Queue_a033_s075
2025-08-07 11:20:38,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 48 minutes, 47 seconds)
2025-08-07 11:22:18,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:20,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 442.27124 ± 172.907
2025-08-07 11:22:20,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [246.88458, 400.2662, 406.13602, 447.6533, 369.4065, 864.3798, 382.99475, 649.6972, 301.26215, 354.03174]
2025-08-07 11:22:20,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 188.0, 197.0, 198.0, 159.0, 361.0, 172.0, 280.0, 145.0, 176.0]
2025-08-07 11:22:20,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (442.27) for latency MM1Queue_a033_s075
2025-08-07 11:22:20,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 46 minutes, 54 seconds)
2025-08-07 11:24:02,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:05,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 421.40118 ± 212.974
2025-08-07 11:24:05,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [550.7857, 138.27034, 389.16617, 544.16705, 503.86502, 626.1632, 83.11298, 142.5557, 723.72314, 512.20215]
2025-08-07 11:24:05,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [256.0, 138.0, 192.0, 273.0, 238.0, 303.0, 82.0, 126.0, 345.0, 230.0]
2025-08-07 11:24:05,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 45 minutes, 47 seconds)
2025-08-07 11:25:44,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:25:47,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 476.82480 ± 196.728
2025-08-07 11:25:47,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [430.1295, 23.066645, 576.8333, 388.85712, 595.24225, 333.77725, 634.9715, 395.63635, 690.8971, 698.8369]
2025-08-07 11:25:47,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [222.0, 27.0, 326.0, 169.0, 330.0, 165.0, 304.0, 195.0, 342.0, 365.0]
2025-08-07 11:25:47,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (476.82) for latency MM1Queue_a033_s075
2025-08-07 11:25:47,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 43 minutes, 56 seconds)
2025-08-07 11:27:27,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:27:30,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 595.12390 ± 108.639
2025-08-07 11:27:30,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [789.83716, 609.90186, 523.368, 636.3595, 571.5683, 373.862, 643.9366, 495.16946, 700.0669, 607.1691]
2025-08-07 11:27:30,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [313.0, 319.0, 240.0, 304.0, 268.0, 212.0, 271.0, 231.0, 299.0, 210.0]
2025-08-07 11:27:30,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (595.12) for latency MM1Queue_a033_s075
2025-08-07 11:27:30,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 42 minutes, 49 seconds)
2025-08-07 11:29:11,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:29:13,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 412.10663 ± 69.494
2025-08-07 11:29:13,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [486.20023, 370.3878, 402.52145, 326.42154, 268.68506, 497.6328, 448.15082, 408.05908, 440.8927, 472.11465]
2025-08-07 11:29:13,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [216.0, 178.0, 168.0, 157.0, 127.0, 179.0, 208.0, 190.0, 177.0, 172.0]
2025-08-07 11:29:13,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 41 minutes, 10 seconds)
2025-08-07 11:30:53,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:30:58,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 852.17303 ± 319.999
2025-08-07 11:30:58,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [806.313, 476.71243, 370.91275, 1169.49, 1314.8003, 540.18896, 641.68774, 885.2924, 1206.084, 1110.2493]
2025-08-07 11:30:58,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [517.0, 227.0, 183.0, 507.0, 596.0, 223.0, 265.0, 383.0, 467.0, 489.0]
2025-08-07 11:30:58,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (852.17) for latency MM1Queue_a033_s075
2025-08-07 11:30:58,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 40 minutes, 2 seconds)
2025-08-07 11:32:39,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:32:42,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 566.01019 ± 282.394
2025-08-07 11:32:42,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [978.5175, 307.15762, 413.09933, 340.31375, 1064.3085, 288.12866, 409.65207, 599.1796, 378.12866, 881.6168]
2025-08-07 11:32:42,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [385.0, 160.0, 202.0, 159.0, 458.0, 146.0, 171.0, 244.0, 167.0, 402.0]
2025-08-07 11:32:42,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 38 minutes, 12 seconds)
2025-08-07 11:34:24,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:26,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 400.03790 ± 143.832
2025-08-07 11:34:26,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [323.3029, 276.7768, 216.97055, 368.06592, 724.44464, 494.04968, 464.5643, 280.2458, 519.04114, 332.91687]
2025-08-07 11:34:26,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [160.0, 133.0, 106.0, 137.0, 309.0, 199.0, 164.0, 131.0, 253.0, 152.0]
2025-08-07 11:34:26,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 36 minutes, 45 seconds)
2025-08-07 11:36:03,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:06,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 620.51093 ± 216.721
2025-08-07 11:36:06,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [558.78033, 657.3786, 495.46976, 699.03595, 1002.12195, 907.13873, 469.90256, 232.91934, 733.123, 449.23926]
2025-08-07 11:36:06,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [259.0, 269.0, 224.0, 304.0, 468.0, 396.0, 198.0, 118.0, 301.0, 167.0]
2025-08-07 11:36:06,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 34 minutes, 32 seconds)
2025-08-07 11:37:47,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:37:50,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 560.57092 ± 259.093
2025-08-07 11:37:50,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [505.6527, 733.8428, 484.91647, 498.31897, 279.23355, 273.9812, 382.08658, 1186.8503, 500.93475, 759.8916]
2025-08-07 11:37:50,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [277.0, 368.0, 235.0, 241.0, 133.0, 122.0, 161.0, 521.0, 225.0, 336.0]
2025-08-07 11:37:50,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 33 minutes, 3 seconds)
2025-08-07 11:39:31,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:34,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 533.37170 ± 214.020
2025-08-07 11:39:34,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [743.2432, 497.44827, 591.2662, 123.41234, 359.24133, 788.4767, 529.95355, 316.44504, 536.51495, 847.7148]
2025-08-07 11:39:34,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [289.0, 196.0, 228.0, 99.0, 168.0, 287.0, 227.0, 138.0, 246.0, 317.0]
2025-08-07 11:39:34,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 31 minutes, 10 seconds)
2025-08-07 11:41:13,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:15,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 434.45084 ± 158.506
2025-08-07 11:41:15,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [264.47568, 255.96815, 371.76938, 350.728, 607.53577, 731.8196, 426.602, 620.2081, 443.7814, 271.62018]
2025-08-07 11:41:15,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [129.0, 133.0, 161.0, 150.0, 229.0, 310.0, 187.0, 346.0, 201.0, 125.0]
2025-08-07 11:41:15,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 28 minutes, 59 seconds)
2025-08-07 11:42:55,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:42:58,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 622.88867 ± 212.488
2025-08-07 11:42:58,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [615.27686, 739.4358, 305.99622, 558.3312, 516.73566, 854.3425, 939.89935, 782.3279, 668.4243, 248.11655]
2025-08-07 11:42:58,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [238.0, 267.0, 133.0, 222.0, 241.0, 310.0, 410.0, 323.0, 287.0, 119.0]
2025-08-07 11:42:58,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 27 minutes, 4 seconds)
2025-08-07 11:44:40,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:44:42,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 553.87750 ± 240.163
2025-08-07 11:44:42,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [358.13788, 919.9188, 116.805084, 814.8607, 572.1114, 575.4325, 405.0359, 406.36807, 503.0999, 867.00446]
2025-08-07 11:44:42,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [138.0, 414.0, 99.0, 329.0, 231.0, 241.0, 165.0, 182.0, 254.0, 363.0]
2025-08-07 11:44:42,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 26 minutes, 3 seconds)
2025-08-07 11:46:22,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:46:24,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 450.99628 ± 84.236
2025-08-07 11:46:24,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [403.85263, 329.9126, 514.2421, 485.56702, 572.70605, 430.47894, 427.34402, 334.81213, 422.94482, 588.1023]
2025-08-07 11:46:24,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [155.0, 152.0, 201.0, 177.0, 235.0, 158.0, 196.0, 138.0, 159.0, 220.0]
2025-08-07 11:46:24,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 23 minutes, 57 seconds)
2025-08-07 11:48:07,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:48:09,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 482.54843 ± 339.233
2025-08-07 11:48:09,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [161.32211, 400.41788, 25.866695, 280.82938, 136.29604, 620.39014, 855.9318, 1184.1548, 548.502, 611.77325]
2025-08-07 11:48:09,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [88.0, 186.0, 28.0, 125.0, 127.0, 271.0, 330.0, 463.0, 213.0, 246.0]
2025-08-07 11:48:09,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 22 minutes, 26 seconds)
2025-08-07 11:49:46,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:49:50,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 673.45056 ± 273.737
2025-08-07 11:49:50,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1195.4108, 436.23706, 756.7847, 572.0677, 591.6514, 575.9459, 1137.728, 315.6387, 687.9016, 465.13904]
2025-08-07 11:49:50,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [452.0, 237.0, 334.0, 224.0, 238.0, 239.0, 433.0, 139.0, 294.0, 179.0]
2025-08-07 11:49:50,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 20 minutes, 36 seconds)
2025-08-07 11:51:30,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:51:35,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1111.00171 ± 765.807
2025-08-07 11:51:35,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1353.1884, 529.7668, 907.65, 2696.122, 507.75787, 419.01495, 399.6702, 1384.1704, 665.0478, 2247.6287]
2025-08-07 11:51:35,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [478.0, 214.0, 348.0, 1000.0, 204.0, 152.0, 165.0, 493.0, 287.0, 806.0]
2025-08-07 11:51:35,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (1111.00) for latency MM1Queue_a033_s075
2025-08-07 11:51:35,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 19 minutes, 17 seconds)
2025-08-07 11:53:16,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:53:20,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 702.82074 ± 134.842
2025-08-07 11:53:20,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [704.2466, 949.1088, 899.10583, 697.114, 761.25775, 479.36804, 683.83496, 613.42505, 680.83356, 559.9129]
2025-08-07 11:53:20,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [284.0, 347.0, 334.0, 245.0, 365.0, 218.0, 277.0, 236.0, 257.0, 241.0]
2025-08-07 11:53:20,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 17 minutes, 36 seconds)
2025-08-07 11:55:01,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:04,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 522.22510 ± 264.068
2025-08-07 11:55:04,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [529.39886, 211.45708, 484.43964, 275.737, 262.27173, 1146.4905, 599.785, 409.21313, 531.53827, 771.9199]
2025-08-07 11:55:04,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [206.0, 126.0, 195.0, 127.0, 128.0, 415.0, 243.0, 175.0, 264.0, 292.0]
2025-08-07 11:55:04,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 16 minutes, 16 seconds)
2025-08-07 11:56:49,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:56:53,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1016.71936 ± 381.834
2025-08-07 11:56:53,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [678.6372, 1571.5408, 1332.3569, 915.6559, 1353.8877, 667.2715, 1231.5597, 683.54443, 372.4905, 1360.2498]
2025-08-07 11:56:53,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [265.0, 561.0, 469.0, 319.0, 631.0, 220.0, 419.0, 286.0, 170.0, 489.0]
2025-08-07 11:56:53,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 15 minutes, 7 seconds)
2025-08-07 11:58:33,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:58:37,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 902.27380 ± 313.134
2025-08-07 11:58:37,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1643.5057, 1219.4589, 776.00024, 655.2182, 875.2624, 612.36304, 748.5239, 650.6336, 719.5362, 1122.2363]
2025-08-07 11:58:37,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [574.0, 508.0, 256.0, 208.0, 377.0, 272.0, 265.0, 270.0, 243.0, 512.0]
2025-08-07 11:58:37,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 13 minutes, 48 seconds)
2025-08-07 12:00:19,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:00:22,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 776.56873 ± 207.225
2025-08-07 12:00:22,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1068.9308, 514.75494, 878.8379, 1147.5972, 585.8161, 787.3155, 577.02826, 792.1744, 849.6729, 563.5589]
2025-08-07 12:00:22,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [381.0, 215.0, 279.0, 416.0, 203.0, 285.0, 190.0, 274.0, 306.0, 187.0]
2025-08-07 12:00:22,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 12 minutes, 3 seconds)
2025-08-07 12:01:58,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:02:02,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 925.40247 ± 401.186
2025-08-07 12:02:02,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [839.7617, 1585.2504, 971.60895, 544.47577, 1012.4773, 891.76715, 883.7848, 255.603, 648.39557, 1620.8997]
2025-08-07 12:02:02,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [341.0, 509.0, 337.0, 227.0, 375.0, 304.0, 331.0, 184.0, 270.0, 623.0]
2025-08-07 12:02:02,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 9 minutes, 35 seconds)
2025-08-07 12:03:43,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:48,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1287.70581 ± 503.284
2025-08-07 12:03:48,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1217.3131, 1718.8654, 2452.8647, 1413.2235, 1449.3938, 902.69836, 550.01544, 812.30615, 1203.1698, 1157.207]
2025-08-07 12:03:48,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [456.0, 586.0, 843.0, 444.0, 503.0, 280.0, 224.0, 282.0, 378.0, 430.0]
2025-08-07 12:03:48,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (1287.71) for latency MM1Queue_a033_s075
2025-08-07 12:03:48,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 8 minutes, 10 seconds)
2025-08-07 12:05:28,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:30,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 499.27069 ± 81.103
2025-08-07 12:05:30,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [433.434, 483.71048, 569.3139, 429.9494, 474.10617, 383.5839, 501.39026, 453.57364, 620.8839, 642.76184]
2025-08-07 12:05:30,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [155.0, 171.0, 228.0, 155.0, 178.0, 143.0, 171.0, 160.0, 244.0, 240.0]
2025-08-07 12:05:30,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 5 minutes, 29 seconds)
2025-08-07 12:07:14,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:07:16,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 581.34595 ± 263.835
2025-08-07 12:07:16,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [266.54755, 436.05875, 558.5623, 289.96133, 540.79016, 532.14233, 1141.1781, 540.2354, 990.2944, 517.6889]
2025-08-07 12:07:16,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [110.0, 181.0, 185.0, 118.0, 187.0, 190.0, 432.0, 199.0, 388.0, 174.0]
2025-08-07 12:07:16,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 4 minutes, 6 seconds)
2025-08-07 12:08:55,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:08:57,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 627.19366 ± 273.717
2025-08-07 12:08:57,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [853.9031, 747.1569, 315.91507, 221.3053, 329.62396, 1033.396, 694.9249, 492.4403, 577.8561, 1005.41486]
2025-08-07 12:08:57,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [332.0, 230.0, 129.0, 100.0, 130.0, 372.0, 227.0, 189.0, 197.0, 334.0]
2025-08-07 12:08:57,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 1 minute, 50 seconds)
2025-08-07 12:10:36,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:39,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 742.33630 ± 326.211
2025-08-07 12:10:39,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [677.9744, 246.9638, 459.5262, 548.0803, 901.78613, 1535.9822, 858.39575, 597.93146, 776.1813, 820.5422]
2025-08-07 12:10:39,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [241.0, 112.0, 176.0, 203.0, 347.0, 559.0, 298.0, 216.0, 343.0, 310.0]
2025-08-07 12:10:39,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 24 seconds)
2025-08-07 12:12:21,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:27,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1243.44653 ± 673.478
2025-08-07 12:12:27,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2489.9675, 934.1149, 1203.0298, 1233.9691, 538.51556, 1002.7601, 1025.5763, 2559.8862, 659.6394, 787.0054]
2025-08-07 12:12:27,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [881.0, 349.0, 398.0, 436.0, 206.0, 384.0, 367.0, 1000.0, 247.0, 287.0]
2025-08-07 12:12:27,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 58 minutes, 47 seconds)
2025-08-07 12:14:09,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:14,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1060.16284 ± 388.595
2025-08-07 12:14:14,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [370.3185, 1904.3784, 1021.05743, 681.1836, 1103.0693, 1056.1305, 1116.2933, 982.21796, 1443.1251, 923.85535]
2025-08-07 12:14:14,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 763.0, 351.0, 267.0, 435.0, 306.0, 358.0, 354.0, 544.0, 353.0]
2025-08-07 12:14:14,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 57 minutes, 35 seconds)
2025-08-07 12:15:58,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:03,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1279.67139 ± 407.099
2025-08-07 12:16:03,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1421.8282, 1425.278, 1844.092, 904.09424, 1545.5819, 2009.4116, 915.74054, 926.13025, 975.57214, 828.98566]
2025-08-07 12:16:03,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [498.0, 472.0, 617.0, 334.0, 509.0, 675.0, 332.0, 318.0, 336.0, 301.0]
2025-08-07 12:16:03,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 56 minutes, 12 seconds)
2025-08-07 12:17:39,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:17:44,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1408.66443 ± 625.449
2025-08-07 12:17:44,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [486.52795, 783.6798, 1823.4564, 1110.7281, 1128.9521, 1310.7726, 2910.5557, 1582.1929, 1357.1567, 1592.6235]
2025-08-07 12:17:44,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [193.0, 286.0, 638.0, 408.0, 407.0, 458.0, 918.0, 602.0, 459.0, 559.0]
2025-08-07 12:17:44,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (1408.66) for latency MM1Queue_a033_s075
2025-08-07 12:17:44,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 54 minutes, 27 seconds)
2025-08-07 12:19:27,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:19:34,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1760.82654 ± 989.831
2025-08-07 12:19:34,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [839.7457, 3173.871, 691.7649, 2470.041, 1084.8873, 1209.7319, 1293.2281, 754.2325, 3129.175, 2961.5884]
2025-08-07 12:19:34,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [287.0, 1000.0, 255.0, 789.0, 371.0, 388.0, 501.0, 266.0, 1000.0, 981.0]
2025-08-07 12:19:34,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (1760.83) for latency MM1Queue_a033_s075
2025-08-07 12:19:34,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 53 minutes, 30 seconds)
2025-08-07 12:21:16,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:21:26,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2415.46729 ± 692.907
2025-08-07 12:21:26,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1647.997, 3518.0315, 1723.279, 2901.8674, 1881.9191, 3021.442, 2977.6943, 1810.9253, 1652.1392, 3019.3794]
2025-08-07 12:21:26,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [551.0, 1000.0, 556.0, 1000.0, 667.0, 1000.0, 1000.0, 668.0, 480.0, 879.0]
2025-08-07 12:21:26,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (2415.47) for latency MM1Queue_a033_s075
2025-08-07 12:21:26,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 52 minutes, 6 seconds)
2025-08-07 12:23:06,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:23:12,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1514.88965 ± 898.034
2025-08-07 12:23:12,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2712.3577, 2796.904, 1574.8074, 904.7227, 1059.7759, 384.08432, 2824.0496, 1351.5996, 1141.7883, 398.8068]
2025-08-07 12:23:12,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 550.0, 339.0, 388.0, 151.0, 889.0, 485.0, 503.0, 155.0]
2025-08-07 12:23:12,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 50 minutes, 15 seconds)
2025-08-07 12:24:57,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:25:03,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1544.26526 ± 1006.684
2025-08-07 12:25:03,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2689.831, 3121.4463, 449.8486, 979.1563, 265.02954, 1613.4321, 1481.9727, 456.48795, 1435.1495, 2950.2976]
2025-08-07 12:25:03,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [923.0, 1000.0, 159.0, 370.0, 114.0, 549.0, 508.0, 179.0, 466.0, 1000.0]
2025-08-07 12:25:03,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 48 minutes, 36 seconds)
2025-08-07 12:26:39,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:26:50,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2512.49341 ± 861.326
2025-08-07 12:26:50,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2853.2878, 2983.0786, 183.69913, 2868.5745, 2838.414, 1680.196, 3055.2632, 2915.835, 2787.9158, 2958.6707]
2025-08-07 12:26:50,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 93.0, 1000.0, 1000.0, 610.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:26:50,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (2512.49) for latency MM1Queue_a033_s075
2025-08-07 12:26:50,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 47 minutes, 16 seconds)
2025-08-07 12:28:33,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:28:45,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3003.16064 ± 861.034
2025-08-07 12:28:45,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3293.6997, 3282.1362, 3343.3538, 437.88376, 3194.1284, 3502.837, 3210.772, 3391.393, 3128.1326, 3247.269]
2025-08-07 12:28:45,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 169.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:28:45,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (3003.16) for latency MM1Queue_a033_s075
2025-08-07 12:28:45,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 45 minutes, 50 seconds)
2025-08-07 12:30:29,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:30:41,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3018.92505 ± 405.689
2025-08-07 12:30:41,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3098.9998, 3286.5195, 3117.167, 3131.8108, 3117.1108, 3177.6367, 3176.3828, 3106.614, 3165.0454, 1811.9636]
2025-08-07 12:30:41,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 516.0]
2025-08-07 12:30:41,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (3018.93) for latency MM1Queue_a033_s075
2025-08-07 12:30:41,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 44 minutes, 22 seconds)
2025-08-07 12:32:14,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:32:25,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2943.48682 ± 849.878
2025-08-07 12:32:25,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3300.3855, 3332.1736, 827.4882, 3342.8574, 3300.3335, 3425.768, 3390.346, 1769.9376, 3405.0715, 3340.5056]
2025-08-07 12:32:25,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 247.0, 1000.0, 1000.0, 1000.0, 1000.0, 512.0, 1000.0, 1000.0]
2025-08-07 12:32:25,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 42 minutes, 22 seconds)
2025-08-07 12:34:14,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:34:25,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3071.57837 ± 887.579
2025-08-07 12:34:25,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3310.0823, 446.1138, 3594.163, 3384.9268, 3272.146, 3276.7175, 3245.7632, 3221.7144, 3687.656, 3276.5002]
2025-08-07 12:34:25,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 160.0, 1000.0, 1000.0, 977.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:34:25,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (3071.58) for latency MM1Queue_a033_s075
2025-08-07 12:34:25,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 41 minutes, 10 seconds)
2025-08-07 12:36:01,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:36:10,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2827.28906 ± 675.061
2025-08-07 12:36:10,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2251.044, 3493.8066, 3460.292, 2281.3674, 1770.5883, 2349.0093, 3604.3555, 3451.6528, 2196.2646, 3414.512]
2025-08-07 12:36:10,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [643.0, 984.0, 1000.0, 617.0, 500.0, 635.0, 1000.0, 1000.0, 608.0, 913.0]
2025-08-07 12:36:10,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 39 minutes, 14 seconds)
2025-08-07 12:37:57,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:38:07,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2972.86670 ± 532.588
2025-08-07 12:38:07,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3420.3052, 2208.503, 3413.5142, 2438.0708, 3048.4841, 3423.4712, 2533.175, 3505.5713, 2211.3906, 3526.181]
2025-08-07 12:38:07,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 585.0, 1000.0, 693.0, 821.0, 1000.0, 686.0, 1000.0, 579.0, 1000.0]
2025-08-07 12:38:07,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 37 minutes, 29 seconds)
2025-08-07 12:39:44,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:39:55,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2692.32715 ± 811.081
2025-08-07 12:39:55,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2998.2278, 2709.8167, 3051.817, 281.50552, 2911.5923, 3094.2627, 2851.2383, 3085.6323, 2965.921, 2973.2563]
2025-08-07 12:39:55,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 120.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:39:55,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 35 minutes, 6 seconds)
2025-08-07 12:41:40,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:41:52,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3342.17969 ± 121.367
2025-08-07 12:41:52,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3313.3228, 3226.3103, 3219.7258, 3272.4026, 3547.8618, 3459.4604, 3444.7156, 3365.0847, 3145.2673, 3427.6458]
2025-08-07 12:41:52,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:41:52,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (3342.18) for latency MM1Queue_a033_s075
2025-08-07 12:41:52,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes)
2025-08-07 12:43:27,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:43:39,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2735.48389 ± 750.187
2025-08-07 12:43:39,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3084.827, 3020.3855, 3010.3457, 2919.775, 2981.418, 2802.3933, 3159.5059, 2925.4968, 2948.7566, 501.9359]
2025-08-07 12:43:39,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 941.0, 1000.0, 1000.0, 1000.0, 1000.0, 173.0]
2025-08-07 12:43:39,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 31 minutes, 23 seconds)
2025-08-07 12:45:17,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:45:28,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2806.47681 ± 837.695
2025-08-07 12:45:28,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [303.55447, 3076.6348, 3045.5535, 3210.2031, 2969.834, 2959.5222, 3170.1267, 3114.4668, 3131.955, 3082.9163]
2025-08-07 12:45:28,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [128.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:45:28,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 29 minutes, 45 seconds)
2025-08-07 12:47:07,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:47:16,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2939.24121 ± 956.860
2025-08-07 12:47:16,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3750.4285, 3659.253, 3033.7156, 3155.9626, 3705.4097, 3777.4678, 470.8855, 2443.1416, 2996.9133, 2399.2341]
2025-08-07 12:47:16,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 857.0, 864.0, 1000.0, 1000.0, 168.0, 680.0, 862.0, 675.0]
2025-08-07 12:47:16,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 27 minutes, 28 seconds)
2025-08-07 12:48:58,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:49:06,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2218.33936 ± 1548.031
2025-08-07 12:49:06,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3595.6238, 329.37564, 3702.5845, 3257.819, 541.71387, 3533.064, 3507.2388, 255.40477, 3260.8157, 199.75264]
2025-08-07 12:49:06,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 126.0, 1000.0, 1000.0, 182.0, 1000.0, 1000.0, 105.0, 1000.0, 91.0]
2025-08-07 12:49:06,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 42 seconds)
2025-08-07 12:50:48,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:51:01,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3620.41089 ± 122.269
2025-08-07 12:51:01,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3416.962, 3652.2402, 3755.2925, 3641.9478, 3658.3687, 3530.8635, 3453.0178, 3737.4988, 3555.6917, 3802.2258]
2025-08-07 12:51:01,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:51:01,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (3620.41) for latency MM1Queue_a033_s075
2025-08-07 12:51:01,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 46 seconds)
2025-08-07 12:52:41,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:52:50,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2642.86743 ± 760.494
2025-08-07 12:52:50,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2030.449, 1916.7136, 3710.742, 2206.4993, 2020.2151, 2008.1349, 3696.01, 2018.1526, 3209.9575, 3611.7983]
2025-08-07 12:52:50,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [573.0, 546.0, 1000.0, 620.0, 572.0, 557.0, 1000.0, 563.0, 879.0, 938.0]
2025-08-07 12:52:50,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 3 seconds)
2025-08-07 12:54:31,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:54:42,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3145.36060 ± 926.903
2025-08-07 12:54:42,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3551.5498, 3553.2158, 3492.6436, 3454.0764, 3514.0461, 3449.7505, 380.00055, 3367.3489, 3205.727, 3485.2458]
2025-08-07 12:54:42,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 140.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:54:42,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 17 seconds)
2025-08-07 12:56:19,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:56:31,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3060.33716 ± 967.724
2025-08-07 12:56:31,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3353.535, 3450.6272, 3561.6292, 3663.2888, 3364.2686, 3233.0898, 3408.5916, 195.92796, 3060.2305, 3312.184]
2025-08-07 12:56:31,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 94.0, 1000.0, 1000.0]
2025-08-07 12:56:31,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 28 seconds)
2025-08-07 12:58:14,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:58:26,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3204.12891 ± 55.810
2025-08-07 12:58:26,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3210.8186, 3202.4226, 3143.1853, 3220.4836, 3294.5562, 3098.317, 3213.4773, 3219.3774, 3160.3215, 3278.33]
2025-08-07 12:58:26,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:58:26,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 48 seconds)
2025-08-07 13:00:07,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:00:17,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2985.15137 ± 1418.531
2025-08-07 13:00:17,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3626.1704, 3638.2446, 156.04684, 3693.6887, 3794.1675, 3701.9426, 3693.6213, 144.57758, 3619.2913, 3783.7615]
2025-08-07 13:00:17,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 287.0, 1000.0, 1000.0, 1000.0, 1000.0, 80.0, 1000.0, 1000.0]
2025-08-07 13:00:17,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 50 seconds)
2025-08-07 13:01:57,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:02:10,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3416.23755 ± 95.914
2025-08-07 13:02:10,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3389.2307, 3451.1274, 3298.1892, 3390.1619, 3448.949, 3327.6694, 3406.2415, 3615.535, 3533.1653, 3302.1042]
2025-08-07 13:02:10,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:02:10,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 3 seconds)
2025-08-07 13:03:50,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:04:02,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3570.65308 ± 78.837
2025-08-07 13:04:02,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3517.6108, 3618.9695, 3680.2898, 3599.9556, 3642.324, 3492.161, 3483.335, 3509.4854, 3478.1487, 3684.2546]
2025-08-07 13:04:02,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:04:02,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 12 seconds)
2025-08-07 13:05:43,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:05:53,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2918.83252 ± 1193.247
2025-08-07 13:05:53,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3442.9294, 3574.4163, 3508.2114, 166.76874, 3321.4478, 3502.4797, 961.52313, 3540.5706, 3536.0315, 3633.947]
2025-08-07 13:05:53,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 83.0, 1000.0, 1000.0, 276.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:05:53,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 22 seconds)
2025-08-07 13:07:34,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:07:46,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3309.16724 ± 41.270
2025-08-07 13:07:46,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3251.6086, 3381.381, 3368.02, 3281.484, 3249.771, 3317.8735, 3321.889, 3304.304, 3323.1682, 3292.1738]
2025-08-07 13:07:46,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:07:46,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 27 seconds)
2025-08-07 13:09:28,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:09:41,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3582.62305 ± 58.555
2025-08-07 13:09:41,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3569.3032, 3559.2407, 3496.3906, 3579.836, 3701.1829, 3591.4814, 3522.5955, 3671.282, 3572.9663, 3561.9504]
2025-08-07 13:09:41,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:09:41,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 38 seconds)
2025-08-07 13:11:22,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:11:34,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3390.64136 ± 80.123
2025-08-07 13:11:34,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3353.4348, 3361.9827, 3350.3413, 3365.2122, 3512.519, 3430.3225, 3265.2922, 3522.0522, 3442.1355, 3303.1238]
2025-08-07 13:11:34,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:11:34,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 45 seconds)
2025-08-07 13:13:14,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:13:26,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3548.14697 ± 52.019
2025-08-07 13:13:26,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3560.1729, 3653.207, 3455.795, 3563.6162, 3474.9587, 3542.2244, 3546.2493, 3584.8376, 3563.6057, 3536.8035]
2025-08-07 13:13:26,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:13:26,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 52 seconds)
2025-08-07 13:15:07,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:15:19,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3662.55127 ± 71.359
2025-08-07 13:15:19,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3636.8887, 3603.6567, 3790.3062, 3660.494, 3721.285, 3552.0845, 3716.0444, 3624.7354, 3587.5105, 3732.506]
2025-08-07 13:15:19,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 13:15:19,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (3662.55) for latency MM1Queue_a033_s075
2025-08-07 13:15:19,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1251 [DEBUG]: Training session finished
