2025-08-07 10:07:46,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc20-hopper/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:07:46,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc20-hopper/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:07:46,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1491ac2cbb50>}
2025-08-07 10:07:46,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 10:07:46,778 baseline-bpql-noiseperc20-hopper:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:07:46,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1133 [INFO]: Creating new trainer
2025-08-07 10:07:46,795 baseline-bpql-noiseperc20-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=59, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 10:07:46,795 baseline-bpql-noiseperc20-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:07:47,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 10:07:47,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 10:09:15,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:09:15,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 40.94476 ± 17.918
2025-08-07 10:09:15,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [51.115562, 12.095606, 55.719337, 13.404583, 52.126106, 53.225506, 15.610562, 49.523083, 52.647373, 53.97986]
2025-08-07 10:09:15,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [47.0, 16.0, 37.0, 18.0, 43.0, 47.0, 16.0, 37.0, 35.0, 37.0]
2025-08-07 10:09:15,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (40.94) for latency MM1Queue_a033_s075
2025-08-07 10:09:15,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 25 minutes, 19 seconds)
2025-08-07 10:10:50,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:10:50,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 69.38400 ± 30.075
2025-08-07 10:10:50,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [36.79415, 103.26638, 112.56077, 64.20079, 13.113881, 47.577393, 95.3531, 72.013985, 57.615597, 91.34391]
2025-08-07 10:10:50,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 57.0, 62.0, 40.0, 16.0, 37.0, 54.0, 51.0, 44.0, 58.0]
2025-08-07 10:10:50,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (69.38) for latency MM1Queue_a033_s075
2025-08-07 10:10:50,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 29 minutes, 39 seconds)
2025-08-07 10:12:26,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:12:27,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 83.40466 ± 55.609
2025-08-07 10:12:27,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [168.54185, 88.18075, 186.0511, 130.15813, 58.832436, 13.969091, 42.54972, 37.332264, 52.037395, 56.39378]
2025-08-07 10:12:27,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 58.0, 96.0, 92.0, 46.0, 16.0, 35.0, 31.0, 39.0, 40.0]
2025-08-07 10:12:27,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (83.40) for latency MM1Queue_a033_s075
2025-08-07 10:12:27,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 30 minutes, 33 seconds)
2025-08-07 10:14:02,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:14:03,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 78.94025 ± 51.086
2025-08-07 10:14:03,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [87.98735, 84.66381, 208.89842, 83.86203, 17.897215, 11.69657, 76.18254, 51.38911, 80.62113, 86.204315]
2025-08-07 10:14:03,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 54.0, 99.0, 54.0, 17.0, 13.0, 56.0, 38.0, 53.0, 65.0]
2025-08-07 10:14:03,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 30 minutes, 9 seconds)
2025-08-07 10:15:39,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:39,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 76.12990 ± 54.309
2025-08-07 10:15:39,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [40.141624, 51.023693, 112.12147, 46.127243, 42.550934, 139.56735, 17.759596, 45.17992, 65.243164, 201.58395]
2025-08-07 10:15:39,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 43.0, 94.0, 48.0, 32.0, 87.0, 26.0, 36.0, 46.0, 116.0]
2025-08-07 10:15:39,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 29 minutes, 28 seconds)
2025-08-07 10:17:15,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:15,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 56.89339 ± 41.308
2025-08-07 10:17:15,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [120.31638, 75.65744, 66.6461, 23.193975, 10.009917, 10.804599, 49.104195, 6.1619515, 94.75226, 112.28714]
2025-08-07 10:17:15,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 71.0, 71.0, 33.0, 16.0, 13.0, 42.0, 12.0, 61.0, 86.0]
2025-08-07 10:17:15,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 30 minutes, 24 seconds)
2025-08-07 10:18:51,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:18:52,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 90.00864 ± 67.570
2025-08-07 10:18:52,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [60.27464, 55.990532, 68.56687, 16.254818, 47.397224, 235.88829, 163.42453, 13.50666, 148.1173, 90.665504]
2025-08-07 10:18:52,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 64.0, 65.0, 24.0, 54.0, 162.0, 93.0, 17.0, 84.0, 68.0]
2025-08-07 10:18:52,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (90.01) for latency MM1Queue_a033_s075
2025-08-07 10:18:52,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 29 minutes, 13 seconds)
2025-08-07 10:20:27,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:28,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 136.78439 ± 78.926
2025-08-07 10:20:28,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [94.658585, 99.34993, 28.484413, 65.58047, 179.51695, 101.80893, 120.26499, 326.33847, 175.01791, 176.82323]
2025-08-07 10:20:28,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 87.0, 26.0, 57.0, 103.0, 60.0, 67.0, 181.0, 119.0, 102.0]
2025-08-07 10:20:28,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (136.78) for latency MM1Queue_a033_s075
2025-08-07 10:20:28,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 27 minutes, 35 seconds)
2025-08-07 10:22:04,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:22:05,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 100.61700 ± 94.569
2025-08-07 10:22:05,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [76.59306, 272.85303, 13.188235, 215.48512, 203.173, 51.093327, 10.237233, 10.531673, 139.49901, 13.516322]
2025-08-07 10:22:05,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 159.0, 17.0, 107.0, 104.0, 40.0, 12.0, 17.0, 102.0, 17.0]
2025-08-07 10:22:05,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 26 minutes, 12 seconds)
2025-08-07 10:23:40,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:23:41,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 112.79472 ± 79.509
2025-08-07 10:23:41,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [211.84914, 11.189757, 95.35464, 156.78258, 70.349, 237.73335, 176.73589, 19.576002, 12.254105, 136.12292]
2025-08-07 10:23:41,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 18.0, 52.0, 80.0, 41.0, 140.0, 91.0, 20.0, 16.0, 91.0]
2025-08-07 10:23:41,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 24 minutes, 29 seconds)
2025-08-07 10:25:18,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:25:19,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 121.85254 ± 89.050
2025-08-07 10:25:19,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [331.17117, 186.29889, 71.89255, 177.85872, 121.94475, 8.514327, 92.63036, 106.191826, 107.64928, 14.373463]
2025-08-07 10:25:19,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [146.0, 88.0, 51.0, 101.0, 68.0, 15.0, 70.0, 67.0, 89.0, 15.0]
2025-08-07 10:25:19,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 23 minutes, 23 seconds)
2025-08-07 10:26:54,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:26:55,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 149.63593 ± 134.161
2025-08-07 10:26:55,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [96.58023, 357.44583, 71.294, 253.76056, 204.7243, 98.00072, 12.287329, 10.456656, 381.36957, 10.440042]
2025-08-07 10:26:55,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 159.0, 50.0, 112.0, 137.0, 85.0, 14.0, 15.0, 197.0, 16.0]
2025-08-07 10:26:55,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (149.64) for latency MM1Queue_a033_s075
2025-08-07 10:26:55,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 21 minutes, 40 seconds)
2025-08-07 10:28:31,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:32,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 129.66278 ± 92.384
2025-08-07 10:28:32,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [327.10132, 168.66531, 119.003784, 84.76061, 12.662247, 58.848133, 177.77419, 8.913692, 125.011536, 213.88707]
2025-08-07 10:28:32,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 111.0, 94.0, 62.0, 15.0, 51.0, 120.0, 12.0, 78.0, 134.0]
2025-08-07 10:28:32,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 20 minutes, 27 seconds)
2025-08-07 10:30:08,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:09,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 179.60092 ± 128.439
2025-08-07 10:30:09,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [220.20407, 52.78889, 374.502, 365.74692, 15.324448, 321.24216, 90.196495, 193.54427, 79.283325, 83.176704]
2025-08-07 10:30:09,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 49.0, 198.0, 202.0, 16.0, 170.0, 56.0, 140.0, 74.0, 48.0]
2025-08-07 10:30:09,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (179.60) for latency MM1Queue_a033_s075
2025-08-07 10:30:09,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 18 minutes, 51 seconds)
2025-08-07 10:31:46,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:31:47,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 151.36655 ± 53.175
2025-08-07 10:31:47,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [94.21849, 130.87755, 205.33813, 258.13226, 156.0437, 95.02275, 131.27881, 208.30835, 140.71828, 93.72719]
2025-08-07 10:31:47,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 98.0, 121.0, 141.0, 109.0, 60.0, 105.0, 125.0, 111.0, 53.0]
2025-08-07 10:31:47,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 17 minutes, 47 seconds)
2025-08-07 10:33:23,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:24,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 67.12770 ± 58.003
2025-08-07 10:33:24,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [9.423665, 78.715324, 11.488437, 69.55133, 8.688337, 199.54037, 118.75579, 82.48939, 11.945973, 80.67834]
2025-08-07 10:33:24,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 62.0, 15.0, 45.0, 11.0, 132.0, 68.0, 53.0, 16.0, 77.0]
2025-08-07 10:33:24,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 15 minutes, 54 seconds)
2025-08-07 10:35:00,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:01,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 136.73807 ± 87.002
2025-08-07 10:35:01,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [155.59058, 163.43953, 21.530872, 199.61038, 170.3385, 18.792778, 331.2656, 79.29551, 122.791565, 104.7255]
2025-08-07 10:35:01,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 99.0, 21.0, 144.0, 93.0, 18.0, 165.0, 54.0, 83.0, 65.0]
2025-08-07 10:35:01,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 14 minutes, 25 seconds)
2025-08-07 10:36:37,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:36:38,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 142.94852 ± 93.753
2025-08-07 10:36:38,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [201.93713, 109.16677, 250.3166, 11.890303, 83.7567, 233.29453, 168.76576, 68.69228, 15.554729, 286.11038]
2025-08-07 10:36:38,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [138.0, 79.0, 146.0, 17.0, 55.0, 111.0, 112.0, 43.0, 16.0, 129.0]
2025-08-07 10:36:38,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 12 minutes, 43 seconds)
2025-08-07 10:38:14,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:38:15,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 146.91794 ± 86.305
2025-08-07 10:38:15,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [10.521615, 107.27705, 147.82376, 155.78798, 76.60578, 103.40438, 353.2139, 200.21927, 187.03441, 127.29124]
2025-08-07 10:38:15,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 64.0, 87.0, 100.0, 53.0, 72.0, 258.0, 119.0, 104.0, 75.0]
2025-08-07 10:38:15,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 11 minutes, 16 seconds)
2025-08-07 10:39:52,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:53,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 114.52177 ± 100.073
2025-08-07 10:39:53,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [158.48535, 12.797967, 99.43143, 100.53447, 10.523166, 118.067505, 164.36404, 103.1297, 366.3153, 11.568709]
2025-08-07 10:39:53,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 17.0, 58.0, 88.0, 13.0, 71.0, 89.0, 82.0, 245.0, 16.0]
2025-08-07 10:39:53,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 9 minutes, 27 seconds)
2025-08-07 10:41:28,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:41:29,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 105.50481 ± 113.782
2025-08-07 10:41:29,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [62.19927, 15.895, 12.322862, 96.12722, 174.63576, 364.64032, 11.08692, 10.729428, 63.38999, 244.02136]
2025-08-07 10:41:29,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [37.0, 17.0, 15.0, 67.0, 95.0, 149.0, 14.0, 16.0, 40.0, 102.0]
2025-08-07 10:41:29,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 7 minutes, 43 seconds)
2025-08-07 10:43:06,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:43:07,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 140.82874 ± 83.247
2025-08-07 10:43:07,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [225.39182, 75.00471, 197.6977, 82.96802, 312.10162, 136.71187, 9.811508, 114.284805, 84.49557, 169.81976]
2025-08-07 10:43:07,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 49.0, 116.0, 60.0, 129.0, 83.0, 13.0, 73.0, 55.0, 84.0]
2025-08-07 10:43:07,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 6 minutes, 19 seconds)
2025-08-07 10:44:42,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:43,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 99.20499 ± 88.956
2025-08-07 10:44:43,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [16.56067, 87.0251, 11.470584, 9.207654, 201.7421, 15.373917, 120.72709, 288.34766, 92.119675, 149.4755]
2025-08-07 10:44:43,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 50.0, 14.0, 15.0, 154.0, 16.0, 90.0, 162.0, 66.0, 79.0]
2025-08-07 10:44:43,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 4 minutes, 31 seconds)
2025-08-07 10:46:19,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:20,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 157.23016 ± 75.217
2025-08-07 10:46:20,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [207.34068, 77.132675, 122.879364, 165.90382, 149.48982, 118.925446, 10.734035, 263.412, 260.22943, 196.25424]
2025-08-07 10:46:20,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 54.0, 67.0, 99.0, 94.0, 88.0, 13.0, 175.0, 141.0, 114.0]
2025-08-07 10:46:20,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 2 minutes, 56 seconds)
2025-08-07 10:47:57,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:58,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 130.11847 ± 90.681
2025-08-07 10:47:58,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [13.721016, 9.132481, 222.6256, 214.43726, 103.39211, 219.54167, 84.86927, 10.326513, 212.2854, 210.85342]
2025-08-07 10:47:58,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 13.0, 126.0, 106.0, 67.0, 111.0, 62.0, 12.0, 120.0, 113.0]
2025-08-07 10:47:58,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 1 minute, 20 seconds)
2025-08-07 10:49:35,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:35,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 104.76167 ± 69.219
2025-08-07 10:49:35,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [126.89112, 86.16472, 12.160524, 143.44986, 72.247826, 114.7666, 258.73364, 10.923486, 70.807175, 151.47177]
2025-08-07 10:49:35,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 57.0, 17.0, 77.0, 74.0, 62.0, 136.0, 16.0, 47.0, 80.0]
2025-08-07 10:49:35,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 59 minutes, 59 seconds)
2025-08-07 10:51:13,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:51:13,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 115.54478 ± 71.805
2025-08-07 10:51:13,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [135.52992, 146.9644, 8.493611, 62.783665, 67.739334, 19.006193, 171.02165, 189.14638, 113.41978, 241.34293]
2025-08-07 10:51:13,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 96.0, 12.0, 49.0, 42.0, 47.0, 87.0, 109.0, 88.0, 134.0]
2025-08-07 10:51:13,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 58 minutes, 27 seconds)
2025-08-07 10:52:49,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:52:50,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 120.87504 ± 132.430
2025-08-07 10:52:50,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [141.91, 187.60962, 476.84644, 10.175909, 150.12364, 61.63408, 10.80758, 15.689485, 69.41657, 84.53701]
2025-08-07 10:52:50,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 108.0, 199.0, 15.0, 78.0, 47.0, 14.0, 17.0, 42.0, 56.0]
2025-08-07 10:52:50,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 56 minutes, 49 seconds)
2025-08-07 10:54:26,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:26,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 125.57336 ± 98.751
2025-08-07 10:54:26,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [147.05064, 13.079648, 278.8004, 10.028309, 158.58322, 12.417847, 140.48772, 298.67636, 66.64577, 129.96358]
2025-08-07 10:54:26,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 16.0, 142.0, 14.0, 98.0, 17.0, 73.0, 179.0, 39.0, 88.0]
2025-08-07 10:54:26,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 55 minutes, 1 second)
2025-08-07 10:56:04,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:05,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 105.41679 ± 93.737
2025-08-07 10:56:05,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [13.623045, 98.39714, 282.27386, 97.68268, 12.275957, 70.8105, 257.02835, 47.583607, 162.04222, 12.45051]
2025-08-07 10:56:05,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 72.0, 144.0, 57.0, 16.0, 60.0, 164.0, 45.0, 97.0, 17.0]
2025-08-07 10:56:05,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 53 minutes, 38 seconds)
2025-08-07 10:57:41,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:57:42,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 152.44104 ± 148.263
2025-08-07 10:57:42,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [10.1221285, 10.809245, 115.06832, 492.7356, 150.87335, 8.541215, 143.23964, 128.18962, 351.01974, 113.81156]
2025-08-07 10:57:42,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 17.0, 71.0, 221.0, 86.0, 15.0, 97.0, 73.0, 193.0, 94.0]
2025-08-07 10:57:42,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 51 minutes, 49 seconds)
2025-08-07 10:59:18,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:59:19,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 182.32646 ± 133.941
2025-08-07 10:59:19,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [70.04155, 162.5694, 180.19548, 103.21721, 131.34483, 506.104, 105.640526, 12.997416, 295.13593, 256.01825]
2025-08-07 10:59:19,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [47.0, 116.0, 115.0, 68.0, 94.0, 244.0, 61.0, 17.0, 144.0, 133.0]
2025-08-07 10:59:19,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (182.33) for latency MM1Queue_a033_s075
2025-08-07 10:59:19,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 50 minutes, 4 seconds)
2025-08-07 11:00:55,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:57,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 175.32382 ± 144.076
2025-08-07 11:00:57,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [109.71668, 458.95383, 129.09114, 14.635676, 13.953281, 144.09702, 310.4875, 306.40332, 252.97427, 12.925657]
2025-08-07 11:00:57,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [68.0, 210.0, 72.0, 17.0, 15.0, 95.0, 148.0, 153.0, 170.0, 17.0]
2025-08-07 11:00:57,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 48 minutes, 43 seconds)
2025-08-07 11:02:33,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:02:34,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 74.92814 ± 87.592
2025-08-07 11:02:34,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [16.832544, 74.736404, 119.980774, 12.670547, 16.111404, 15.213315, 10.781812, 15.690696, 198.84944, 268.4144]
2025-08-07 11:02:34,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 44.0, 78.0, 16.0, 16.0, 19.0, 14.0, 16.0, 106.0, 152.0]
2025-08-07 11:02:34,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 47 minutes, 15 seconds)
2025-08-07 11:04:10,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:04:11,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 151.76328 ± 99.064
2025-08-07 11:04:11,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [209.28609, 167.6425, 74.23026, 294.34686, 9.185447, 288.7441, 15.760662, 172.50694, 72.041374, 213.88853]
2025-08-07 11:04:11,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 121.0, 50.0, 128.0, 11.0, 158.0, 16.0, 89.0, 42.0, 154.0]
2025-08-07 11:04:11,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 45 minutes, 14 seconds)
2025-08-07 11:05:48,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:49,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 124.73887 ± 89.227
2025-08-07 11:05:49,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [7.7892137, 237.64996, 102.410126, 8.873229, 228.75435, 195.4657, 11.52331, 123.33804, 105.56881, 226.01595]
2025-08-07 11:05:49,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 132.0, 70.0, 11.0, 126.0, 125.0, 16.0, 70.0, 67.0, 119.0]
2025-08-07 11:05:49,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 43 minutes, 59 seconds)
2025-08-07 11:07:25,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:26,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 108.32465 ± 64.242
2025-08-07 11:07:26,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [101.4059, 161.62764, 11.917046, 61.238598, 88.340256, 125.06185, 191.23495, 210.60004, 14.634556, 117.185646]
2025-08-07 11:07:26,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 115.0, 15.0, 38.0, 62.0, 66.0, 127.0, 117.0, 17.0, 96.0]
2025-08-07 11:07:26,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 42 minutes, 15 seconds)
2025-08-07 11:09:03,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:09:04,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 114.67682 ± 77.062
2025-08-07 11:09:04,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [138.72879, 13.855984, 157.18105, 156.50891, 245.5195, 168.86977, 11.200883, 12.839603, 165.01862, 77.04499]
2025-08-07 11:09:04,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 15.0, 103.0, 97.0, 129.0, 110.0, 16.0, 16.0, 110.0, 48.0]
2025-08-07 11:09:04,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 40 minutes, 45 seconds)
2025-08-07 11:10:42,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:43,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 134.07021 ± 103.485
2025-08-07 11:10:43,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [111.4535, 10.2533, 21.420736, 7.4463983, 251.30695, 97.172554, 88.13361, 297.5452, 206.24428, 249.72548]
2025-08-07 11:10:43,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 15.0, 40.0, 10.0, 131.0, 66.0, 69.0, 159.0, 163.0, 157.0]
2025-08-07 11:10:43,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 39 minutes, 21 seconds)
2025-08-07 11:12:18,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:20,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 207.43315 ± 217.447
2025-08-07 11:12:20,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [10.883345, 300.7379, 12.13322, 325.7871, 17.805067, 614.0356, 74.29761, 10.3923435, 163.77165, 544.4877]
2025-08-07 11:12:20,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 168.0, 14.0, 190.0, 17.0, 226.0, 54.0, 12.0, 142.0, 217.0]
2025-08-07 11:12:20,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (207.43) for latency MM1Queue_a033_s075
2025-08-07 11:12:20,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 37 minutes, 45 seconds)
2025-08-07 11:13:56,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:13:57,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 227.13737 ± 177.500
2025-08-07 11:13:57,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [200.43825, 159.64403, 13.758381, 468.1301, 185.67953, 10.397754, 11.487919, 313.48273, 439.39297, 468.96225]
2025-08-07 11:13:57,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 100.0, 18.0, 185.0, 120.0, 14.0, 13.0, 150.0, 161.0, 185.0]
2025-08-07 11:13:57,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (227.14) for latency MM1Queue_a033_s075
2025-08-07 11:13:57,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 35 minutes, 55 seconds)
2025-08-07 11:15:33,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:34,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 153.60632 ± 156.272
2025-08-07 11:15:34,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [158.65718, 100.65721, 10.608709, 528.5059, 265.74228, 80.16454, 14.008505, 285.79788, 12.212601, 79.70831]
2025-08-07 11:15:34,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 94.0, 13.0, 195.0, 131.0, 62.0, 16.0, 139.0, 15.0, 56.0]
2025-08-07 11:15:34,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 34 minutes, 16 seconds)
2025-08-07 11:17:10,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:11,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 172.41373 ± 142.182
2025-08-07 11:17:11,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [74.290344, 377.31238, 407.60477, 15.623541, 102.0908, 236.43762, 278.2466, 12.94367, 17.246153, 202.34134]
2025-08-07 11:17:11,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [44.0, 154.0, 173.0, 16.0, 72.0, 123.0, 120.0, 16.0, 18.0, 101.0]
2025-08-07 11:17:11,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 32 minutes, 35 seconds)
2025-08-07 11:18:48,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:18:49,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 172.67323 ± 109.939
2025-08-07 11:18:49,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [9.025271, 90.253654, 135.53334, 349.22385, 302.32974, 214.02315, 190.98851, 159.75125, 266.81235, 8.791119]
2025-08-07 11:18:49,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 72.0, 77.0, 189.0, 209.0, 109.0, 92.0, 85.0, 139.0, 11.0]
2025-08-07 11:18:49,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 30 minutes, 47 seconds)
2025-08-07 11:20:26,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:27,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 151.59911 ± 107.215
2025-08-07 11:20:27,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [285.93442, 294.1285, 116.41555, 101.358475, 77.39785, 315.90677, 9.736533, 149.99913, 155.58011, 9.533819]
2025-08-07 11:20:27,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [153.0, 158.0, 74.0, 56.0, 53.0, 140.0, 13.0, 74.0, 116.0, 11.0]
2025-08-07 11:20:27,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 29 minutes, 16 seconds)
2025-08-07 11:22:04,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:05,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 319.39618 ± 198.929
2025-08-07 11:22:05,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [76.93234, 304.585, 509.36063, 99.02026, 308.38214, 397.66174, 221.85347, 329.26062, 780.65027, 166.25536]
2025-08-07 11:22:05,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 144.0, 198.0, 55.0, 151.0, 190.0, 122.0, 168.0, 288.0, 112.0]
2025-08-07 11:22:05,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (319.40) for latency MM1Queue_a033_s075
2025-08-07 11:22:05,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 27 minutes, 55 seconds)
2025-08-07 11:23:42,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:23:43,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 140.65164 ± 82.363
2025-08-07 11:23:43,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [137.61186, 265.41055, 164.82465, 11.083333, 108.90222, 193.40526, 84.179184, 220.4504, 11.069816, 209.57924]
2025-08-07 11:23:43,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 141.0, 83.0, 13.0, 96.0, 120.0, 53.0, 125.0, 12.0, 123.0]
2025-08-07 11:23:43,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 26 minutes, 23 seconds)
2025-08-07 11:25:20,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:25:21,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 164.33157 ± 104.584
2025-08-07 11:25:21,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [303.67416, 143.77097, 85.269905, 300.89386, 121.32005, 56.122284, 90.260864, 224.06158, 14.112999, 303.8291]
2025-08-07 11:25:21,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [163.0, 78.0, 60.0, 164.0, 65.0, 34.0, 51.0, 117.0, 17.0, 145.0]
2025-08-07 11:25:21,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 24 minutes, 46 seconds)
2025-08-07 11:26:58,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:26:59,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 207.34573 ± 130.978
2025-08-07 11:26:59,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [283.28732, 168.52052, 147.71956, 207.58534, 243.11586, 10.63583, 176.6409, 432.44238, 15.914485, 387.59512]
2025-08-07 11:26:59,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 106.0, 90.0, 120.0, 154.0, 14.0, 119.0, 210.0, 18.0, 171.0]
2025-08-07 11:26:59,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 23 minutes, 17 seconds)
2025-08-07 11:28:34,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:35,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 101.30739 ± 91.058
2025-08-07 11:28:35,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [256.48483, 12.9917, 15.897863, 15.870724, 82.0176, 256.00482, 102.07423, 162.90004, 98.8404, 9.991707]
2025-08-07 11:28:35,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [142.0, 16.0, 17.0, 17.0, 58.0, 149.0, 62.0, 93.0, 61.0, 13.0]
2025-08-07 11:28:35,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 21 minutes, 25 seconds)
2025-08-07 11:30:12,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:30:13,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 88.97690 ± 114.050
2025-08-07 11:30:13,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.5591545, 8.362558, 47.77363, 12.590668, 294.53897, 275.17883, 8.745266, 209.30748, 10.12882, 11.583642]
2025-08-07 11:30:13,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 11.0, 30.0, 13.0, 168.0, 164.0, 14.0, 114.0, 12.0, 15.0]
2025-08-07 11:30:13,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 19 minutes, 35 seconds)
2025-08-07 11:31:50,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:31:50,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 170.46953 ± 149.706
2025-08-07 11:31:50,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [266.96106, 133.48227, 10.382971, 123.22684, 118.85507, 16.441332, 215.67157, 97.787605, 161.81335, 560.07324]
2025-08-07 11:31:50,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 79.0, 13.0, 71.0, 72.0, 17.0, 125.0, 55.0, 83.0, 198.0]
2025-08-07 11:31:51,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 18 minutes, 3 seconds)
2025-08-07 11:33:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:33:28,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 163.68562 ± 139.021
2025-08-07 11:33:28,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [114.36817, 473.72958, 183.46408, 91.36252, 98.764885, 12.571505, 172.23157, 125.890785, 355.22003, 9.253144]
2025-08-07 11:33:28,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 178.0, 111.0, 82.0, 63.0, 13.0, 92.0, 103.0, 171.0, 12.0]
2025-08-07 11:33:28,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 16 minutes, 25 seconds)
2025-08-07 11:35:04,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:35:05,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 108.96043 ± 68.089
2025-08-07 11:35:05,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [114.57099, 136.37576, 214.34282, 145.33551, 64.98022, 7.817756, 79.12449, 8.528674, 209.00363, 109.52444]
2025-08-07 11:35:05,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 100.0, 109.0, 108.0, 45.0, 12.0, 52.0, 16.0, 121.0, 68.0]
2025-08-07 11:35:05,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 14 minutes, 33 seconds)
2025-08-07 11:36:42,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:44,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 216.91574 ± 142.048
2025-08-07 11:36:44,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [194.79286, 295.7279, 454.7296, 343.8067, 7.5701766, 201.88472, 204.63425, 93.36631, 11.565092, 361.07977]
2025-08-07 11:36:44,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [122.0, 139.0, 200.0, 175.0, 12.0, 106.0, 130.0, 74.0, 16.0, 153.0]
2025-08-07 11:36:44,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 13 minutes, 16 seconds)
2025-08-07 11:38:19,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:38:21,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 196.06473 ± 162.614
2025-08-07 11:38:21,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [161.74458, 655.6944, 102.429855, 89.60968, 123.20617, 113.25387, 125.64184, 138.61673, 155.33305, 295.1171]
2025-08-07 11:38:21,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 267.0, 87.0, 61.0, 91.0, 61.0, 91.0, 79.0, 113.0, 162.0]
2025-08-07 11:38:21,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 11 minutes, 34 seconds)
2025-08-07 11:39:58,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 111.53564 ± 88.780
2025-08-07 11:39:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [14.798433, 71.88976, 15.803312, 163.0039, 279.39603, 189.2894, 210.01772, 12.533939, 67.560265, 91.063675]
2025-08-07 11:39:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 52.0, 16.0, 99.0, 140.0, 99.0, 104.0, 16.0, 46.0, 65.0]
2025-08-07 11:39:59,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 9 minutes, 58 seconds)
2025-08-07 11:41:34,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:35,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 179.28069 ± 123.160
2025-08-07 11:41:35,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [262.32413, 163.0525, 201.98547, 501.7122, 143.98123, 116.85065, 135.79257, 14.534124, 137.57031, 115.00384]
2025-08-07 11:41:35,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [155.0, 106.0, 101.0, 243.0, 78.0, 66.0, 75.0, 19.0, 91.0, 88.0]
2025-08-07 11:41:35,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 8 minutes, 10 seconds)
2025-08-07 11:43:11,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:43:12,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 102.70955 ± 51.570
2025-08-07 11:43:12,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [124.92052, 91.92186, 139.41415, 14.0469265, 153.67859, 78.44688, 108.82187, 138.9586, 11.529398, 165.35669]
2025-08-07 11:43:12,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 69.0, 87.0, 18.0, 133.0, 53.0, 70.0, 88.0, 17.0, 94.0]
2025-08-07 11:43:12,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 6 minutes, 29 seconds)
2025-08-07 11:44:46,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:44:48,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 169.30843 ± 160.221
2025-08-07 11:44:48,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [8.959152, 10.443621, 113.58608, 143.80035, 156.46497, 601.39886, 253.9472, 150.53015, 82.06637, 171.8874]
2025-08-07 11:44:48,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 14.0, 89.0, 99.0, 106.0, 253.0, 152.0, 88.0, 54.0, 109.0]
2025-08-07 11:44:48,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 4 minutes, 31 seconds)
2025-08-07 11:46:23,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:46:24,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 242.47742 ± 210.443
2025-08-07 11:46:24,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [584.0276, 610.7425, 8.74302, 9.49379, 234.92372, 173.90977, 144.22368, 10.314166, 325.99368, 322.40237]
2025-08-07 11:46:24,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [225.0, 260.0, 11.0, 12.0, 129.0, 90.0, 96.0, 13.0, 170.0, 148.0]
2025-08-07 11:46:24,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 2 minutes, 51 seconds)
2025-08-07 11:47:57,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:58,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 225.33928 ± 124.955
2025-08-07 11:47:58,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [229.66502, 168.35329, 324.74634, 302.9315, 442.6289, 10.575452, 343.5668, 94.33887, 222.99234, 113.59415]
2025-08-07 11:47:58,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 115.0, 183.0, 152.0, 201.0, 14.0, 190.0, 77.0, 119.0, 85.0]
2025-08-07 11:47:58,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 45 seconds)
2025-08-07 11:49:34,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:49:35,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 169.32861 ± 153.661
2025-08-07 11:49:35,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [10.036144, 431.92535, 13.883669, 282.3511, 9.541046, 47.629227, 395.4641, 73.15408, 217.01952, 212.28183]
2025-08-07 11:49:35,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 195.0, 16.0, 154.0, 14.0, 104.0, 196.0, 50.0, 107.0, 127.0]
2025-08-07 11:49:35,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 59 minutes, 9 seconds)
2025-08-07 11:51:08,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:51:10,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 281.94247 ± 183.806
2025-08-07 11:51:10,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [705.68463, 277.99875, 77.99138, 252.40057, 531.4182, 251.75719, 249.11426, 105.083244, 160.43404, 207.54237]
2025-08-07 11:51:10,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [287.0, 170.0, 45.0, 159.0, 264.0, 131.0, 137.0, 72.0, 167.0, 116.0]
2025-08-07 11:51:10,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 57 minutes, 21 seconds)
2025-08-07 11:52:44,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:52:45,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 223.46457 ± 116.399
2025-08-07 11:52:45,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [517.61945, 153.23479, 139.84993, 233.14563, 340.57123, 154.29137, 247.0565, 145.09615, 124.51231, 179.26839]
2025-08-07 11:52:45,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [225.0, 82.0, 112.0, 111.0, 127.0, 89.0, 147.0, 107.0, 79.0, 87.0]
2025-08-07 11:52:45,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 55 minutes, 42 seconds)
2025-08-07 11:54:21,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:54:22,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 129.81642 ± 105.561
2025-08-07 11:54:22,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [201.20116, 11.19391, 272.99023, 9.472317, 11.957487, 159.64572, 9.012614, 245.0785, 126.90394, 250.70828]
2025-08-07 11:54:22,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 14.0, 152.0, 12.0, 13.0, 90.0, 14.0, 130.0, 98.0, 130.0]
2025-08-07 11:54:22,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 54 minutes, 7 seconds)
2025-08-07 11:55:55,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:56,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 220.24313 ± 137.484
2025-08-07 11:55:56,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [7.536857, 81.77894, 61.913704, 420.27713, 196.54689, 244.59792, 172.82973, 243.50087, 395.19205, 378.25754]
2025-08-07 11:55:56,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [10.0, 56.0, 38.0, 186.0, 92.0, 111.0, 103.0, 115.0, 202.0, 161.0]
2025-08-07 11:55:56,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 52 minutes, 32 seconds)
2025-08-07 11:57:30,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:57:31,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 83.67857 ± 100.177
2025-08-07 11:57:31,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [306.97293, 8.907453, 10.958621, 11.861177, 124.15646, 209.60135, 14.795605, 9.331385, 126.31554, 13.885275]
2025-08-07 11:57:31,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 15.0, 13.0, 14.0, 76.0, 134.0, 16.0, 14.0, 79.0, 18.0]
2025-08-07 11:57:31,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 50 minutes, 43 seconds)
2025-08-07 11:59:05,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:59:05,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 125.64525 ± 131.312
2025-08-07 11:59:05,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [434.0437, 97.506035, 130.42259, 286.4692, 134.09904, 17.193819, 120.876205, 11.037036, 13.015599, 11.789143]
2025-08-07 11:59:05,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [182.0, 79.0, 71.0, 133.0, 83.0, 18.0, 70.0, 13.0, 17.0, 14.0]
2025-08-07 11:59:06,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 49 minutes, 9 seconds)
2025-08-07 12:00:40,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:00:41,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 144.40742 ± 116.507
2025-08-07 12:00:41,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [15.361046, 11.926372, 211.72868, 189.2356, 91.91113, 12.210445, 412.40314, 175.26718, 135.88087, 188.14978]
2025-08-07 12:00:41,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 15.0, 134.0, 103.0, 69.0, 15.0, 189.0, 113.0, 102.0, 108.0]
2025-08-07 12:00:41,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 47 minutes, 34 seconds)
2025-08-07 12:02:15,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:02:17,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 225.54338 ± 165.726
2025-08-07 12:02:17,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [346.63943, 438.7759, 96.34054, 170.7186, 210.46222, 117.034386, 117.39494, 167.82373, 576.28864, 13.955396]
2025-08-07 12:02:17,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 196.0, 66.0, 102.0, 130.0, 66.0, 95.0, 98.0, 234.0, 16.0]
2025-08-07 12:02:17,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 45 minutes, 54 seconds)
2025-08-07 12:03:50,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:51,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 222.86307 ± 109.485
2025-08-07 12:03:51,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [306.47787, 186.00775, 147.54338, 361.56036, 13.391355, 254.08969, 317.14615, 221.20102, 83.00399, 338.209]
2025-08-07 12:03:51,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 136.0, 95.0, 174.0, 14.0, 143.0, 151.0, 125.0, 53.0, 182.0]
2025-08-07 12:03:51,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 44 minutes, 20 seconds)
2025-08-07 12:05:26,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:27,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 231.29716 ± 124.185
2025-08-07 12:05:27,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [269.25027, 431.47598, 180.42636, 213.44814, 431.11447, 8.598486, 173.3832, 138.59193, 297.84653, 168.83609]
2025-08-07 12:05:27,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 170.0, 91.0, 121.0, 236.0, 11.0, 95.0, 76.0, 157.0, 122.0]
2025-08-07 12:05:27,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 42 minutes, 54 seconds)
2025-08-07 12:07:00,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:07:01,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 205.73514 ± 118.677
2025-08-07 12:07:01,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [204.97205, 179.95761, 212.20212, 119.70733, 456.5874, 117.63821, 141.12923, 12.141356, 300.00583, 313.0102]
2025-08-07 12:07:01,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [92.0, 123.0, 140.0, 75.0, 205.0, 83.0, 97.0, 15.0, 142.0, 159.0]
2025-08-07 12:07:01,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 41 minutes, 14 seconds)
2025-08-07 12:08:35,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:08:36,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 244.78171 ± 233.115
2025-08-07 12:08:36,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [419.98212, 131.52786, 16.13232, 289.40442, 190.23349, 140.83136, 855.1023, 203.30943, 190.34476, 10.948861]
2025-08-07 12:08:36,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [162.0, 69.0, 18.0, 144.0, 99.0, 90.0, 293.0, 128.0, 100.0, 13.0]
2025-08-07 12:08:36,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 39 minutes, 37 seconds)
2025-08-07 12:10:10,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:11,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 180.65828 ± 168.487
2025-08-07 12:10:11,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [89.16783, 331.5494, 582.66223, 118.550255, 12.138755, 113.39114, 9.836145, 303.03772, 165.41052, 80.83879]
2025-08-07 12:10:11,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 167.0, 261.0, 73.0, 20.0, 62.0, 14.0, 180.0, 97.0, 50.0]
2025-08-07 12:10:11,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 37 minutes, 56 seconds)
2025-08-07 12:11:45,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:11:46,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 122.24178 ± 91.392
2025-08-07 12:11:46,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [181.87204, 9.192042, 107.895676, 11.63268, 221.04218, 9.868363, 149.4055, 297.40427, 95.99759, 138.10747]
2025-08-07 12:11:46,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [92.0, 13.0, 70.0, 14.0, 108.0, 13.0, 79.0, 136.0, 61.0, 109.0]
2025-08-07 12:11:46,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 36 minutes, 23 seconds)
2025-08-07 12:13:19,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:13:21,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 265.81046 ± 134.561
2025-08-07 12:13:21,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [264.18387, 171.92126, 428.48547, 11.81809, 422.13766, 325.43207, 149.16115, 290.82065, 160.88423, 433.25995]
2025-08-07 12:13:21,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [137.0, 100.0, 217.0, 15.0, 195.0, 164.0, 86.0, 154.0, 105.0, 214.0]
2025-08-07 12:13:21,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 34 minutes, 42 seconds)
2025-08-07 12:14:55,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:56,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 225.42749 ± 182.795
2025-08-07 12:14:56,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [141.10623, 439.77567, 374.38672, 10.471209, 10.369153, 114.23955, 605.5403, 243.62631, 157.90475, 156.85487]
2025-08-07 12:14:56,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 184.0, 176.0, 13.0, 13.0, 68.0, 249.0, 125.0, 87.0, 87.0]
2025-08-07 12:14:56,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 33 minutes, 13 seconds)
2025-08-07 12:16:29,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:31,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 274.66568 ± 187.181
2025-08-07 12:16:31,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [663.17194, 88.695435, 400.50024, 425.99658, 243.72618, 169.62418, 192.70949, 411.37512, 7.163424, 143.69402]
2025-08-07 12:16:31,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [303.0, 59.0, 158.0, 221.0, 132.0, 101.0, 95.0, 165.0, 10.0, 86.0]
2025-08-07 12:16:31,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 31 minutes, 36 seconds)
2025-08-07 12:18:04,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:18:05,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 141.18428 ± 119.268
2025-08-07 12:18:05,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [187.78221, 15.100285, 130.54303, 11.341257, 312.9661, 167.93222, 234.54642, 13.241756, 10.563437, 327.82593]
2025-08-07 12:18:05,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 17.0, 89.0, 16.0, 147.0, 102.0, 112.0, 17.0, 15.0, 163.0]
2025-08-07 12:18:05,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 30 minutes)
2025-08-07 12:19:38,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:19:39,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 208.67480 ± 153.220
2025-08-07 12:19:39,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [339.32727, 392.09378, 12.994115, 10.288681, 15.527069, 124.6214, 263.49503, 193.16608, 303.3526, 431.88196]
2025-08-07 12:19:39,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [231.0, 201.0, 14.0, 15.0, 17.0, 66.0, 123.0, 105.0, 148.0, 201.0]
2025-08-07 12:19:39,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 28 minutes, 24 seconds)
2025-08-07 12:21:13,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:21:14,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 179.01656 ± 120.367
2025-08-07 12:21:14,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [459.3451, 198.55682, 136.07126, 179.2946, 97.96296, 292.0716, 222.45207, 11.507996, 120.333664, 72.569435]
2025-08-07 12:21:14,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 100.0, 78.0, 117.0, 73.0, 138.0, 111.0, 14.0, 77.0, 45.0]
2025-08-07 12:21:14,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 26 minutes, 51 seconds)
2025-08-07 12:22:47,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:22:48,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 160.61778 ± 96.733
2025-08-07 12:22:48,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [169.04326, 85.44092, 117.79469, 161.87598, 77.610985, 110.484, 240.86098, 12.025087, 322.71225, 308.3297]
2025-08-07 12:22:48,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 49.0, 79.0, 93.0, 45.0, 87.0, 131.0, 14.0, 144.0, 135.0]
2025-08-07 12:22:48,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 8 seconds)
2025-08-07 12:24:21,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:24:22,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 207.49776 ± 128.710
2025-08-07 12:24:22,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [440.8684, 131.62625, 290.0423, 12.065571, 169.71422, 168.89288, 384.86432, 97.179695, 271.3151, 108.40878]
2025-08-07 12:24:22,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [222.0, 67.0, 137.0, 14.0, 88.0, 107.0, 197.0, 76.0, 177.0, 65.0]
2025-08-07 12:24:22,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 23 minutes, 35 seconds)
2025-08-07 12:25:55,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:25:57,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 194.37231 ± 156.675
2025-08-07 12:25:57,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [160.16176, 77.82085, 152.75087, 187.14734, 126.611, 17.813345, 141.16591, 219.69966, 232.2964, 628.2561]
2025-08-07 12:25:57,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 60.0, 98.0, 106.0, 91.0, 17.0, 92.0, 136.0, 121.0, 242.0]
2025-08-07 12:25:57,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 1 second)
2025-08-07 12:27:30,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:27:31,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 177.46640 ± 125.501
2025-08-07 12:27:31,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [124.52989, 185.91045, 246.417, 121.70307, 176.12569, 476.87592, 225.35133, 12.209315, 12.334423, 193.20682]
2025-08-07 12:27:31,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 121.0, 129.0, 82.0, 115.0, 245.0, 115.0, 16.0, 16.0, 98.0]
2025-08-07 12:27:31,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 20 minutes, 26 seconds)
2025-08-07 12:29:04,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:29:05,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 189.89540 ± 224.224
2025-08-07 12:29:05,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [100.05534, 216.51003, 11.196198, 79.763504, 100.63961, 811.5456, 13.499882, 256.9446, 61.196644, 247.60258]
2025-08-07 12:29:05,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 105.0, 13.0, 55.0, 62.0, 297.0, 16.0, 142.0, 42.0, 140.0]
2025-08-07 12:29:05,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 18 minutes, 49 seconds)
2025-08-07 12:30:37,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:30:39,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 253.91777 ± 202.026
2025-08-07 12:30:39,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [111.33065, 207.17128, 236.73503, 193.11754, 100.42627, 148.33694, 264.05246, 835.5375, 277.84106, 164.62924]
2025-08-07 12:30:39,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [87.0, 124.0, 118.0, 123.0, 84.0, 112.0, 119.0, 358.0, 147.0, 101.0]
2025-08-07 12:30:39,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 17 seconds)
2025-08-07 12:32:12,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:32:14,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 176.17830 ± 120.426
2025-08-07 12:32:14,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [222.44554, 261.37067, 115.83916, 198.5775, 402.15222, 9.261572, 147.61049, 90.84916, 304.67523, 9.001458]
2025-08-07 12:32:14,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 135.0, 70.0, 139.0, 180.0, 14.0, 102.0, 58.0, 154.0, 13.0]
2025-08-07 12:32:14,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 15 minutes, 42 seconds)
2025-08-07 12:33:47,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:33:48,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 251.55876 ± 249.925
2025-08-07 12:33:48,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [969.6317, 183.04863, 170.23253, 194.3514, 157.3414, 209.8257, 289.85663, 11.033745, 96.07367, 234.19215]
2025-08-07 12:33:48,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [335.0, 124.0, 87.0, 97.0, 98.0, 109.0, 153.0, 15.0, 58.0, 123.0]
2025-08-07 12:33:48,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 8 seconds)
2025-08-07 12:35:21,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:35:22,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 254.96867 ± 173.528
2025-08-07 12:35:22,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [670.7704, 6.1077814, 105.683784, 370.61874, 270.75717, 139.59045, 286.99368, 178.63513, 331.56107, 188.96855]
2025-08-07 12:35:22,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [273.0, 12.0, 61.0, 190.0, 137.0, 70.0, 136.0, 116.0, 157.0, 125.0]
2025-08-07 12:35:22,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 12 minutes, 33 seconds)
2025-08-07 12:36:55,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:36:56,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 217.62891 ± 150.642
2025-08-07 12:36:56,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [107.68085, 316.86935, 18.437363, 449.92285, 261.48508, 466.74283, 11.174711, 183.08286, 200.35379, 160.53938]
2025-08-07 12:36:56,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 159.0, 18.0, 220.0, 159.0, 207.0, 17.0, 108.0, 135.0, 96.0]
2025-08-07 12:36:56,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 10 minutes, 59 seconds)
2025-08-07 12:38:35,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:38:36,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 169.07649 ± 131.061
2025-08-07 12:38:36,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [206.5556, 168.14413, 298.21417, 192.03526, 144.71704, 11.443693, 9.809479, 443.4764, 207.28679, 9.0823]
2025-08-07 12:38:36,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 108.0, 142.0, 95.0, 81.0, 17.0, 13.0, 198.0, 113.0, 11.0]
2025-08-07 12:38:36,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 32 seconds)
2025-08-07 12:40:21,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:40:22,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 193.51395 ± 73.538
2025-08-07 12:40:22,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [288.32156, 210.84464, 159.83632, 189.18364, 216.27467, 263.23688, 258.96964, 173.03027, 162.45193, 12.990039]
2025-08-07 12:40:22,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [142.0, 102.0, 89.0, 108.0, 120.0, 129.0, 124.0, 99.0, 101.0, 17.0]
2025-08-07 12:40:22,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 8 seconds)
2025-08-07 12:42:06,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:42:07,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 275.15027 ± 239.037
2025-08-07 12:42:07,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [524.5173, 137.25122, 9.664288, 300.58817, 285.30725, 85.28854, 176.62848, 204.39186, 870.41626, 157.44955]
2025-08-07 12:42:07,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [257.0, 83.0, 12.0, 139.0, 138.0, 59.0, 125.0, 101.0, 349.0, 85.0]
2025-08-07 12:42:07,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 39 seconds)
2025-08-07 12:43:52,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:43:53,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 124.91164 ± 85.531
2025-08-07 12:43:53,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [153.18846, 10.595377, 12.532809, 208.06587, 19.39518, 278.15887, 181.68518, 110.61824, 111.13249, 163.74405]
2025-08-07 12:43:53,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 15.0, 15.0, 100.0, 18.0, 164.0, 97.0, 73.0, 86.0, 82.0]
2025-08-07 12:43:53,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 6 seconds)
2025-08-07 12:45:37,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:45:38,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 68.78688 ± 66.985
2025-08-07 12:45:38,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [212.14685, 11.41537, 99.59899, 73.1906, 149.36797, 13.248636, 94.06332, 13.682811, 6.8999662, 14.254336]
2025-08-07 12:45:38,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 15.0, 67.0, 53.0, 107.0, 16.0, 57.0, 14.0, 10.0, 16.0]
2025-08-07 12:45:38,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 28 seconds)
2025-08-07 12:47:23,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:47:24,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 166.07176 ± 120.100
2025-08-07 12:47:24,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [106.33572, 106.40137, 460.9015, 181.34213, 72.03001, 167.51996, 275.42242, 185.50932, 91.57639, 13.678813]
2025-08-07 12:47:24,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 69.0, 198.0, 114.0, 42.0, 87.0, 118.0, 127.0, 61.0, 16.0]
2025-08-07 12:47:24,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 45 seconds)
2025-08-07 12:49:09,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:49:10,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 155.31284 ± 123.906
2025-08-07 12:49:10,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [278.38144, 14.591577, 159.12474, 238.43791, 264.27637, 10.658927, 13.502357, 338.66916, 223.34833, 12.137632]
2025-08-07 12:49:10,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [174.0, 16.0, 87.0, 143.0, 122.0, 17.0, 15.0, 190.0, 114.0, 14.0]
2025-08-07 12:49:10,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1251 [DEBUG]: Training session finished
