2025-08-07 10:03:37,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc10-hopper/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:03:37,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc10-hopper/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:03:37,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14fd98f87e10>}
2025-08-07 10:03:37,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 10:03:37,483 baseline-bpql-noiseperc10-hopper:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:03:37,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1133 [INFO]: Creating new trainer
2025-08-07 10:03:37,499 baseline-bpql-noiseperc10-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=59, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 10:03:37,499 baseline-bpql-noiseperc10-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:03:38,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 10:03:38,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 10:05:04,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:05:04,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 63.03350 ± 11.525
2025-08-07 10:05:04,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [86.07092, 72.43786, 52.433598, 52.42029, 75.797874, 52.578846, 56.897003, 66.858, 64.366295, 50.474365]
2025-08-07 10:05:04,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 51.0, 34.0, 32.0, 50.0, 34.0, 38.0, 48.0, 48.0, 36.0]
2025-08-07 10:05:04,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (63.03) for latency MM1Queue_a033_s075
2025-08-07 10:05:04,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 22 minutes, 30 seconds)
2025-08-07 10:06:37,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:06:38,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 127.76632 ± 43.908
2025-08-07 10:06:38,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [98.69826, 134.05037, 89.05458, 115.80635, 158.37857, 204.99956, 176.70287, 147.29254, 46.462162, 106.21795]
2025-08-07 10:06:38,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 88.0, 58.0, 75.0, 85.0, 131.0, 110.0, 95.0, 33.0, 73.0]
2025-08-07 10:06:38,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (127.77) for latency MM1Queue_a033_s075
2025-08-07 10:06:38,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 27 minutes, 5 seconds)
2025-08-07 10:08:10,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:08:11,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 17.29930 ± 3.245
2025-08-07 10:08:11,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [21.06213, 13.806582, 18.184647, 14.524931, 17.261795, 17.443708, 24.275354, 13.07914, 17.945578, 15.409103]
2025-08-07 10:08:11,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [43.0, 41.0, 41.0, 41.0, 43.0, 43.0, 46.0, 39.0, 41.0, 43.0]
2025-08-07 10:08:11,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 26 minutes, 51 seconds)
2025-08-07 10:09:44,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:09:45,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 154.73404 ± 69.256
2025-08-07 10:09:45,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [326.84525, 124.82041, 168.92877, 170.1305, 159.03494, 122.892075, 180.89851, 107.88393, 40.689915, 145.21622]
2025-08-07 10:09:45,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 83.0, 113.0, 100.0, 89.0, 79.0, 103.0, 74.0, 31.0, 74.0]
2025-08-07 10:09:45,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (154.73) for latency MM1Queue_a033_s075
2025-08-07 10:09:45,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 26 minutes, 35 seconds)
2025-08-07 10:11:17,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:11:18,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 113.50844 ± 60.981
2025-08-07 10:11:18,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [61.7373, 163.98007, 78.802124, 116.22238, 216.817, 22.068344, 173.5873, 52.943626, 78.662926, 170.26334]
2025-08-07 10:11:18,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 146.0, 81.0, 92.0, 190.0, 29.0, 175.0, 64.0, 70.0, 168.0]
2025-08-07 10:11:18,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 25 minutes, 45 seconds)
2025-08-07 10:12:53,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:12:54,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 138.87048 ± 69.018
2025-08-07 10:12:54,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [146.22885, 43.61679, 245.31526, 112.3827, 51.153763, 66.436226, 148.22702, 249.28882, 169.6652, 156.3902]
2025-08-07 10:12:54,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 36.0, 125.0, 103.0, 36.0, 43.0, 85.0, 129.0, 98.0, 94.0]
2025-08-07 10:12:54,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 27 minutes, 13 seconds)
2025-08-07 10:14:25,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:14:26,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 137.36523 ± 96.108
2025-08-07 10:14:26,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [192.02019, 87.629234, 307.22342, 68.22142, 52.73228, 70.0976, 106.60002, 316.10275, 44.560066, 128.4653]
2025-08-07 10:14:26,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 75.0, 153.0, 52.0, 47.0, 50.0, 79.0, 128.0, 33.0, 87.0]
2025-08-07 10:14:26,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 25 minutes, 4 seconds)
2025-08-07 10:15:59,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:16:00,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 159.54726 ± 55.402
2025-08-07 10:16:00,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [88.57284, 231.04968, 114.46835, 147.26848, 275.6426, 200.46658, 132.8261, 138.9925, 151.42416, 114.76123]
2025-08-07 10:16:00,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [57.0, 120.0, 74.0, 74.0, 146.0, 104.0, 82.0, 83.0, 89.0, 76.0]
2025-08-07 10:16:00,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (159.55) for latency MM1Queue_a033_s075
2025-08-07 10:16:00,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 23 minutes, 58 seconds)
2025-08-07 10:17:33,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:35,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 177.50107 ± 102.193
2025-08-07 10:17:35,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [189.85368, 222.68378, 188.62712, 341.75797, 81.17255, 351.07074, 95.01756, 145.10242, 146.23772, 13.48723]
2025-08-07 10:17:35,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [143.0, 140.0, 151.0, 304.0, 54.0, 207.0, 71.0, 102.0, 97.0, 17.0]
2025-08-07 10:17:35,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (177.50) for latency MM1Queue_a033_s075
2025-08-07 10:17:35,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 22 minutes, 38 seconds)
2025-08-07 10:19:08,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:19:09,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 222.00729 ± 111.724
2025-08-07 10:19:09,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [188.37218, 97.43299, 387.25638, 360.97974, 375.497, 214.55545, 85.57285, 241.01393, 91.59978, 177.79277]
2025-08-07 10:19:09,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [143.0, 64.0, 177.0, 199.0, 187.0, 123.0, 56.0, 207.0, 58.0, 103.0]
2025-08-07 10:19:09,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (222.01) for latency MM1Queue_a033_s075
2025-08-07 10:19:09,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 21 minutes, 15 seconds)
2025-08-07 10:20:42,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:43,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 137.63489 ± 65.825
2025-08-07 10:20:43,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [100.571045, 140.08907, 154.20396, 224.38397, 114.24542, 72.929504, 282.0447, 49.131203, 113.80077, 124.94927]
2025-08-07 10:20:43,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 108.0, 125.0, 142.0, 81.0, 63.0, 240.0, 36.0, 88.0, 97.0]
2025-08-07 10:20:43,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 19 minutes, 7 seconds)
2025-08-07 10:22:17,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:22:19,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 201.74759 ± 117.214
2025-08-07 10:22:19,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [306.8489, 483.12378, 281.4348, 120.789894, 72.39158, 121.64791, 142.05124, 157.45943, 127.01551, 204.71274]
2025-08-07 10:22:19,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [205.0, 262.0, 182.0, 82.0, 61.0, 87.0, 107.0, 113.0, 91.0, 124.0]
2025-08-07 10:22:19,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 18 minutes, 38 seconds)
2025-08-07 10:23:50,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:23:52,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 253.09830 ± 164.585
2025-08-07 10:23:52,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [164.73499, 181.97021, 358.0799, 284.59116, 643.22833, 258.45877, 232.07417, 51.761253, 322.13513, 33.948624]
2025-08-07 10:23:52,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 117.0, 158.0, 148.0, 321.0, 107.0, 134.0, 42.0, 148.0, 33.0]
2025-08-07 10:23:52,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (253.10) for latency MM1Queue_a033_s075
2025-08-07 10:23:52,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 16 minutes, 46 seconds)
2025-08-07 10:25:24,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:25:26,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 207.99638 ± 184.653
2025-08-07 10:25:26,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [175.85681, 86.75278, 350.2448, 101.67226, 100.46046, 712.30774, 76.79461, 168.46864, 193.25847, 114.14743]
2025-08-07 10:25:26,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 59.0, 177.0, 67.0, 68.0, 326.0, 57.0, 114.0, 125.0, 75.0]
2025-08-07 10:25:26,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 14 minutes, 59 seconds)
2025-08-07 10:27:00,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:27:01,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 253.26993 ± 111.751
2025-08-07 10:27:01,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [208.11736, 270.08994, 134.9013, 214.13077, 523.6902, 291.613, 198.351, 144.1055, 365.0676, 182.63263]
2025-08-07 10:27:01,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 149.0, 84.0, 121.0, 244.0, 167.0, 134.0, 88.0, 170.0, 110.0]
2025-08-07 10:27:01,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (253.27) for latency MM1Queue_a033_s075
2025-08-07 10:27:01,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 13 minutes, 43 seconds)
2025-08-07 10:28:34,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:35,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 245.79532 ± 161.192
2025-08-07 10:28:35,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [458.0162, 151.42558, 96.058876, 211.15659, 149.65338, 90.99112, 237.47289, 399.90225, 96.585724, 566.6906]
2025-08-07 10:28:35,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [206.0, 87.0, 69.0, 129.0, 100.0, 59.0, 130.0, 161.0, 66.0, 184.0]
2025-08-07 10:28:35,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 12 minutes, 5 seconds)
2025-08-07 10:30:09,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:10,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 226.35527 ± 88.655
2025-08-07 10:30:10,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [406.01984, 126.57497, 107.99941, 166.62254, 293.69125, 260.73047, 227.93404, 251.90932, 288.3159, 133.75487]
2025-08-07 10:30:10,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [149.0, 82.0, 69.0, 104.0, 137.0, 138.0, 116.0, 122.0, 130.0, 70.0]
2025-08-07 10:30:10,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 10 minutes, 27 seconds)
2025-08-07 10:31:42,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:31:44,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 298.25259 ± 256.512
2025-08-07 10:31:44,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [200.7429, 622.28955, 88.31623, 85.38447, 118.94754, 114.00736, 138.19789, 220.86931, 558.0167, 835.7539]
2025-08-07 10:31:44,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 251.0, 66.0, 60.0, 80.0, 71.0, 95.0, 110.0, 228.0, 281.0]
2025-08-07 10:31:44,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (298.25) for latency MM1Queue_a033_s075
2025-08-07 10:31:44,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 9 minutes, 6 seconds)
2025-08-07 10:33:18,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:20,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 297.45721 ± 193.479
2025-08-07 10:33:20,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [15.494924, 291.66022, 91.88122, 360.94174, 639.3425, 404.7826, 108.84027, 338.6353, 554.6751, 168.31813]
2025-08-07 10:33:20,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 128.0, 68.0, 143.0, 218.0, 161.0, 61.0, 140.0, 207.0, 92.0]
2025-08-07 10:33:20,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 7 minutes, 56 seconds)
2025-08-07 10:34:53,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:34:54,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 276.19543 ± 202.078
2025-08-07 10:34:54,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [62.917908, 216.84863, 453.54608, 289.85477, 119.1681, 223.48753, 298.57007, 325.83395, 16.23545, 755.49176]
2025-08-07 10:34:54,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 109.0, 211.0, 136.0, 88.0, 111.0, 144.0, 146.0, 18.0, 269.0]
2025-08-07 10:34:54,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 6 minutes, 10 seconds)
2025-08-07 10:36:27,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:36:29,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 322.30371 ± 180.242
2025-08-07 10:36:29,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [326.12695, 321.7998, 453.24518, 646.8336, 146.82527, 546.11536, 238.48904, 84.27276, 372.2457, 87.08344]
2025-08-07 10:36:29,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [158.0, 199.0, 238.0, 265.0, 80.0, 267.0, 114.0, 57.0, 165.0, 69.0]
2025-08-07 10:36:29,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (322.30) for latency MM1Queue_a033_s075
2025-08-07 10:36:29,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 4 minutes, 46 seconds)
2025-08-07 10:38:03,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:38:05,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 428.81747 ± 206.044
2025-08-07 10:38:05,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [401.4805, 437.06223, 167.89954, 551.09265, 676.46893, 695.4017, 531.3517, 567.2704, 122.38691, 137.76027]
2025-08-07 10:38:05,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [150.0, 177.0, 90.0, 206.0, 326.0, 259.0, 184.0, 193.0, 77.0, 82.0]
2025-08-07 10:38:05,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (428.82) for latency MM1Queue_a033_s075
2025-08-07 10:38:05,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 3 minutes, 25 seconds)
2025-08-07 10:39:36,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:38,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 438.04517 ± 305.035
2025-08-07 10:39:38,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [260.9876, 133.2207, 581.2819, 412.48026, 398.59003, 112.607285, 699.9927, 397.30585, 1181.0258, 202.95955]
2025-08-07 10:39:38,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 79.0, 218.0, 186.0, 157.0, 71.0, 299.0, 159.0, 437.0, 105.0]
2025-08-07 10:39:38,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (438.05) for latency MM1Queue_a033_s075
2025-08-07 10:39:38,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 1 minute, 43 seconds)
2025-08-07 10:41:12,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:41:14,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 369.68900 ± 269.663
2025-08-07 10:41:14,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [111.0146, 218.78299, 295.1029, 848.8303, 272.17715, 105.88033, 389.9495, 918.3535, 242.99281, 293.8057]
2025-08-07 10:41:14,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 134.0, 158.0, 295.0, 138.0, 72.0, 169.0, 318.0, 128.0, 143.0]
2025-08-07 10:41:14,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 7 seconds)
2025-08-07 10:42:48,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:42:50,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 455.74908 ± 241.332
2025-08-07 10:42:50,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [472.5783, 949.76117, 647.58813, 574.8699, 136.58055, 632.48724, 427.6699, 149.96236, 283.87616, 282.11722]
2025-08-07 10:42:50,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [209.0, 450.0, 246.0, 260.0, 88.0, 250.0, 188.0, 90.0, 146.0, 126.0]
2025-08-07 10:42:50,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (455.75) for latency MM1Queue_a033_s075
2025-08-07 10:42:50,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 1 hour, 58 minutes, 59 seconds)
2025-08-07 10:44:22,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:24,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 344.70746 ± 359.252
2025-08-07 10:44:24,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [405.35425, 188.31029, 136.93274, 220.60611, 1374.68, 130.48775, 308.5815, 133.15865, 418.1055, 130.85757]
2025-08-07 10:44:24,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [183.0, 99.0, 78.0, 123.0, 615.0, 82.0, 165.0, 75.0, 189.0, 73.0]
2025-08-07 10:44:24,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 57 minutes, 15 seconds)
2025-08-07 10:46:00,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:03,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 514.37988 ± 369.714
2025-08-07 10:46:03,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [270.13248, 78.45538, 391.63837, 1269.958, 986.3363, 702.41284, 697.27936, 371.81943, 243.37561, 132.3911]
2025-08-07 10:46:03,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 52.0, 188.0, 571.0, 412.0, 329.0, 329.0, 161.0, 123.0, 90.0]
2025-08-07 10:46:03,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (514.38) for latency MM1Queue_a033_s075
2025-08-07 10:46:03,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 56 minutes, 19 seconds)
2025-08-07 10:47:34,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:37,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 444.41763 ± 340.928
2025-08-07 10:47:37,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1261.6683, 185.59958, 284.20862, 105.88341, 157.21776, 539.93744, 339.2201, 792.93097, 557.3195, 220.19058]
2025-08-07 10:47:37,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [493.0, 105.0, 162.0, 71.0, 94.0, 230.0, 157.0, 324.0, 229.0, 123.0]
2025-08-07 10:47:37,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 54 minutes, 46 seconds)
2025-08-07 10:49:10,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:12,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 469.41455 ± 296.160
2025-08-07 10:49:12,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [207.65504, 790.6484, 418.5917, 383.33716, 404.57523, 833.06757, 325.0581, 78.491684, 1036.2607, 216.46014]
2025-08-07 10:49:12,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 337.0, 192.0, 172.0, 207.0, 286.0, 160.0, 55.0, 397.0, 99.0]
2025-08-07 10:49:12,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 53 minutes, 10 seconds)
2025-08-07 10:50:45,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:50:48,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 715.47748 ± 279.881
2025-08-07 10:50:48,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1020.0642, 821.34955, 229.5515, 880.4439, 892.0906, 769.6905, 741.1642, 327.97333, 1075.4894, 396.9581]
2025-08-07 10:50:48,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [414.0, 286.0, 133.0, 343.0, 334.0, 261.0, 250.0, 180.0, 374.0, 160.0]
2025-08-07 10:50:48,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (715.48) for latency MM1Queue_a033_s075
2025-08-07 10:50:48,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 51 minutes, 34 seconds)
2025-08-07 10:52:24,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:52:27,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 511.98203 ± 425.979
2025-08-07 10:52:27,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1160.5397, 345.4213, 130.07352, 317.7823, 1393.5637, 379.66537, 94.50793, 775.02, 359.38605, 163.86012]
2025-08-07 10:52:27,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [425.0, 155.0, 85.0, 164.0, 502.0, 167.0, 56.0, 340.0, 183.0, 88.0]
2025-08-07 10:52:27,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 50 minutes, 54 seconds)
2025-08-07 10:53:57,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:59,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 412.71500 ± 314.686
2025-08-07 10:53:59,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1270.3517, 181.76006, 166.41483, 313.52704, 347.886, 642.08777, 294.6216, 323.50726, 184.53384, 402.4597]
2025-08-07 10:53:59,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [500.0, 103.0, 89.0, 149.0, 149.0, 260.0, 144.0, 153.0, 95.0, 175.0]
2025-08-07 10:53:59,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 47 minutes, 57 seconds)
2025-08-07 10:55:34,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:55:37,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 569.36499 ± 344.995
2025-08-07 10:55:37,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [501.26227, 586.6095, 428.95685, 133.14964, 980.1983, 1016.8767, 138.60242, 1106.36, 206.53217, 595.10223]
2025-08-07 10:55:37,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [208.0, 244.0, 175.0, 76.0, 373.0, 373.0, 81.0, 446.0, 94.0, 235.0]
2025-08-07 10:55:37,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 47 minutes, 13 seconds)
2025-08-07 10:57:11,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:57:12,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 357.03577 ± 240.878
2025-08-07 10:57:12,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [231.45894, 100.67279, 530.2065, 187.7841, 232.81131, 406.36237, 806.3095, 226.6959, 735.81396, 112.242355]
2025-08-07 10:57:12,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 63.0, 217.0, 111.0, 121.0, 173.0, 303.0, 110.0, 280.0, 62.0]
2025-08-07 10:57:12,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 45 minutes, 41 seconds)
2025-08-07 10:58:45,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:48,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 672.05688 ± 422.951
2025-08-07 10:58:48,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [137.30609, 1366.1123, 624.34375, 584.7112, 671.8791, 769.64594, 764.73096, 231.20827, 1407.0793, 163.55215]
2025-08-07 10:58:48,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 523.0, 229.0, 218.0, 249.0, 302.0, 309.0, 109.0, 508.0, 91.0]
2025-08-07 10:58:48,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 43 minutes, 50 seconds)
2025-08-07 11:00:21,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:23,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 415.23886 ± 352.313
2025-08-07 11:00:23,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [191.79817, 99.35348, 697.3197, 123.77362, 384.06033, 108.6668, 514.17163, 90.32976, 1192.7544, 750.1608]
2025-08-07 11:00:23,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 63.0, 229.0, 81.0, 180.0, 73.0, 209.0, 55.0, 500.0, 328.0]
2025-08-07 11:00:23,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 41 minutes, 37 seconds)
2025-08-07 11:01:56,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:59,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 468.78296 ± 264.130
2025-08-07 11:01:59,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [281.21375, 365.58646, 99.603615, 339.81296, 711.5316, 706.4882, 935.6734, 94.91847, 541.21027, 611.7907]
2025-08-07 11:01:59,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [121.0, 149.0, 60.0, 152.0, 251.0, 284.0, 331.0, 71.0, 215.0, 244.0]
2025-08-07 11:01:59,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 40 minutes, 36 seconds)
2025-08-07 11:03:34,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:36,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 446.15469 ± 421.403
2025-08-07 11:03:36,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [131.56645, 438.74796, 806.4168, 1556.6653, 279.61978, 390.24698, 76.8674, 414.65033, 122.39942, 244.36603]
2025-08-07 11:03:36,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [76.0, 193.0, 309.0, 616.0, 129.0, 169.0, 59.0, 168.0, 72.0, 120.0]
2025-08-07 11:03:36,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 39 minutes, 7 seconds)
2025-08-07 11:05:08,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:10,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 238.46866 ± 244.096
2025-08-07 11:05:10,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [580.41925, 814.98395, 13.4264765, 188.43048, 83.27794, 74.04112, 118.4496, 181.5552, 248.22028, 81.88226]
2025-08-07 11:05:10,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [226.0, 332.0, 17.0, 99.0, 53.0, 61.0, 71.0, 88.0, 134.0, 58.0]
2025-08-07 11:05:10,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 37 minutes, 3 seconds)
2025-08-07 11:06:41,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:06:44,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 576.60608 ± 414.851
2025-08-07 11:06:44,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [418.93686, 106.85356, 94.14948, 464.2528, 303.41055, 925.9416, 255.95967, 1004.87335, 773.3577, 1418.3254]
2025-08-07 11:06:44,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [172.0, 60.0, 66.0, 189.0, 134.0, 377.0, 126.0, 382.0, 277.0, 514.0]
2025-08-07 11:06:44,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 35 minutes, 10 seconds)
2025-08-07 11:08:19,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:22,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 512.62903 ± 278.722
2025-08-07 11:08:22,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [223.4053, 380.17157, 652.74866, 219.07281, 379.26227, 156.21457, 1035.9829, 858.6512, 531.9996, 688.78186]
2025-08-07 11:08:22,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [109.0, 169.0, 252.0, 110.0, 172.0, 85.0, 415.0, 331.0, 213.0, 265.0]
2025-08-07 11:08:22,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 34 minutes, 9 seconds)
2025-08-07 11:09:52,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:09:55,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 587.61505 ± 514.775
2025-08-07 11:09:55,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [489.13995, 180.89243, 764.84924, 242.85878, 180.82993, 1460.7855, 1655.9501, 229.0292, 334.32242, 337.49326]
2025-08-07 11:09:55,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [201.0, 97.0, 282.0, 125.0, 94.0, 606.0, 628.0, 105.0, 144.0, 155.0]
2025-08-07 11:09:55,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 32 minutes, 2 seconds)
2025-08-07 11:11:29,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:11:31,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 429.86688 ± 380.746
2025-08-07 11:11:31,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [117.73122, 399.14676, 1393.6036, 601.0385, 394.90546, 182.50752, 379.79855, 134.48314, 685.03625, 10.417776]
2025-08-07 11:11:31,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 185.0, 495.0, 216.0, 168.0, 105.0, 179.0, 76.0, 231.0, 15.0]
2025-08-07 11:11:31,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 30 minutes, 10 seconds)
2025-08-07 11:13:00,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:13:03,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 593.64099 ± 277.723
2025-08-07 11:13:03,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1015.2381, 620.57587, 303.83313, 499.29565, 199.7509, 914.2655, 407.2376, 570.8352, 1005.1328, 400.2448]
2025-08-07 11:13:03,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [392.0, 236.0, 143.0, 219.0, 98.0, 376.0, 167.0, 249.0, 404.0, 182.0]
2025-08-07 11:13:03,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 28 minutes, 20 seconds)
2025-08-07 11:14:34,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:36,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 508.04181 ± 254.088
2025-08-07 11:14:36,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [187.62816, 547.1269, 691.19794, 756.33466, 662.93756, 647.76843, 107.79657, 367.76096, 886.26624, 225.60071]
2025-08-07 11:14:36,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 222.0, 263.0, 275.0, 268.0, 216.0, 71.0, 161.0, 329.0, 119.0]
2025-08-07 11:14:36,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 26 minutes, 39 seconds)
2025-08-07 11:16:08,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:16:11,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 607.51239 ± 533.363
2025-08-07 11:16:11,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [372.8566, 686.92505, 507.69424, 624.70575, 770.6183, 304.68512, 2092.3276, 127.36611, 186.76027, 401.18518]
2025-08-07 11:16:11,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [159.0, 254.0, 226.0, 258.0, 347.0, 137.0, 849.0, 67.0, 103.0, 184.0]
2025-08-07 11:16:11,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 24 minutes, 29 seconds)
2025-08-07 11:17:43,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:45,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 474.75751 ± 201.346
2025-08-07 11:17:45,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [535.8909, 633.59827, 701.0731, 265.2169, 326.97098, 217.11382, 681.3625, 576.07605, 660.5197, 149.75314]
2025-08-07 11:17:45,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [193.0, 209.0, 253.0, 125.0, 166.0, 109.0, 258.0, 235.0, 259.0, 76.0]
2025-08-07 11:17:45,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 23 minutes, 4 seconds)
2025-08-07 11:19:17,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:19:21,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 708.51776 ± 496.081
2025-08-07 11:19:21,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [507.06308, 232.79338, 112.92308, 1100.6459, 1742.5376, 479.92178, 740.62604, 452.4933, 382.2749, 1333.8983]
2025-08-07 11:19:21,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [207.0, 118.0, 66.0, 430.0, 640.0, 194.0, 285.0, 186.0, 166.0, 513.0]
2025-08-07 11:19:21,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 21 minutes, 23 seconds)
2025-08-07 11:20:50,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:53,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 570.19971 ± 433.600
2025-08-07 11:20:53,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [791.5868, 812.59906, 1643.2257, 777.7499, 214.8243, 140.95274, 182.211, 399.76462, 429.56552, 309.51736]
2025-08-07 11:20:53,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [286.0, 294.0, 638.0, 334.0, 108.0, 85.0, 91.0, 178.0, 203.0, 147.0]
2025-08-07 11:20:53,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 19 minutes, 52 seconds)
2025-08-07 11:22:26,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:28,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 572.15906 ± 387.808
2025-08-07 11:22:28,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [332.85614, 634.79865, 1286.7522, 845.4661, 288.50967, 109.8717, 157.6398, 849.15826, 213.24342, 1003.2943]
2025-08-07 11:22:28,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [160.0, 258.0, 514.0, 345.0, 127.0, 68.0, 74.0, 318.0, 114.0, 365.0]
2025-08-07 11:22:28,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 18 minutes, 41 seconds)
2025-08-07 11:24:00,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:02,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 350.41077 ± 275.365
2025-08-07 11:24:02,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [188.43251, 786.9578, 149.6129, 173.30022, 469.07092, 13.540764, 836.0393, 157.68204, 179.2521, 550.21906]
2025-08-07 11:24:02,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 327.0, 72.0, 83.0, 187.0, 17.0, 313.0, 104.0, 90.0, 208.0]
2025-08-07 11:24:02,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 16 minutes, 52 seconds)
2025-08-07 11:25:32,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:25:34,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 620.40308 ± 315.655
2025-08-07 11:25:34,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [451.8372, 1019.122, 490.55594, 944.15576, 613.0689, 65.13664, 1011.26666, 721.3649, 720.8606, 166.66248]
2025-08-07 11:25:34,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [185.0, 411.0, 204.0, 338.0, 240.0, 42.0, 378.0, 272.0, 271.0, 89.0]
2025-08-07 11:25:34,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 15 minutes, 7 seconds)
2025-08-07 11:27:05,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:27:08,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 537.14728 ± 236.864
2025-08-07 11:27:08,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [836.1155, 566.8109, 559.88293, 785.28046, 385.05865, 125.24187, 823.80115, 661.13104, 411.12122, 217.02943]
2025-08-07 11:27:08,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [291.0, 219.0, 214.0, 323.0, 163.0, 79.0, 309.0, 245.0, 177.0, 104.0]
2025-08-07 11:27:08,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 13 minutes, 9 seconds)
2025-08-07 11:28:41,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:43,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 454.90176 ± 288.641
2025-08-07 11:28:43,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [199.223, 610.88434, 13.740656, 1018.8638, 421.35156, 792.7473, 179.0149, 354.62674, 360.67746, 597.888]
2025-08-07 11:28:43,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [98.0, 252.0, 15.0, 370.0, 177.0, 322.0, 96.0, 148.0, 183.0, 230.0]
2025-08-07 11:28:43,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 12 minutes, 7 seconds)
2025-08-07 11:30:15,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:30:17,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 548.13593 ± 279.309
2025-08-07 11:30:17,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [130.1788, 605.4699, 223.0412, 859.29285, 723.73785, 763.6389, 903.5565, 222.09555, 735.7637, 314.58423]
2025-08-07 11:30:17,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 257.0, 122.0, 332.0, 281.0, 311.0, 322.0, 113.0, 291.0, 140.0]
2025-08-07 11:30:17,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 10 minutes, 17 seconds)
2025-08-07 11:31:45,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:31:48,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 622.95978 ± 437.805
2025-08-07 11:31:48,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [247.6704, 386.02255, 719.11847, 706.34863, 852.06696, 1745.5115, 159.36133, 623.9454, 584.35443, 205.19846]
2025-08-07 11:31:48,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 162.0, 275.0, 277.0, 337.0, 701.0, 85.0, 230.0, 215.0, 103.0]
2025-08-07 11:31:48,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 8 minutes, 25 seconds)
2025-08-07 11:33:21,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:33:24,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 549.57782 ± 547.381
2025-08-07 11:33:24,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [542.21735, 186.60635, 562.7803, 200.3553, 272.12643, 125.35778, 596.751, 322.8758, 577.97784, 2108.7305]
2025-08-07 11:33:24,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [216.0, 96.0, 228.0, 106.0, 129.0, 91.0, 233.0, 155.0, 234.0, 822.0]
2025-08-07 11:33:24,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 7 minutes, 15 seconds)
2025-08-07 11:34:54,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:55,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 310.68793 ± 213.879
2025-08-07 11:34:55,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [124.13851, 188.70291, 758.60474, 144.21619, 188.13846, 384.22726, 382.4364, 618.156, 77.372086, 240.88649]
2025-08-07 11:34:55,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 101.0, 295.0, 84.0, 95.0, 170.0, 174.0, 244.0, 48.0, 121.0]
2025-08-07 11:34:55,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 5 minutes, 30 seconds)
2025-08-07 11:36:28,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:30,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 579.17212 ± 351.217
2025-08-07 11:36:30,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [332.4633, 929.8592, 256.85382, 1041.9906, 296.66794, 212.31183, 885.55164, 1128.5499, 453.42218, 254.05057]
2025-08-07 11:36:30,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 373.0, 117.0, 425.0, 133.0, 101.0, 340.0, 422.0, 182.0, 123.0]
2025-08-07 11:36:30,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 3 minutes, 49 seconds)
2025-08-07 11:38:03,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:38:05,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 438.94635 ± 475.498
2025-08-07 11:38:05,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [154.81677, 788.04285, 370.47702, 113.91169, 392.9659, 277.2168, 115.165665, 336.1816, 1739.1532, 101.53179]
2025-08-07 11:38:05,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 327.0, 159.0, 71.0, 164.0, 126.0, 66.0, 144.0, 648.0, 73.0]
2025-08-07 11:38:05,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 2 minutes, 20 seconds)
2025-08-07 11:39:35,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:38,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 710.06702 ± 530.363
2025-08-07 11:39:38,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [480.14972, 405.92612, 229.95987, 720.02594, 1706.5438, 305.87177, 1758.9392, 667.70996, 443.17267, 382.37143]
2025-08-07 11:39:38,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [190.0, 165.0, 114.0, 268.0, 659.0, 137.0, 688.0, 258.0, 187.0, 159.0]
2025-08-07 11:39:38,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 1 minute, 4 seconds)
2025-08-07 11:41:08,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:11,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 539.83643 ± 562.226
2025-08-07 11:41:11,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [157.09148, 136.39929, 437.81885, 256.09113, 136.76277, 413.63724, 250.93231, 476.15253, 2005.958, 1127.5208]
2025-08-07 11:41:11,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 78.0, 178.0, 122.0, 100.0, 190.0, 122.0, 192.0, 743.0, 432.0]
2025-08-07 11:41:11,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 59 minutes, 9 seconds)
2025-08-07 11:42:42,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:42:44,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 479.90659 ± 401.626
2025-08-07 11:42:44,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1057.0697, 899.2015, 393.58063, 943.4775, 119.15863, 928.24396, 190.38797, 116.189644, 12.53106, 139.22495]
2025-08-07 11:42:44,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [402.0, 287.0, 160.0, 358.0, 66.0, 303.0, 101.0, 80.0, 16.0, 85.0]
2025-08-07 11:42:44,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 57 minutes, 46 seconds)
2025-08-07 11:44:15,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:44:18,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 614.18176 ± 405.576
2025-08-07 11:44:18,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [779.57446, 372.49258, 513.02185, 756.13196, 246.72043, 220.5435, 912.6873, 1613.6063, 475.70425, 251.33527]
2025-08-07 11:44:18,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [303.0, 168.0, 200.0, 297.0, 112.0, 102.0, 295.0, 583.0, 192.0, 129.0]
2025-08-07 11:44:18,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 56 minutes, 5 seconds)
2025-08-07 11:45:50,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:45:52,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 399.34412 ± 246.491
2025-08-07 11:45:52,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [220.32576, 408.04535, 130.44212, 508.3471, 281.7221, 143.36406, 952.7033, 657.10693, 475.11508, 216.26924]
2025-08-07 11:45:52,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [124.0, 174.0, 80.0, 206.0, 138.0, 75.0, 372.0, 265.0, 197.0, 116.0]
2025-08-07 11:45:52,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 54 minutes, 29 seconds)
2025-08-07 11:47:22,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:26,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 816.11041 ± 723.248
2025-08-07 11:47:26,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [120.37948, 483.1476, 1071.4202, 2670.5303, 183.83101, 758.03094, 414.78232, 1314.4563, 888.83386, 255.69196]
2025-08-07 11:47:26,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 194.0, 410.0, 999.0, 100.0, 251.0, 170.0, 499.0, 306.0, 123.0]
2025-08-07 11:47:26,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (816.11) for latency MM1Queue_a033_s075
2025-08-07 11:47:26,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 52 minutes, 59 seconds)
2025-08-07 11:48:58,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:49:01,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 487.98901 ± 407.840
2025-08-07 11:49:01,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [200.90683, 406.3009, 407.41013, 87.65087, 134.48746, 700.6895, 412.09653, 1441.917, 147.219, 941.21204]
2025-08-07 11:49:01,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 172.0, 171.0, 57.0, 73.0, 285.0, 173.0, 556.0, 75.0, 368.0]
2025-08-07 11:49:01,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 51 minutes, 41 seconds)
2025-08-07 11:50:31,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:50:35,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 814.64130 ± 329.864
2025-08-07 11:50:35,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [656.5803, 806.54016, 193.64403, 1260.3641, 1363.6185, 835.21466, 523.4381, 863.4339, 1033.5265, 610.05286]
2025-08-07 11:50:35,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [264.0, 310.0, 100.0, 475.0, 530.0, 329.0, 185.0, 350.0, 418.0, 211.0]
2025-08-07 11:50:35,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 50 minutes, 14 seconds)
2025-08-07 11:52:07,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:52:09,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 379.34421 ± 288.278
2025-08-07 11:52:09,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [230.95593, 605.4364, 496.89865, 162.15837, 889.3468, 318.0282, 142.66916, 804.5543, 130.62476, 12.769621]
2025-08-07 11:52:09,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [111.0, 239.0, 200.0, 79.0, 309.0, 129.0, 88.0, 317.0, 74.0, 17.0]
2025-08-07 11:52:09,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 48 minutes, 40 seconds)
2025-08-07 11:53:42,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:53:45,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 519.00580 ± 390.057
2025-08-07 11:53:45,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [701.09, 237.59467, 962.02057, 428.26, 184.63002, 484.52634, 282.21454, 1450.1833, 253.52553, 206.01329]
2025-08-07 11:53:45,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [259.0, 111.0, 364.0, 172.0, 92.0, 211.0, 126.0, 549.0, 127.0, 106.0]
2025-08-07 11:53:45,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 47 minutes, 17 seconds)
2025-08-07 11:55:14,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:17,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 580.24414 ± 454.121
2025-08-07 11:55:17,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [965.42804, 878.3281, 240.93053, 111.09332, 189.5998, 526.28485, 340.0261, 738.5735, 1626.9106, 185.26685]
2025-08-07 11:55:17,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [397.0, 308.0, 117.0, 65.0, 99.0, 213.0, 160.0, 307.0, 552.0, 98.0]
2025-08-07 11:55:17,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 45 minutes, 33 seconds)
2025-08-07 11:56:47,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:56:49,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 476.11371 ± 292.018
2025-08-07 11:56:49,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [201.09222, 476.6937, 992.33887, 432.7701, 719.5494, 124.76105, 210.31516, 401.479, 931.66046, 270.4769]
2025-08-07 11:56:49,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 197.0, 319.0, 182.0, 287.0, 85.0, 108.0, 171.0, 316.0, 138.0]
2025-08-07 11:56:49,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 43 minutes, 43 seconds)
2025-08-07 11:58:23,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:58:25,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 596.81421 ± 321.133
2025-08-07 11:58:25,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [156.07819, 979.90094, 656.47894, 708.6112, 836.0623, 442.85153, 148.03983, 239.44283, 700.4763, 1100.1997]
2025-08-07 11:58:25,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 355.0, 245.0, 264.0, 284.0, 185.0, 90.0, 116.0, 265.0, 399.0]
2025-08-07 11:58:25,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 42 minutes, 20 seconds)
2025-08-07 11:59:54,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:59:57,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 668.91620 ± 392.445
2025-08-07 11:59:57,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1124.6843, 1392.3182, 344.493, 541.255, 958.581, 291.97534, 956.3931, 564.3848, 379.0363, 136.0412]
2025-08-07 11:59:57,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [415.0, 470.0, 154.0, 204.0, 384.0, 148.0, 328.0, 213.0, 176.0, 85.0]
2025-08-07 11:59:57,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 40 minutes, 33 seconds)
2025-08-07 12:01:35,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:01:38,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 586.25299 ± 296.504
2025-08-07 12:01:38,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [431.938, 193.58276, 666.98376, 145.67133, 425.79318, 1152.3713, 639.24023, 538.257, 720.1838, 948.5081]
2025-08-07 12:01:38,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [179.0, 104.0, 238.0, 80.0, 188.0, 426.0, 258.0, 214.0, 318.0, 328.0]
2025-08-07 12:01:38,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 39 minutes, 24 seconds)
2025-08-07 12:03:04,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:06,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 463.73785 ± 276.047
2025-08-07 12:03:06,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [323.36017, 422.8116, 293.73376, 574.8853, 114.714745, 353.23392, 184.21413, 928.07776, 457.48648, 984.8604]
2025-08-07 12:03:06,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [138.0, 179.0, 124.0, 220.0, 67.0, 150.0, 101.0, 344.0, 181.0, 367.0]
2025-08-07 12:03:06,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 37 minutes, 31 seconds)
2025-08-07 12:04:39,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:04:43,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 975.62225 ± 554.997
2025-08-07 12:04:43,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [456.67853, 926.207, 2017.4613, 160.51859, 732.73346, 1499.5396, 989.3229, 1611.5935, 409.9626, 952.20496]
2025-08-07 12:04:43,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [183.0, 299.0, 765.0, 84.0, 264.0, 550.0, 328.0, 590.0, 169.0, 329.0]
2025-08-07 12:04:43,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (975.62) for latency MM1Queue_a033_s075
2025-08-07 12:04:43,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 36 minutes, 17 seconds)
2025-08-07 12:06:12,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:06:15,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 551.55261 ± 334.555
2025-08-07 12:06:15,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [921.86597, 452.91248, 1281.2207, 319.17682, 240.4539, 370.1338, 181.10641, 659.92303, 775.91, 312.82336]
2025-08-07 12:06:15,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [300.0, 185.0, 464.0, 144.0, 116.0, 158.0, 94.0, 231.0, 260.0, 147.0]
2025-08-07 12:06:15,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 34 minutes, 24 seconds)
2025-08-07 12:07:46,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:07:48,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 550.62311 ± 554.510
2025-08-07 12:07:48,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [449.15277, 166.67184, 402.4876, 905.7185, 564.7474, 150.70274, 133.60545, 459.01227, 2069.4817, 204.65097]
2025-08-07 12:07:48,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 88.0, 167.0, 341.0, 219.0, 91.0, 78.0, 189.0, 764.0, 110.0]
2025-08-07 12:07:48,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 32 minutes, 59 seconds)
2025-08-07 12:09:19,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:09:22,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 682.16107 ± 541.511
2025-08-07 12:09:22,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [301.8064, 950.32434, 631.9353, 2174.4556, 591.9688, 193.18542, 269.94766, 711.14404, 478.6, 518.24316]
2025-08-07 12:09:22,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [149.0, 311.0, 245.0, 716.0, 237.0, 106.0, 122.0, 275.0, 189.0, 190.0]
2025-08-07 12:09:22,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 30 minutes, 57 seconds)
2025-08-07 12:10:53,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:58,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1012.30090 ± 763.989
2025-08-07 12:10:58,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [242.0304, 1701.1224, 1620.2548, 521.3968, 751.3403, 486.75934, 2780.0374, 874.91455, 195.99733, 949.15533]
2025-08-07 12:10:58,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 630.0, 615.0, 207.0, 283.0, 198.0, 1000.0, 327.0, 98.0, 305.0]
2025-08-07 12:10:58,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (1012.30) for latency MM1Queue_a033_s075
2025-08-07 12:10:58,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 29 minutes, 52 seconds)
2025-08-07 12:12:30,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:33,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 613.23413 ± 241.253
2025-08-07 12:12:33,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [656.0924, 924.305, 213.6886, 446.1035, 761.5867, 954.263, 668.43085, 359.32657, 782.83453, 365.71045]
2025-08-07 12:12:33,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [245.0, 279.0, 109.0, 198.0, 267.0, 316.0, 222.0, 152.0, 260.0, 172.0]
2025-08-07 12:12:33,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 28 minutes, 11 seconds)
2025-08-07 12:14:03,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:06,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 652.90344 ± 489.375
2025-08-07 12:14:06,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1256.5048, 612.9997, 787.46765, 147.4153, 477.34842, 99.70373, 303.79282, 1381.3359, 1345.1182, 117.34771]
2025-08-07 12:14:06,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [451.0, 234.0, 305.0, 95.0, 187.0, 58.0, 159.0, 499.0, 507.0, 65.0]
2025-08-07 12:14:06,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 26 minutes, 41 seconds)
2025-08-07 12:15:37,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:15:40,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 615.84393 ± 435.017
2025-08-07 12:15:40,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1470.7471, 535.50134, 11.4987545, 1135.8967, 300.48224, 134.89648, 977.2474, 388.88315, 546.3793, 656.9066]
2025-08-07 12:15:40,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [515.0, 221.0, 16.0, 432.0, 133.0, 71.0, 308.0, 160.0, 217.0, 239.0]
2025-08-07 12:15:40,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 9 seconds)
2025-08-07 12:17:13,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:17:16,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 689.29944 ± 446.594
2025-08-07 12:17:16,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [248.52356, 992.1346, 340.81152, 1653.4347, 891.6405, 170.0272, 282.29715, 443.80948, 941.3083, 929.00757]
2025-08-07 12:17:16,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 348.0, 156.0, 589.0, 305.0, 101.0, 132.0, 170.0, 305.0, 293.0]
2025-08-07 12:17:16,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 23 minutes, 41 seconds)
2025-08-07 12:18:47,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:18:51,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 875.45685 ± 470.708
2025-08-07 12:18:51,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [443.45465, 443.19806, 1363.6667, 770.6476, 530.28687, 189.0288, 1364.5653, 1392.003, 699.8768, 1557.8403]
2025-08-07 12:18:51,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 196.0, 510.0, 298.0, 210.0, 103.0, 451.0, 527.0, 266.0, 554.0]
2025-08-07 12:18:51,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 4 seconds)
2025-08-07 12:20:23,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:20:26,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 943.59900 ± 549.355
2025-08-07 12:20:26,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [768.481, 633.4137, 2386.0117, 1010.06903, 1049.8329, 962.75385, 118.36413, 791.82416, 1052.0087, 663.23145]
2025-08-07 12:20:26,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [276.0, 230.0, 850.0, 315.0, 353.0, 323.0, 72.0, 303.0, 338.0, 241.0]
2025-08-07 12:20:26,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 20 minutes, 31 seconds)
2025-08-07 12:21:58,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:22:00,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 611.84552 ± 373.239
2025-08-07 12:22:00,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1153.4589, 170.9158, 472.4811, 367.03714, 1097.9276, 645.00964, 144.85292, 626.1195, 303.93613, 1136.7166]
2025-08-07 12:22:00,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [433.0, 85.0, 206.0, 167.0, 386.0, 257.0, 85.0, 243.0, 135.0, 437.0]
2025-08-07 12:22:00,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 18 minutes, 59 seconds)
2025-08-07 12:23:31,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:23:34,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 539.81702 ± 309.084
2025-08-07 12:23:34,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [923.07135, 517.1088, 828.60803, 150.47177, 210.21046, 217.07135, 213.59848, 895.3047, 896.6191, 546.1061]
2025-08-07 12:23:34,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [316.0, 209.0, 323.0, 87.0, 113.0, 99.0, 101.0, 289.0, 318.0, 211.0]
2025-08-07 12:23:34,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 22 seconds)
2025-08-07 12:25:06,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:25:10,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 809.93463 ± 566.567
2025-08-07 12:25:10,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [541.1362, 93.398155, 192.41821, 1092.5737, 328.12845, 375.44736, 1567.288, 904.1333, 1819.1708, 1185.6517]
2025-08-07 12:25:10,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 58.0, 101.0, 406.0, 135.0, 171.0, 576.0, 293.0, 658.0, 468.0]
2025-08-07 12:25:10,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 15 minutes, 47 seconds)
2025-08-07 12:26:39,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:26:41,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 696.43109 ± 484.354
2025-08-07 12:26:41,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1733.2479, 130.57065, 945.94403, 215.53973, 586.1368, 996.85535, 286.27737, 1172.6144, 584.59467, 312.5301]
2025-08-07 12:26:41,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [549.0, 79.0, 313.0, 121.0, 235.0, 361.0, 132.0, 396.0, 236.0, 137.0]
2025-08-07 12:26:41,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 7 seconds)
2025-08-07 12:28:14,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:28:17,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 825.05334 ± 696.670
2025-08-07 12:28:17,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2333.6824, 1244.0804, 1417.2982, 418.43152, 130.99878, 312.65488, 433.81607, 241.61852, 1427.2273, 290.72583]
2025-08-07 12:28:17,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [838.0, 465.0, 526.0, 169.0, 78.0, 158.0, 187.0, 123.0, 514.0, 132.0]
2025-08-07 12:28:17,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 12 minutes, 33 seconds)
2025-08-07 12:29:50,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:29:55,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1106.34705 ± 490.328
2025-08-07 12:29:55,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [691.0791, 1491.9279, 642.25006, 2056.045, 1002.23956, 1110.7864, 1064.0363, 229.38994, 1274.7565, 1500.9596]
2025-08-07 12:29:55,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [275.0, 552.0, 237.0, 739.0, 376.0, 413.0, 397.0, 113.0, 476.0, 480.0]
2025-08-07 12:29:55,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (1106.35) for latency MM1Queue_a033_s075
2025-08-07 12:29:55,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 4 seconds)
2025-08-07 12:31:26,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:31:30,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1046.90955 ± 737.857
2025-08-07 12:31:30,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [143.01671, 940.8671, 1062.5159, 1322.3364, 226.75815, 976.75104, 429.45157, 1506.1156, 1006.43964, 2854.844]
2025-08-07 12:31:30,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 326.0, 339.0, 482.0, 108.0, 364.0, 175.0, 479.0, 364.0, 1000.0]
2025-08-07 12:31:30,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 31 seconds)
2025-08-07 12:33:00,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:33:04,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 925.72382 ± 725.679
2025-08-07 12:33:04,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1395.8878, 149.86351, 374.29715, 190.82138, 893.4957, 252.32526, 1328.4697, 2635.005, 871.30115, 1165.7709]
2025-08-07 12:33:04,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [496.0, 83.0, 160.0, 96.0, 353.0, 124.0, 478.0, 957.0, 326.0, 383.0]
2025-08-07 12:33:04,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 7 minutes, 54 seconds)
2025-08-07 12:34:36,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:34:40,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 978.76172 ± 601.522
2025-08-07 12:34:40,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [140.38837, 390.56982, 790.2389, 797.4521, 913.22833, 1201.2653, 1250.3828, 1317.0148, 563.74084, 2423.336]
2025-08-07 12:34:40,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 167.0, 301.0, 268.0, 297.0, 373.0, 410.0, 490.0, 212.0, 818.0]
2025-08-07 12:34:40,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 22 seconds)
2025-08-07 12:36:11,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:36:14,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 623.06335 ± 390.154
2025-08-07 12:36:14,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [394.2335, 507.4561, 133.33473, 963.88434, 599.0118, 657.2807, 550.6656, 127.25981, 764.7184, 1532.7888]
2025-08-07 12:36:14,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [173.0, 190.0, 78.0, 359.0, 240.0, 247.0, 213.0, 76.0, 297.0, 559.0]
2025-08-07 12:36:14,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 45 seconds)
2025-08-07 12:37:52,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:37:55,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 662.05969 ± 336.695
2025-08-07 12:37:55,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [899.45123, 1029.5133, 261.27066, 235.89676, 849.33887, 148.26373, 1077.0315, 651.9227, 977.1747, 490.73358]
2025-08-07 12:37:55,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [351.0, 335.0, 116.0, 135.0, 325.0, 89.0, 416.0, 227.0, 379.0, 198.0]
2025-08-07 12:37:55,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 12 seconds)
2025-08-07 12:39:20,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:39:22,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 582.94440 ± 382.711
2025-08-07 12:39:22,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [640.2095, 408.1236, 457.4698, 579.48785, 525.9029, 124.04954, 105.58176, 459.05948, 1145.9934, 1383.5659]
2025-08-07 12:39:22,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [242.0, 169.0, 195.0, 230.0, 208.0, 71.0, 63.0, 201.0, 364.0, 497.0]
2025-08-07 12:39:22,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 34 seconds)
2025-08-07 12:40:58,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:41:01,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 591.23413 ± 288.919
2025-08-07 12:41:01,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [202.4853, 538.3556, 495.01752, 168.64346, 498.35275, 1177.4025, 912.5927, 497.9926, 721.38605, 700.11224]
2025-08-07 12:41:01,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 211.0, 205.0, 94.0, 199.0, 441.0, 358.0, 229.0, 279.0, 260.0]
2025-08-07 12:41:01,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1251 [DEBUG]: Training session finished
