2025-08-07 10:07:34,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc15-hopper/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:07:34,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc15-hopper/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:07:34,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14dddf047a10>}
2025-08-07 10:07:34,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 10:07:34,493 baseline-bpql-noiseperc15-hopper:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:07:34,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1133 [INFO]: Creating new trainer
2025-08-07 10:07:34,510 baseline-bpql-noiseperc15-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=59, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 10:07:34,510 baseline-bpql-noiseperc15-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:07:35,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 10:07:35,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 10:09:02,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:09:02,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 50.96644 ± 19.360
2025-08-07 10:09:02,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [60.990692, 55.376587, 60.502556, 56.238865, 55.644447, 73.4576, 61.2896, 15.349033, 59.098732, 11.716255]
2025-08-07 10:09:02,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [51.0, 46.0, 48.0, 42.0, 51.0, 64.0, 51.0, 17.0, 44.0, 15.0]
2025-08-07 10:09:02,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (50.97) for latency MM1Queue_a033_s075
2025-08-07 10:09:02,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 24 minutes, 17 seconds)
2025-08-07 10:10:36,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:10:37,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 80.71539 ± 67.549
2025-08-07 10:10:37,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [84.37462, 269.51233, 62.75606, 92.17517, 33.3097, 17.548655, 80.67637, 44.22293, 85.71523, 36.862907]
2025-08-07 10:10:37,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 161.0, 52.0, 55.0, 29.0, 18.0, 59.0, 40.0, 67.0, 29.0]
2025-08-07 10:10:37,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (80.72) for latency MM1Queue_a033_s075
2025-08-07 10:10:37,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 28 minutes, 33 seconds)
2025-08-07 10:12:11,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:12:11,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 72.79060 ± 39.849
2025-08-07 10:12:11,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [53.448734, 62.46068, 14.374544, 68.83826, 68.96388, 104.800186, 105.79268, 17.541643, 76.5813, 155.1042]
2025-08-07 10:12:11,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [44.0, 50.0, 17.0, 50.0, 46.0, 64.0, 69.0, 20.0, 51.0, 98.0]
2025-08-07 10:12:11,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 28 minutes, 52 seconds)
2025-08-07 10:13:46,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:13:46,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 69.56895 ± 51.332
2025-08-07 10:13:46,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [63.204647, 24.696539, 134.58961, 30.897339, 25.699135, 159.98653, 138.78484, 51.757072, 51.688545, 14.385195]
2025-08-07 10:13:46,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [53.0, 23.0, 86.0, 31.0, 26.0, 101.0, 107.0, 51.0, 44.0, 17.0]
2025-08-07 10:13:46,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 28 minutes, 37 seconds)
2025-08-07 10:15:21,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:22,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 82.50931 ± 48.639
2025-08-07 10:15:22,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [145.15054, 14.878597, 72.25699, 56.321827, 12.7959585, 52.800327, 95.10524, 130.77219, 84.17408, 160.83736]
2025-08-07 10:15:22,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 16.0, 64.0, 48.0, 17.0, 40.0, 77.0, 104.0, 67.0, 84.0]
2025-08-07 10:15:22,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (82.51) for latency MM1Queue_a033_s075
2025-08-07 10:15:22,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 27 minutes, 57 seconds)
2025-08-07 10:16:57,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:16:58,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 87.38869 ± 50.143
2025-08-07 10:16:58,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [213.7274, 79.84125, 57.85096, 115.44042, 58.696053, 15.349886, 87.234764, 79.70835, 109.05272, 56.984962]
2025-08-07 10:16:58,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 69.0, 49.0, 95.0, 47.0, 17.0, 83.0, 66.0, 61.0, 43.0]
2025-08-07 10:16:58,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (87.39) for latency MM1Queue_a033_s075
2025-08-07 10:16:58,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 28 minutes, 53 seconds)
2025-08-07 10:18:34,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:18:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 78.47835 ± 41.628
2025-08-07 10:18:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [93.85651, 163.1993, 100.48084, 78.53874, 8.997848, 121.99823, 44.278156, 67.55765, 45.33053, 60.5457]
2025-08-07 10:18:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 129.0, 86.0, 76.0, 13.0, 89.0, 36.0, 56.0, 39.0, 41.0]
2025-08-07 10:18:34,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 28 minutes)
2025-08-07 10:20:08,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:09,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 107.16481 ± 88.325
2025-08-07 10:20:09,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [168.11446, 12.947959, 175.10648, 61.945774, 51.42738, 143.04288, 70.00369, 311.94376, 9.94384, 67.17184]
2025-08-07 10:20:09,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [109.0, 15.0, 91.0, 60.0, 42.0, 80.0, 42.0, 152.0, 13.0, 48.0]
2025-08-07 10:20:09,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (107.16) for latency MM1Queue_a033_s075
2025-08-07 10:20:09,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 26 minutes, 32 seconds)
2025-08-07 10:21:44,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:21:45,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 107.80031 ± 71.153
2025-08-07 10:21:45,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [90.18354, 12.299031, 243.25961, 133.51675, 85.63452, 41.674606, 80.736984, 101.19662, 227.86221, 61.639202]
2025-08-07 10:21:45,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [50.0, 14.0, 121.0, 93.0, 70.0, 35.0, 58.0, 75.0, 151.0, 48.0]
2025-08-07 10:21:45,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (107.80) for latency MM1Queue_a033_s075
2025-08-07 10:21:45,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 25 minutes, 8 seconds)
2025-08-07 10:23:20,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:23:20,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 118.83189 ± 102.961
2025-08-07 10:23:20,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [129.26866, 94.6081, 280.02228, 13.607879, 16.314625, 15.865591, 87.59306, 309.96072, 189.94711, 51.1308]
2025-08-07 10:23:20,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 76.0, 125.0, 17.0, 17.0, 17.0, 66.0, 182.0, 101.0, 45.0]
2025-08-07 10:23:20,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (118.83) for latency MM1Queue_a033_s075
2025-08-07 10:23:20,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 23 minutes, 28 seconds)
2025-08-07 10:24:55,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:56,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 153.93347 ± 51.058
2025-08-07 10:24:56,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [190.31024, 230.77014, 124.43796, 165.23972, 45.120857, 225.39594, 139.21889, 141.79236, 133.14062, 143.908]
2025-08-07 10:24:56,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [121.0, 149.0, 80.0, 98.0, 37.0, 132.0, 95.0, 80.0, 77.0, 85.0]
2025-08-07 10:24:56,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (153.93) for latency MM1Queue_a033_s075
2025-08-07 10:24:56,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 22 minutes, 2 seconds)
2025-08-07 10:26:32,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:26:33,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 82.64449 ± 53.054
2025-08-07 10:26:33,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [13.854445, 98.52747, 50.064667, 10.118776, 66.6983, 129.74152, 115.25819, 68.1292, 196.94385, 77.1085]
2025-08-07 10:26:33,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 62.0, 44.0, 14.0, 46.0, 93.0, 69.0, 45.0, 113.0, 72.0]
2025-08-07 10:26:33,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 20 minutes, 24 seconds)
2025-08-07 10:28:08,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:09,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 123.56356 ± 81.494
2025-08-07 10:28:09,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [279.48624, 93.07031, 177.43718, 153.45442, 75.28982, 88.636024, 133.23131, 213.98056, 10.880955, 10.168763]
2025-08-07 10:28:09,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [181.0, 60.0, 118.0, 116.0, 65.0, 53.0, 74.0, 155.0, 21.0, 12.0]
2025-08-07 10:28:09,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 19 minutes, 4 seconds)
2025-08-07 10:29:44,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:29:45,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 139.41309 ± 81.095
2025-08-07 10:29:45,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [329.8396, 155.79169, 65.95278, 100.49919, 14.389431, 98.14692, 107.53291, 189.9182, 167.53308, 164.5271]
2025-08-07 10:29:45,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 94.0, 64.0, 73.0, 15.0, 70.0, 81.0, 111.0, 117.0, 105.0]
2025-08-07 10:29:45,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 17 minutes, 41 seconds)
2025-08-07 10:31:20,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:31:21,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 121.57220 ± 75.131
2025-08-07 10:31:21,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [123.02661, 122.26239, 171.16484, 158.21, 276.3289, 15.966974, 66.8125, 107.30615, 9.890621, 164.7531]
2025-08-07 10:31:21,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 99.0, 105.0, 80.0, 189.0, 18.0, 42.0, 79.0, 14.0, 115.0]
2025-08-07 10:31:21,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 16 minutes, 15 seconds)
2025-08-07 10:32:57,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:32:58,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 118.18001 ± 90.755
2025-08-07 10:32:58,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [94.11201, 220.33775, 55.428596, 48.07113, 236.48474, 11.594508, 277.39362, 107.25017, 120.94535, 10.182199]
2025-08-07 10:32:58,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [57.0, 155.0, 44.0, 33.0, 159.0, 16.0, 141.0, 80.0, 76.0, 16.0]
2025-08-07 10:32:58,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 14 minutes, 49 seconds)
2025-08-07 10:34:32,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:34:33,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 172.10670 ± 95.737
2025-08-07 10:34:33,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [64.94573, 284.88474, 11.933945, 152.63553, 319.4064, 289.3556, 126.17403, 116.364845, 159.05046, 196.31581]
2025-08-07 10:34:33,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 134.0, 16.0, 106.0, 151.0, 137.0, 68.0, 73.0, 115.0, 128.0]
2025-08-07 10:34:33,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (172.11) for latency MM1Queue_a033_s075
2025-08-07 10:34:33,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 12 minutes, 54 seconds)
2025-08-07 10:36:08,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:36:09,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 149.44699 ± 154.493
2025-08-07 10:36:09,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [180.38754, 107.56434, 118.53463, 158.04524, 69.240326, 165.25212, 88.91692, 581.8677, 11.039994, 13.6210575]
2025-08-07 10:36:09,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 58.0, 77.0, 93.0, 53.0, 109.0, 83.0, 218.0, 13.0, 16.0]
2025-08-07 10:36:09,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 11 minutes, 22 seconds)
2025-08-07 10:37:44,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:46,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 166.44839 ± 78.885
2025-08-07 10:37:46,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [278.6352, 110.22214, 114.00595, 239.68573, 78.3926, 93.36435, 126.89333, 319.0574, 150.16113, 154.06602]
2025-08-07 10:37:46,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [140.0, 77.0, 80.0, 112.0, 59.0, 61.0, 83.0, 168.0, 96.0, 89.0]
2025-08-07 10:37:46,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 9 minutes, 40 seconds)
2025-08-07 10:39:21,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:22,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 106.16543 ± 79.946
2025-08-07 10:39:22,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [76.59915, 282.31174, 67.33887, 74.025635, 112.76352, 68.44349, 168.83202, 14.575414, 186.60716, 10.15732]
2025-08-07 10:39:22,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [44.0, 139.0, 49.0, 46.0, 75.0, 43.0, 117.0, 16.0, 109.0, 13.0]
2025-08-07 10:39:22,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 8 minutes, 11 seconds)
2025-08-07 10:40:57,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:40:58,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 128.80461 ± 152.202
2025-08-07 10:40:58,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [163.69131, 85.524994, 7.4023266, 121.381996, 55.111942, 120.59323, 158.93379, 11.463473, 11.162499, 552.78064]
2025-08-07 10:40:58,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 61.0, 12.0, 86.0, 49.0, 90.0, 101.0, 17.0, 15.0, 277.0]
2025-08-07 10:40:58,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 6 minutes, 31 seconds)
2025-08-07 10:42:34,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:42:35,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 155.24239 ± 144.165
2025-08-07 10:42:35,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [71.25803, 432.41602, 160.60283, 425.21133, 144.80681, 73.040565, 104.727234, 15.539607, 111.778854, 13.042553]
2025-08-07 10:42:35,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 190.0, 109.0, 204.0, 78.0, 68.0, 72.0, 16.0, 83.0, 16.0]
2025-08-07 10:42:35,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 5 minutes, 12 seconds)
2025-08-07 10:44:10,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:12,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 185.45572 ± 126.125
2025-08-07 10:44:12,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [64.88026, 361.1882, 10.627996, 270.12433, 227.65837, 15.425527, 293.31802, 86.51765, 339.74728, 185.06943]
2025-08-07 10:44:12,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [48.0, 195.0, 13.0, 169.0, 173.0, 18.0, 168.0, 71.0, 182.0, 94.0]
2025-08-07 10:44:12,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (185.46) for latency MM1Queue_a033_s075
2025-08-07 10:44:12,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 3 minutes, 49 seconds)
2025-08-07 10:45:48,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:45:49,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 159.28099 ± 122.221
2025-08-07 10:45:49,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [422.58755, 10.800715, 145.57211, 15.572434, 163.81035, 147.06773, 82.99859, 100.60887, 179.17654, 324.61514]
2025-08-07 10:45:49,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [189.0, 13.0, 90.0, 16.0, 95.0, 97.0, 51.0, 60.0, 111.0, 154.0]
2025-08-07 10:45:49,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 2 minutes, 29 seconds)
2025-08-07 10:47:24,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:24,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 137.81216 ± 101.060
2025-08-07 10:47:24,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [84.031136, 198.04419, 63.725925, 346.77768, 215.02505, 13.693579, 13.113641, 69.87961, 194.02725, 179.8037]
2025-08-07 10:47:24,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 119.0, 40.0, 181.0, 118.0, 16.0, 16.0, 46.0, 116.0, 92.0]
2025-08-07 10:47:24,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 37 seconds)
2025-08-07 10:48:59,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:01,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 181.49635 ± 92.342
2025-08-07 10:49:01,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [151.50993, 250.50034, 272.52267, 242.08128, 83.42573, 77.047264, 102.77131, 331.81912, 242.60657, 60.679348]
2025-08-07 10:49:01,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 145.0, 166.0, 132.0, 72.0, 54.0, 65.0, 149.0, 142.0, 36.0]
2025-08-07 10:49:01,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 58 minutes, 59 seconds)
2025-08-07 10:50:37,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:50:38,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 153.06410 ± 52.826
2025-08-07 10:50:38,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [250.02367, 204.43343, 61.536385, 100.91334, 207.02713, 115.50337, 155.70311, 143.62389, 134.80153, 157.07506]
2025-08-07 10:50:38,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 114.0, 47.0, 76.0, 127.0, 72.0, 90.0, 89.0, 90.0, 106.0]
2025-08-07 10:50:38,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 57 minutes, 32 seconds)
2025-08-07 10:52:13,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:52:14,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 116.47099 ± 63.523
2025-08-07 10:52:14,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [115.38166, 143.09344, 13.658848, 200.37233, 15.900928, 103.93987, 151.36298, 111.30304, 216.37503, 93.321785]
2025-08-07 10:52:14,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 84.0, 15.0, 100.0, 17.0, 61.0, 77.0, 78.0, 138.0, 58.0]
2025-08-07 10:52:14,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 55 minutes, 44 seconds)
2025-08-07 10:53:49,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:51,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 186.03076 ± 109.310
2025-08-07 10:53:51,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [11.937288, 302.23007, 133.0735, 54.07363, 353.19937, 284.84253, 237.56985, 94.534294, 251.78249, 137.06467]
2025-08-07 10:53:51,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 138.0, 102.0, 38.0, 179.0, 145.0, 134.0, 61.0, 153.0, 89.0]
2025-08-07 10:53:51,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (186.03) for latency MM1Queue_a033_s075
2025-08-07 10:53:51,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 53 minutes, 57 seconds)
2025-08-07 10:55:26,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:55:27,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 142.88602 ± 113.900
2025-08-07 10:55:27,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [88.2817, 10.916019, 15.796342, 170.10913, 148.87799, 174.83516, 425.0312, 112.79058, 67.39879, 214.82307]
2025-08-07 10:55:27,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 14.0, 17.0, 117.0, 86.0, 115.0, 179.0, 76.0, 45.0, 125.0]
2025-08-07 10:55:27,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 52 minutes, 35 seconds)
2025-08-07 10:57:02,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:57:04,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 218.32027 ± 142.287
2025-08-07 10:57:04,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [183.32677, 100.24596, 473.01736, 12.253643, 72.51949, 168.54832, 230.14095, 301.64816, 439.3954, 202.1066]
2025-08-07 10:57:04,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 55.0, 236.0, 16.0, 59.0, 110.0, 125.0, 141.0, 197.0, 116.0]
2025-08-07 10:57:04,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (218.32) for latency MM1Queue_a033_s075
2025-08-07 10:57:04,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 51 minutes, 3 seconds)
2025-08-07 10:58:38,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:39,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 152.94046 ± 55.337
2025-08-07 10:58:39,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [117.756294, 191.77983, 88.96067, 207.75223, 237.32085, 166.1139, 152.87903, 58.184494, 106.88501, 201.77238]
2025-08-07 10:58:39,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 124.0, 54.0, 130.0, 116.0, 118.0, 104.0, 39.0, 65.0, 105.0]
2025-08-07 10:58:39,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 49 minutes, 8 seconds)
2025-08-07 11:00:15,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:16,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 196.03336 ± 118.512
2025-08-07 11:00:16,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [326.56485, 507.06195, 134.0072, 126.41474, 138.11623, 171.5072, 124.87149, 167.25749, 131.95091, 132.58139]
2025-08-07 11:00:16,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [185.0, 211.0, 70.0, 92.0, 85.0, 106.0, 76.0, 88.0, 92.0, 84.0]
2025-08-07 11:00:16,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 47 minutes, 35 seconds)
2025-08-07 11:01:50,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:51,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 185.14172 ± 105.316
2025-08-07 11:01:51,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [14.81208, 255.21123, 273.60248, 394.13272, 202.83223, 109.77149, 148.58534, 238.58328, 146.14368, 67.74262]
2025-08-07 11:01:51,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 130.0, 158.0, 174.0, 114.0, 78.0, 95.0, 135.0, 81.0, 42.0]
2025-08-07 11:01:51,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 45 minutes, 42 seconds)
2025-08-07 11:03:25,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:27,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 230.91861 ± 129.345
2025-08-07 11:03:27,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [174.94359, 508.99066, 217.26741, 96.93437, 318.76697, 317.82333, 162.62186, 268.2508, 230.33188, 13.255229]
2025-08-07 11:03:27,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 232.0, 115.0, 66.0, 180.0, 153.0, 89.0, 163.0, 115.0, 18.0]
2025-08-07 11:03:27,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (230.92) for latency MM1Queue_a033_s075
2025-08-07 11:03:27,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 43 minutes, 56 seconds)
2025-08-07 11:05:01,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:02,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 161.37448 ± 102.498
2025-08-07 11:05:02,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [229.86052, 10.07916, 172.10092, 191.1913, 365.0287, 233.68207, 14.994534, 78.64367, 131.80504, 186.35893]
2025-08-07 11:05:02,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 17.0, 96.0, 102.0, 164.0, 124.0, 16.0, 55.0, 93.0, 104.0]
2025-08-07 11:05:02,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 41 minutes, 58 seconds)
2025-08-07 11:06:36,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:06:37,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 196.86270 ± 112.319
2025-08-07 11:06:37,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [123.63369, 201.55893, 113.75158, 376.44308, 80.18717, 191.73767, 332.62558, 324.46274, 209.31436, 14.9121895]
2025-08-07 11:06:37,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 101.0, 75.0, 182.0, 49.0, 90.0, 162.0, 165.0, 119.0, 16.0]
2025-08-07 11:06:37,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 40 minutes, 16 seconds)
2025-08-07 11:08:12,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:13,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 127.15804 ± 30.947
2025-08-07 11:08:13,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [103.94753, 110.486145, 162.3122, 112.22368, 64.758095, 134.09738, 156.81067, 171.85712, 143.0696, 112.01803]
2025-08-07 11:08:13,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 67.0, 95.0, 68.0, 48.0, 87.0, 108.0, 103.0, 101.0, 94.0]
2025-08-07 11:08:13,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 38 minutes, 38 seconds)
2025-08-07 11:09:46,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:09:48,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 222.87613 ± 184.405
2025-08-07 11:09:48,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [163.88911, 10.237491, 216.42365, 288.3707, 151.71632, 297.81314, 683.5272, 298.1005, 9.337239, 109.34583]
2025-08-07 11:09:48,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [109.0, 16.0, 122.0, 143.0, 97.0, 130.0, 306.0, 150.0, 13.0, 70.0]
2025-08-07 11:09:48,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 36 minutes, 53 seconds)
2025-08-07 11:11:21,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:11:22,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 170.92693 ± 151.408
2025-08-07 11:11:22,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [12.641133, 341.478, 303.36472, 133.21565, 9.883933, 119.03158, 11.735151, 163.96774, 126.553925, 487.39746]
2025-08-07 11:11:22,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 139.0, 154.0, 71.0, 16.0, 66.0, 16.0, 91.0, 66.0, 217.0]
2025-08-07 11:11:22,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 35 minutes, 8 seconds)
2025-08-07 11:12:56,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:58,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 247.87740 ± 130.723
2025-08-07 11:12:58,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [202.4446, 385.68622, 154.96097, 271.3344, 420.5519, 491.68896, 94.07765, 138.10301, 149.43819, 170.48808]
2025-08-07 11:12:58,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [109.0, 189.0, 124.0, 150.0, 175.0, 216.0, 57.0, 85.0, 95.0, 94.0]
2025-08-07 11:12:58,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (247.88) for latency MM1Queue_a033_s075
2025-08-07 11:12:58,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 33 minutes, 36 seconds)
2025-08-07 11:14:32,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:33,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 148.98537 ± 113.804
2025-08-07 11:14:33,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [129.88367, 8.736909, 111.43009, 93.63162, 117.012886, 426.1278, 161.40215, 204.78769, 13.449822, 223.39096]
2025-08-07 11:14:33,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 12.0, 60.0, 69.0, 71.0, 207.0, 99.0, 119.0, 16.0, 113.0]
2025-08-07 11:14:33,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 31 minutes, 57 seconds)
2025-08-07 11:16:07,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:16:08,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 232.92390 ± 234.060
2025-08-07 11:16:08,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [354.83517, 273.07013, 172.5423, 877.29974, 202.4575, 10.545601, 123.54358, 122.85408, 79.18821, 112.9028]
2025-08-07 11:16:08,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [174.0, 133.0, 84.0, 340.0, 130.0, 16.0, 75.0, 85.0, 50.0, 80.0]
2025-08-07 11:16:08,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 30 minutes, 19 seconds)
2025-08-07 11:17:42,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:43,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 268.62173 ± 192.150
2025-08-07 11:17:43,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [13.995903, 142.76328, 392.9173, 139.63249, 367.5347, 207.87965, 328.31604, 325.23572, 707.70056, 60.241695]
2025-08-07 11:17:43,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 71.0, 173.0, 85.0, 184.0, 135.0, 163.0, 145.0, 311.0, 39.0]
2025-08-07 11:17:43,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (268.62) for latency MM1Queue_a033_s075
2025-08-07 11:17:43,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 28 minutes, 46 seconds)
2025-08-07 11:19:16,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:19:18,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 245.75261 ± 116.738
2025-08-07 11:19:18,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [173.45097, 408.6846, 473.4418, 147.74837, 200.86089, 127.325356, 157.47998, 320.1916, 304.01486, 144.32797]
2025-08-07 11:19:18,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 188.0, 226.0, 109.0, 93.0, 87.0, 90.0, 166.0, 144.0, 88.0]
2025-08-07 11:19:18,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 27 minutes, 7 seconds)
2025-08-07 11:20:52,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:54,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 187.45946 ± 94.558
2025-08-07 11:20:54,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [181.64275, 135.17322, 12.052312, 399.78595, 213.80106, 246.71011, 115.48833, 164.1542, 216.85904, 188.92776]
2025-08-07 11:20:54,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 82.0, 16.0, 189.0, 109.0, 118.0, 81.0, 93.0, 124.0, 103.0]
2025-08-07 11:20:54,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 25 minutes, 41 seconds)
2025-08-07 11:22:26,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:27,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 208.94022 ± 103.221
2025-08-07 11:22:27,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [294.90295, 249.51942, 14.618706, 205.83797, 83.208664, 325.54404, 271.15912, 191.51265, 113.547775, 339.551]
2025-08-07 11:22:27,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [140.0, 134.0, 15.0, 116.0, 55.0, 185.0, 149.0, 109.0, 75.0, 149.0]
2025-08-07 11:22:27,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 23 minutes, 49 seconds)
2025-08-07 11:24:01,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:02,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 242.54739 ± 171.020
2025-08-07 11:24:02,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [523.26495, 185.87772, 566.8718, 99.78801, 123.95782, 283.11588, 247.29158, 266.8829, 14.377386, 114.04569]
2025-08-07 11:24:02,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [221.0, 97.0, 274.0, 73.0, 89.0, 124.0, 122.0, 122.0, 16.0, 68.0]
2025-08-07 11:24:02,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 22 minutes, 5 seconds)
2025-08-07 11:25:36,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:25:37,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 193.32326 ± 89.659
2025-08-07 11:25:37,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [122.76267, 330.0361, 80.53026, 353.92896, 81.66957, 178.73094, 226.96545, 146.78384, 236.79514, 175.02966]
2025-08-07 11:25:37,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 156.0, 54.0, 162.0, 48.0, 88.0, 124.0, 90.0, 125.0, 118.0]
2025-08-07 11:25:37,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 20 minutes, 31 seconds)
2025-08-07 11:27:10,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:27:11,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 149.47324 ± 84.532
2025-08-07 11:27:11,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [251.49246, 209.3451, 220.66263, 9.980712, 8.075308, 184.61201, 101.0708, 195.75696, 216.51172, 97.22475]
2025-08-07 11:27:11,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [138.0, 104.0, 122.0, 13.0, 12.0, 105.0, 74.0, 105.0, 105.0, 58.0]
2025-08-07 11:27:11,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 18 minutes, 57 seconds)
2025-08-07 11:28:46,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:47,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 161.19391 ± 67.760
2025-08-07 11:28:47,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [185.65909, 102.72873, 202.14659, 108.62973, 8.611529, 141.37167, 242.42973, 182.67146, 224.97238, 212.71815]
2025-08-07 11:28:47,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 57.0, 96.0, 75.0, 14.0, 77.0, 135.0, 92.0, 112.0, 120.0]
2025-08-07 11:28:47,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 17 minutes, 17 seconds)
2025-08-07 11:30:21,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:30:23,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 369.81793 ± 379.729
2025-08-07 11:30:23,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [273.44232, 478.17908, 1386.6538, 634.9581, 201.41347, 148.12167, 155.67688, 11.293479, 133.85088, 274.58966]
2025-08-07 11:30:23,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 208.0, 602.0, 290.0, 111.0, 98.0, 97.0, 16.0, 86.0, 138.0]
2025-08-07 11:30:23,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (369.82) for latency MM1Queue_a033_s075
2025-08-07 11:30:23,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 16 minutes, 7 seconds)
2025-08-07 11:31:56,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:31:57,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 228.85226 ± 137.041
2025-08-07 11:31:57,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [423.77307, 304.8524, 10.202773, 369.99023, 369.40005, 12.316995, 155.96529, 239.0195, 164.41896, 238.5834]
2025-08-07 11:31:57,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 145.0, 14.0, 169.0, 188.0, 17.0, 88.0, 147.0, 94.0, 127.0]
2025-08-07 11:31:57,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 14 minutes, 28 seconds)
2025-08-07 11:33:30,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:33:32,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 220.43221 ± 168.862
2025-08-07 11:33:32,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [284.0829, 538.8483, 128.28377, 515.0427, 73.1724, 14.664891, 151.50711, 98.762634, 199.6648, 200.29259]
2025-08-07 11:33:32,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [154.0, 227.0, 80.0, 241.0, 48.0, 17.0, 85.0, 64.0, 107.0, 102.0]
2025-08-07 11:33:32,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 12 minutes, 48 seconds)
2025-08-07 11:35:05,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:35:07,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 200.48666 ± 176.702
2025-08-07 11:35:07,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [17.039474, 235.398, 658.4763, 100.019554, 103.41135, 300.26065, 149.56985, 76.27148, 280.8228, 83.59722]
2025-08-07 11:35:07,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 132.0, 265.0, 81.0, 72.0, 130.0, 91.0, 51.0, 123.0, 64.0]
2025-08-07 11:35:07,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 11 minutes, 16 seconds)
2025-08-07 11:36:40,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:42,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 246.47102 ± 144.699
2025-08-07 11:36:42,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [256.8267, 77.65008, 164.71327, 162.35129, 245.53023, 194.11179, 135.45975, 229.42879, 390.2543, 608.3839]
2025-08-07 11:36:42,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 45.0, 88.0, 86.0, 116.0, 129.0, 71.0, 114.0, 164.0, 246.0]
2025-08-07 11:36:42,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 9 minutes, 39 seconds)
2025-08-07 11:38:15,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:38:17,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 368.97754 ± 260.148
2025-08-07 11:38:17,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [140.10565, 199.77942, 183.9405, 879.0167, 728.49097, 295.8486, 417.89996, 124.636795, 131.41817, 588.63855]
2025-08-07 11:38:17,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 104.0, 90.0, 367.0, 338.0, 167.0, 203.0, 75.0, 83.0, 265.0]
2025-08-07 11:38:17,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 8 minutes, 1 second)
2025-08-07 11:39:53,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:54,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 187.57802 ± 90.221
2025-08-07 11:39:54,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [362.7769, 201.55655, 163.5632, 233.20517, 128.27927, 253.76033, 162.22603, 243.6172, 11.783049, 115.01244]
2025-08-07 11:39:54,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [177.0, 109.0, 90.0, 118.0, 79.0, 135.0, 92.0, 110.0, 15.0, 74.0]
2025-08-07 11:39:54,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 6 minutes, 43 seconds)
2025-08-07 11:41:26,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:28,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 267.52704 ± 152.222
2025-08-07 11:41:28,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [11.610651, 179.76862, 349.0128, 101.86105, 213.18666, 390.8803, 247.01997, 548.1637, 422.28314, 211.48376]
2025-08-07 11:41:28,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 107.0, 160.0, 59.0, 125.0, 171.0, 141.0, 241.0, 181.0, 126.0]
2025-08-07 11:41:28,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 5 minutes, 5 seconds)
2025-08-07 11:43:01,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:43:03,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 299.91544 ± 206.772
2025-08-07 11:43:03,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [188.03474, 380.6846, 72.26357, 389.22446, 592.57733, 559.4658, 12.421937, 316.49933, 13.227468, 474.75485]
2025-08-07 11:43:03,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 166.0, 50.0, 195.0, 238.0, 228.0, 16.0, 171.0, 16.0, 244.0]
2025-08-07 11:43:03,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 3 minutes, 28 seconds)
2025-08-07 11:44:37,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:44:39,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 323.86768 ± 301.013
2025-08-07 11:44:39,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [12.445067, 565.6947, 1035.0208, 91.77146, 187.40489, 195.27249, 226.32751, 155.40704, 619.7953, 149.53754]
2025-08-07 11:44:39,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 273.0, 462.0, 61.0, 93.0, 113.0, 102.0, 94.0, 275.0, 81.0]
2025-08-07 11:44:39,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 2 minutes, 1 second)
2025-08-07 11:46:12,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:46:13,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 262.12564 ± 151.107
2025-08-07 11:46:13,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [218.12624, 180.48892, 79.465355, 273.56995, 672.46875, 312.92804, 270.8193, 240.40636, 227.02419, 145.95935]
2025-08-07 11:46:13,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [124.0, 125.0, 54.0, 133.0, 281.0, 148.0, 144.0, 136.0, 118.0, 102.0]
2025-08-07 11:46:13,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 18 seconds)
2025-08-07 11:47:47,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:49,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 317.39902 ± 164.446
2025-08-07 11:47:49,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [347.996, 377.56677, 537.03107, 176.87265, 181.16508, 187.51767, 413.9219, 159.19716, 636.81775, 155.90425]
2025-08-07 11:47:49,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [183.0, 145.0, 216.0, 108.0, 101.0, 101.0, 179.0, 84.0, 260.0, 90.0]
2025-08-07 11:47:49,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 58 minutes, 34 seconds)
2025-08-07 11:49:23,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:49:24,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 253.23340 ± 124.550
2025-08-07 11:49:24,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [492.4231, 317.0162, 182.63452, 475.61005, 198.8295, 160.16336, 160.61607, 177.42685, 227.54, 140.07423]
2025-08-07 11:49:24,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [222.0, 136.0, 92.0, 192.0, 104.0, 100.0, 79.0, 89.0, 137.0, 84.0]
2025-08-07 11:49:24,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 57 minutes, 7 seconds)
2025-08-07 11:50:58,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:51:00,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 335.95297 ± 282.214
2025-08-07 11:51:00,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [179.65044, 141.1269, 457.50897, 193.08896, 90.33842, 963.9666, 164.73022, 740.7852, 111.033134, 317.30072]
2025-08-07 11:51:00,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 77.0, 192.0, 119.0, 68.0, 377.0, 88.0, 350.0, 70.0, 135.0]
2025-08-07 11:51:00,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 55 minutes, 38 seconds)
2025-08-07 11:52:35,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:52:37,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 371.59125 ± 189.382
2025-08-07 11:52:37,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [690.5375, 142.4967, 696.4067, 425.2679, 190.44896, 355.604, 216.65729, 278.8094, 482.52066, 237.16309]
2025-08-07 11:52:37,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [294.0, 84.0, 256.0, 206.0, 92.0, 179.0, 142.0, 137.0, 218.0, 111.0]
2025-08-07 11:52:37,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (371.59) for latency MM1Queue_a033_s075
2025-08-07 11:52:37,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 54 minutes, 12 seconds)
2025-08-07 11:54:11,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:54:12,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 258.48859 ± 114.440
2025-08-07 11:54:12,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [162.33371, 192.91525, 323.97607, 199.0386, 163.97676, 236.84193, 270.39832, 133.14601, 379.99, 522.2691]
2025-08-07 11:54:12,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 104.0, 143.0, 120.0, 97.0, 146.0, 139.0, 70.0, 194.0, 208.0]
2025-08-07 11:54:12,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 52 minutes, 38 seconds)
2025-08-07 11:55:45,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:47,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 388.74652 ± 221.623
2025-08-07 11:55:47,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [689.0253, 174.5168, 172.18848, 266.58365, 145.05461, 442.8698, 428.40405, 860.3651, 402.12662, 306.3308]
2025-08-07 11:55:47,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [253.0, 101.0, 94.0, 184.0, 73.0, 219.0, 207.0, 362.0, 173.0, 168.0]
2025-08-07 11:55:47,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (388.75) for latency MM1Queue_a033_s075
2025-08-07 11:55:47,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 50 minutes, 59 seconds)
2025-08-07 11:57:20,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:57:21,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 198.50052 ± 136.967
2025-08-07 11:57:21,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [126.43246, 171.30219, 225.66548, 185.47029, 11.811874, 186.46379, 245.32362, 116.078964, 565.2892, 151.16736]
2025-08-07 11:57:21,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 102.0, 120.0, 103.0, 14.0, 116.0, 109.0, 71.0, 215.0, 83.0]
2025-08-07 11:57:21,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 49 minutes, 18 seconds)
2025-08-07 11:58:56,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:58:57,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 232.78799 ± 173.783
2025-08-07 11:58:57,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [175.86452, 385.90005, 192.04002, 406.20456, 13.146213, 12.5291195, 112.153145, 212.34357, 221.94728, 595.7516]
2025-08-07 11:58:57,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [98.0, 173.0, 108.0, 171.0, 16.0, 15.0, 75.0, 104.0, 120.0, 245.0]
2025-08-07 11:58:57,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 47 minutes, 44 seconds)
2025-08-07 12:00:31,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:00:33,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 218.09538 ± 134.415
2025-08-07 12:00:33,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [212.7861, 432.2306, 481.7811, 117.3503, 12.578794, 133.76672, 149.28278, 212.67699, 192.22534, 236.27509]
2025-08-07 12:00:33,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [123.0, 174.0, 236.0, 91.0, 14.0, 75.0, 89.0, 96.0, 97.0, 124.0]
2025-08-07 12:00:33,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 45 minutes, 57 seconds)
2025-08-07 12:02:05,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:02:07,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 239.79659 ± 204.429
2025-08-07 12:02:07,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [361.18393, 142.66621, 180.96881, 129.64998, 13.59385, 675.99567, 213.76076, 10.09805, 154.55951, 515.489]
2025-08-07 12:02:07,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 79.0, 98.0, 76.0, 17.0, 269.0, 109.0, 16.0, 94.0, 224.0]
2025-08-07 12:02:07,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 44 minutes, 17 seconds)
2025-08-07 12:03:40,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:41,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 309.16766 ± 135.689
2025-08-07 12:03:41,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [201.65797, 239.03516, 384.4688, 183.51054, 206.73915, 153.92696, 358.0523, 590.4371, 481.77112, 292.07773]
2025-08-07 12:03:41,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 167.0, 181.0, 87.0, 126.0, 91.0, 188.0, 242.0, 206.0, 140.0]
2025-08-07 12:03:41,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 42 minutes, 41 seconds)
2025-08-07 12:05:15,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:17,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 251.86369 ± 207.175
2025-08-07 12:05:17,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [549.7767, 13.525836, 162.86588, 133.7123, 13.344716, 239.05627, 166.77475, 492.09888, 139.15543, 608.32623]
2025-08-07 12:05:17,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [255.0, 16.0, 103.0, 82.0, 16.0, 118.0, 105.0, 234.0, 82.0, 209.0]
2025-08-07 12:05:17,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 41 minutes, 12 seconds)
2025-08-07 12:06:50,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:06:52,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 329.27411 ± 142.336
2025-08-07 12:06:52,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [234.2227, 578.2226, 167.92357, 173.26498, 460.33157, 419.76025, 379.93875, 185.97728, 219.75914, 473.34055]
2025-08-07 12:06:52,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 226.0, 93.0, 109.0, 200.0, 178.0, 161.0, 113.0, 120.0, 182.0]
2025-08-07 12:06:52,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 39 minutes, 35 seconds)
2025-08-07 12:08:27,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:08:28,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 300.67709 ± 181.602
2025-08-07 12:08:28,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [207.12485, 152.46959, 687.52924, 396.39755, 100.86915, 505.7178, 169.90163, 184.26126, 418.80313, 183.69681]
2025-08-07 12:08:28,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [111.0, 77.0, 276.0, 174.0, 64.0, 201.0, 93.0, 92.0, 178.0, 99.0]
2025-08-07 12:08:28,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 38 minutes, 3 seconds)
2025-08-07 12:10:03,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:04,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 248.92973 ± 191.001
2025-08-07 12:10:04,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [529.4307, 14.151327, 257.60986, 546.3461, 173.27098, 187.80612, 120.1581, 160.46983, 16.994463, 483.05975]
2025-08-07 12:10:04,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [230.0, 17.0, 117.0, 225.0, 90.0, 106.0, 92.0, 99.0, 17.0, 225.0]
2025-08-07 12:10:04,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 36 minutes, 35 seconds)
2025-08-07 12:11:37,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:11:39,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 397.27267 ± 249.496
2025-08-07 12:11:39,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [244.40106, 424.8343, 161.6324, 158.24664, 203.64444, 679.9872, 367.8568, 642.5562, 181.65485, 907.9128]
2025-08-07 12:11:39,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [144.0, 188.0, 93.0, 97.0, 117.0, 286.0, 161.0, 255.0, 87.0, 351.0]
2025-08-07 12:11:39,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (397.27) for latency MM1Queue_a033_s075
2025-08-07 12:11:39,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 34 minutes, 59 seconds)
2025-08-07 12:13:12,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:13:14,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 333.48355 ± 198.282
2025-08-07 12:13:14,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [147.44614, 123.291336, 465.4811, 667.8384, 205.5077, 314.7813, 125.04682, 682.89856, 281.14117, 321.4032]
2025-08-07 12:13:14,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 77.0, 187.0, 246.0, 97.0, 159.0, 66.0, 281.0, 135.0, 193.0]
2025-08-07 12:13:14,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 33 minutes, 25 seconds)
2025-08-07 12:14:48,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:50,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 385.01028 ± 169.804
2025-08-07 12:14:50,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [664.71686, 361.28073, 486.3317, 199.3985, 170.90192, 289.6979, 267.7628, 654.28467, 486.33453, 269.39328]
2025-08-07 12:14:50,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [289.0, 163.0, 214.0, 107.0, 99.0, 146.0, 134.0, 270.0, 211.0, 137.0]
2025-08-07 12:14:50,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 31 minutes, 51 seconds)
2025-08-07 12:16:23,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:25,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 279.40845 ± 195.037
2025-08-07 12:16:25,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [218.26271, 437.6425, 16.726475, 622.2149, 125.03436, 363.87772, 491.20465, 163.29636, 11.792433, 344.03223]
2025-08-07 12:16:25,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 186.0, 17.0, 250.0, 72.0, 173.0, 222.0, 80.0, 17.0, 163.0]
2025-08-07 12:16:25,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 30 minutes, 9 seconds)
2025-08-07 12:17:59,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:18:00,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 161.45895 ± 109.423
2025-08-07 12:18:00,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [107.103966, 270.93015, 13.231081, 111.24757, 138.75487, 198.65495, 9.620503, 190.28117, 394.1657, 180.59956]
2025-08-07 12:18:00,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 155.0, 16.0, 61.0, 77.0, 108.0, 15.0, 120.0, 181.0, 87.0]
2025-08-07 12:18:00,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 28 minutes, 32 seconds)
2025-08-07 12:19:34,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:19:36,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 315.14658 ± 157.271
2025-08-07 12:19:36,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [433.38028, 627.5585, 229.02982, 197.32822, 167.282, 416.98584, 260.66464, 494.6473, 174.78732, 149.80211]
2025-08-07 12:19:36,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [188.0, 262.0, 117.0, 104.0, 100.0, 183.0, 122.0, 221.0, 112.0, 92.0]
2025-08-07 12:19:36,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 3 seconds)
2025-08-07 12:21:10,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:21:11,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 268.89508 ± 111.271
2025-08-07 12:21:11,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [403.7076, 225.77187, 301.91412, 273.8374, 377.48767, 100.204094, 180.89429, 450.90344, 251.01659, 123.21362]
2025-08-07 12:21:11,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 127.0, 160.0, 142.0, 171.0, 61.0, 109.0, 186.0, 154.0, 81.0]
2025-08-07 12:21:11,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 26 seconds)
2025-08-07 12:22:45,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:22:46,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 240.47290 ± 172.945
2025-08-07 12:22:46,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [80.20534, 13.4223, 205.56772, 395.67596, 630.4189, 216.39499, 200.87962, 381.79095, 106.53718, 173.83624]
2025-08-07 12:22:46,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 17.0, 113.0, 178.0, 238.0, 118.0, 102.0, 180.0, 68.0, 95.0]
2025-08-07 12:22:46,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 23 minutes, 49 seconds)
2025-08-07 12:24:20,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:24:21,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 228.92050 ± 199.196
2025-08-07 12:24:21,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [13.960372, 187.50916, 13.2107315, 190.10114, 245.51149, 116.28308, 516.6692, 104.58535, 664.28455, 237.0901]
2025-08-07 12:24:21,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 101.0, 16.0, 91.0, 138.0, 62.0, 235.0, 59.0, 273.0, 108.0]
2025-08-07 12:24:21,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 13 seconds)
2025-08-07 12:25:54,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:25:56,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 244.51401 ± 144.009
2025-08-07 12:25:56,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [159.39064, 318.89932, 12.206869, 272.50223, 156.58034, 571.5078, 181.23601, 367.26306, 239.14664, 166.40732]
2025-08-07 12:25:56,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 160.0, 15.0, 126.0, 103.0, 239.0, 106.0, 162.0, 113.0, 94.0]
2025-08-07 12:25:56,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 20 minutes, 37 seconds)
2025-08-07 12:27:29,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:27:30,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 220.45584 ± 151.044
2025-08-07 12:27:30,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [438.47586, 114.17391, 99.57818, 16.413929, 172.54514, 180.87575, 387.26477, 489.1481, 146.05762, 160.02516]
2025-08-07 12:27:30,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [184.0, 61.0, 70.0, 18.0, 108.0, 110.0, 165.0, 191.0, 88.0, 85.0]
2025-08-07 12:27:30,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 18 minutes, 58 seconds)
2025-08-07 12:29:05,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:29:06,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 336.39377 ± 243.635
2025-08-07 12:29:06,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [548.77234, 115.49825, 9.844437, 673.86835, 414.11523, 580.7784, 92.124504, 195.91579, 109.434586, 623.5859]
2025-08-07 12:29:06,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [226.0, 73.0, 14.0, 276.0, 177.0, 245.0, 54.0, 95.0, 75.0, 260.0]
2025-08-07 12:29:06,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 25 seconds)
2025-08-07 12:30:40,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:30:41,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 238.49646 ± 159.531
2025-08-07 12:30:41,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [78.04261, 203.67487, 14.391427, 282.64984, 184.03084, 406.44373, 144.46346, 274.23502, 194.96645, 602.0664]
2025-08-07 12:30:41,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [62.0, 107.0, 17.0, 158.0, 103.0, 174.0, 82.0, 135.0, 114.0, 255.0]
2025-08-07 12:30:41,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 15 minutes, 49 seconds)
2025-08-07 12:32:16,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:32:18,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 243.58972 ± 190.791
2025-08-07 12:32:18,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [208.36739, 154.38211, 141.426, 370.18658, 9.830873, 560.61255, 599.511, 204.33064, 94.70473, 92.545296]
2025-08-07 12:32:18,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 88.0, 86.0, 157.0, 13.0, 230.0, 250.0, 114.0, 64.0, 66.0]
2025-08-07 12:32:18,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 18 seconds)
2025-08-07 12:33:50,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:33:52,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 251.99138 ± 194.750
2025-08-07 12:33:52,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [404.2574, 145.86806, 13.870989, 233.3903, 11.598708, 689.29266, 140.78503, 271.84634, 405.70462, 203.29991]
2025-08-07 12:33:52,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [182.0, 76.0, 16.0, 113.0, 16.0, 271.0, 102.0, 115.0, 203.0, 107.0]
2025-08-07 12:33:52,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 12 minutes, 41 seconds)
2025-08-07 12:35:25,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:35:27,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 267.62888 ± 249.578
2025-08-07 12:35:27,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [12.4161415, 239.632, 396.5244, 201.72299, 142.9142, 257.41806, 103.614815, 959.3431, 188.44069, 174.26236]
2025-08-07 12:35:27,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 113.0, 179.0, 112.0, 98.0, 135.0, 73.0, 360.0, 104.0, 95.0]
2025-08-07 12:35:27,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 7 seconds)
2025-08-07 12:37:00,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:37:01,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 192.22987 ± 92.740
2025-08-07 12:37:01,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [166.5415, 240.31033, 392.00333, 108.71666, 208.5242, 10.001453, 163.97949, 194.7754, 234.07318, 203.37318]
2025-08-07 12:37:01,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 130.0, 178.0, 87.0, 125.0, 14.0, 83.0, 95.0, 104.0, 111.0]
2025-08-07 12:37:01,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 29 seconds)
2025-08-07 12:38:36,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:38:38,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 228.56299 ± 234.104
2025-08-07 12:38:38,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [94.85776, 161.84004, 184.0537, 166.3551, 179.32927, 15.803235, 212.75208, 204.7341, 154.76534, 911.1391]
2025-08-07 12:38:38,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 95.0, 112.0, 97.0, 91.0, 18.0, 112.0, 111.0, 84.0, 364.0]
2025-08-07 12:38:38,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 7 minutes, 56 seconds)
2025-08-07 12:40:11,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:40:12,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 210.24014 ± 158.999
2025-08-07 12:40:12,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [355.47226, 11.864553, 191.68979, 174.65584, 130.41963, 142.44838, 142.41916, 623.82513, 166.18042, 163.42616]
2025-08-07 12:40:12,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [161.0, 16.0, 104.0, 101.0, 82.0, 79.0, 87.0, 256.0, 106.0, 89.0]
2025-08-07 12:40:12,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 19 seconds)
2025-08-07 12:41:45,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:41:46,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 258.01126 ± 182.084
2025-08-07 12:41:46,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [246.62843, 625.4473, 252.01347, 325.22717, 126.15173, 231.76917, 63.756847, 535.4653, 71.79207, 101.86105]
2025-08-07 12:41:46,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 244.0, 127.0, 163.0, 71.0, 125.0, 46.0, 198.0, 61.0, 73.0]
2025-08-07 12:41:46,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 44 seconds)
2025-08-07 12:43:20,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:43:21,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 323.82843 ± 163.170
2025-08-07 12:43:21,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [397.15778, 279.54062, 131.60538, 587.4399, 160.81145, 188.28606, 152.57207, 337.77557, 415.4434, 587.652]
2025-08-07 12:43:21,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [183.0, 126.0, 74.0, 231.0, 103.0, 102.0, 89.0, 146.0, 199.0, 234.0]
2025-08-07 12:43:21,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 9 seconds)
2025-08-07 12:44:55,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:44:56,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 233.71909 ± 104.416
2025-08-07 12:44:56,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [225.38417, 288.60886, 440.20813, 249.4928, 108.47397, 228.73294, 171.87376, 191.69962, 359.8062, 72.910286]
2025-08-07 12:44:56,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 129.0, 202.0, 122.0, 80.0, 127.0, 94.0, 113.0, 184.0, 48.0]
2025-08-07 12:44:56,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 35 seconds)
2025-08-07 12:46:30,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:46:31,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 215.00021 ± 199.681
2025-08-07 12:46:31,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [11.139512, 14.166733, 342.28235, 166.2151, 174.03906, 167.28102, 250.77791, 161.53055, 744.8111, 117.75882]
2025-08-07 12:46:31,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 17.0, 160.0, 89.0, 107.0, 91.0, 149.0, 99.0, 321.0, 67.0]
2025-08-07 12:46:31,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1251 [DEBUG]: Training session finished
