2025-08-07 09:11:14,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc0-halfcheetah/MM1Queue_a033_s075-bpql-mem16
2025-08-07 09:11:14,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc0-halfcheetah/MM1Queue_a033_s075-bpql-mem16
2025-08-07 09:11:14,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x15154395bb10>}
2025-08-07 09:11:14,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 09:11:14,190 baseline-bpql-noiseperc0-halfcheetah:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 09:11:14,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 09:11:14,206 baseline-bpql-noiseperc0-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=113, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 09:11:14,206 baseline-bpql-noiseperc0-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 09:11:15,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 09:11:15,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 09:12:50,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:13:01,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: -397.07050 ± 72.055
2025-08-07 09:13:01,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [-362.9846, -465.1791, -441.1814, -369.48123, -322.0912, -396.83362, -495.27127, -494.75656, -355.88483, -267.04163]
2025-08-07 09:13:01,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:13:01,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (-397.07) for latency MM1Queue_a033_s075
2025-08-07 09:13:01,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 55 minutes, 13 seconds)
2025-08-07 09:14:41,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:14:53,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: -276.18365 ± 48.227
2025-08-07 09:14:53,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [-277.24307, -299.8032, -215.06862, -299.20187, -302.7663, -207.01868, -210.9227, -307.0999, -278.64706, -364.0649]
2025-08-07 09:14:53,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:14:53,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (-276.18) for latency MM1Queue_a033_s075
2025-08-07 09:14:53,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 57 minutes, 39 seconds)
2025-08-07 09:16:32,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:16:44,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: -20.40818 ± 70.762
2025-08-07 09:16:44,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [-51.013317, -118.44518, -77.53125, 2.3531795, 22.67408, 51.920197, 43.18022, -55.478436, 96.835304, -118.57664]
2025-08-07 09:16:44,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:16:44,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (-20.41) for latency MM1Queue_a033_s075
2025-08-07 09:16:44,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 57 minutes, 18 seconds)
2025-08-07 09:18:24,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:18:36,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 144.76089 ± 116.225
2025-08-07 09:18:36,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [42.84888, 237.02324, 206.81804, 191.82042, 60.071686, 26.964083, 260.43713, 151.40086, -59.12824, 329.35278]
2025-08-07 09:18:36,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:18:36,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (144.76) for latency MM1Queue_a033_s075
2025-08-07 09:18:36,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 56 minutes, 20 seconds)
2025-08-07 09:20:16,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:20:28,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 273.15921 ± 157.477
2025-08-07 09:20:28,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [354.32123, 266.3518, 161.48616, 355.72324, 68.81784, 127.21849, 592.14136, 406.63907, 79.90216, 318.9907]
2025-08-07 09:20:28,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:20:28,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (273.16) for latency MM1Queue_a033_s075
2025-08-07 09:20:28,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 54 minutes, 58 seconds)
2025-08-07 09:22:08,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:22:19,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 321.72101 ± 221.735
2025-08-07 09:22:19,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [631.96936, 135.82246, 206.97012, 256.2273, 70.326805, 377.2722, 190.88277, 755.1276, 484.53284, 108.0787]
2025-08-07 09:22:19,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:22:19,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (321.72) for latency MM1Queue_a033_s075
2025-08-07 09:22:19,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 54 minutes, 51 seconds)
2025-08-07 09:23:59,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:24:11,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 455.38696 ± 206.280
2025-08-07 09:24:11,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [548.4536, 267.0, 747.87573, 472.9534, 345.02124, 313.3253, 227.66689, 566.80865, 838.1738, 226.5912]
2025-08-07 09:24:11,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:24:11,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (455.39) for latency MM1Queue_a033_s075
2025-08-07 09:24:11,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 53 minutes, 5 seconds)
2025-08-07 09:25:51,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:26:03,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 934.94092 ± 148.449
2025-08-07 09:26:03,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [987.23175, 958.44446, 812.0886, 829.38586, 886.5677, 872.5976, 782.48663, 1189.9232, 811.0905, 1219.5928]
2025-08-07 09:26:03,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:26:03,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (934.94) for latency MM1Queue_a033_s075
2025-08-07 09:26:03,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 51 minutes, 17 seconds)
2025-08-07 09:27:43,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:27:54,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1213.92151 ± 359.720
2025-08-07 09:27:54,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1104.8667, 1207.6731, 366.30347, 1598.8477, 1453.6229, 1340.2576, 1110.7383, 1213.8052, 1746.8309, 996.269]
2025-08-07 09:27:54,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:27:54,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (1213.92) for latency MM1Queue_a033_s075
2025-08-07 09:27:54,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 49 minutes, 20 seconds)
2025-08-07 09:29:34,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:29:46,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1244.37268 ± 188.950
2025-08-07 09:29:46,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1147.0973, 1263.8141, 1398.4982, 1184.8389, 1279.3022, 1581.8641, 991.9976, 1494.233, 1006.98193, 1095.0999]
2025-08-07 09:29:46,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:29:46,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (1244.37) for latency MM1Queue_a033_s075
2025-08-07 09:29:46,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 47 minutes, 31 seconds)
2025-08-07 09:31:26,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:31:38,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1203.69922 ± 208.739
2025-08-07 09:31:38,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1271.0671, 1295.4415, 766.4706, 1509.7478, 1105.0707, 1414.8767, 1164.8677, 1136.9608, 989.9856, 1382.504]
2025-08-07 09:31:38,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:31:38,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 45 minutes, 37 seconds)
2025-08-07 09:33:18,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:33:29,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1498.61011 ± 412.540
2025-08-07 09:33:29,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1186.6951, 1848.6965, 1112.9906, 1369.7192, 1035.5928, 1930.9053, 1359.9502, 1659.5332, 2354.5334, 1127.484]
2025-08-07 09:33:29,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:33:29,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (1498.61) for latency MM1Queue_a033_s075
2025-08-07 09:33:29,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 43 minutes, 45 seconds)
2025-08-07 09:35:09,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:35:21,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1420.46179 ± 180.260
2025-08-07 09:35:21,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1564.2767, 1268.5563, 1380.1482, 1598.7646, 1519.8616, 1113.2792, 1304.1324, 1736.8407, 1465.2775, 1253.4817]
2025-08-07 09:35:21,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:35:21,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 41 minutes, 50 seconds)
2025-08-07 09:37:01,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:37:12,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1711.94299 ± 452.809
2025-08-07 09:37:12,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1310.8167, 1186.4468, 1756.4126, 1637.3883, 1342.8597, 1718.1198, 1235.9067, 2611.0974, 2287.8037, 2032.5782]
2025-08-07 09:37:12,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:37:12,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (1711.94) for latency MM1Queue_a033_s075
2025-08-07 09:37:12,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 39 minutes, 58 seconds)
2025-08-07 09:38:52,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:39:04,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1756.35913 ± 484.386
2025-08-07 09:39:04,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1918.436, 1171.2249, 1584.6416, 2385.0813, 1470.6569, 1274.4355, 2495.6726, 1155.647, 2331.8318, 1775.9647]
2025-08-07 09:39:04,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:39:04,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (1756.36) for latency MM1Queue_a033_s075
2025-08-07 09:39:04,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 38 minutes, 1 second)
2025-08-07 09:40:44,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:40:55,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1336.17981 ± 454.930
2025-08-07 09:40:55,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1205.9221, 1368.3568, 1523.0651, 1173.6572, 1226.6531, 395.77365, 2332.2095, 1447.223, 1133.0813, 1555.8561]
2025-08-07 09:40:55,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:40:55,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 36 minutes, 6 seconds)
2025-08-07 09:42:35,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:42:47,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1845.85449 ± 491.097
2025-08-07 09:42:47,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1617.8203, 1423.7122, 2556.5115, 1812.0594, 1179.4048, 1885.0721, 1206.8855, 2098.8892, 2707.0667, 1971.1237]
2025-08-07 09:42:47,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:42:47,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (1845.85) for latency MM1Queue_a033_s075
2025-08-07 09:42:47,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 34 minutes, 12 seconds)
2025-08-07 09:44:26,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:44:38,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1562.22192 ± 300.476
2025-08-07 09:44:38,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2154.463, 1319.3333, 1743.1947, 1308.8013, 1261.0314, 1836.89, 1303.534, 1893.8687, 1380.5914, 1420.5103]
2025-08-07 09:44:38,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:44:38,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 32 minutes, 20 seconds)
2025-08-07 09:46:18,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:46:30,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2067.90430 ± 682.824
2025-08-07 09:46:30,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1349.3112, 2991.6255, 2937.528, 2576.2122, 1259.5481, 2301.731, 1966.7611, 1500.7823, 1129.5365, 2666.0054]
2025-08-07 09:46:30,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:46:30,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (2067.90) for latency MM1Queue_a033_s075
2025-08-07 09:46:30,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 30 minutes, 29 seconds)
2025-08-07 09:48:10,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:48:21,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1594.26587 ± 383.195
2025-08-07 09:48:21,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1757.5433, 1131.6052, 1669.2633, 1162.5835, 1523.6752, 1770.8622, 2329.538, 1213.7345, 2075.9958, 1307.8574]
2025-08-07 09:48:21,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:48:21,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 28 minutes, 39 seconds)
2025-08-07 09:50:01,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:50:13,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2092.51172 ± 609.925
2025-08-07 09:50:13,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1834.532, 3145.246, 1188.3431, 2258.0588, 2088.6738, 1735.8086, 3093.6458, 1568.6824, 1630.9059, 2381.221]
2025-08-07 09:50:13,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:50:13,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (2092.51) for latency MM1Queue_a033_s075
2025-08-07 09:50:13,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 26 minutes, 49 seconds)
2025-08-07 09:51:53,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:52:04,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1570.14746 ± 221.534
2025-08-07 09:52:04,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1904.9226, 1788.1356, 1521.1768, 1599.1012, 1814.7798, 1201.5941, 1486.5347, 1231.9071, 1650.0963, 1503.2257]
2025-08-07 09:52:04,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:52:04,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 25 minutes, 2 seconds)
2025-08-07 09:53:44,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:53:56,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2665.11035 ± 703.340
2025-08-07 09:53:56,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2336.6235, 1328.8907, 1675.1787, 3222.3416, 3434.004, 2559.882, 3454.1912, 3373.0813, 2501.41, 2765.5007]
2025-08-07 09:53:56,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:53:56,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (2665.11) for latency MM1Queue_a033_s075
2025-08-07 09:53:56,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 23 minutes, 10 seconds)
2025-08-07 09:55:36,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:55:47,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1884.15796 ± 548.685
2025-08-07 09:55:47,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2522.0757, 1432.8561, 1604.503, 3124.8127, 2317.5217, 1590.3275, 1470.3196, 1546.976, 1815.7135, 1416.475]
2025-08-07 09:55:47,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:55:47,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 21 minutes, 19 seconds)
2025-08-07 09:57:27,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:57:39,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2011.54102 ± 962.937
2025-08-07 09:57:39,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [141.55215, 1567.2341, 2618.8442, 3241.8162, 1664.2668, 1643.2745, 3138.0613, 3235.485, 1462.3927, 1402.4827]
2025-08-07 09:57:39,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:57:39,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 19 minutes, 27 seconds)
2025-08-07 09:59:19,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:59:30,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2127.15503 ± 892.839
2025-08-07 09:59:30,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3325.936, 1532.9698, 3221.985, 3688.9832, 1250.9191, 1641.4994, 1607.3646, 1207.8353, 2296.1648, 1497.8936]
2025-08-07 09:59:30,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:59:30,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 17 minutes, 33 seconds)
2025-08-07 10:01:10,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:01:22,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2236.24585 ± 611.699
2025-08-07 10:01:22,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1663.4678, 2615.0254, 1524.468, 1861.6449, 2304.066, 1760.6096, 3436.851, 1712.7177, 2453.9612, 3029.6492]
2025-08-07 10:01:22,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:01:22,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 15 minutes, 41 seconds)
2025-08-07 10:03:02,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:03:14,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1783.05798 ± 695.264
2025-08-07 10:03:14,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1224.7374, 1229.5304, 1610.7562, 1535.3044, 1738.5603, 1210.1902, 3654.209, 1455.2288, 2052.0278, 2120.0369]
2025-08-07 10:03:14,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:03:14,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 13 minutes, 50 seconds)
2025-08-07 10:04:53,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:05:05,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2105.05176 ± 1267.826
2025-08-07 10:05:05,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2735.255, 1425.9548, 4086.3438, 384.88904, 4352.7886, 1223.4255, 975.9639, 1933.2037, 1205.2347, 2727.4597]
2025-08-07 10:05:05,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:05:05,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 11 minutes, 59 seconds)
2025-08-07 10:06:45,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:06:57,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2082.35400 ± 752.185
2025-08-07 10:06:57,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1907.962, 1651.5668, 3380.991, 1309.4102, 3218.2505, 1386.6394, 1455.706, 2937.34, 1579.8973, 1995.7788]
2025-08-07 10:06:57,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:06:57,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 10 minutes, 6 seconds)
2025-08-07 10:08:37,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:08:48,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2305.89648 ± 812.262
2025-08-07 10:08:48,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3413.2637, 1523.5363, 2203.5344, 1831.6533, 1480.364, 2337.0486, 1696.7803, 4163.3003, 2209.5166, 2199.9685]
2025-08-07 10:08:48,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:08:48,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 8 minutes, 15 seconds)
2025-08-07 10:10:28,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:10:39,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3216.86572 ± 1098.304
2025-08-07 10:10:39,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4225.8486, 3918.8699, 1370.6577, 4010.5562, 1405.8741, 3160.9224, 2955.8132, 2432.3845, 4402.164, 4285.567]
2025-08-07 10:10:39,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:10:39,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (3216.87) for latency MM1Queue_a033_s075
2025-08-07 10:10:39,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 6 minutes, 18 seconds)
2025-08-07 10:12:20,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:12:32,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2436.41528 ± 1121.483
2025-08-07 10:12:32,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1807.3627, 2631.57, 4436.5835, 4243.072, 612.41156, 2930.3994, 1668.3906, 1873.1686, 1809.0922, 2352.1035]
2025-08-07 10:12:32,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:12:32,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 4 minutes, 41 seconds)
2025-08-07 10:14:12,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:14:24,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2987.97583 ± 1021.432
2025-08-07 10:14:24,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2541.3718, 3350.9678, 2426.4302, 2513.3032, 3820.061, 1520.3098, 4023.0266, 1332.9569, 4379.792, 3971.5383]
2025-08-07 10:14:24,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:14:24,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 2 minutes, 54 seconds)
2025-08-07 10:16:04,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:16:16,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1955.32837 ± 782.672
2025-08-07 10:16:16,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2868.1233, 1559.1084, 1408.4891, 1581.4121, 1696.1653, 1712.702, 2079.9792, 1261.5017, 3914.7292, 1471.074]
2025-08-07 10:16:16,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:16:16,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 1 minute, 11 seconds)
2025-08-07 10:17:56,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:18:08,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2199.27979 ± 900.004
2025-08-07 10:18:08,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2603.2498, 1168.4207, 1507.1241, 1791.578, 1610.6614, 2403.9563, 1231.5498, 3032.2817, 2402.9624, 4241.012]
2025-08-07 10:18:08,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:18:08,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 59 minutes, 26 seconds)
2025-08-07 10:19:48,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:00,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3673.43164 ± 970.833
2025-08-07 10:20:00,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4354.077, 4176.026, 4370.7554, 4104.0503, 1608.3838, 3670.3572, 3638.082, 4347.5117, 2022.4235, 4442.6475]
2025-08-07 10:20:00,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:20:00,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (3673.43) for latency MM1Queue_a033_s075
2025-08-07 10:20:00,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 57 minutes, 43 seconds)
2025-08-07 10:21:40,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:21:52,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4058.59521 ± 1415.425
2025-08-07 10:21:52,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4695.856, 4917.9487, 4715.9126, 4706.511, 1164.378, 4884.58, 4819.485, 1320.1243, 4932.7466, 4428.408]
2025-08-07 10:21:52,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:21:52,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (4058.60) for latency MM1Queue_a033_s075
2025-08-07 10:21:52,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 55 minutes, 43 seconds)
2025-08-07 10:23:32,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:23:44,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3218.84253 ± 1166.308
2025-08-07 10:23:44,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4090.8074, 1734.076, 4245.015, 1352.5494, 3900.4849, 4202.4453, 4020.3967, 4027.4998, 1382.2466, 3232.9065]
2025-08-07 10:23:44,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:23:44,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 53 minutes, 51 seconds)
2025-08-07 10:25:24,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:25:36,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2238.67432 ± 778.548
2025-08-07 10:25:36,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1710.445, 2182.227, 3018.309, 2403.5608, 1623.4521, 1736.0365, 4216.9043, 1771.1113, 2083.4146, 1641.2831]
2025-08-07 10:25:36,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:25:36,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 51 minutes, 56 seconds)
2025-08-07 10:27:16,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:27:27,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3565.70361 ± 1219.676
2025-08-07 10:27:27,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4568.6313, 1447.3531, 3695.9531, 3833.4292, 1773.614, 2124.5857, 4643.313, 4412.2666, 4772.02, 4385.8687]
2025-08-07 10:27:27,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:27:27,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 50 minutes, 2 seconds)
2025-08-07 10:29:08,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:29:19,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3547.72729 ± 1249.393
2025-08-07 10:29:19,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1312.5789, 4524.2246, 4080.3477, 4303.9727, 4451.462, 4437.0264, 2211.0671, 1526.8009, 4588.1187, 4041.674]
2025-08-07 10:29:19,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:29:19,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 48 minutes, 9 seconds)
2025-08-07 10:31:00,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:31:11,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3713.35400 ± 1090.054
2025-08-07 10:31:11,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4982.4165, 3016.8298, 4606.4287, 3698.1846, 4953.832, 2008.4534, 2613.0251, 3156.9136, 2865.9504, 5231.5054]
2025-08-07 10:31:11,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:31:11,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 46 minutes, 17 seconds)
2025-08-07 10:32:52,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:03,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4109.12988 ± 1474.250
2025-08-07 10:33:03,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5191.4873, 1697.3821, 5286.385, 5041.7554, 1868.2323, 5036.0923, 5082.8394, 5133.043, 2046.42, 4707.661]
2025-08-07 10:33:03,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:33:03,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (4109.13) for latency MM1Queue_a033_s075
2025-08-07 10:33:03,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 44 minutes, 26 seconds)
2025-08-07 10:34:44,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:34:55,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3626.19922 ± 1211.174
2025-08-07 10:34:55,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4698.6465, 4544.681, 2274.6628, 2039.0402, 4642.065, 3183.9485, 4749.957, 3866.1284, 4758.143, 1504.7179]
2025-08-07 10:34:55,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:34:55,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 42 minutes, 37 seconds)
2025-08-07 10:36:36,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:36:47,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5020.69824 ± 67.106
2025-08-07 10:36:47,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5072.9805, 4965.5947, 4997.2563, 5000.8296, 5035.655, 4970.4644, 5097.45, 5130.8623, 5045.3325, 4890.5547]
2025-08-07 10:36:47,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:36:47,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (5020.70) for latency MM1Queue_a033_s075
2025-08-07 10:36:47,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 40 minutes, 48 seconds)
2025-08-07 10:38:28,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:38:39,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4788.62695 ± 594.330
2025-08-07 10:38:39,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5177.0117, 5002.419, 5022.6924, 4911.4346, 3042.5347, 5027.96, 5015.3066, 4837.1616, 4739.293, 5110.4517]
2025-08-07 10:38:39,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:38:39,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 38 minutes, 56 seconds)
2025-08-07 10:40:20,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:40:31,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3974.92310 ± 1413.381
2025-08-07 10:40:31,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5338.5083, 5131.016, 4036.9119, 1907.1703, 2346.1768, 5122.7095, 5139.3057, 1683.3322, 3708.1145, 5335.988]
2025-08-07 10:40:31,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:40:31,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 37 minutes, 6 seconds)
2025-08-07 10:42:12,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:42:23,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4244.75684 ± 1436.101
2025-08-07 10:42:23,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5084.9985, 4775.8193, 5350.4556, 3534.5894, 5395.2393, 5276.064, 1785.5735, 5076.801, 4828.6055, 1339.4253]
2025-08-07 10:42:23,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:42:23,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 35 minutes, 10 seconds)
2025-08-07 10:44:02,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:14,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4295.35059 ± 1287.480
2025-08-07 10:44:14,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5361.2656, 4842.58, 4124.5645, 5323.6826, 4706.8667, 2895.4514, 3611.6582, 5423.1235, 5372.9775, 1291.3331]
2025-08-07 10:44:14,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:44:14,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 33 minutes, 4 seconds)
2025-08-07 10:45:53,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:04,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4076.03857 ± 1026.208
2025-08-07 10:46:04,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2549.1702, 5135.1035, 5409.5, 4182.523, 3982.205, 2250.9216, 5012.489, 3264.1008, 4680.194, 4294.181]
2025-08-07 10:46:04,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:46:04,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 30 minutes, 55 seconds)
2025-08-07 10:47:43,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:54,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3855.55859 ± 1414.121
2025-08-07 10:47:54,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2010.7297, 2225.2703, 5156.272, 4640.289, 4868.257, 5041.451, 2319.0916, 5126.7456, 5176.25, 1991.23]
2025-08-07 10:47:54,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:47:54,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 28 minutes, 47 seconds)
2025-08-07 10:49:33,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:44,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4444.16699 ± 943.141
2025-08-07 10:49:44,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4862.8726, 5304.17, 4301.5664, 4844.48, 5225.8086, 2386.438, 5065.4043, 4474.1704, 5025.7144, 2951.0466]
2025-08-07 10:49:44,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:49:44,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 26 minutes, 31 seconds)
2025-08-07 10:51:22,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:51:34,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4270.86328 ± 1479.819
2025-08-07 10:51:34,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5416.7207, 4725.0405, 4655.8486, 1379.0596, 4867.8403, 5021.0723, 5388.3965, 4694.36, 5224.1357, 1336.1608]
2025-08-07 10:51:34,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:51:34,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 24 minutes, 23 seconds)
2025-08-07 10:53:12,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:23,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4647.11768 ± 1243.499
2025-08-07 10:53:23,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5429.191, 4598.285, 5485.3877, 2226.3013, 2192.1702, 5114.869, 5385.84, 5277.22, 5279.74, 5482.1675]
2025-08-07 10:53:23,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:53:23,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 22 minutes, 26 seconds)
2025-08-07 10:55:02,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:55:13,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4657.26660 ± 1435.513
2025-08-07 10:55:13,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5497.528, 4872.154, 1411.5836, 5447.2417, 5460.0923, 2255.3608, 5360.3813, 5308.929, 5552.976, 5406.416]
2025-08-07 10:55:13,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:55:13,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 20 minutes, 30 seconds)
2025-08-07 10:56:52,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:57:03,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4416.62744 ± 1298.800
2025-08-07 10:57:03,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4892.461, 5449.4185, 2290.6313, 5355.196, 5186.003, 5396.3213, 2036.4745, 3123.2466, 5214.5747, 5221.948]
2025-08-07 10:57:03,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:57:03,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 18 minutes, 37 seconds)
2025-08-07 10:58:41,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:53,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5008.88184 ± 237.847
2025-08-07 10:58:53,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4521.9414, 4999.1035, 5365.1616, 5100.403, 4829.0684, 5261.527, 4849.981, 4853.4863, 5101.774, 5206.376]
2025-08-07 10:58:53,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:58:53,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 16 minutes, 50 seconds)
2025-08-07 11:00:31,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:43,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5180.58691 ± 818.903
2025-08-07 11:00:43,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5405.3555, 5492.841, 2738.3582, 5286.2705, 5560.2085, 5520.089, 5482.73, 5538.2285, 5303.3955, 5478.399]
2025-08-07 11:00:43,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:00:43,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (5180.59) for latency MM1Queue_a033_s075
2025-08-07 11:00:43,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 15 minutes, 2 seconds)
2025-08-07 11:02:21,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:02:32,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4383.62500 ± 1329.655
2025-08-07 11:02:32,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2609.0256, 5026.9575, 4686.5093, 5179.0264, 3527.4792, 5610.9106, 5469.1855, 5475.7656, 1449.8168, 4801.573]
2025-08-07 11:02:32,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:02:32,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 13 minutes, 8 seconds)
2025-08-07 11:04:10,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:04:21,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3812.58789 ± 1768.168
2025-08-07 11:04:21,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5558.15, 5481.2754, 4566.2744, 1346.6272, 5193.238, 4989.6074, 1258.6716, 1307.9304, 3121.831, 5302.2734]
2025-08-07 11:04:21,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:04:21,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 11 minutes, 14 seconds)
2025-08-07 11:05:59,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:06:10,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5106.00098 ± 936.300
2025-08-07 11:06:10,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5519.0273, 5296.2603, 4574.232, 2434.9927, 5629.689, 5576.727, 5534.154, 5485.9185, 5533.0596, 5475.948]
2025-08-07 11:06:10,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:06:10,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 9 minutes, 19 seconds)
2025-08-07 11:07:48,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:59,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5191.35596 ± 806.479
2025-08-07 11:07:59,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5607.043, 2950.1968, 5518.8296, 5501.3486, 5486.2725, 5596.2383, 5596.3247, 5522.461, 4543.066, 5591.7773]
2025-08-07 11:07:59,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:07:59,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (5191.36) for latency MM1Queue_a033_s075
2025-08-07 11:07:59,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 7 minutes, 25 seconds)
2025-08-07 11:09:38,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:09:49,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3586.53052 ± 1660.093
2025-08-07 11:09:49,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4054.4468, 1395.6335, 3116.555, 4920.3438, 1302.4379, 5549.0234, 3192.478, 5277.88, 1472.8939, 5583.6113]
2025-08-07 11:09:49,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:09:49,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 5 minutes, 34 seconds)
2025-08-07 11:11:27,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:11:38,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4793.21191 ± 1120.277
2025-08-07 11:11:38,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1780.0886, 4246.044, 5478.7437, 4177.5986, 5598.1396, 4925.739, 5523.8247, 5387.879, 5316.142, 5497.9243]
2025-08-07 11:11:38,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:11:38,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 3 minutes, 42 seconds)
2025-08-07 11:13:16,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:13:27,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4591.47217 ± 1308.077
2025-08-07 11:13:27,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5403.9824, 5243.131, 2421.9958, 5221.573, 4365.6396, 5183.1465, 1714.7838, 5504.8594, 5507.0796, 5348.5312]
2025-08-07 11:13:27,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:13:27,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 1 minute, 54 seconds)
2025-08-07 11:15:05,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:17,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5293.84277 ± 634.575
2025-08-07 11:15:17,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5607.918, 5491.9863, 5562.1494, 3419.946, 5633.859, 5276.102, 5470.6826, 5394.6763, 5425.707, 5655.4004]
2025-08-07 11:15:17,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:15:17,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (5293.84) for latency MM1Queue_a033_s075
2025-08-07 11:15:17,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 6 seconds)
2025-08-07 11:16:55,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:06,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5565.75146 ± 55.692
2025-08-07 11:17:06,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5545.2812, 5537.732, 5659.7334, 5570.9644, 5565.679, 5643.671, 5591.4995, 5498.477, 5576.5747, 5467.9033]
2025-08-07 11:17:06,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:17:06,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (5565.75) for latency MM1Queue_a033_s075
2025-08-07 11:17:06,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 58 minutes, 17 seconds)
2025-08-07 11:18:44,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:18:55,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5108.04443 ± 1057.748
2025-08-07 11:18:55,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5357.1616, 5609.294, 5555.756, 5396.944, 1944.7574, 5366.4053, 5530.4487, 5439.4297, 5507.7764, 5372.474]
2025-08-07 11:18:55,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:18:55,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 56 minutes, 24 seconds)
2025-08-07 11:20:33,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:44,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5191.58301 ± 859.162
2025-08-07 11:20:44,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2681.9177, 5528.473, 5017.1826, 5551.161, 5298.667, 5307.7153, 5684.31, 5569.2266, 5638.0757, 5639.0986]
2025-08-07 11:20:44,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:20:44,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 54 minutes, 35 seconds)
2025-08-07 11:22:22,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:33,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4842.96729 ± 1312.019
2025-08-07 11:22:33,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5565.348, 5510.6094, 5457.955, 5643.5083, 5452.511, 2527.6465, 5507.8633, 5540.9385, 5274.996, 1948.2964]
2025-08-07 11:22:33,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:22:33,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 52 minutes, 46 seconds)
2025-08-07 11:24:11,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:23,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4368.58496 ± 1270.237
2025-08-07 11:24:23,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2407.9246, 3545.2861, 5715.768, 5381.2393, 5561.3174, 5483.981, 5732.0747, 3844.3174, 3370.2625, 2643.6763]
2025-08-07 11:24:23,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:24:23,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 50 minutes, 57 seconds)
2025-08-07 11:26:00,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:26:12,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5564.53711 ± 170.283
2025-08-07 11:26:12,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5680.623, 5656.983, 5595.364, 5724.137, 5573.3896, 5553.146, 5633.3804, 5083.6953, 5623.5146, 5521.138]
2025-08-07 11:26:12,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:26:12,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 49 minutes, 7 seconds)
2025-08-07 11:27:50,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:01,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4963.64014 ± 1441.370
2025-08-07 11:28:01,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5689.23, 5675.817, 5577.2505, 5557.241, 5400.9707, 5757.4375, 5801.226, 5835.6777, 2922.2295, 1419.3207]
2025-08-07 11:28:01,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:28:01,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 47 minutes, 18 seconds)
2025-08-07 11:29:39,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:29:50,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5434.72559 ± 221.979
2025-08-07 11:29:50,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5352.1606, 5483.713, 4820.1357, 5595.513, 5459.263, 5520.856, 5398.5776, 5498.1396, 5662.1323, 5556.7617]
2025-08-07 11:29:50,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:29:50,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 45 minutes, 28 seconds)
2025-08-07 11:31:28,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:31:39,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4709.65918 ± 1181.101
2025-08-07 11:31:39,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5461.921, 5326.56, 5483.8145, 2329.643, 4998.485, 5376.8013, 4693.734, 2467.5933, 5490.981, 5467.0635]
2025-08-07 11:31:39,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:31:39,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 43 minutes, 38 seconds)
2025-08-07 11:33:17,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:33:28,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5443.13330 ± 349.936
2025-08-07 11:33:28,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5228.1216, 5693.428, 5565.326, 5563.2944, 5604.4946, 5720.4556, 5572.5864, 5440.5127, 5574.3354, 4468.7734]
2025-08-07 11:33:28,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:33:28,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 41 minutes, 48 seconds)
2025-08-07 11:35:06,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:35:17,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4705.86523 ± 1585.610
2025-08-07 11:35:17,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5667.141, 5612.507, 891.3869, 5504.665, 5641.343, 4591.0474, 5448.4863, 5539.8506, 5707.654, 2454.5737]
2025-08-07 11:35:17,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:35:17,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 39 minutes, 58 seconds)
2025-08-07 11:36:55,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:37:06,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4975.31836 ± 1068.327
2025-08-07 11:37:06,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5535.6646, 5725.5825, 3179.8843, 5124.215, 5616.2085, 5596.1567, 5602.691, 5461.425, 2579.963, 5331.394]
2025-08-07 11:37:06,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:37:06,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 38 minutes, 9 seconds)
2025-08-07 11:38:44,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:38:55,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3924.84326 ± 1584.600
2025-08-07 11:38:55,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5733.54, 3058.8477, 4862.492, 5388.6914, 1340.5442, 2688.255, 5437.9263, 5498.3647, 1701.4012, 3538.3696]
2025-08-07 11:38:55,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:38:55,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 36 minutes, 20 seconds)
2025-08-07 11:40:33,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:40:44,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5541.82471 ± 76.843
2025-08-07 11:40:44,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5623.15, 5517.519, 5340.143, 5624.663, 5554.5166, 5534.019, 5553.661, 5571.608, 5587.8247, 5511.141]
2025-08-07 11:40:44,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:40:44,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 34 minutes, 31 seconds)
2025-08-07 11:42:22,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:42:33,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5090.01855 ± 932.174
2025-08-07 11:42:33,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5714.7417, 5242.83, 5539.7246, 4449.087, 5669.4805, 2535.8352, 5107.035, 5768.863, 5210.169, 5662.4214]
2025-08-07 11:42:33,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:42:33,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 32 minutes, 42 seconds)
2025-08-07 11:44:11,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:44:22,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5078.97900 ± 1092.573
2025-08-07 11:44:22,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5444.6787, 5449.0815, 1839.5531, 5672.386, 5277.0337, 5481.308, 5033.8525, 5476.1943, 5578.8403, 5536.8623]
2025-08-07 11:44:22,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:44:22,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 30 minutes, 54 seconds)
2025-08-07 11:45:59,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:46:10,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4839.97559 ± 934.356
2025-08-07 11:46:10,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3106.3071, 3625.6467, 5287.203, 5138.906, 3603.7393, 5684.2764, 5340.2437, 5477.4062, 5530.3203, 5605.7085]
2025-08-07 11:46:10,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:46:10,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 29 minutes, 1 second)
2025-08-07 11:47:46,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:58,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5627.40918 ± 64.545
2025-08-07 11:47:58,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5570.559, 5609.3037, 5737.4595, 5696.189, 5584.386, 5705.564, 5518.123, 5641.326, 5595.331, 5615.853]
2025-08-07 11:47:58,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:47:58,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1226 [INFO]: New best (5627.41) for latency MM1Queue_a033_s075
2025-08-07 11:47:58,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 27 minutes, 7 seconds)
2025-08-07 11:49:35,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:49:46,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4971.32129 ± 1266.587
2025-08-07 11:49:46,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5425.0874, 1281.9076, 5409.6724, 5550.54, 5682.2627, 5473.5693, 5268.432, 5550.9243, 4531.9253, 5538.894]
2025-08-07 11:49:46,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:49:46,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 17 seconds)
2025-08-07 11:51:23,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:51:34,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5018.06787 ± 991.244
2025-08-07 11:51:34,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5453.0337, 2674.4573, 5554.266, 5605.457, 5611.7817, 3483.6465, 5572.1143, 5531.6475, 5439.761, 5254.5103]
2025-08-07 11:51:34,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:51:34,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 26 seconds)
2025-08-07 11:53:11,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:53:22,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5058.51660 ± 1098.811
2025-08-07 11:53:22,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4060.491, 5641.007, 5716.3438, 5134.2314, 5698.734, 2091.7507, 5661.197, 5361.3774, 5722.2915, 5497.743]
2025-08-07 11:53:22,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:53:22,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 35 seconds)
2025-08-07 11:54:59,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:10,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5416.16162 ± 426.853
2025-08-07 11:55:10,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5684.7666, 5702.979, 5815.611, 5563.8477, 5120.2485, 5424.061, 5225.2437, 4303.744, 5711.765, 5609.3516]
2025-08-07 11:55:10,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:55:10,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 48 seconds)
2025-08-07 11:56:47,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:56:58,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5151.53760 ± 1164.659
2025-08-07 11:56:58,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5376.6235, 5577.147, 5483.7764, 1667.2627, 5677.2935, 5454.012, 5547.041, 5616.875, 5482.934, 5632.411]
2025-08-07 11:56:58,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:56:58,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 1 second)
2025-08-07 11:58:35,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:58:46,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5157.97021 ± 1256.059
2025-08-07 11:58:46,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5428.0117, 5647.3267, 5657.0557, 5534.8276, 5603.748, 5493.81, 5762.946, 1399.6838, 5524.6025, 5527.696]
2025-08-07 11:58:46,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:58:46,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 12 seconds)
2025-08-07 12:00:23,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:00:35,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5559.10449 ± 181.278
2025-08-07 12:00:35,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5674.593, 5689.7573, 5068.336, 5708.287, 5491.3535, 5708.0596, 5537.4077, 5603.3706, 5609.062, 5500.8213]
2025-08-07 12:00:35,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:00:35,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 24 seconds)
2025-08-07 12:02:11,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:02:23,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4858.65137 ± 1349.488
2025-08-07 12:02:23,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5614.983, 1407.8586, 3185.0078, 5106.7935, 5513.996, 5653.044, 5491.1826, 5442.87, 5657.075, 5513.7026]
2025-08-07 12:02:23,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:02:23,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 36 seconds)
2025-08-07 12:03:59,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:04:11,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5613.58691 ± 124.970
2025-08-07 12:04:11,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5707.2627, 5740.293, 5583.8965, 5683.2925, 5735.458, 5340.4526, 5706.5, 5596.44, 5592.3013, 5449.974]
2025-08-07 12:04:11,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:04:11,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 48 seconds)
2025-08-07 12:05:48,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:59,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5020.89111 ± 1210.084
2025-08-07 12:05:59,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5678.102, 3502.4517, 5680.9644, 5149.1934, 5242.7124, 1973.733, 5677.994, 5772.331, 5685.292, 5846.136]
2025-08-07 12:05:59,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:05:59,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes)
2025-08-07 12:07:35,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:07:47,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5278.51221 ± 1303.748
2025-08-07 12:07:47,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5772.942, 5761.249, 5738.0376, 1376.98, 5698.2793, 5450.751, 5778.584, 5733.654, 5699.401, 5775.242]
2025-08-07 12:07:47,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:07:47,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 12 seconds)
2025-08-07 12:09:23,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:09:34,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4823.42920 ± 1094.916
2025-08-07 12:09:34,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5853.5225, 4573.771, 3268.3223, 3556.0066, 5543.588, 5688.453, 5613.234, 5491.9316, 2908.289, 5737.175]
2025-08-07 12:09:34,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:09:35,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 23 seconds)
2025-08-07 12:11:11,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:11:23,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4088.31445 ± 1934.029
2025-08-07 12:11:23,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5595.586, 3931.391, 5700.3276, 1260.2687, 5348.9224, 1171.6917, 5700.2, 1262.0626, 5664.9717, 5247.721]
2025-08-07 12:11:23,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:11:23,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 36 seconds)
2025-08-07 12:12:59,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:13:11,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4898.00781 ± 1415.422
2025-08-07 12:13:11,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3112.294, 5711.9175, 5676.978, 5544.2383, 1311.1062, 5607.4414, 5016.329, 5706.3877, 5667.5107, 5625.8706]
2025-08-07 12:13:11,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:13:11,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 48 seconds)
2025-08-07 12:14:47,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:59,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5569.29736 ± 113.563
2025-08-07 12:14:59,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5568.9355, 5629.214, 5536.1475, 5507.838, 5607.056, 5798.2085, 5562.159, 5352.322, 5664.3286, 5466.7603]
2025-08-07 12:14:59,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:14:59,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-halfcheetah):1251 [DEBUG]: Training session finished
