2025-08-07 09:21:56,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc10-halfcheetah/MM1Queue_a033_s075-bpql-mem16
2025-08-07 09:21:56,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc10-halfcheetah/MM1Queue_a033_s075-bpql-mem16
2025-08-07 09:21:56,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x15029b6e34d0>}
2025-08-07 09:21:56,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 09:21:56,844 baseline-bpql-noiseperc10-halfcheetah:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 09:21:56,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 09:21:56,861 baseline-bpql-noiseperc10-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=113, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 09:21:56,861 baseline-bpql-noiseperc10-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 09:21:57,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 09:21:57,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 09:23:35,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:23:46,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -323.54376 ± 25.186
2025-08-07 09:23:46,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-367.2763, -306.17877, -335.09637, -327.7046, -299.6937, -294.35645, -289.384, -336.10703, -321.066, -358.57416]
2025-08-07 09:23:46,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:23:46,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (-323.54) for latency MM1Queue_a033_s075
2025-08-07 09:23:46,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 59 minutes, 18 seconds)
2025-08-07 09:25:28,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:25:41,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -250.50081 ± 59.410
2025-08-07 09:25:41,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-301.59677, -295.37817, -288.37152, -287.0399, -268.40497, -186.12819, -265.8082, -264.2356, -98.80256, -249.2421]
2025-08-07 09:25:41,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:25:41,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (-250.50) for latency MM1Queue_a033_s075
2025-08-07 09:25:41,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 2 minutes, 37 seconds)
2025-08-07 09:27:24,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:27:35,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -103.33655 ± 51.325
2025-08-07 09:27:35,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-75.011215, -99.59073, -148.7715, -102.78787, -105.640816, -200.2264, -13.841394, -75.17509, -54.926056, -157.39436]
2025-08-07 09:27:35,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:27:35,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (-103.34) for latency MM1Queue_a033_s075
2025-08-07 09:27:35,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 1 minute, 59 seconds)
2025-08-07 09:29:16,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:29:27,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -85.53429 ± 42.521
2025-08-07 09:29:27,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-79.35602, -77.89218, -97.4374, -125.557724, -156.54253, -52.687443, -88.2233, -44.08393, -4.769837, -128.79251]
2025-08-07 09:29:27,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:29:27,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (-85.53) for latency MM1Queue_a033_s075
2025-08-07 09:29:27,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours)
2025-08-07 09:31:06,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:31:19,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 158.77490 ± 175.356
2025-08-07 09:31:19,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [338.98135, 233.81122, 233.66685, -81.907265, 164.50954, 46.18608, -12.102161, 215.28139, 500.40747, -51.085503]
2025-08-07 09:31:19,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:31:19,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (158.77) for latency MM1Queue_a033_s075
2025-08-07 09:31:19,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 57 minutes, 48 seconds)
2025-08-07 09:32:59,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:33:11,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 221.88736 ± 219.046
2025-08-07 09:33:11,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [307.6685, 197.81421, 348.24448, 56.53962, 138.77655, 28.633478, 258.21967, 788.5886, 67.30385, 27.084593]
2025-08-07 09:33:11,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:33:11,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (221.89) for latency MM1Queue_a033_s075
2025-08-07 09:33:11,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 57 minutes, 8 seconds)
2025-08-07 09:34:49,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:35:00,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 298.50299 ± 365.379
2025-08-07 09:35:00,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [651.1543, 118.72445, 168.92789, 168.0259, 614.9175, 282.80472, 184.5983, 862.35815, -529.8013, 463.3199]
2025-08-07 09:35:00,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:35:00,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (298.50) for latency MM1Queue_a033_s075
2025-08-07 09:35:00,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 53 minutes, 22 seconds)
2025-08-07 09:36:40,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:36:51,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 577.59521 ± 82.119
2025-08-07 09:36:51,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [529.22217, 593.7578, 509.45435, 480.12045, 699.67676, 560.9905, 492.4604, 594.84436, 570.745, 744.6805]
2025-08-07 09:36:51,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:36:51,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (577.60) for latency MM1Queue_a033_s075
2025-08-07 09:36:51,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 50 minutes, 23 seconds)
2025-08-07 09:38:31,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:38:42,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 883.28925 ± 80.982
2025-08-07 09:38:42,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1024.7247, 895.76495, 919.83887, 860.23987, 831.1835, 905.5111, 883.20184, 712.68256, 972.97455, 826.7708]
2025-08-07 09:38:42,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:38:42,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (883.29) for latency MM1Queue_a033_s075
2025-08-07 09:38:42,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 48 minutes, 15 seconds)
2025-08-07 09:40:21,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:40:33,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1044.71301 ± 153.523
2025-08-07 09:40:33,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [967.4644, 971.2236, 1127.2262, 1210.8816, 959.3878, 1190.6357, 786.45654, 1024.2881, 1313.3484, 896.2169]
2025-08-07 09:40:33,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:40:33,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1044.71) for latency MM1Queue_a033_s075
2025-08-07 09:40:33,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 46 minutes, 18 seconds)
2025-08-07 09:42:11,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:42:24,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1030.32202 ± 93.064
2025-08-07 09:42:24,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [964.4189, 964.7975, 1249.6796, 888.1888, 995.2986, 1074.9368, 1085.4667, 992.4967, 1072.1815, 1015.7563]
2025-08-07 09:42:24,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:42:24,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 43 minutes, 51 seconds)
2025-08-07 09:44:01,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:44:12,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1083.23511 ± 76.920
2025-08-07 09:44:12,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1102.6178, 1180.3268, 1082.1161, 1091.954, 1136.5208, 1216.266, 1046.272, 1028.5253, 989.23676, 958.5168]
2025-08-07 09:44:12,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:44:12,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1083.24) for latency MM1Queue_a033_s075
2025-08-07 09:44:12,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 41 minutes, 49 seconds)
2025-08-07 09:45:52,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:46:03,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1278.76819 ± 223.402
2025-08-07 09:46:03,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1093.5294, 1070.5121, 1343.6694, 1550.3641, 1513.6927, 884.0247, 1554.5336, 1325.5532, 1070.0345, 1381.768]
2025-08-07 09:46:03,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:46:03,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1278.77) for latency MM1Queue_a033_s075
2025-08-07 09:46:03,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 40 minutes, 8 seconds)
2025-08-07 09:47:45,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:47:56,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1132.69397 ± 114.469
2025-08-07 09:47:56,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1088.7831, 1057.62, 1091.4487, 1106.4448, 1103.6667, 1073.2151, 1214.9469, 1146.6455, 1438.5972, 1005.57056]
2025-08-07 09:47:56,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:47:56,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 38 minutes, 39 seconds)
2025-08-07 09:49:34,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:49:47,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1178.40820 ± 169.604
2025-08-07 09:49:47,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1234.5796, 1315.2219, 1139.7426, 955.8079, 1289.0577, 1177.56, 1549.1343, 1011.9979, 995.72656, 1115.2535]
2025-08-07 09:49:47,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:49:47,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 36 minutes, 46 seconds)
2025-08-07 09:51:25,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:51:36,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1269.77747 ± 244.158
2025-08-07 09:51:36,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1131.1769, 1142.0751, 1179.8633, 1724.3993, 1116.866, 1064.1869, 1281.6614, 1224.2959, 1759.3058, 1073.944]
2025-08-07 09:51:36,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:51:36,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 34 minutes, 40 seconds)
2025-08-07 09:53:16,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:53:27,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1097.34607 ± 62.613
2025-08-07 09:53:27,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1219.5077, 1106.6016, 1039.776, 1212.9087, 1047.0804, 1056.2772, 1047.2906, 1084.2, 1083.5454, 1076.2737]
2025-08-07 09:53:27,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:53:27,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 33 minutes, 26 seconds)
2025-08-07 09:55:06,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:55:17,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1073.82605 ± 274.638
2025-08-07 09:55:17,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1112.5574, 1519.5754, 1137.7415, 1252.6849, 1114.3041, 1057.3234, 1065.2931, 367.6242, 1146.7482, 964.4096]
2025-08-07 09:55:17,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:55:17,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 31 minutes, 19 seconds)
2025-08-07 09:56:57,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:57:08,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1255.44360 ± 254.125
2025-08-07 09:57:08,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1231.3115, 1148.6149, 1365.9318, 1843.7188, 1053.1893, 1578.0421, 1055.3969, 1161.7168, 1102.862, 1013.65173]
2025-08-07 09:57:08,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:57:08,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 29 minutes, 6 seconds)
2025-08-07 09:58:49,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 09:59:00,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1291.94263 ± 186.550
2025-08-07 09:59:00,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1113.1562, 1288.4518, 1196.6365, 1130.5176, 1023.6277, 1519.5095, 1545.1697, 1390.2711, 1162.2643, 1549.8226]
2025-08-07 09:59:00,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 09:59:00,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1291.94) for latency MM1Queue_a033_s075
2025-08-07 09:59:00,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 27 minutes, 31 seconds)
2025-08-07 10:00:40,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:00:51,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1315.59009 ± 169.055
2025-08-07 10:00:51,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1249.8738, 1040.5396, 1407.5504, 1085.6362, 1468.2432, 1477.9786, 1411.3242, 1508.4927, 1110.5864, 1395.6758]
2025-08-07 10:00:51,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:00:51,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1315.59) for latency MM1Queue_a033_s075
2025-08-07 10:00:51,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 26 minutes, 2 seconds)
2025-08-07 10:02:29,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:02:40,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1260.49292 ± 294.492
2025-08-07 10:02:40,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1111.5654, 1139.4055, 975.5705, 1048.827, 1069.5767, 1319.2434, 1115.4169, 1316.7401, 2030.5552, 1478.0292]
2025-08-07 10:02:40,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:02:40,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 23 minutes, 54 seconds)
2025-08-07 10:04:19,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:04:30,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1453.46997 ± 220.033
2025-08-07 10:04:30,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1775.942, 1285.2722, 1059.8116, 1728.4031, 1358.4192, 1556.5988, 1595.3094, 1520.5892, 1176.9792, 1477.3751]
2025-08-07 10:04:30,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:04:30,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1453.47) for latency MM1Queue_a033_s075
2025-08-07 10:04:30,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 21 minutes, 57 seconds)
2025-08-07 10:06:10,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:06:21,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1287.72888 ± 454.332
2025-08-07 10:06:21,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [120.9762, 1240.1417, 1056.3822, 1493.1807, 1859.3878, 1163.6262, 1652.2887, 1631.5383, 1266.5968, 1393.1699]
2025-08-07 10:06:21,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:06:21,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 20 minutes, 6 seconds)
2025-08-07 10:08:01,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:08:12,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1390.28003 ± 297.681
2025-08-07 10:08:12,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1083.8022, 1289.2007, 1623.8778, 1420.3895, 1119.0353, 1197.3391, 1734.8903, 1256.0165, 1141.991, 2036.2572]
2025-08-07 10:08:12,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:08:12,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 18 minutes, 8 seconds)
2025-08-07 10:09:54,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:10:05,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1481.96240 ± 334.747
2025-08-07 10:10:05,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1327.7151, 1268.8143, 1119.1245, 2404.3506, 1429.1823, 1617.7544, 1283.3751, 1502.1665, 1473.1115, 1394.0299]
2025-08-07 10:10:05,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:10:05,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1481.96) for latency MM1Queue_a033_s075
2025-08-07 10:10:05,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 16 minutes, 38 seconds)
2025-08-07 10:11:42,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:11:55,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1150.27759 ± 377.940
2025-08-07 10:11:55,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1077.3967, 1158.3698, 1086.8484, 1102.5105, 1648.2867, 1517.696, 1146.2423, 1289.5714, 1318.534, 157.3197]
2025-08-07 10:11:55,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:11:55,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 15 minutes)
2025-08-07 10:13:35,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:13:46,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1675.60449 ± 461.395
2025-08-07 10:13:46,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2541.0916, 1260.2533, 1221.3057, 1907.1562, 1646.4039, 1040.9784, 1258.0817, 2032.0018, 1680.9354, 2167.8372]
2025-08-07 10:13:46,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:13:46,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1675.60) for latency MM1Queue_a033_s075
2025-08-07 10:13:46,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 13 minutes, 23 seconds)
2025-08-07 10:15:23,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:34,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1353.52222 ± 303.346
2025-08-07 10:15:34,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1467.3531, 678.96185, 1151.3782, 1780.8236, 1367.4258, 1805.387, 1328.2953, 1320.517, 1417.6854, 1217.3942]
2025-08-07 10:15:34,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:15:34,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 10 minutes, 59 seconds)
2025-08-07 10:17:12,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:23,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1396.11157 ± 356.237
2025-08-07 10:17:23,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1167.239, 1129.3536, 1166.1875, 1094.7023, 1644.4641, 1254.3922, 2076.9128, 1227.7167, 1184.7548, 2015.3916]
2025-08-07 10:17:23,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:17:23,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 8 minutes, 30 seconds)
2025-08-07 10:19:03,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:19:14,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1295.43713 ± 584.368
2025-08-07 10:19:14,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1290.9807, 1498.4653, 1306.89, 1488.5223, 1655.5802, 1314.0901, 1230.3062, -269.52798, 1276.2036, 2162.861]
2025-08-07 10:19:14,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:19:14,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 6 minutes, 16 seconds)
2025-08-07 10:20:52,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:21:03,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1452.83704 ± 529.749
2025-08-07 10:21:03,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1666.9232, 1369.2394, 1725.0455, 2196.1216, 1189.6476, 1104.052, 281.0493, 1824.8796, 2026.8542, 1144.557]
2025-08-07 10:21:03,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:21:03,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 4 minutes, 14 seconds)
2025-08-07 10:22:43,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:22:54,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1541.16553 ± 398.890
2025-08-07 10:22:54,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1344.908, 1812.848, 1732.7065, 1215.6464, 1151.9948, 2012.4457, 1165.621, 1476.0245, 1145.4458, 2354.0146]
2025-08-07 10:22:54,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:22:54,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 2 minutes, 30 seconds)
2025-08-07 10:24:35,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:24:46,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1447.41882 ± 305.892
2025-08-07 10:24:46,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1403.6044, 1227.6376, 1194.1731, 1206.5782, 1147.6956, 1338.8524, 1880.0248, 2113.6729, 1361.7607, 1600.1892]
2025-08-07 10:24:46,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:24:46,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 1 minute, 15 seconds)
2025-08-07 10:26:26,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:26:38,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1403.60815 ± 221.925
2025-08-07 10:26:38,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1403.3849, 2013.7238, 1161.1227, 1255.6451, 1329.0807, 1425.3203, 1465.0331, 1367.733, 1236.733, 1378.3048]
2025-08-07 10:26:38,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:26:38,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 19 seconds)
2025-08-07 10:28:16,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:28,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1260.40417 ± 561.261
2025-08-07 10:28:28,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1049.236, -172.85226, 1158.7039, 1314.3926, 1461.7954, 1155.3527, 1662.5468, 1607.7017, 2099.4998, 1267.6648]
2025-08-07 10:28:28,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:28:28,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 58 minutes, 15 seconds)
2025-08-07 10:30:07,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:18,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1442.38049 ± 265.208
2025-08-07 10:30:18,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1671.159, 1340.016, 1272.0011, 1760.6381, 1337.7521, 1429.3279, 2004.058, 1286.6443, 1184.9169, 1137.2909]
2025-08-07 10:30:18,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:30:18,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 56 minutes, 35 seconds)
2025-08-07 10:31:57,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:32:08,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1249.67456 ± 198.920
2025-08-07 10:32:08,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1568.1577, 1182.2213, 1266.0693, 1290.4608, 1400.9324, 1281.7554, 1155.7169, 1290.7188, 750.0396, 1310.6738]
2025-08-07 10:32:08,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:32:08,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 54 minutes, 33 seconds)
2025-08-07 10:33:47,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:58,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1493.16382 ± 454.011
2025-08-07 10:33:58,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1155.9904, 1449.9376, 1340.6927, 2570.3474, 1976.6819, 858.692, 1474.1698, 1242.1875, 1597.0443, 1265.8958]
2025-08-07 10:33:58,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:33:58,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 52 minutes, 17 seconds)
2025-08-07 10:35:39,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:50,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1639.40234 ± 402.701
2025-08-07 10:35:50,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1830.7129, 1407.6501, 1703.2734, 1163.549, 2307.4453, 1458.7399, 1101.3707, 1726.6848, 1379.1111, 2315.4863]
2025-08-07 10:35:50,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:35:50,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 50 minutes, 16 seconds)
2025-08-07 10:37:28,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:39,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1484.73071 ± 331.843
2025-08-07 10:37:39,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1074.0486, 1568.5753, 1654.2865, 2259.3608, 1197.5854, 1485.7365, 1184.137, 1659.2903, 1572.3788, 1191.9082]
2025-08-07 10:37:39,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:37:39,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 48 minutes, 21 seconds)
2025-08-07 10:39:16,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:27,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1524.75623 ± 380.806
2025-08-07 10:39:27,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1255.5138, 1824.7283, 1257.5931, 2214.933, 1289.8315, 1460.6064, 1130.7639, 1132.7694, 1546.7511, 2134.071]
2025-08-07 10:39:27,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:39:27,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 46 minutes)
2025-08-07 10:41:06,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:41:17,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1485.33411 ± 400.731
2025-08-07 10:41:17,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1163.8832, 1207.033, 2163.8726, 1754.044, 1256.7703, 2095.184, 1784.1444, 1094.655, 1275.3805, 1058.3737]
2025-08-07 10:41:17,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:41:17,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 44 minutes, 12 seconds)
2025-08-07 10:42:57,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:43:08,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1447.06702 ± 385.660
2025-08-07 10:43:08,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1416.0592, 2388.0725, 1637.4698, 1193.7175, 1179.7588, 1263.864, 1440.1156, 1794.1274, 1036.31, 1121.1737]
2025-08-07 10:43:08,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:43:08,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 42 minutes, 38 seconds)
2025-08-07 10:44:45,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:56,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1448.00806 ± 251.379
2025-08-07 10:44:56,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1405.8401, 1756.7257, 1331.6416, 1819.0402, 1626.6332, 1189.966, 1491.7964, 1653.9504, 1073.1333, 1131.3538]
2025-08-07 10:44:56,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:44:56,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 40 minutes, 4 seconds)
2025-08-07 10:46:34,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:45,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1652.21021 ± 614.211
2025-08-07 10:46:45,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2073.7336, 1838.4314, 302.90472, 1909.8197, 2516.4888, 1327.4845, 1303.4418, 2037.1287, 1083.4731, 2129.1963]
2025-08-07 10:46:45,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:46:45,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 38 minutes, 13 seconds)
2025-08-07 10:48:24,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:48:37,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1435.87512 ± 323.262
2025-08-07 10:48:37,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1100.0261, 1899.2207, 1125.1306, 1259.3307, 1830.0848, 1900.8533, 1285.0376, 1210.2946, 1636.0736, 1112.6996]
2025-08-07 10:48:37,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:48:37,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 37 minutes, 11 seconds)
2025-08-07 10:50:18,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:50:30,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1587.24475 ± 483.510
2025-08-07 10:50:30,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1151.7839, 1955.4828, 1491.7908, 2700.465, 1193.3402, 1198.5903, 1419.8591, 2036.3186, 1110.192, 1614.6237]
2025-08-07 10:50:30,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:50:30,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 35 minutes, 51 seconds)
2025-08-07 10:52:11,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:52:22,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1516.61499 ± 271.950
2025-08-07 10:52:22,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1357.3295, 1310.1992, 1736.6892, 1581.134, 1725.3013, 1993.2953, 1221.7222, 1252.4323, 1802.6031, 1185.4431]
2025-08-07 10:52:22,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:52:22,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 34 minutes, 10 seconds)
2025-08-07 10:53:59,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:10,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1529.39380 ± 340.316
2025-08-07 10:54:10,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1214.1573, 1429.8257, 1405.1562, 1299.4954, 1699.4547, 1439.0157, 1443.633, 1303.7201, 2468.2083, 1591.2712]
2025-08-07 10:54:10,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:54:11,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 32 minutes, 28 seconds)
2025-08-07 10:55:52,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:03,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1675.81250 ± 504.618
2025-08-07 10:56:03,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1112.8816, 1880.4435, 2082.4995, 1532.9503, 1629.8069, 2369.4587, 2538.237, 1427.9159, 966.31445, 1217.6182]
2025-08-07 10:56:03,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:56:03,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1675.81) for latency MM1Queue_a033_s075
2025-08-07 10:56:03,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 31 minutes, 10 seconds)
2025-08-07 10:57:43,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:57:55,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1947.52502 ± 527.672
2025-08-07 10:57:55,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2252.9172, 1279.7534, 1170.8237, 2668.4893, 2447.94, 1506.7656, 1731.773, 1863.7399, 1825.1892, 2727.86]
2025-08-07 10:57:55,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:57:55,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1947.53) for latency MM1Queue_a033_s075
2025-08-07 10:57:55,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 29 minutes, 21 seconds)
2025-08-07 10:59:33,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:59:44,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1596.89270 ± 356.313
2025-08-07 10:59:44,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1415.041, 1697.3783, 1215.0854, 2108.9265, 2035.7621, 1430.1356, 1479.4607, 2145.3687, 1202.598, 1239.1708]
2025-08-07 10:59:44,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 10:59:44,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 26 minutes, 48 seconds)
2025-08-07 11:01:21,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:34,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1679.05664 ± 358.554
2025-08-07 11:01:34,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1323.6678, 2057.6746, 1570.9993, 1628.7565, 1537.1561, 1554.4984, 1581.151, 2047.3252, 1107.377, 2381.9602]
2025-08-07 11:01:34,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:01:34,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 24 minutes, 41 seconds)
2025-08-07 11:03:14,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:25,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1514.81042 ± 410.885
2025-08-07 11:03:25,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2389.3174, 1069.0837, 1351.1685, 1476.2743, 1179.9868, 1831.5386, 2017.4716, 1295.165, 1092.3654, 1445.7338]
2025-08-07 11:03:25,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:03:25,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 23 minutes, 13 seconds)
2025-08-07 11:05:05,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:16,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1500.78247 ± 476.410
2025-08-07 11:05:16,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1351.3143, 2839.6057, 1189.0446, 1356.7809, 1219.3312, 1355.8583, 1082.559, 1575.9083, 1685.3665, 1352.0565]
2025-08-07 11:05:16,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:05:16,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 21 minutes, 5 seconds)
2025-08-07 11:06:56,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:07,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1656.53455 ± 526.598
2025-08-07 11:07:07,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2004.839, 1746.3164, 1375.0693, 1352.2706, 1581.8053, 1319.1523, 1215.4915, 2281.1367, 923.47437, 2765.7893]
2025-08-07 11:07:07,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:07:07,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 19 minutes, 1 second)
2025-08-07 11:08:45,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:56,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1597.27100 ± 523.484
2025-08-07 11:08:56,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2563.6619, 1161.8821, 2343.9277, 1248.3689, 2178.5889, 1079.3071, 1248.8422, 1529.7583, 1449.697, 1168.6749]
2025-08-07 11:08:56,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:08:56,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 17 minutes, 20 seconds)
2025-08-07 11:10:36,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:47,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1654.55115 ± 344.428
2025-08-07 11:10:47,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1387.9421, 1777.5833, 1597.7289, 1179.1133, 2395.731, 2035.1215, 1527.162, 1496.4015, 1821.8892, 1326.8397]
2025-08-07 11:10:47,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:10:47,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 15 minutes, 34 seconds)
2025-08-07 11:12:27,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:38,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2025.00098 ± 555.349
2025-08-07 11:12:38,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2439.8042, 2963.1013, 1467.752, 1282.4496, 1263.9515, 1562.9146, 2177.3557, 2325.6345, 2325.4836, 2441.563]
2025-08-07 11:12:38,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:12:38,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (2025.00) for latency MM1Queue_a033_s075
2025-08-07 11:12:38,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 13 minutes, 38 seconds)
2025-08-07 11:14:15,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:26,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1830.45044 ± 693.520
2025-08-07 11:14:26,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1285.5339, 2675.5415, 1662.891, 2743.0356, 1359.9338, 1466.7363, 1807.4945, 2265.7722, 470.04047, 2567.5251]
2025-08-07 11:14:26,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:14:26,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 11 minutes, 33 seconds)
2025-08-07 11:16:03,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:16:14,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1564.28723 ± 596.116
2025-08-07 11:16:14,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1359.8717, 1187.3143, 1289.0795, 1065.6531, 2166.2334, 1245.0345, 1617.8066, 1173.7217, 3115.2883, 1422.8691]
2025-08-07 11:16:14,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:16:14,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 9 minutes, 21 seconds)
2025-08-07 11:17:53,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:18:04,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1679.96411 ± 522.895
2025-08-07 11:18:04,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1205.1301, 1306.9181, 1716.5753, 2542.2664, 2046.2843, 2546.6257, 1118.1478, 1106.9087, 1400.191, 1810.5917]
2025-08-07 11:18:04,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:18:04,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 7 minutes, 33 seconds)
2025-08-07 11:19:44,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:19:55,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2120.66260 ± 565.360
2025-08-07 11:19:55,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1603.3379, 2476.3171, 2703.6968, 3076.4907, 1341.1208, 2243.1, 2492.6013, 1550.3782, 1455.8602, 2263.7222]
2025-08-07 11:19:55,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:19:55,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (2120.66) for latency MM1Queue_a033_s075
2025-08-07 11:19:55,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 5 minutes, 45 seconds)
2025-08-07 11:21:33,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:21:44,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1828.31616 ± 630.678
2025-08-07 11:21:44,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1426.1195, 1908.7281, 2415.936, 472.2156, 2727.2002, 1376.7438, 2604.1455, 1830.862, 1728.8726, 1792.3407]
2025-08-07 11:21:44,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:21:44,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 3 minutes, 47 seconds)
2025-08-07 11:23:21,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:23:32,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1892.21912 ± 369.575
2025-08-07 11:23:32,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2319.8145, 1938.5518, 1721.7007, 1484.7129, 1549.7529, 2113.1062, 1464.0697, 1601.9282, 2133.6272, 2594.9285]
2025-08-07 11:23:32,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:23:32,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 1 minute, 48 seconds)
2025-08-07 11:25:08,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:25:21,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1837.99744 ± 475.529
2025-08-07 11:25:21,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1869.7931, 1442.458, 1286.2417, 1505.829, 1450.5378, 2173.5583, 2452.2122, 2811.225, 1520.2101, 1867.9086]
2025-08-07 11:25:21,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:25:21,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 7 seconds)
2025-08-07 11:26:57,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:27:08,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1692.04907 ± 423.156
2025-08-07 11:27:08,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1255.6724, 1377.1843, 1342.4774, 2254.2637, 1588.996, 1852.0767, 2655.2468, 1548.8756, 1420.3145, 1625.3828]
2025-08-07 11:27:08,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:27:08,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 58 minutes, 2 seconds)
2025-08-07 11:28:49,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:29:00,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1811.08594 ± 392.947
2025-08-07 11:29:00,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1328.9645, 1342.3326, 1474.4469, 2347.5393, 1646.9292, 2397.5222, 2264.5735, 1980.6311, 1809.3068, 1518.6133]
2025-08-07 11:29:00,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:29:00,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 56 minutes, 19 seconds)
2025-08-07 11:30:40,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:30:53,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1814.06738 ± 454.802
2025-08-07 11:30:53,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2952.8713, 1297.5228, 1834.5726, 1817.6437, 1813.1442, 1975.3622, 1598.6941, 2075.3455, 1397.18, 1378.3374]
2025-08-07 11:30:53,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:30:53,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 54 minutes, 53 seconds)
2025-08-07 11:32:41,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:32:53,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1532.13403 ± 226.902
2025-08-07 11:32:53,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1429.1157, 1597.4456, 1946.9755, 1407.1776, 1345.4772, 1849.0044, 1499.631, 1223.7844, 1711.3799, 1311.3484]
2025-08-07 11:32:53,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:32:53,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 54 minutes, 13 seconds)
2025-08-07 11:34:40,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:53,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1696.40979 ± 816.563
2025-08-07 11:34:53,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-123.2128, 1350.5585, 1790.8667, 1345.0702, 2110.382, 1198.342, 1890.4539, 3023.7864, 2636.4636, 1741.386]
2025-08-07 11:34:53,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:34:53,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 53 minutes, 21 seconds)
2025-08-07 11:36:40,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:51,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1664.90979 ± 400.183
2025-08-07 11:36:51,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1866.1003, 1196.0914, 1828.6714, 1707.9135, 1299.2448, 2620.6206, 1445.715, 1234.8458, 1837.2495, 1612.6449]
2025-08-07 11:36:51,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:36:51,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 52 minutes, 28 seconds)
2025-08-07 11:38:38,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:38:50,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2013.48633 ± 513.601
2025-08-07 11:38:50,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1929.7935, 2617.5935, 1997.406, 2535.9429, 1955.2496, 2646.6375, 1498.6178, 1194.1869, 2434.8262, 1324.6091]
2025-08-07 11:38:50,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:38:50,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 6 seconds)
2025-08-07 11:40:37,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:40:50,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1264.46167 ± 531.811
2025-08-07 11:40:50,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1614.3989, 1105.2993, 1269.589, -220.62074, 1457.6396, 1434.743, 1211.4828, 1505.7668, 1438.5822, 1827.7365]
2025-08-07 11:40:50,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:40:50,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 42 seconds)
2025-08-07 11:42:36,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:42:47,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1705.20312 ± 438.461
2025-08-07 11:42:47,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1255.8975, 1787.3812, 2714.7798, 1212.8159, 1644.8395, 1169.5742, 1732.3975, 2073.5378, 1621.0079, 1839.8018]
2025-08-07 11:42:47,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:42:47,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 33 seconds)
2025-08-07 11:44:37,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:44:48,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1530.41724 ± 349.180
2025-08-07 11:44:48,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1505.8362, 2288.5562, 1947.8436, 1485.0143, 1179.2809, 1182.8137, 1620.513, 1246.0825, 1660.647, 1187.5841]
2025-08-07 11:44:48,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:44:48,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 40 seconds)
2025-08-07 11:46:38,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:46:49,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1751.95862 ± 442.828
2025-08-07 11:46:49,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2273.8965, 1267.135, 2400.12, 1453.1765, 2225.4473, 1373.2496, 1479.3794, 1616.6359, 1224.7646, 2205.782]
2025-08-07 11:46:49,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:46:49,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 49 seconds)
2025-08-07 11:48:38,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:48:49,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1813.66626 ± 552.304
2025-08-07 11:48:49,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1232.3696, 1992.9147, 2698.5073, 1374.03, 1501.1425, 1699.8392, 1925.8892, 2902.7554, 1231.2001, 1578.015]
2025-08-07 11:48:49,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:48:49,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 57 seconds)
2025-08-07 11:50:35,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:50:47,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1706.42554 ± 339.527
2025-08-07 11:50:47,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1831.3762, 1180.2673, 1988.9983, 1426.5073, 2413.5571, 1430.2695, 1726.3772, 1796.3285, 1403.7362, 1866.8373]
2025-08-07 11:50:47,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:50:47,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 46 seconds)
2025-08-07 11:52:35,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:52:48,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2116.15552 ± 559.585
2025-08-07 11:52:48,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1251.4952, 1525.3761, 2384.1697, 2600.6917, 2096.413, 1853.5513, 1649.352, 2854.783, 3038.8945, 1906.8278]
2025-08-07 11:52:48,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:52:48,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 1 second)
2025-08-07 11:54:34,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:54:45,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1585.52026 ± 374.255
2025-08-07 11:54:45,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1718.578, 1204.873, 1441.7723, 1375.3218, 1285.0377, 1490.4932, 1764.3914, 1628.5333, 1364.98, 2581.223]
2025-08-07 11:54:45,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:54:45,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 49 seconds)
2025-08-07 11:56:33,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:56:46,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1696.20776 ± 397.670
2025-08-07 11:56:46,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1255.1188, 1655.3271, 1710.8496, 2619.6016, 1716.6178, 1409.1223, 2213.0598, 1383.3633, 1439.9685, 1559.0485]
2025-08-07 11:56:46,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:56:46,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 50 seconds)
2025-08-07 11:58:33,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:58:44,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2076.77808 ± 455.579
2025-08-07 11:58:44,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2546.0435, 1415.3741, 2305.5554, 2460.5857, 1889.343, 2507.6243, 2700.6887, 1838.4691, 1533.1324, 1570.9648]
2025-08-07 11:58:44,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 11:58:44,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 45 seconds)
2025-08-07 12:00:34,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:00:47,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2001.83179 ± 757.386
2025-08-07 12:00:47,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1741.5967, 2814.3594, 1704.8105, 1839.1415, 1360.644, 1234.7413, 1264.5095, 3283.7021, 3227.0488, 1547.7645]
2025-08-07 12:00:47,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:00:47,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 1 second)
2025-08-07 12:02:35,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:02:46,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2209.78149 ± 718.450
2025-08-07 12:02:46,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2677.1843, 3243.3596, 2702.3027, 1302.5724, 1866.7925, 3021.1565, 2085.4944, 1346.6359, 1180.4955, 2671.8179]
2025-08-07 12:02:46,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:02:46,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (2209.78) for latency MM1Queue_a033_s075
2025-08-07 12:02:46,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 56 seconds)
2025-08-07 12:04:34,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:04:45,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1842.41956 ± 549.560
2025-08-07 12:04:45,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2646.3386, 1250.0354, 1726.0605, 1275.3336, 1723.62, 1597.8878, 2876.9048, 1514.2167, 2369.3562, 1444.4421]
2025-08-07 12:04:45,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:04:45,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 59 seconds)
2025-08-07 12:06:35,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:06:48,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1601.48193 ± 525.263
2025-08-07 12:06:48,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2231.1008, 1519.6415, 2841.711, 1749.5618, 1156.0204, 1244.2139, 1085.3888, 1201.2068, 1393.9802, 1591.9945]
2025-08-07 12:06:48,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:06:48,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 4 seconds)
2025-08-07 12:08:34,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:08:46,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1839.80786 ± 477.945
2025-08-07 12:08:46,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1177.5125, 2073.5544, 1432.2334, 1809.8583, 2597.8328, 1368.3896, 1531.7714, 1979.9678, 1747.9655, 2678.9912]
2025-08-07 12:08:46,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:08:46,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 2 seconds)
2025-08-07 12:10:35,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:10:47,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1865.44067 ± 588.732
2025-08-07 12:10:47,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1456.7, 1720.6028, 1116.4784, 2021.9717, 2138.074, 2001.2555, 3335.2136, 2004.0049, 1573.6307, 1286.4736]
2025-08-07 12:10:47,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:10:47,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 59 seconds)
2025-08-07 12:12:37,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:48,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1830.28687 ± 665.167
2025-08-07 12:12:48,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1485.5872, 2528.5925, 2637.658, 2342.446, 1698.3525, 686.0684, 1705.9545, 2746.4324, 1324.8429, 1146.9362]
2025-08-07 12:12:48,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:12:48,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 3 seconds)
2025-08-07 12:14:35,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:47,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2291.63306 ± 857.389
2025-08-07 12:14:47,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3109.4626, 3498.732, 1543.5337, 1412.8633, 2388.3086, 3308.5667, 1500.0409, 1761.7151, 3177.454, 1215.655]
2025-08-07 12:14:47,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:14:47,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (2291.63) for latency MM1Queue_a033_s075
2025-08-07 12:14:47,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 2 seconds)
2025-08-07 12:16:38,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:16:49,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1802.68848 ± 449.791
2025-08-07 12:16:49,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2687.8962, 2185.5852, 1133.8403, 1259.7523, 2238.338, 1653.8335, 1749.4972, 1540.0049, 1971.5846, 1606.5533]
2025-08-07 12:16:49,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:16:49,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 1 second)
2025-08-07 12:18:36,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:18:47,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1940.12463 ± 595.905
2025-08-07 12:18:47,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1438.6406, 2386.1196, 1278.7512, 2717.1174, 1953.4087, 1377.5029, 1521.174, 2713.4114, 1314.6716, 2700.4482]
2025-08-07 12:18:47,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:18:47,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 1 second)
2025-08-07 12:20:35,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:20:48,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1850.98438 ± 662.980
2025-08-07 12:20:48,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1669.0878, 1573.0574, 2202.6638, 1208.5808, 2581.763, 1147.2513, 1533.3661, 1237.4786, 3339.0908, 2017.504]
2025-08-07 12:20:48,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:20:48,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 1 second)
2025-08-07 12:22:34,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:22:45,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2048.74658 ± 649.747
2025-08-07 12:22:45,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2142.4177, 2716.721, 1223.8367, 2902.3586, 1786.9419, 2182.118, 1412.0767, 1070.1897, 2048.972, 3001.8318]
2025-08-07 12:22:45,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:22:45,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 57 seconds)
2025-08-07 12:24:33,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:24:44,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1430.06165 ± 426.017
2025-08-07 12:24:44,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1213.7115, 1556.0092, 1138.7426, 1731.9309, 1407.9617, 1409.5767, 1638.8967, 1765.7914, 2034.4453, 403.54947]
2025-08-07 12:24:44,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:24:44,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 58 seconds)
2025-08-07 12:26:33,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:26:44,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1980.99683 ± 696.810
2025-08-07 12:26:44,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2995.8804, 1226.9547, 1705.048, 2069.0916, 1424.4323, 1320.2095, 2073.6094, 1162.7881, 2924.3677, 2907.5876]
2025-08-07 12:26:44,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:26:44,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 58 seconds)
2025-08-07 12:28:34,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:28:45,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1969.20728 ± 612.618
2025-08-07 12:28:45,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2182.4382, 3012.8142, 1320.0643, 2328.6155, 1553.5431, 2493.8816, 2681.885, 1199.226, 1455.5212, 1464.0845]
2025-08-07 12:28:45,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:28:45,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 59 seconds)
2025-08-07 12:30:31,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:30:43,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1638.73279 ± 640.820
2025-08-07 12:30:43,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2866.3513, 1626.9961, 529.6288, 2647.582, 1642.166, 1563.6031, 1430.246, 1295.8951, 1214.3402, 1570.5198]
2025-08-07 12:30:43,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 12:30:43,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1251 [DEBUG]: Training session finished
