2025-08-07 10:15:46,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc25-humanoid/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:15:46,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc25-humanoid/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:15:46,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1496005a7d50>}
2025-08-07 10:15:46,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 10:15:46,915 baseline-bpql-noiseperc25-humanoid:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:15:46,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 10:15:46,934 baseline-bpql-noiseperc25-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=648, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 10:15:46,934 baseline-bpql-noiseperc25-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:15:48,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 10:15:48,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 10:17:37,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:37,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 199.72043 ± 55.898
2025-08-07 10:17:37,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [211.21454, 263.50815, 111.66067, 180.61212, 191.1331, 106.10459, 251.04675, 284.4165, 210.53401, 186.97377]
2025-08-07 10:17:37,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [44.0, 53.0, 22.0, 36.0, 38.0, 21.0, 52.0, 58.0, 43.0, 37.0]
2025-08-07 10:17:37,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (199.72) for latency MM1Queue_a033_s075
2025-08-07 10:17:37,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 59 minutes, 47 seconds)
2025-08-07 10:19:33,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:19:34,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 275.27597 ± 112.329
2025-08-07 10:19:34,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [349.13235, 89.37938, 295.74133, 273.7085, 416.1565, 89.215294, 189.42276, 288.19025, 354.22104, 407.59247]
2025-08-07 10:19:34,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 18.0, 58.0, 56.0, 79.0, 18.0, 37.0, 56.0, 65.0, 74.0]
2025-08-07 10:19:34,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (275.28) for latency MM1Queue_a033_s075
2025-08-07 10:19:34,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 4 minutes, 27 seconds)
2025-08-07 10:21:31,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:21:32,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 218.01485 ± 88.126
2025-08-07 10:21:32,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [283.4516, 94.63648, 260.77646, 323.2406, 307.0294, 137.56067, 118.73575, 107.65133, 237.071, 309.9952]
2025-08-07 10:21:32,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 19.0, 51.0, 64.0, 57.0, 27.0, 23.0, 21.0, 46.0, 59.0]
2025-08-07 10:21:32,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 5 minutes, 4 seconds)
2025-08-07 10:23:27,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:23:28,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 242.59131 ± 107.868
2025-08-07 10:23:28,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [122.80423, 273.74475, 286.11163, 396.24612, 125.131165, 362.62674, 112.861694, 307.81186, 331.44043, 107.13442]
2025-08-07 10:23:28,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 55.0, 56.0, 74.0, 25.0, 67.0, 22.0, 61.0, 64.0, 21.0]
2025-08-07 10:23:28,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 3 minutes, 54 seconds)
2025-08-07 10:25:24,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:25:25,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 251.15274 ± 163.831
2025-08-07 10:25:25,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [533.81903, 90.85512, 110.85127, 505.1398, 301.32843, 118.559265, 110.13474, 99.5416, 265.9383, 375.35986]
2025-08-07 10:25:25,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 18.0, 22.0, 97.0, 58.0, 23.0, 22.0, 20.0, 58.0, 75.0]
2025-08-07 10:25:25,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 2 minutes, 36 seconds)
2025-08-07 10:27:21,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:27:22,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 197.03801 ± 130.592
2025-08-07 10:27:22,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [477.7177, 101.75446, 83.71365, 118.4015, 235.95363, 320.80258, 331.89996, 89.0347, 116.331024, 94.77088]
2025-08-07 10:27:22,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 20.0, 17.0, 23.0, 50.0, 61.0, 63.0, 18.0, 23.0, 19.0]
2025-08-07 10:27:22,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 3 minutes, 11 seconds)
2025-08-07 10:29:18,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:29:19,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 311.02081 ± 125.208
2025-08-07 10:29:19,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [229.39359, 447.85202, 90.45909, 392.76407, 337.07587, 307.99405, 304.949, 328.97192, 527.49756, 143.25117]
2025-08-07 10:29:19,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [43.0, 84.0, 18.0, 84.0, 67.0, 59.0, 57.0, 65.0, 109.0, 28.0]
2025-08-07 10:29:19,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (311.02) for latency MM1Queue_a033_s075
2025-08-07 10:29:19,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 1 minute, 25 seconds)
2025-08-07 10:31:16,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:31:16,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 258.37729 ± 135.273
2025-08-07 10:31:16,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [380.13608, 508.54367, 220.56996, 397.07205, 261.82275, 100.04799, 83.802635, 288.77136, 253.17552, 89.83075]
2025-08-07 10:31:16,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 99.0, 43.0, 86.0, 49.0, 20.0, 17.0, 55.0, 48.0, 18.0]
2025-08-07 10:31:16,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 59 minutes, 19 seconds)
2025-08-07 10:33:13,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:14,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 227.95406 ± 129.319
2025-08-07 10:33:14,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [273.8728, 117.28016, 94.665306, 451.94672, 131.72142, 336.83002, 95.46921, 344.73978, 343.72708, 89.28803]
2025-08-07 10:33:14,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 23.0, 19.0, 86.0, 26.0, 65.0, 19.0, 65.0, 73.0, 18.0]
2025-08-07 10:33:14,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 57 minutes, 38 seconds)
2025-08-07 10:35:10,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:10,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 232.35019 ± 165.581
2025-08-07 10:35:10,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [312.38394, 453.15375, 122.35132, 101.58271, 95.77265, 398.6396, 95.12189, 535.5696, 107.78186, 101.14457]
2025-08-07 10:35:10,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 86.0, 24.0, 20.0, 19.0, 80.0, 19.0, 102.0, 21.0, 20.0]
2025-08-07 10:35:10,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 55 minutes, 38 seconds)
2025-08-07 10:37:07,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:08,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 295.99191 ± 117.223
2025-08-07 10:37:08,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [380.64743, 377.37946, 107.61036, 304.1711, 416.2036, 139.98567, 280.20737, 360.5279, 146.81966, 446.36667]
2025-08-07 10:37:08,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 74.0, 21.0, 56.0, 79.0, 27.0, 63.0, 78.0, 28.0, 100.0]
2025-08-07 10:37:08,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 53 minutes, 48 seconds)
2025-08-07 10:39:04,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:05,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 141.62564 ± 70.599
2025-08-07 10:39:05,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [240.86292, 130.12741, 101.553055, 128.73152, 94.604546, 311.5418, 83.955315, 107.2984, 95.62173, 121.959755]
2025-08-07 10:39:05,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [47.0, 25.0, 20.0, 25.0, 19.0, 59.0, 17.0, 21.0, 19.0, 24.0]
2025-08-07 10:39:05,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 51 minutes, 39 seconds)
2025-08-07 10:41:02,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:41:03,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 217.23257 ± 180.125
2025-08-07 10:41:03,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [112.04187, 89.15709, 117.114395, 107.577126, 343.3208, 106.68252, 99.562256, 627.92175, 450.69827, 118.249626]
2025-08-07 10:41:03,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 18.0, 23.0, 21.0, 65.0, 21.0, 20.0, 118.0, 95.0, 23.0]
2025-08-07 10:41:03,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 49 minutes, 59 seconds)
2025-08-07 10:42:59,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:42:59,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 196.20676 ± 122.776
2025-08-07 10:42:59,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [130.4641, 83.88051, 337.81332, 278.16068, 247.9281, 119.739105, 111.07901, 462.3407, 88.922165, 101.740036]
2025-08-07 10:42:59,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 17.0, 63.0, 51.0, 48.0, 23.0, 22.0, 89.0, 18.0, 20.0]
2025-08-07 10:42:59,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 47 minutes, 57 seconds)
2025-08-07 10:44:55,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:56,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 237.02454 ± 130.688
2025-08-07 10:44:56,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [125.49265, 149.36249, 458.91962, 202.50807, 100.978294, 377.71143, 329.3318, 392.91495, 110.60564, 122.42036]
2025-08-07 10:44:56,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 29.0, 90.0, 42.0, 20.0, 76.0, 70.0, 74.0, 22.0, 24.0]
2025-08-07 10:44:56,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 45 minutes, 53 seconds)
2025-08-07 10:46:53,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:53,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 202.56036 ± 140.472
2025-08-07 10:46:53,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [96.50952, 123.363495, 408.14575, 83.56065, 133.81618, 489.78485, 135.74734, 328.07843, 114.272316, 112.32502]
2025-08-07 10:46:53,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 24.0, 75.0, 17.0, 26.0, 92.0, 26.0, 74.0, 22.0, 22.0]
2025-08-07 10:46:53,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 43 minutes, 58 seconds)
2025-08-07 10:48:50,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:48:51,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 259.16479 ± 129.048
2025-08-07 10:48:51,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [304.9477, 393.46915, 402.89142, 136.95367, 323.681, 106.67389, 89.75485, 95.16757, 411.74084, 326.36765]
2025-08-07 10:48:51,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 73.0, 77.0, 27.0, 70.0, 21.0, 18.0, 19.0, 77.0, 61.0]
2025-08-07 10:48:51,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 42 minutes, 11 seconds)
2025-08-07 10:50:47,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:50:48,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 231.63403 ± 116.313
2025-08-07 10:50:48,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [299.15463, 95.514175, 290.39124, 83.72236, 324.14722, 266.46143, 107.55601, 335.08585, 414.78937, 99.51809]
2025-08-07 10:50:48,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 19.0, 56.0, 17.0, 61.0, 50.0, 21.0, 62.0, 78.0, 20.0]
2025-08-07 10:50:48,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 39 minutes, 58 seconds)
2025-08-07 10:52:44,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:52:45,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 218.01968 ± 119.030
2025-08-07 10:52:45,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [318.97104, 95.26512, 89.68082, 89.27872, 349.60577, 325.63895, 100.4751, 123.4216, 342.16574, 345.69385]
2025-08-07 10:52:45,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 19.0, 18.0, 18.0, 66.0, 61.0, 20.0, 24.0, 63.0, 67.0]
2025-08-07 10:52:45,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 38 minutes, 5 seconds)
2025-08-07 10:54:41,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:42,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 332.51962 ± 137.715
2025-08-07 10:54:42,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [580.8615, 325.64914, 145.02715, 403.5794, 301.2489, 292.75378, 408.4188, 478.95905, 293.8059, 94.89291]
2025-08-07 10:54:42,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 61.0, 28.0, 88.0, 54.0, 58.0, 77.0, 89.0, 58.0, 19.0]
2025-08-07 10:54:42,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (332.52) for latency MM1Queue_a033_s075
2025-08-07 10:54:42,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 36 minutes, 25 seconds)
2025-08-07 10:56:39,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:40,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 240.65378 ± 152.642
2025-08-07 10:56:40,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [474.0655, 89.9019, 428.88223, 88.79045, 100.48283, 269.22437, 326.31726, 420.12094, 95.9775, 112.77468]
2025-08-07 10:56:40,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 18.0, 81.0, 18.0, 20.0, 54.0, 74.0, 89.0, 19.0, 22.0]
2025-08-07 10:56:40,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 34 minutes, 28 seconds)
2025-08-07 10:58:37,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:58:37,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 154.94453 ± 101.998
2025-08-07 10:58:37,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [418.35965, 142.23415, 83.95245, 273.2845, 122.12045, 107.10816, 101.20349, 111.81969, 100.38306, 88.97978]
2025-08-07 10:58:37,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 28.0, 17.0, 56.0, 24.0, 21.0, 20.0, 22.0, 20.0, 18.0]
2025-08-07 10:58:37,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 32 minutes, 29 seconds)
2025-08-07 11:00:35,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:00:35,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 275.73529 ± 135.239
2025-08-07 11:00:35,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [414.43207, 357.8122, 117.60255, 355.28638, 134.34723, 392.26328, 95.619835, 415.1437, 101.02603, 373.81927]
2025-08-07 11:00:35,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 66.0, 23.0, 76.0, 26.0, 70.0, 19.0, 80.0, 20.0, 81.0]
2025-08-07 11:00:35,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 30 minutes, 49 seconds)
2025-08-07 11:02:32,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:02:32,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 213.57259 ± 126.704
2025-08-07 11:02:32,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [89.92228, 101.57209, 361.5873, 382.91617, 290.81363, 95.02746, 402.3936, 214.25226, 101.74419, 95.49682]
2025-08-07 11:02:32,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 20.0, 69.0, 72.0, 54.0, 19.0, 75.0, 41.0, 20.0, 19.0]
2025-08-07 11:02:32,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 28 minutes, 47 seconds)
2025-08-07 11:04:29,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:04:30,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 318.20389 ± 146.563
2025-08-07 11:04:30,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [412.56757, 100.915855, 412.08685, 380.36972, 401.33792, 95.07318, 95.029884, 384.4477, 437.43378, 462.77634]
2025-08-07 11:04:30,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 20.0, 78.0, 72.0, 73.0, 19.0, 19.0, 69.0, 96.0, 94.0]
2025-08-07 11:04:30,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 26 minutes, 58 seconds)
2025-08-07 11:06:27,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:06:27,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 280.46753 ± 207.705
2025-08-07 11:06:27,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [322.20435, 649.25104, 89.907364, 95.497925, 469.86002, 589.51514, 88.79934, 101.295586, 129.50215, 268.84232]
2025-08-07 11:06:27,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 122.0, 18.0, 19.0, 87.0, 113.0, 18.0, 20.0, 25.0, 51.0]
2025-08-07 11:06:27,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 24 minutes, 55 seconds)
2025-08-07 11:08:25,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:08:26,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 298.19794 ± 160.946
2025-08-07 11:08:26,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [462.17978, 360.45782, 315.0945, 228.94833, 89.436775, 88.75933, 89.5692, 503.47552, 323.51837, 520.53973]
2025-08-07 11:08:26,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 66.0, 58.0, 47.0, 18.0, 18.0, 18.0, 100.0, 61.0, 97.0]
2025-08-07 11:08:26,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 23 minutes, 17 seconds)
2025-08-07 11:10:22,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:23,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 258.51172 ± 155.758
2025-08-07 11:10:23,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [145.3205, 111.00343, 450.38324, 410.59796, 94.42826, 501.90262, 375.23083, 287.30038, 101.251175, 107.69884]
2025-08-07 11:10:23,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 22.0, 97.0, 92.0, 19.0, 92.0, 73.0, 54.0, 20.0, 21.0]
2025-08-07 11:10:23,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 21 minutes, 3 seconds)
2025-08-07 11:12:20,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:21,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 236.01279 ± 147.047
2025-08-07 11:12:21,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [575.53754, 118.44614, 300.76175, 368.56912, 144.96144, 106.88423, 95.34164, 107.41746, 242.70837, 299.50046]
2025-08-07 11:12:21,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 23.0, 56.0, 69.0, 28.0, 21.0, 19.0, 21.0, 45.0, 55.0]
2025-08-07 11:12:21,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 19 minutes, 13 seconds)
2025-08-07 11:14:17,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:18,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 260.61139 ± 133.403
2025-08-07 11:14:18,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [479.3611, 324.8061, 94.676636, 118.30838, 372.84543, 375.88437, 124.33318, 359.47256, 106.10881, 250.31734]
2025-08-07 11:14:18,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 62.0, 19.0, 23.0, 78.0, 72.0, 24.0, 67.0, 21.0, 47.0]
2025-08-07 11:14:18,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 17 minutes, 9 seconds)
2025-08-07 11:16:15,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:16:16,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 247.02261 ± 131.047
2025-08-07 11:16:16,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [306.5144, 95.27197, 134.85657, 317.31516, 491.56512, 89.77692, 357.876, 95.66579, 245.38206, 336.00214]
2025-08-07 11:16:16,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 19.0, 26.0, 58.0, 90.0, 18.0, 66.0, 19.0, 46.0, 61.0]
2025-08-07 11:16:16,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 15 minutes, 23 seconds)
2025-08-07 11:18:13,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:18:13,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 107.30339 ± 14.567
2025-08-07 11:18:13,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [95.533035, 107.26342, 136.3373, 113.61611, 83.87505, 107.56157, 90.28452, 107.21106, 123.48252, 107.86944]
2025-08-07 11:18:13,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 21.0, 26.0, 22.0, 17.0, 21.0, 18.0, 21.0, 24.0, 21.0]
2025-08-07 11:18:13,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 13 minutes, 4 seconds)
2025-08-07 11:20:10,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:11,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 171.48631 ± 112.474
2025-08-07 11:20:11,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [88.79092, 344.23746, 125.27721, 100.54872, 336.63925, 83.808815, 89.23888, 346.23114, 109.665436, 90.425316]
2025-08-07 11:20:11,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 66.0, 24.0, 20.0, 65.0, 17.0, 18.0, 67.0, 22.0, 18.0]
2025-08-07 11:20:11,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 11 minutes, 9 seconds)
2025-08-07 11:22:08,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:08,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 289.23343 ± 77.485
2025-08-07 11:22:08,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [283.74847, 337.8557, 356.02924, 414.51212, 288.5889, 226.31625, 289.1717, 326.11194, 257.24957, 112.75043]
2025-08-07 11:22:08,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 64.0, 64.0, 78.0, 55.0, 45.0, 62.0, 61.0, 48.0, 22.0]
2025-08-07 11:22:08,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 9 minutes, 18 seconds)
2025-08-07 11:24:05,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:24:06,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 243.52548 ± 138.970
2025-08-07 11:24:06,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [308.0944, 281.9456, 124.33385, 119.02558, 90.46406, 335.14847, 175.99051, 89.26121, 517.67334, 393.31775]
2025-08-07 11:24:06,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 55.0, 24.0, 23.0, 18.0, 76.0, 34.0, 18.0, 95.0, 73.0]
2025-08-07 11:24:06,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 7 minutes, 24 seconds)
2025-08-07 11:26:04,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:26:04,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 182.46729 ± 120.591
2025-08-07 11:26:04,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [83.897446, 150.5464, 392.96814, 416.75192, 252.32312, 101.70491, 83.84899, 101.362305, 112.45131, 128.8183]
2025-08-07 11:26:04,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 29.0, 90.0, 76.0, 47.0, 20.0, 17.0, 20.0, 22.0, 25.0]
2025-08-07 11:26:04,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 5 minutes, 25 seconds)
2025-08-07 11:28:02,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:03,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 208.28104 ± 131.159
2025-08-07 11:28:03,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [377.61417, 89.11537, 258.6624, 101.66613, 450.06488, 113.23116, 144.9551, 349.3173, 97.054016, 101.12979]
2025-08-07 11:28:03,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 18.0, 48.0, 20.0, 99.0, 22.0, 28.0, 62.0, 19.0, 20.0]
2025-08-07 11:28:03,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 3 minutes, 45 seconds)
2025-08-07 11:29:58,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:29:59,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 260.30988 ± 149.352
2025-08-07 11:29:59,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [106.01173, 500.5622, 112.41222, 435.51193, 362.87576, 164.45624, 408.02908, 88.65475, 130.57736, 294.00742]
2025-08-07 11:29:59,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 104.0, 22.0, 88.0, 66.0, 31.0, 77.0, 18.0, 28.0, 62.0]
2025-08-07 11:29:59,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 1 minute, 38 seconds)
2025-08-07 11:31:56,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:31:57,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 302.09784 ± 136.099
2025-08-07 11:31:57,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [365.6587, 371.92688, 363.58618, 99.683754, 371.94754, 402.30508, 89.21803, 477.36978, 111.57572, 367.70682]
2025-08-07 11:31:57,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 68.0, 68.0, 20.0, 68.0, 75.0, 18.0, 88.0, 22.0, 81.0]
2025-08-07 11:31:57,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 59 minutes, 36 seconds)
2025-08-07 11:33:53,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:33:54,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 230.30266 ± 134.757
2025-08-07 11:33:54,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [119.5679, 407.44354, 285.02518, 328.73328, 83.90884, 113.096306, 96.09902, 359.381, 90.231575, 419.5399]
2025-08-07 11:33:54,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 74.0, 51.0, 60.0, 17.0, 22.0, 19.0, 66.0, 18.0, 74.0]
2025-08-07 11:33:54,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 57 minutes, 35 seconds)
2025-08-07 11:35:51,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:35:52,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 322.36832 ± 188.256
2025-08-07 11:35:52,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [145.14435, 122.69867, 140.15369, 776.3797, 263.796, 491.962, 288.14417, 258.26917, 336.5062, 400.62924]
2025-08-07 11:35:52,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 24.0, 27.0, 156.0, 50.0, 94.0, 58.0, 47.0, 63.0, 88.0]
2025-08-07 11:35:52,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 55 minutes, 42 seconds)
2025-08-07 11:37:49,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:37:50,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 293.72638 ± 217.703
2025-08-07 11:37:50,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [94.74594, 299.96005, 102.458206, 425.36734, 691.9662, 101.79231, 626.66724, 374.24258, 101.495026, 118.56903]
2025-08-07 11:37:50,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 57.0, 20.0, 79.0, 133.0, 20.0, 119.0, 70.0, 20.0, 24.0]
2025-08-07 11:37:50,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 53 minutes, 29 seconds)
2025-08-07 11:39:47,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:48,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 236.53621 ± 123.378
2025-08-07 11:39:48,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [292.48972, 300.98898, 336.01282, 95.44631, 301.9712, 284.35956, 102.34424, 461.43927, 94.673965, 95.63569]
2025-08-07 11:39:48,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 58.0, 69.0, 19.0, 68.0, 53.0, 20.0, 102.0, 19.0, 19.0]
2025-08-07 11:39:48,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 51 minutes, 48 seconds)
2025-08-07 11:41:45,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:46,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 300.16928 ± 152.116
2025-08-07 11:41:46,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [337.72897, 95.859856, 358.30338, 580.4219, 363.33505, 351.7613, 83.61423, 432.10208, 285.90656, 112.65979]
2025-08-07 11:41:46,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 19.0, 64.0, 125.0, 69.0, 63.0, 17.0, 81.0, 53.0, 22.0]
2025-08-07 11:41:46,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 49 minutes, 56 seconds)
2025-08-07 11:43:42,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:43:43,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 282.52933 ± 170.452
2025-08-07 11:43:43,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [88.824844, 415.64725, 88.58976, 416.83707, 530.9093, 295.52536, 101.735634, 267.62485, 105.60604, 513.9931]
2025-08-07 11:43:43,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 75.0, 18.0, 79.0, 100.0, 56.0, 20.0, 52.0, 21.0, 96.0]
2025-08-07 11:43:43,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 47 minutes, 57 seconds)
2025-08-07 11:45:41,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:45:42,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 323.81189 ± 118.980
2025-08-07 11:45:42,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [337.97955, 295.38736, 455.7608, 100.864105, 377.02988, 345.33685, 125.361595, 462.41293, 307.89465, 430.09097]
2025-08-07 11:45:42,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 64.0, 99.0, 20.0, 80.0, 75.0, 24.0, 101.0, 60.0, 79.0]
2025-08-07 11:45:42,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 46 minutes, 12 seconds)
2025-08-07 11:47:38,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:39,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 175.64149 ± 110.308
2025-08-07 11:47:39,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [352.12827, 89.11147, 90.21323, 129.08258, 96.27434, 261.29233, 394.11145, 121.30658, 134.02217, 88.87246]
2025-08-07 11:47:39,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 18.0, 18.0, 25.0, 19.0, 48.0, 72.0, 24.0, 26.0, 18.0]
2025-08-07 11:47:39,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 44 minutes, 6 seconds)
2025-08-07 11:49:37,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:49:38,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 291.09811 ± 154.234
2025-08-07 11:49:38,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [88.85329, 371.31805, 107.39702, 387.06738, 510.74338, 341.43164, 126.20028, 118.58598, 456.5468, 402.83707]
2025-08-07 11:49:38,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 67.0, 21.0, 71.0, 95.0, 61.0, 24.0, 23.0, 86.0, 76.0]
2025-08-07 11:49:38,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 42 minutes, 18 seconds)
2025-08-07 11:51:34,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:51:35,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 224.56194 ± 148.660
2025-08-07 11:51:35,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [116.78552, 88.989, 335.12534, 140.21536, 93.335945, 421.3544, 83.94341, 381.59045, 115.85957, 468.42026]
2025-08-07 11:51:35,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 18.0, 61.0, 27.0, 19.0, 78.0, 17.0, 83.0, 23.0, 88.0]
2025-08-07 11:51:35,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 40 minutes, 9 seconds)
2025-08-07 11:53:32,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:53:33,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 258.41180 ± 156.524
2025-08-07 11:53:33,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [289.01663, 106.11973, 342.92743, 586.87933, 373.0321, 84.162125, 341.01483, 95.62541, 89.0264, 276.31406]
2025-08-07 11:53:33,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 21.0, 63.0, 110.0, 72.0, 17.0, 67.0, 19.0, 18.0, 51.0]
2025-08-07 11:53:33,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 38 minutes, 15 seconds)
2025-08-07 11:55:30,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:31,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 317.58115 ± 176.151
2025-08-07 11:55:31,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [439.9991, 95.59276, 576.28955, 484.72894, 390.53912, 100.370186, 425.22543, 96.2537, 417.0957, 149.71712]
2025-08-07 11:55:31,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 19.0, 110.0, 93.0, 74.0, 20.0, 82.0, 19.0, 76.0, 30.0]
2025-08-07 11:55:31,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 36 minutes, 8 seconds)
2025-08-07 11:57:28,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:57:29,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 294.77319 ± 201.511
2025-08-07 11:57:29,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [88.877365, 88.605644, 658.39496, 294.48907, 88.80339, 376.33112, 298.38275, 366.173, 88.939125, 598.73535]
2025-08-07 11:57:29,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 124.0, 60.0, 18.0, 72.0, 58.0, 81.0, 18.0, 116.0]
2025-08-07 11:57:29,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 34 minutes, 27 seconds)
2025-08-07 11:59:26,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:59:26,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 335.75662 ± 191.615
2025-08-07 11:59:26,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [522.60736, 344.38776, 343.79373, 696.18774, 274.70303, 298.04086, 111.85458, 111.507454, 546.88257, 107.60116]
2025-08-07 11:59:26,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 63.0, 63.0, 127.0, 51.0, 54.0, 22.0, 22.0, 105.0, 21.0]
2025-08-07 11:59:26,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (335.76) for latency MM1Queue_a033_s075
2025-08-07 11:59:26,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 32 minutes, 13 seconds)
2025-08-07 12:01:24,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:01:25,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 263.20493 ± 177.285
2025-08-07 12:01:25,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [89.69524, 88.98729, 510.66812, 478.41385, 101.61502, 420.77023, 94.572464, 464.00925, 101.22641, 282.09152]
2025-08-07 12:01:25,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 97.0, 87.0, 20.0, 79.0, 19.0, 87.0, 20.0, 54.0]
2025-08-07 12:01:25,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 30 minutes, 26 seconds)
2025-08-07 12:03:22,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:23,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 283.37503 ± 219.042
2025-08-07 12:03:23,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [829.3333, 345.9234, 401.53317, 83.91181, 112.472145, 102.61226, 398.431, 132.55676, 304.12253, 122.85376]
2025-08-07 12:03:23,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 62.0, 73.0, 17.0, 22.0, 20.0, 73.0, 26.0, 56.0, 24.0]
2025-08-07 12:03:23,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 28 minutes, 30 seconds)
2025-08-07 12:05:19,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:20,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 248.62271 ± 159.222
2025-08-07 12:05:20,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [88.73613, 105.771065, 325.41833, 89.14631, 123.180016, 520.7987, 291.77274, 95.36741, 412.48218, 433.55426]
2025-08-07 12:05:20,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 21.0, 72.0, 18.0, 24.0, 114.0, 54.0, 19.0, 74.0, 86.0]
2025-08-07 12:05:20,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 26 minutes, 23 seconds)
2025-08-07 12:07:18,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:07:19,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 277.45099 ± 156.019
2025-08-07 12:07:19,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [420.18124, 534.0277, 115.33859, 411.5447, 291.3415, 89.4923, 88.9121, 349.7943, 107.76878, 366.10867]
2025-08-07 12:07:19,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 104.0, 23.0, 91.0, 57.0, 18.0, 18.0, 75.0, 21.0, 68.0]
2025-08-07 12:07:19,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 24 minutes, 29 seconds)
2025-08-07 12:09:15,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:09:16,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 322.49506 ± 298.361
2025-08-07 12:09:16,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [83.91005, 334.5337, 412.12808, 115.892654, 113.122345, 1098.0282, 479.8957, 89.76556, 95.00823, 402.66595]
2025-08-07 12:09:16,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 61.0, 83.0, 23.0, 22.0, 208.0, 102.0, 18.0, 19.0, 85.0]
2025-08-07 12:09:16,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 22 minutes, 35 seconds)
2025-08-07 12:11:14,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:11:15,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 280.73666 ± 135.883
2025-08-07 12:11:15,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [268.5702, 89.86928, 100.22216, 88.687645, 282.62622, 406.1394, 396.22986, 470.7977, 312.0139, 392.21008]
2025-08-07 12:11:15,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 18.0, 20.0, 18.0, 57.0, 85.0, 72.0, 100.0, 61.0, 72.0]
2025-08-07 12:11:15,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 20 minutes, 42 seconds)
2025-08-07 12:13:13,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:13:13,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 313.32541 ± 157.727
2025-08-07 12:13:13,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [96.22091, 90.41368, 105.85974, 389.03146, 354.93555, 320.2503, 546.45514, 497.3881, 304.90903, 427.79053]
2025-08-07 12:13:13,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 21.0, 71.0, 79.0, 71.0, 101.0, 93.0, 58.0, 78.0]
2025-08-07 12:13:13,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 18 minutes, 46 seconds)
2025-08-07 12:15:09,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:15:10,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 280.55786 ± 153.361
2025-08-07 12:15:10,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [101.72612, 320.72064, 472.88644, 360.5691, 334.42245, 83.63711, 134.71982, 408.64352, 492.2296, 96.02386]
2025-08-07 12:15:10,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 59.0, 87.0, 67.0, 61.0, 17.0, 26.0, 76.0, 92.0, 19.0]
2025-08-07 12:15:10,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 16 minutes, 42 seconds)
2025-08-07 12:17:08,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:17:08,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 282.84485 ± 208.681
2025-08-07 12:17:08,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [578.2119, 83.86848, 397.63275, 95.38065, 672.6006, 342.83246, 355.6552, 101.93261, 104.75114, 95.58257]
2025-08-07 12:17:08,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 17.0, 86.0, 19.0, 124.0, 63.0, 65.0, 20.0, 21.0, 19.0]
2025-08-07 12:17:08,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 14 minutes, 41 seconds)
2025-08-07 12:19:06,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:19:06,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 284.36987 ± 158.327
2025-08-07 12:19:06,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [434.81934, 347.5108, 108.08402, 492.88428, 101.79828, 335.74582, 316.97818, 88.65437, 499.63684, 117.58698]
2025-08-07 12:19:06,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 64.0, 21.0, 94.0, 20.0, 62.0, 60.0, 18.0, 98.0, 23.0]
2025-08-07 12:19:06,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 12 minutes, 45 seconds)
2025-08-07 12:21:04,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:21:05,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 284.80692 ± 170.933
2025-08-07 12:21:05,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [150.50868, 428.89645, 107.19989, 124.16582, 95.412926, 369.20493, 113.343925, 501.97552, 504.20996, 453.15115]
2025-08-07 12:21:05,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 78.0, 21.0, 24.0, 19.0, 69.0, 22.0, 93.0, 92.0, 85.0]
2025-08-07 12:21:05,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 10 minutes, 44 seconds)
2025-08-07 12:23:01,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:23:02,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 289.95758 ± 125.525
2025-08-07 12:23:02,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [324.7541, 465.18332, 443.62393, 306.84622, 327.98935, 370.84445, 102.413765, 89.43239, 312.25513, 156.23323]
2025-08-07 12:23:02,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 87.0, 81.0, 57.0, 64.0, 70.0, 20.0, 18.0, 67.0, 30.0]
2025-08-07 12:23:02,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 41 seconds)
2025-08-07 12:25:00,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:25:01,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 283.67169 ± 196.891
2025-08-07 12:25:01,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [109.97118, 422.7035, 94.86478, 369.65863, 590.59283, 309.9784, 112.41397, 106.64229, 612.84766, 107.04341]
2025-08-07 12:25:01,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 79.0, 19.0, 66.0, 111.0, 59.0, 22.0, 21.0, 113.0, 21.0]
2025-08-07 12:25:01,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 6 minutes, 55 seconds)
2025-08-07 12:26:56,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:26:57,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 279.55569 ± 186.652
2025-08-07 12:26:57,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [133.83076, 451.91223, 296.12366, 585.16144, 127.18818, 89.59126, 88.90154, 496.65115, 89.24031, 436.9563]
2025-08-07 12:26:57,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 81.0, 54.0, 110.0, 25.0, 18.0, 18.0, 89.0, 18.0, 80.0]
2025-08-07 12:26:57,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 4 minutes, 41 seconds)
2025-08-07 12:28:52,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:28:53,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 294.84268 ± 165.204
2025-08-07 12:28:53,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [492.28058, 114.22186, 398.7808, 378.0281, 130.36012, 505.6212, 483.98294, 118.446686, 90.3961, 236.30858]
2025-08-07 12:28:53,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 22.0, 72.0, 69.0, 25.0, 93.0, 90.0, 23.0, 18.0, 49.0]
2025-08-07 12:28:53,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 2 minutes, 33 seconds)
2025-08-07 12:30:48,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:30:49,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 237.03377 ± 144.825
2025-08-07 12:30:49,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [433.33478, 143.2009, 112.68051, 495.31873, 89.48944, 84.13928, 101.61129, 255.99072, 320.4568, 334.1153]
2025-08-07 12:30:49,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 28.0, 22.0, 94.0, 18.0, 17.0, 20.0, 46.0, 64.0, 74.0]
2025-08-07 12:30:49,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 22 seconds)
2025-08-07 12:32:45,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:32:46,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 273.64752 ± 225.856
2025-08-07 12:32:46,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [323.14142, 107.6631, 680.26935, 84.71134, 117.96081, 391.99777, 679.97955, 95.6641, 101.55757, 153.53024]
2025-08-07 12:32:46,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 21.0, 127.0, 17.0, 23.0, 73.0, 128.0, 19.0, 20.0, 29.0]
2025-08-07 12:32:46,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 58 minutes, 20 seconds)
2025-08-07 12:34:41,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:34:42,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 163.72018 ± 97.857
2025-08-07 12:34:42,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [93.88234, 358.64673, 88.929054, 140.84904, 245.47221, 100.9309, 95.55792, 315.4551, 102.60576, 94.8728]
2025-08-07 12:34:42,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 71.0, 18.0, 28.0, 48.0, 20.0, 19.0, 57.0, 20.0, 19.0]
2025-08-07 12:34:42,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 56 minutes, 9 seconds)
2025-08-07 12:36:38,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:36:39,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 233.04607 ± 145.976
2025-08-07 12:36:39,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [94.83202, 89.608475, 89.04145, 326.6886, 331.41492, 321.44803, 105.60648, 495.05286, 94.389885, 382.37805]
2025-08-07 12:36:39,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 18.0, 65.0, 69.0, 72.0, 21.0, 109.0, 19.0, 67.0]
2025-08-07 12:36:39,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 54 minutes, 18 seconds)
2025-08-07 12:38:33,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:38:34,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 294.44855 ± 215.618
2025-08-07 12:38:34,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [113.30608, 88.51635, 378.524, 628.485, 101.068375, 89.250374, 454.633, 112.42408, 320.41028, 657.86774]
2025-08-07 12:38:34,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 18.0, 68.0, 123.0, 20.0, 18.0, 81.0, 22.0, 62.0, 125.0]
2025-08-07 12:38:34,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 52 minutes, 18 seconds)
2025-08-07 12:40:30,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:40:31,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 173.91985 ± 136.436
2025-08-07 12:40:31,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [83.74532, 130.10446, 101.03731, 161.3371, 99.681435, 112.61262, 89.868996, 494.2298, 382.69336, 83.88812]
2025-08-07 12:40:31,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 26.0, 20.0, 31.0, 20.0, 22.0, 18.0, 89.0, 70.0, 17.0]
2025-08-07 12:40:31,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 50 minutes, 25 seconds)
2025-08-07 12:42:27,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:42:28,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 318.45728 ± 283.054
2025-08-07 12:42:28,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [138.62355, 83.708015, 299.98978, 95.12986, 1004.09686, 597.5316, 445.46005, 328.79227, 95.57574, 95.66508]
2025-08-07 12:42:28,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 17.0, 67.0, 19.0, 191.0, 129.0, 86.0, 61.0, 19.0, 19.0]
2025-08-07 12:42:28,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 48 minutes, 30 seconds)
2025-08-07 12:44:23,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:44:24,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 294.29230 ± 166.548
2025-08-07 12:44:24,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [256.4074, 402.8886, 511.62057, 310.41672, 468.7348, 107.51883, 139.16243, 95.930984, 532.5289, 117.714195]
2025-08-07 12:44:24,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 82.0, 111.0, 61.0, 94.0, 21.0, 27.0, 19.0, 97.0, 23.0]
2025-08-07 12:44:24,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 46 minutes, 34 seconds)
2025-08-07 12:46:19,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:46:20,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 266.97583 ± 216.981
2025-08-07 12:46:20,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [101.01722, 88.99373, 89.28453, 449.36407, 701.65436, 124.76772, 88.990875, 494.4319, 101.0648, 430.1893]
2025-08-07 12:46:20,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 18.0, 18.0, 83.0, 136.0, 24.0, 18.0, 90.0, 20.0, 82.0]
2025-08-07 12:46:20,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 44 minutes, 33 seconds)
2025-08-07 12:48:15,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:48:16,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 310.86414 ± 144.060
2025-08-07 12:48:16,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [413.14865, 488.25247, 401.01193, 113.20622, 400.8463, 89.54278, 355.60938, 90.00508, 404.92273, 352.0959]
2025-08-07 12:48:16,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 88.0, 79.0, 22.0, 72.0, 18.0, 67.0, 18.0, 75.0, 63.0]
2025-08-07 12:48:16,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 42 minutes, 41 seconds)
2025-08-07 12:50:12,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:50:13,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 302.38416 ± 192.365
2025-08-07 12:50:13,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [94.4101, 418.61542, 465.08264, 136.63129, 395.8059, 144.7946, 530.8493, 118.40459, 617.67065, 101.57686]
2025-08-07 12:50:13,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 76.0, 81.0, 26.0, 81.0, 28.0, 95.0, 23.0, 115.0, 20.0]
2025-08-07 12:50:13,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 40 minutes, 45 seconds)
2025-08-07 12:52:09,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:52:09,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 278.96164 ± 155.306
2025-08-07 12:52:09,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [94.09121, 135.20706, 414.96945, 117.32328, 276.2076, 88.95325, 460.8127, 365.0869, 304.0215, 532.9433]
2025-08-07 12:52:09,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 26.0, 77.0, 23.0, 51.0, 18.0, 94.0, 69.0, 67.0, 100.0]
2025-08-07 12:52:09,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 38 minutes, 45 seconds)
2025-08-07 12:54:04,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:54:05,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 336.99814 ± 176.508
2025-08-07 12:54:05,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [669.9417, 116.27998, 298.9186, 362.06854, 102.54464, 457.74985, 107.497826, 469.0625, 348.34668, 437.57123]
2025-08-07 12:54:05,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 23.0, 64.0, 68.0, 20.0, 85.0, 21.0, 89.0, 64.0, 87.0]
2025-08-07 12:54:05,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (337.00) for latency MM1Queue_a033_s075
2025-08-07 12:54:05,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 36 minutes, 49 seconds)
2025-08-07 12:56:01,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:56:02,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 304.26257 ± 154.435
2025-08-07 12:56:02,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [89.664986, 585.4227, 471.22446, 366.32443, 90.4363, 296.87262, 125.36815, 351.68423, 335.33374, 330.29385]
2025-08-07 12:56:02,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 114.0, 87.0, 70.0, 18.0, 54.0, 24.0, 67.0, 63.0, 62.0]
2025-08-07 12:56:02,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes, 54 seconds)
2025-08-07 12:57:57,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:57:58,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 405.29877 ± 72.143
2025-08-07 12:57:58,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [561.8851, 338.5285, 514.75006, 362.21307, 358.79996, 327.49402, 413.26843, 376.4416, 394.32065, 405.2865]
2025-08-07 12:57:58,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 61.0, 100.0, 65.0, 65.0, 61.0, 74.0, 68.0, 74.0, 73.0]
2025-08-07 12:57:58,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (405.30) for latency MM1Queue_a033_s075
2025-08-07 12:57:58,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 58 seconds)
2025-08-07 12:59:54,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:59:55,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 232.26965 ± 168.686
2025-08-07 12:59:55,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [330.06967, 106.61689, 508.29825, 101.47063, 511.33005, 95.83856, 365.2847, 113.56653, 95.73741, 94.484055]
2025-08-07 12:59:55,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 21.0, 106.0, 20.0, 91.0, 19.0, 66.0, 22.0, 19.0, 19.0]
2025-08-07 12:59:55,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 1 second)
2025-08-07 13:01:50,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:01:51,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 351.53030 ± 179.348
2025-08-07 13:01:51,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [393.66895, 291.50446, 122.7505, 617.8869, 94.51205, 496.77707, 101.44701, 462.3629, 513.3112, 421.08173]
2025-08-07 13:01:51,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 53.0, 24.0, 135.0, 19.0, 91.0, 20.0, 89.0, 91.0, 84.0]
2025-08-07 13:01:51,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 5 seconds)
2025-08-07 13:03:47,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:03:47,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 254.62497 ± 162.866
2025-08-07 13:03:47,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [401.2024, 83.78289, 358.15976, 548.67645, 99.7558, 133.50002, 102.46973, 298.14194, 424.0356, 96.52514]
2025-08-07 13:03:47,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 17.0, 63.0, 103.0, 20.0, 26.0, 20.0, 61.0, 82.0, 19.0]
2025-08-07 13:03:47,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 10 seconds)
2025-08-07 13:05:43,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:05:44,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 293.92392 ± 168.423
2025-08-07 13:05:44,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [493.90128, 83.70814, 460.67078, 89.89279, 393.98773, 284.66638, 554.0448, 143.29272, 312.2172, 122.85722]
2025-08-07 13:05:44,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 17.0, 85.0, 18.0, 74.0, 54.0, 109.0, 28.0, 57.0, 24.0]
2025-08-07 13:05:44,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 13 seconds)
2025-08-07 13:07:40,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:07:40,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 327.56769 ± 159.085
2025-08-07 13:07:40,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [317.96497, 431.8864, 364.28107, 379.83902, 274.18915, 446.12454, 112.949265, 145.574, 145.48022, 657.3882]
2025-08-07 13:07:40,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 79.0, 67.0, 75.0, 51.0, 82.0, 22.0, 28.0, 28.0, 120.0]
2025-08-07 13:07:40,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 17 seconds)
2025-08-07 13:09:36,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:09:36,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 280.26425 ± 203.434
2025-08-07 13:09:36,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [89.96728, 95.1733, 113.5219, 342.9357, 330.51758, 95.52931, 107.77585, 518.7107, 415.96515, 692.5458]
2025-08-07 13:09:36,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 19.0, 22.0, 64.0, 66.0, 19.0, 21.0, 108.0, 83.0, 132.0]
2025-08-07 13:09:36,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 19 seconds)
2025-08-07 13:11:34,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:11:35,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 296.58862 ± 170.438
2025-08-07 13:11:35,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [107.8158, 118.97528, 484.75897, 465.9727, 415.29907, 368.54156, 283.6726, 89.53798, 534.9488, 96.36344]
2025-08-07 13:11:35,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 23.0, 88.0, 87.0, 74.0, 66.0, 52.0, 18.0, 102.0, 19.0]
2025-08-07 13:11:35,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 27 seconds)
2025-08-07 13:13:29,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:13:30,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 333.92480 ± 154.712
2025-08-07 13:13:30,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [455.57596, 118.64513, 370.7405, 420.0845, 89.90581, 475.05615, 436.5117, 420.17383, 95.49649, 457.05777]
2025-08-07 13:13:30,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 23.0, 65.0, 92.0, 18.0, 90.0, 94.0, 77.0, 19.0, 85.0]
2025-08-07 13:13:30,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 27 seconds)
2025-08-07 13:15:25,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:15:25,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 179.36476 ± 143.143
2025-08-07 13:15:25,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [83.61388, 94.130615, 95.368515, 273.74057, 89.288506, 451.25723, 90.14902, 436.9932, 89.579544, 89.52646]
2025-08-07 13:15:25,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 19.0, 19.0, 53.0, 18.0, 100.0, 18.0, 83.0, 18.0, 18.0]
2025-08-07 13:15:25,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 30 seconds)
2025-08-07 13:17:19,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:17:19,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 282.67776 ± 123.526
2025-08-07 13:17:19,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [106.28172, 329.8958, 88.814766, 361.15787, 398.9065, 401.8472, 110.92302, 297.09195, 411.865, 319.99387]
2025-08-07 13:17:19,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 61.0, 18.0, 65.0, 79.0, 74.0, 22.0, 58.0, 73.0, 60.0]
2025-08-07 13:17:19,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 30 seconds)
2025-08-07 13:19:13,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:19:13,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 194.65604 ± 149.102
2025-08-07 13:19:13,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [94.899765, 100.849174, 89.603615, 398.90637, 493.1675, 94.52885, 118.477715, 100.00843, 359.74133, 96.37759]
2025-08-07 13:19:13,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 20.0, 18.0, 73.0, 91.0, 19.0, 23.0, 20.0, 65.0, 19.0]
2025-08-07 13:19:13,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 32 seconds)
2025-08-07 13:21:07,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:21:08,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 406.49915 ± 195.385
2025-08-07 13:21:08,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [88.701035, 361.5765, 446.78098, 677.25287, 88.65885, 408.3731, 609.36426, 637.3271, 429.00958, 317.9473]
2025-08-07 13:21:08,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 79.0, 88.0, 134.0, 18.0, 83.0, 119.0, 114.0, 81.0, 58.0]
2025-08-07 13:21:08,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (406.50) for latency MM1Queue_a033_s075
2025-08-07 13:21:08,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 33 seconds)
2025-08-07 13:23:04,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:23:04,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 275.62128 ± 235.924
2025-08-07 13:23:04,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [809.3742, 95.63281, 313.85233, 119.65671, 597.31647, 359.18213, 107.05554, 122.63329, 123.50664, 108.00244]
2025-08-07 13:23:04,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 19.0, 68.0, 23.0, 118.0, 67.0, 22.0, 24.0, 24.0, 21.0]
2025-08-07 13:23:04,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 39 seconds)
2025-08-07 13:24:57,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:24:58,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 244.27051 ± 185.702
2025-08-07 13:24:58,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [83.73289, 89.63961, 107.905914, 83.87052, 118.74785, 583.83997, 106.878006, 430.67523, 470.19098, 367.2241]
2025-08-07 13:24:58,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 18.0, 21.0, 17.0, 23.0, 109.0, 21.0, 93.0, 90.0, 81.0]
2025-08-07 13:24:58,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 43 seconds)
2025-08-07 13:26:51,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:26:52,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 254.55330 ± 123.623
2025-08-07 13:26:52,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [95.823, 405.29053, 340.89587, 348.85315, 398.6578, 294.60718, 96.07503, 134.68188, 324.01938, 106.62909]
2025-08-07 13:26:52,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 78.0, 68.0, 77.0, 72.0, 59.0, 19.0, 26.0, 59.0, 21.0]
2025-08-07 13:26:52,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 48 seconds)
2025-08-07 13:28:45,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:28:46,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 254.38596 ± 173.172
2025-08-07 13:28:46,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [273.4174, 94.741264, 393.75806, 474.26328, 88.85681, 89.16433, 511.75015, 99.86501, 434.25302, 83.7904]
2025-08-07 13:28:46,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 19.0, 71.0, 84.0, 18.0, 18.0, 93.0, 20.0, 78.0, 17.0]
2025-08-07 13:28:46,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 54 seconds)
2025-08-07 13:30:38,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:30:39,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 372.46332 ± 244.566
2025-08-07 13:30:39,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [298.38727, 95.077156, 574.447, 335.37384, 535.2113, 101.504974, 269.62558, 864.23456, 83.75826, 567.0132]
2025-08-07 13:30:39,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 19.0, 128.0, 63.0, 98.0, 20.0, 48.0, 166.0, 17.0, 105.0]
2025-08-07 13:30:39,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1251 [DEBUG]: Training session finished
