2025-08-07 10:13:36,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc10-humanoid/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:13:36,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc10-humanoid/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:13:36,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1544689abf50>}
2025-08-07 10:13:36,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 10:13:36,139 baseline-bpql-noiseperc10-humanoid:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:13:36,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 10:13:36,157 baseline-bpql-noiseperc10-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=648, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 10:13:36,157 baseline-bpql-noiseperc10-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:13:38,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 10:13:38,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 10:15:27,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:28,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 367.66895 ± 98.552
2025-08-07 10:15:28,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [363.1952, 492.50507, 337.67245, 138.37846, 463.07693, 407.5048, 474.41852, 342.78043, 294.7687, 362.38867]
2025-08-07 10:15:28,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 93.0, 62.0, 27.0, 87.0, 77.0, 89.0, 65.0, 55.0, 67.0]
2025-08-07 10:15:28,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (367.67) for latency MM1Queue_a033_s075
2025-08-07 10:15:28,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 1 minute, 47 seconds)
2025-08-07 10:17:26,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:27,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 258.83575 ± 117.299
2025-08-07 10:17:27,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [186.10431, 405.64627, 101.69761, 129.84064, 305.77966, 178.29941, 321.96494, 396.27023, 417.08768, 145.66669]
2025-08-07 10:17:27,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 75.0, 20.0, 25.0, 57.0, 34.0, 62.0, 74.0, 83.0, 28.0]
2025-08-07 10:17:27,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 7 minutes, 9 seconds)
2025-08-07 10:19:25,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:19:26,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 297.05072 ± 84.993
2025-08-07 10:19:26,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [351.67035, 263.45428, 185.29816, 154.44702, 251.45544, 427.32608, 380.64648, 246.74426, 360.9725, 348.49298]
2025-08-07 10:19:26,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 51.0, 36.0, 30.0, 51.0, 79.0, 69.0, 50.0, 65.0, 66.0]
2025-08-07 10:19:26,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 7 minutes, 47 seconds)
2025-08-07 10:21:25,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:21:27,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 372.68857 ± 154.396
2025-08-07 10:21:27,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [395.48865, 341.00824, 411.10504, 511.08597, 152.84758, 362.3953, 701.03345, 365.83536, 128.33171, 357.75433]
2025-08-07 10:21:27,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 65.0, 75.0, 102.0, 29.0, 68.0, 135.0, 71.0, 25.0, 77.0]
2025-08-07 10:21:27,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (372.69) for latency MM1Queue_a033_s075
2025-08-07 10:21:27,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 7 minutes, 35 seconds)
2025-08-07 10:23:26,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:23:26,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 258.17148 ± 84.816
2025-08-07 10:23:26,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [301.2997, 130.06535, 327.73187, 134.37646, 156.5529, 336.21283, 253.90181, 291.0384, 384.35837, 266.17728]
2025-08-07 10:23:26,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 25.0, 63.0, 26.0, 30.0, 65.0, 49.0, 55.0, 76.0, 53.0]
2025-08-07 10:23:26,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 6 minutes, 26 seconds)
2025-08-07 10:25:24,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:25:25,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 395.06906 ± 74.461
2025-08-07 10:25:25,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [429.1138, 425.17148, 521.9028, 375.40872, 291.11078, 250.88367, 372.50226, 397.453, 436.87082, 450.27353]
2025-08-07 10:25:25,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 81.0, 100.0, 69.0, 54.0, 48.0, 71.0, 74.0, 85.0, 83.0]
2025-08-07 10:25:25,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (395.07) for latency MM1Queue_a033_s075
2025-08-07 10:25:25,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 7 minutes, 10 seconds)
2025-08-07 10:27:25,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:27:26,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 366.34354 ± 92.907
2025-08-07 10:27:26,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [565.55334, 412.57108, 406.4807, 258.6765, 424.17212, 331.72626, 330.9515, 235.84657, 411.36453, 286.09283]
2025-08-07 10:27:26,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 77.0, 85.0, 51.0, 79.0, 64.0, 61.0, 45.0, 77.0, 55.0]
2025-08-07 10:27:26,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 5 minutes, 48 seconds)
2025-08-07 10:29:25,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:29:27,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 404.99289 ± 105.790
2025-08-07 10:29:27,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [394.90244, 347.0159, 643.6503, 273.68848, 285.98633, 444.94205, 377.4613, 323.6206, 473.6516, 485.00992]
2025-08-07 10:29:27,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 66.0, 133.0, 51.0, 54.0, 81.0, 70.0, 60.0, 87.0, 89.0]
2025-08-07 10:29:27,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (404.99) for latency MM1Queue_a033_s075
2025-08-07 10:29:27,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 4 minutes, 7 seconds)
2025-08-07 10:31:26,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:31:27,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 348.61591 ± 90.062
2025-08-07 10:31:27,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [354.7092, 472.07504, 135.92647, 419.38068, 259.83923, 353.8012, 350.12347, 337.25238, 370.31253, 432.7389]
2025-08-07 10:31:27,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 89.0, 26.0, 81.0, 48.0, 68.0, 67.0, 67.0, 71.0, 81.0]
2025-08-07 10:31:27,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 2 minutes)
2025-08-07 10:33:26,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:27,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 343.19955 ± 124.672
2025-08-07 10:33:27,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [337.7235, 390.9948, 421.07153, 218.91301, 529.19476, 215.8367, 275.46442, 508.22498, 130.46173, 404.11005]
2025-08-07 10:33:27,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 75.0, 78.0, 42.0, 97.0, 43.0, 52.0, 100.0, 25.0, 75.0]
2025-08-07 10:33:27,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 13 seconds)
2025-08-07 10:35:26,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:27,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 309.34818 ± 116.612
2025-08-07 10:35:27,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [179.85829, 456.79218, 411.2807, 124.97261, 118.46427, 318.14957, 363.00082, 369.0856, 401.06335, 350.81412]
2025-08-07 10:35:27,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 97.0, 76.0, 24.0, 23.0, 59.0, 66.0, 71.0, 75.0, 65.0]
2025-08-07 10:35:27,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 58 minutes, 27 seconds)
2025-08-07 10:37:26,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:37:27,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 295.92075 ± 126.653
2025-08-07 10:37:27,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [145.07217, 179.17845, 429.28186, 140.58214, 353.37524, 376.15576, 138.50575, 329.22012, 498.71286, 369.12308]
2025-08-07 10:37:27,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 35.0, 81.0, 27.0, 65.0, 75.0, 27.0, 61.0, 93.0, 69.0]
2025-08-07 10:37:27,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 56 minutes, 8 seconds)
2025-08-07 10:39:26,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:39:27,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 370.21115 ± 100.449
2025-08-07 10:39:27,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [323.2634, 468.78995, 187.39667, 323.72455, 427.38702, 300.76184, 427.0115, 341.3059, 569.7949, 332.67575]
2025-08-07 10:39:27,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 87.0, 37.0, 64.0, 79.0, 57.0, 82.0, 64.0, 119.0, 61.0]
2025-08-07 10:39:27,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 54 minutes, 15 seconds)
2025-08-07 10:41:26,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:41:27,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 278.30167 ± 127.850
2025-08-07 10:41:27,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [140.83179, 146.3244, 488.2092, 338.0684, 375.05188, 108.139755, 393.3362, 357.93457, 133.18819, 301.93222]
2025-08-07 10:41:27,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 28.0, 98.0, 63.0, 69.0, 21.0, 74.0, 65.0, 26.0, 57.0]
2025-08-07 10:41:27,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 52 minutes, 1 second)
2025-08-07 10:43:25,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:43:26,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 361.78433 ± 151.073
2025-08-07 10:43:26,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [350.16837, 179.2092, 537.9227, 152.2626, 482.68088, 394.82162, 501.87994, 124.85877, 363.83224, 530.2068]
2025-08-07 10:43:26,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 35.0, 102.0, 29.0, 89.0, 71.0, 94.0, 24.0, 68.0, 99.0]
2025-08-07 10:43:26,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 49 minutes, 39 seconds)
2025-08-07 10:45:26,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:45:27,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 422.65302 ± 116.497
2025-08-07 10:45:27,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [501.92728, 415.0487, 426.62338, 565.65454, 435.5821, 362.00018, 528.2159, 412.11334, 460.62985, 118.734764]
2025-08-07 10:45:27,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 77.0, 78.0, 105.0, 81.0, 68.0, 117.0, 76.0, 86.0, 23.0]
2025-08-07 10:45:27,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (422.65) for latency MM1Queue_a033_s075
2025-08-07 10:45:27,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 48 minutes, 1 second)
2025-08-07 10:47:26,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:47:27,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 315.46149 ± 170.503
2025-08-07 10:47:27,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [436.2957, 139.18893, 461.47998, 101.83501, 391.72946, 133.71281, 488.4441, 302.12302, 580.9443, 118.86168]
2025-08-07 10:47:27,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 27.0, 84.0, 20.0, 81.0, 26.0, 91.0, 58.0, 125.0, 23.0]
2025-08-07 10:47:27,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 46 minutes, 12 seconds)
2025-08-07 10:49:26,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:26,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 274.69037 ± 117.288
2025-08-07 10:49:26,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [340.6065, 177.17201, 188.83447, 107.02785, 363.4335, 382.35855, 175.38506, 424.18713, 427.75217, 160.14632]
2025-08-07 10:49:26,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 34.0, 37.0, 21.0, 66.0, 70.0, 34.0, 78.0, 92.0, 31.0]
2025-08-07 10:49:26,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 43 minutes, 43 seconds)
2025-08-07 10:51:26,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:51:28,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 385.46350 ± 123.810
2025-08-07 10:51:28,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [428.88516, 330.1554, 399.4727, 583.08545, 376.0079, 530.48083, 445.7448, 285.7634, 113.772736, 361.26666]
2025-08-07 10:51:28,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 60.0, 74.0, 117.0, 69.0, 101.0, 83.0, 54.0, 22.0, 67.0]
2025-08-07 10:51:28,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 42 minutes, 13 seconds)
2025-08-07 10:53:26,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:27,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 399.51492 ± 111.264
2025-08-07 10:53:27,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [363.10666, 404.39014, 135.27315, 345.36172, 458.10788, 390.17258, 536.14435, 445.47528, 554.5256, 362.59183]
2025-08-07 10:53:27,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 75.0, 26.0, 63.0, 85.0, 72.0, 104.0, 86.0, 103.0, 69.0]
2025-08-07 10:53:27,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 40 minutes, 23 seconds)
2025-08-07 10:55:26,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:55:27,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 286.31488 ± 220.082
2025-08-07 10:55:27,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [474.80167, 102.75245, 772.07916, 332.6471, 130.3457, 141.22145, 124.66442, 135.7619, 524.36414, 124.51083]
2025-08-07 10:55:27,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 20.0, 152.0, 71.0, 25.0, 27.0, 24.0, 26.0, 98.0, 24.0]
2025-08-07 10:55:27,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 38 minutes, 7 seconds)
2025-08-07 10:57:27,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:57:28,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 342.71198 ± 201.640
2025-08-07 10:57:28,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [286.8961, 114.18485, 396.27814, 114.1596, 463.06888, 156.80699, 458.15192, 172.33395, 767.9843, 497.25504]
2025-08-07 10:57:28,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 22.0, 73.0, 22.0, 88.0, 30.0, 83.0, 33.0, 152.0, 101.0]
2025-08-07 10:57:28,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 36 minutes, 9 seconds)
2025-08-07 10:59:26,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:59:27,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 405.14752 ± 166.633
2025-08-07 10:59:27,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [404.69382, 514.0548, 317.43234, 413.54745, 123.8656, 571.71576, 118.47482, 651.56635, 481.42105, 454.7028]
2025-08-07 10:59:27,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 99.0, 59.0, 75.0, 24.0, 107.0, 23.0, 130.0, 96.0, 96.0]
2025-08-07 10:59:27,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 34 minutes, 17 seconds)
2025-08-07 11:01:27,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:28,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 433.47549 ± 133.967
2025-08-07 11:01:28,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [359.18668, 659.71826, 506.9432, 386.95822, 478.99173, 510.81442, 114.740555, 451.55045, 484.07483, 381.7764]
2025-08-07 11:01:28,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 121.0, 91.0, 69.0, 87.0, 94.0, 22.0, 81.0, 103.0, 81.0]
2025-08-07 11:01:28,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (433.48) for latency MM1Queue_a033_s075
2025-08-07 11:01:28,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 32 minutes, 5 seconds)
2025-08-07 11:03:27,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:03:29,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 406.32513 ± 194.436
2025-08-07 11:03:29,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [624.2662, 697.70435, 468.9406, 157.22734, 443.98105, 161.71153, 456.7857, 118.8628, 353.06122, 580.7106]
2025-08-07 11:03:29,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 134.0, 86.0, 30.0, 80.0, 31.0, 84.0, 23.0, 64.0, 108.0]
2025-08-07 11:03:29,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 30 minutes, 18 seconds)
2025-08-07 11:05:28,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:29,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 371.71445 ± 169.461
2025-08-07 11:05:29,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [384.89117, 140.45023, 113.317474, 760.9594, 380.048, 318.21573, 374.8063, 480.78607, 372.0551, 391.6148]
2025-08-07 11:05:29,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 27.0, 22.0, 154.0, 69.0, 60.0, 68.0, 88.0, 68.0, 72.0]
2025-08-07 11:05:29,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 28 minutes, 32 seconds)
2025-08-07 11:07:28,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:29,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 359.95703 ± 170.514
2025-08-07 11:07:29,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [448.62085, 296.52258, 184.10257, 444.84424, 433.81195, 312.1866, 584.77673, 635.8987, 129.02547, 129.78073]
2025-08-07 11:07:29,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 55.0, 35.0, 86.0, 79.0, 56.0, 110.0, 119.0, 25.0, 25.0]
2025-08-07 11:07:29,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 26 minutes, 16 seconds)
2025-08-07 11:09:28,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:09:29,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 342.35248 ± 123.266
2025-08-07 11:09:29,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [367.78873, 155.21275, 357.89465, 470.66635, 459.39716, 463.21844, 149.86032, 434.95175, 378.22305, 186.3114]
2025-08-07 11:09:29,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 30.0, 65.0, 85.0, 85.0, 98.0, 29.0, 78.0, 68.0, 36.0]
2025-08-07 11:09:29,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 24 minutes, 17 seconds)
2025-08-07 11:11:27,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:11:29,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 406.24274 ± 206.154
2025-08-07 11:11:29,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [144.99226, 209.64734, 195.93599, 648.45605, 139.98686, 388.0953, 531.6291, 590.70276, 522.74133, 690.24023]
2025-08-07 11:11:29,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 40.0, 37.0, 122.0, 27.0, 71.0, 113.0, 115.0, 96.0, 130.0]
2025-08-07 11:11:29,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 11 seconds)
2025-08-07 11:13:28,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:13:29,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 419.03998 ± 161.325
2025-08-07 11:13:29,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [420.38812, 134.53946, 527.3894, 373.67303, 153.98183, 501.47693, 466.74783, 525.0008, 395.3918, 691.8108]
2025-08-07 11:13:29,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 26.0, 99.0, 79.0, 30.0, 92.0, 85.0, 97.0, 73.0, 136.0]
2025-08-07 11:13:29,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 20 minutes, 8 seconds)
2025-08-07 11:15:29,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:30,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 388.62808 ± 145.115
2025-08-07 11:15:30,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [443.9948, 436.24658, 525.1879, 327.8794, 491.22806, 96.42462, 489.6444, 394.98715, 150.48186, 530.20575]
2025-08-07 11:15:30,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 93.0, 111.0, 61.0, 102.0, 19.0, 90.0, 83.0, 29.0, 98.0]
2025-08-07 11:15:30,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 18 minutes, 13 seconds)
2025-08-07 11:17:30,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:31,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 397.38043 ± 150.457
2025-08-07 11:17:31,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [380.04196, 573.29755, 474.4502, 491.4168, 427.2751, 119.33396, 446.16537, 107.852875, 517.34863, 436.62167]
2025-08-07 11:17:31,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 109.0, 87.0, 90.0, 77.0, 23.0, 88.0, 21.0, 96.0, 91.0]
2025-08-07 11:17:31,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 16 minutes, 29 seconds)
2025-08-07 11:19:30,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:19:31,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 416.06342 ± 116.781
2025-08-07 11:19:31,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [537.21423, 424.38684, 472.7897, 361.85068, 438.8105, 560.1575, 436.7756, 458.48154, 120.06646, 350.10123]
2025-08-07 11:19:31,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 77.0, 89.0, 67.0, 82.0, 104.0, 81.0, 88.0, 23.0, 63.0]
2025-08-07 11:19:31,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 14 minutes, 34 seconds)
2025-08-07 11:21:30,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:21:31,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 450.38370 ± 134.139
2025-08-07 11:21:31,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [382.21872, 469.84827, 569.3481, 602.91016, 329.13345, 452.9714, 611.98346, 553.70404, 171.26776, 360.4519]
2025-08-07 11:21:31,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 84.0, 107.0, 111.0, 64.0, 83.0, 115.0, 101.0, 33.0, 72.0]
2025-08-07 11:21:31,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (450.38) for latency MM1Queue_a033_s075
2025-08-07 11:21:31,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 12 minutes, 31 seconds)
2025-08-07 11:23:30,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:23:32,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 417.10449 ± 163.186
2025-08-07 11:23:32,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [125.117, 494.39386, 349.3112, 373.45514, 430.37808, 250.51338, 478.8393, 572.40295, 350.4858, 746.1481]
2025-08-07 11:23:32,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 89.0, 64.0, 70.0, 78.0, 48.0, 89.0, 107.0, 68.0, 142.0]
2025-08-07 11:23:32,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 10 minutes, 30 seconds)
2025-08-07 11:25:30,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:25:31,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 338.22644 ± 152.525
2025-08-07 11:25:31,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [550.49896, 167.08679, 397.6548, 425.02798, 460.60095, 107.28003, 302.8221, 129.38754, 521.8427, 320.06244]
2025-08-07 11:25:31,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 32.0, 73.0, 79.0, 85.0, 21.0, 56.0, 25.0, 94.0, 66.0]
2025-08-07 11:25:31,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 8 minutes, 6 seconds)
2025-08-07 11:27:32,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:27:33,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 483.28305 ± 149.134
2025-08-07 11:27:33,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [175.71658, 539.60645, 445.15735, 554.02515, 409.74313, 573.41077, 382.89145, 766.7632, 408.99365, 576.5228]
2025-08-07 11:27:33,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 101.0, 82.0, 101.0, 76.0, 106.0, 70.0, 145.0, 75.0, 120.0]
2025-08-07 11:27:33,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (483.28) for latency MM1Queue_a033_s075
2025-08-07 11:27:33,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 6 minutes, 20 seconds)
2025-08-07 11:29:32,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:29:33,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 329.70947 ± 141.871
2025-08-07 11:29:33,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [488.3184, 124.19126, 370.48376, 97.16419, 403.7812, 397.95245, 404.94827, 134.36017, 457.20383, 418.69113]
2025-08-07 11:29:33,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 24.0, 68.0, 19.0, 73.0, 71.0, 76.0, 26.0, 83.0, 75.0]
2025-08-07 11:29:33,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 4 minutes, 26 seconds)
2025-08-07 11:31:31,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:31:32,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 441.63248 ± 293.451
2025-08-07 11:31:32,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [782.4524, 396.55325, 140.5789, 166.38464, 141.40779, 1006.3173, 96.57841, 522.29614, 618.2119, 545.5438]
2025-08-07 11:31:32,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 73.0, 27.0, 32.0, 27.0, 193.0, 19.0, 96.0, 117.0, 102.0]
2025-08-07 11:31:32,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 2 minutes, 11 seconds)
2025-08-07 11:33:32,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:33:33,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 373.05533 ± 187.877
2025-08-07 11:33:33,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [107.67001, 553.0927, 462.83636, 329.1721, 151.45375, 571.99896, 303.43936, 124.45657, 642.24335, 484.19022]
2025-08-07 11:33:33,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 102.0, 89.0, 62.0, 29.0, 106.0, 66.0, 24.0, 120.0, 88.0]
2025-08-07 11:33:33,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 18 seconds)
2025-08-07 11:35:30,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:35:31,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 499.40756 ± 56.383
2025-08-07 11:35:31,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [518.54425, 548.18274, 501.52518, 612.9305, 521.7648, 418.17178, 504.00974, 426.73132, 503.89188, 438.32327]
2025-08-07 11:35:31,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 101.0, 93.0, 115.0, 95.0, 75.0, 95.0, 78.0, 99.0, 92.0]
2025-08-07 11:35:31,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (499.41) for latency MM1Queue_a033_s075
2025-08-07 11:35:31,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 58 minutes, 6 seconds)
2025-08-07 11:37:28,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:37:29,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 343.69904 ± 139.659
2025-08-07 11:37:29,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [418.42636, 479.03128, 119.07005, 416.93863, 428.51794, 399.74594, 129.37096, 150.59799, 456.45694, 438.83408]
2025-08-07 11:37:29,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 102.0, 23.0, 77.0, 79.0, 71.0, 25.0, 29.0, 84.0, 79.0]
2025-08-07 11:37:29,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 11 seconds)
2025-08-07 11:39:26,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:27,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 462.32602 ± 120.166
2025-08-07 11:39:27,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [583.2988, 748.8673, 366.41022, 314.54483, 467.26678, 490.99957, 389.64886, 480.14203, 388.6963, 393.38525]
2025-08-07 11:39:27,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 149.0, 67.0, 58.0, 85.0, 89.0, 71.0, 87.0, 70.0, 71.0]
2025-08-07 11:39:27,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 52 minutes, 45 seconds)
2025-08-07 11:41:24,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:25,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 348.47003 ± 160.705
2025-08-07 11:41:25,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [355.762, 388.4781, 488.24704, 103.16828, 181.72156, 369.3414, 109.161545, 476.5301, 393.74924, 618.5408]
2025-08-07 11:41:25,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 69.0, 91.0, 20.0, 35.0, 69.0, 21.0, 100.0, 71.0, 135.0]
2025-08-07 11:41:25,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 50 minutes, 41 seconds)
2025-08-07 11:43:22,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:43:24,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 458.97137 ± 215.487
2025-08-07 11:43:24,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [524.9105, 157.79726, 613.5685, 511.16302, 535.02716, 632.7388, 181.34344, 510.73553, 798.0543, 124.3754]
2025-08-07 11:43:24,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 30.0, 112.0, 100.0, 102.0, 118.0, 35.0, 98.0, 156.0, 24.0]
2025-08-07 11:43:24,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 48 minutes, 17 seconds)
2025-08-07 11:45:21,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:45:22,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 410.10870 ± 162.295
2025-08-07 11:45:22,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [462.0539, 461.26093, 130.56546, 96.00166, 551.97485, 555.7052, 574.019, 505.84216, 347.78806, 415.87598]
2025-08-07 11:45:22,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 97.0, 25.0, 19.0, 116.0, 106.0, 108.0, 109.0, 64.0, 78.0]
2025-08-07 11:45:22,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 46 minutes, 15 seconds)
2025-08-07 11:47:21,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:22,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 418.88086 ± 217.194
2025-08-07 11:47:22,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [108.56185, 469.67252, 637.19543, 796.29095, 130.11183, 464.09128, 456.37207, 540.6726, 135.83481, 450.00528]
2025-08-07 11:47:22,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 84.0, 117.0, 158.0, 25.0, 85.0, 96.0, 100.0, 26.0, 83.0]
2025-08-07 11:47:22,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 44 minutes, 46 seconds)
2025-08-07 11:49:19,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:49:21,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 510.34985 ± 198.609
2025-08-07 11:49:21,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [494.83786, 504.88287, 164.89435, 145.62128, 567.2727, 682.51965, 688.3417, 561.32874, 783.58057, 510.21884]
2025-08-07 11:49:21,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 104.0, 32.0, 28.0, 103.0, 127.0, 128.0, 106.0, 160.0, 95.0]
2025-08-07 11:49:21,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (510.35) for latency MM1Queue_a033_s075
2025-08-07 11:49:21,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 42 minutes, 56 seconds)
2025-08-07 11:51:18,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:51:19,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 511.80533 ± 165.660
2025-08-07 11:51:19,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [369.78348, 142.56903, 554.2219, 452.52463, 748.3339, 608.0623, 573.9305, 486.09695, 468.2698, 714.26013]
2025-08-07 11:51:19,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 27.0, 118.0, 80.0, 140.0, 115.0, 105.0, 89.0, 84.0, 134.0]
2025-08-07 11:51:19,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (511.81) for latency MM1Queue_a033_s075
2025-08-07 11:51:19,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 40 minutes, 59 seconds)
2025-08-07 11:53:16,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:53:18,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 509.32755 ± 325.358
2025-08-07 11:53:18,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [476.76254, 563.3802, 808.7128, 145.89905, 1241.0549, 150.34337, 470.95755, 573.1626, 108.117645, 554.8845]
2025-08-07 11:53:18,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 108.0, 153.0, 28.0, 249.0, 29.0, 86.0, 105.0, 21.0, 102.0]
2025-08-07 11:53:18,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 1 second)
2025-08-07 11:55:16,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:17,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 462.94476 ± 81.458
2025-08-07 11:55:17,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [483.3925, 409.7383, 419.81842, 474.5447, 523.51697, 424.53275, 475.34592, 665.11993, 382.5543, 370.88367]
2025-08-07 11:55:17,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 81.0, 74.0, 102.0, 97.0, 79.0, 87.0, 131.0, 71.0, 68.0]
2025-08-07 11:55:17,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 17 seconds)
2025-08-07 11:57:15,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:57:16,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 536.55792 ± 86.170
2025-08-07 11:57:16,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [482.47745, 508.83234, 597.34424, 539.59644, 447.81564, 741.0365, 604.7983, 439.57544, 483.00925, 521.0935]
2025-08-07 11:57:16,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 91.0, 109.0, 101.0, 82.0, 142.0, 112.0, 91.0, 87.0, 94.0]
2025-08-07 11:57:16,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (536.56) for latency MM1Queue_a033_s075
2025-08-07 11:57:16,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 35 minutes, 8 seconds)
2025-08-07 11:59:16,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:59:17,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 509.26392 ± 97.951
2025-08-07 11:59:17,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [475.03705, 536.4736, 477.69214, 595.04565, 497.5721, 376.217, 616.8365, 391.02002, 698.3344, 428.41043]
2025-08-07 11:59:17,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 103.0, 87.0, 107.0, 90.0, 70.0, 116.0, 71.0, 131.0, 76.0]
2025-08-07 11:59:17,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 26 seconds)
2025-08-07 12:01:16,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:01:17,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 538.30518 ± 209.667
2025-08-07 12:01:17,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [377.1559, 445.53058, 779.9899, 517.2431, 616.7242, 142.36656, 521.9813, 964.06177, 510.26617, 507.73227]
2025-08-07 12:01:17,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 81.0, 155.0, 105.0, 115.0, 27.0, 98.0, 181.0, 93.0, 93.0]
2025-08-07 12:01:17,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (538.31) for latency MM1Queue_a033_s075
2025-08-07 12:01:17,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 40 seconds)
2025-08-07 12:03:14,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:15,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 409.94971 ± 187.211
2025-08-07 12:03:15,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [626.3321, 412.84723, 156.37181, 521.95825, 646.63104, 415.08258, 595.6576, 423.852, 124.6277, 176.13707]
2025-08-07 12:03:15,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 87.0, 30.0, 94.0, 118.0, 76.0, 108.0, 78.0, 24.0, 34.0]
2025-08-07 12:03:15,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 29 minutes, 30 seconds)
2025-08-07 12:05:13,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:05:14,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 393.48599 ± 228.705
2025-08-07 12:05:14,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [134.16695, 124.513885, 697.2417, 573.32404, 507.9675, 135.03491, 435.44855, 572.85315, 108.37337, 645.9357]
2025-08-07 12:05:14,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 24.0, 136.0, 105.0, 91.0, 26.0, 80.0, 103.0, 21.0, 119.0]
2025-08-07 12:05:14,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 27 minutes, 32 seconds)
2025-08-07 12:07:12,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:07:13,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 524.78009 ± 107.916
2025-08-07 12:07:13,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [414.76062, 639.02484, 369.79132, 457.1376, 662.58093, 577.5159, 450.73508, 451.7919, 526.1215, 698.34094]
2025-08-07 12:07:13,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 118.0, 65.0, 80.0, 122.0, 105.0, 97.0, 80.0, 96.0, 126.0]
2025-08-07 12:07:13,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 25 minutes, 34 seconds)
2025-08-07 12:09:12,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:09:14,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 526.82550 ± 158.762
2025-08-07 12:09:14,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [621.0302, 659.9584, 462.963, 477.59604, 493.61835, 642.82654, 660.8621, 686.88916, 423.0274, 139.48392]
2025-08-07 12:09:14,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 123.0, 101.0, 89.0, 87.0, 121.0, 128.0, 126.0, 77.0, 27.0]
2025-08-07 12:09:14,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 23 minutes, 30 seconds)
2025-08-07 12:11:11,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:11:12,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 465.58112 ± 132.762
2025-08-07 12:11:12,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [494.2922, 555.8802, 568.98004, 438.82422, 102.68781, 486.1964, 619.2972, 469.9328, 464.39276, 455.32782]
2025-08-07 12:11:12,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 105.0, 116.0, 96.0, 20.0, 104.0, 113.0, 92.0, 84.0, 83.0]
2025-08-07 12:11:12,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 21 minutes, 20 seconds)
2025-08-07 12:13:11,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:13:12,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 360.87817 ± 221.087
2025-08-07 12:13:12,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [461.89758, 129.06757, 451.80862, 138.95177, 624.62695, 387.16647, 801.7082, 344.07608, 125.03814, 144.4403]
2025-08-07 12:13:12,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 25.0, 87.0, 27.0, 131.0, 77.0, 171.0, 71.0, 24.0, 28.0]
2025-08-07 12:13:12,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 19 minutes, 38 seconds)
2025-08-07 12:15:10,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:15:12,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 638.88287 ± 176.914
2025-08-07 12:15:12,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [849.159, 547.90497, 487.58194, 997.9473, 490.8874, 452.88257, 635.81006, 455.81937, 732.5835, 738.25226]
2025-08-07 12:15:12,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 98.0, 87.0, 189.0, 91.0, 91.0, 137.0, 87.0, 134.0, 143.0]
2025-08-07 12:15:12,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (638.88) for latency MM1Queue_a033_s075
2025-08-07 12:15:12,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 17 minutes, 39 seconds)
2025-08-07 12:17:10,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:17:12,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 544.34753 ± 243.282
2025-08-07 12:17:12,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [144.61406, 540.17883, 514.9349, 506.2744, 433.36145, 358.2548, 545.40735, 530.38055, 738.40894, 1131.6598]
2025-08-07 12:17:12,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 96.0, 93.0, 91.0, 79.0, 70.0, 112.0, 96.0, 138.0, 217.0]
2025-08-07 12:17:12,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 15 minutes, 50 seconds)
2025-08-07 12:19:11,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:19:12,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 388.24347 ± 213.266
2025-08-07 12:19:12,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [452.4793, 435.07797, 351.77603, 773.40704, 344.1532, 114.97759, 672.13293, 478.49664, 135.17648, 124.75711]
2025-08-07 12:19:12,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 79.0, 74.0, 143.0, 62.0, 22.0, 121.0, 96.0, 26.0, 24.0]
2025-08-07 12:19:12,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 13 minutes, 46 seconds)
2025-08-07 12:21:10,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:21:11,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 502.55048 ± 143.109
2025-08-07 12:21:11,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [461.29745, 594.58563, 521.38403, 487.01358, 559.9599, 520.084, 751.34686, 507.51108, 144.95795, 477.36447]
2025-08-07 12:21:11,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 106.0, 95.0, 87.0, 102.0, 94.0, 135.0, 94.0, 28.0, 86.0]
2025-08-07 12:21:11,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 11 minutes, 53 seconds)
2025-08-07 12:23:09,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:23:11,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 504.61948 ± 163.441
2025-08-07 12:23:11,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [402.72418, 576.314, 493.27692, 955.91943, 490.28827, 495.3265, 454.96112, 323.92224, 415.2587, 438.20364]
2025-08-07 12:23:11,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 114.0, 87.0, 181.0, 89.0, 110.0, 81.0, 59.0, 75.0, 85.0]
2025-08-07 12:23:11,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 9 minutes, 51 seconds)
2025-08-07 12:25:08,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:25:10,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 517.17358 ± 183.002
2025-08-07 12:25:10,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [320.3091, 568.2326, 709.663, 475.56223, 151.436, 822.16754, 544.2437, 392.7107, 590.0031, 597.4077]
2025-08-07 12:25:10,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 106.0, 138.0, 86.0, 29.0, 165.0, 99.0, 82.0, 112.0, 113.0]
2025-08-07 12:25:10,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 7 minutes, 45 seconds)
2025-08-07 12:27:09,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:27:11,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 552.00983 ± 199.976
2025-08-07 12:27:11,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [608.7755, 413.8187, 188.31136, 475.3787, 571.53485, 319.27765, 632.94995, 879.9331, 808.959, 621.159]
2025-08-07 12:27:11,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 73.0, 36.0, 86.0, 104.0, 67.0, 118.0, 164.0, 163.0, 109.0]
2025-08-07 12:27:11,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes, 51 seconds)
2025-08-07 12:29:09,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:29:11,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 644.09265 ± 245.170
2025-08-07 12:29:11,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [808.53357, 837.1278, 639.4578, 516.5112, 141.29393, 591.43964, 1092.196, 659.4924, 411.63388, 743.2396]
2025-08-07 12:29:11,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 167.0, 118.0, 93.0, 27.0, 110.0, 211.0, 119.0, 74.0, 142.0]
2025-08-07 12:29:11,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (644.09) for latency MM1Queue_a033_s075
2025-08-07 12:29:11,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 3 minutes, 54 seconds)
2025-08-07 12:31:08,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:31:09,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 534.62976 ± 209.155
2025-08-07 12:31:09,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [735.9921, 755.6627, 612.95636, 534.53937, 646.34875, 139.23544, 675.32385, 517.5766, 146.36818, 582.295]
2025-08-07 12:31:09,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 140.0, 114.0, 119.0, 120.0, 27.0, 120.0, 93.0, 28.0, 108.0]
2025-08-07 12:31:09,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 1 minute, 45 seconds)
2025-08-07 12:33:08,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:33:10,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 679.56311 ± 235.268
2025-08-07 12:33:10,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [655.12537, 588.01245, 560.1278, 661.34155, 479.5301, 1341.9446, 730.2069, 501.032, 721.21533, 557.0959]
2025-08-07 12:33:10,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 105.0, 103.0, 117.0, 86.0, 252.0, 138.0, 91.0, 134.0, 105.0]
2025-08-07 12:33:10,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (679.56) for latency MM1Queue_a033_s075
2025-08-07 12:33:10,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes, 54 seconds)
2025-08-07 12:35:08,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:35:10,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 581.63251 ± 166.555
2025-08-07 12:35:10,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [438.1598, 514.62866, 337.83008, 570.8229, 827.01447, 872.1845, 672.6751, 499.10883, 419.69412, 664.20667]
2025-08-07 12:35:10,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 94.0, 70.0, 120.0, 156.0, 160.0, 124.0, 91.0, 76.0, 125.0]
2025-08-07 12:35:10,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes)
2025-08-07 12:37:08,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:37:10,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 621.62506 ± 231.222
2025-08-07 12:37:10,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [434.50775, 578.28046, 892.9953, 648.78955, 531.23267, 530.13104, 1018.0631, 150.03624, 647.50714, 784.70685]
2025-08-07 12:37:10,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 105.0, 166.0, 117.0, 97.0, 96.0, 205.0, 29.0, 123.0, 151.0]
2025-08-07 12:37:10,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 57 seconds)
2025-08-07 12:39:07,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:39:08,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 494.30972 ± 75.790
2025-08-07 12:39:08,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [473.21127, 558.3132, 520.3488, 419.989, 340.96844, 522.2147, 586.75116, 586.90344, 423.50958, 510.88776]
2025-08-07 12:39:08,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 122.0, 91.0, 76.0, 74.0, 91.0, 109.0, 125.0, 81.0, 108.0]
2025-08-07 12:39:08,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 44 seconds)
2025-08-07 12:41:05,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:41:07,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 497.89828 ± 175.049
2025-08-07 12:41:07,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [389.46808, 520.66016, 586.0152, 364.5942, 463.77267, 365.31235, 717.52985, 674.8262, 160.80157, 736.0029]
2025-08-07 12:41:07,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 108.0, 115.0, 71.0, 84.0, 66.0, 132.0, 141.0, 31.0, 146.0]
2025-08-07 12:41:07,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 48 seconds)
2025-08-07 12:43:04,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:43:05,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 472.08063 ± 282.915
2025-08-07 12:43:05,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [124.57352, 657.366, 729.4311, 777.0415, 161.68613, 518.05164, 114.34467, 864.949, 599.6621, 173.70085]
2025-08-07 12:43:05,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 125.0, 133.0, 140.0, 31.0, 95.0, 22.0, 172.0, 109.0, 33.0]
2025-08-07 12:43:05,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 36 seconds)
2025-08-07 12:45:03,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:45:04,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 456.24268 ± 246.426
2025-08-07 12:45:04,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [389.34866, 899.256, 548.5783, 700.1594, 505.12653, 119.462296, 161.92786, 530.19116, 119.91297, 588.46387]
2025-08-07 12:45:04,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 171.0, 99.0, 129.0, 94.0, 23.0, 31.0, 105.0, 23.0, 114.0]
2025-08-07 12:45:04,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 34 seconds)
2025-08-07 12:47:03,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:47:04,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 492.79922 ± 211.996
2025-08-07 12:47:04,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [731.786, 489.32736, 113.56907, 742.58966, 403.23666, 425.63123, 671.5151, 151.69624, 563.49396, 635.1465]
2025-08-07 12:47:04,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 87.0, 22.0, 143.0, 73.0, 79.0, 121.0, 29.0, 101.0, 119.0]
2025-08-07 12:47:04,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 31 seconds)
2025-08-07 12:49:04,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:49:05,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 570.21594 ± 200.582
2025-08-07 12:49:05,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [491.25537, 748.8712, 902.37915, 551.3519, 161.52599, 579.3412, 473.27295, 798.4331, 412.92355, 582.8052]
2025-08-07 12:49:05,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 138.0, 167.0, 96.0, 31.0, 104.0, 86.0, 161.0, 74.0, 106.0]
2025-08-07 12:49:05,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 47 seconds)
2025-08-07 12:51:00,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:51:02,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 435.23468 ± 263.471
2025-08-07 12:51:02,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [456.18515, 120.16411, 166.86736, 141.14986, 561.97064, 810.5376, 675.1132, 130.1794, 787.14844, 503.0312]
2025-08-07 12:51:02,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 23.0, 32.0, 27.0, 105.0, 161.0, 127.0, 25.0, 143.0, 89.0]
2025-08-07 12:51:02,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 39 seconds)
2025-08-07 12:53:01,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:53:04,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 726.85553 ± 292.180
2025-08-07 12:53:04,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [630.95654, 413.8123, 962.06537, 1478.3585, 576.8212, 490.77243, 742.4516, 807.46375, 582.79047, 583.0635]
2025-08-07 12:53:04,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 90.0, 188.0, 271.0, 109.0, 91.0, 140.0, 149.0, 115.0, 106.0]
2025-08-07 12:53:04,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (726.86) for latency MM1Queue_a033_s075
2025-08-07 12:53:04,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 54 seconds)
2025-08-07 12:54:59,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:55:01,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 702.06830 ± 277.484
2025-08-07 12:55:01,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [1011.9056, 718.4875, 807.9932, 693.6558, 1071.1842, 697.09607, 491.63284, 366.84784, 993.33887, 168.54129]
2025-08-07 12:55:01,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [191.0, 135.0, 152.0, 127.0, 192.0, 128.0, 99.0, 70.0, 200.0, 32.0]
2025-08-07 12:55:01,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 47 seconds)
2025-08-07 12:56:57,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:56:59,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 561.70477 ± 266.875
2025-08-07 12:56:59,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [103.538925, 645.66046, 949.6279, 913.3689, 592.621, 493.0322, 611.9939, 107.10606, 559.4147, 640.6843]
2025-08-07 12:56:59,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 117.0, 180.0, 179.0, 131.0, 97.0, 116.0, 21.0, 121.0, 143.0]
2025-08-07 12:56:59,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 40 seconds)
2025-08-07 12:58:57,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:58:59,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 470.27090 ± 321.843
2025-08-07 12:58:59,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [124.50939, 572.3085, 745.11475, 521.21814, 151.03217, 152.45172, 509.47714, 1147.586, 114.30575, 664.7057]
2025-08-07 12:58:59,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 125.0, 136.0, 93.0, 29.0, 29.0, 95.0, 213.0, 22.0, 119.0]
2025-08-07 12:58:59,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 38 seconds)
2025-08-07 13:00:58,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:01:00,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 623.13214 ± 285.193
2025-08-07 13:01:00,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [1219.3811, 965.9047, 474.5227, 508.4338, 660.46124, 119.185265, 440.21054, 531.698, 709.5999, 601.924]
2025-08-07 13:01:00,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [239.0, 178.0, 86.0, 101.0, 119.0, 23.0, 83.0, 95.0, 151.0, 112.0]
2025-08-07 13:01:00,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 54 seconds)
2025-08-07 13:02:56,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:02:57,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 477.23257 ± 379.827
2025-08-07 13:02:57,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [336.4439, 1345.6249, 451.33804, 156.86111, 747.4749, 442.3316, 146.32422, 136.55656, 868.09705, 141.27356]
2025-08-07 13:02:57,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 245.0, 82.0, 30.0, 139.0, 84.0, 28.0, 26.0, 166.0, 27.0]
2025-08-07 13:02:57,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 40 seconds)
2025-08-07 13:04:55,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:04:57,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 653.65405 ± 295.206
2025-08-07 13:04:57,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [596.9866, 145.43509, 777.97076, 1068.8801, 657.0497, 548.1265, 851.29504, 969.18646, 146.2041, 775.40607]
2025-08-07 13:04:57,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 28.0, 144.0, 206.0, 119.0, 101.0, 157.0, 176.0, 28.0, 139.0]
2025-08-07 13:04:57,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 48 seconds)
2025-08-07 13:06:53,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:06:55,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 577.26398 ± 337.617
2025-08-07 13:06:55,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [1280.607, 726.89014, 542.58344, 392.12582, 118.8515, 346.12808, 118.73451, 660.16, 874.77814, 711.78094]
2025-08-07 13:06:55,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [241.0, 138.0, 96.0, 70.0, 23.0, 61.0, 23.0, 124.0, 153.0, 133.0]
2025-08-07 13:06:55,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 49 seconds)
2025-08-07 13:08:52,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:08:54,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 701.29346 ± 265.150
2025-08-07 13:08:54,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [632.12463, 930.17914, 155.20328, 463.6556, 447.81302, 975.76996, 931.35095, 667.89343, 991.0463, 817.8981]
2025-08-07 13:08:54,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 172.0, 30.0, 83.0, 83.0, 171.0, 202.0, 120.0, 179.0, 148.0]
2025-08-07 13:08:54,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 49 seconds)
2025-08-07 13:10:53,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:10:55,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 731.16101 ± 355.401
2025-08-07 13:10:55,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [748.56586, 129.54224, 450.20633, 1032.47, 694.32855, 1020.00665, 935.2006, 1200.2871, 150.9086, 950.094]
2025-08-07 13:10:55,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 25.0, 83.0, 187.0, 127.0, 190.0, 184.0, 224.0, 29.0, 176.0]
2025-08-07 13:10:55,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (731.16) for latency MM1Queue_a033_s075
2025-08-07 13:10:55,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 48 seconds)
2025-08-07 13:12:52,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:12:54,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 544.96277 ± 219.570
2025-08-07 13:12:54,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [770.53253, 548.4289, 527.40466, 415.76233, 123.64302, 920.5747, 777.67883, 397.29965, 405.8432, 562.4601]
2025-08-07 13:12:54,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 97.0, 112.0, 87.0, 24.0, 172.0, 141.0, 85.0, 73.0, 105.0]
2025-08-07 13:12:54,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 52 seconds)
2025-08-07 13:14:51,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:14:53,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 675.76404 ± 282.131
2025-08-07 13:14:53,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [682.0505, 1101.9196, 939.4112, 759.9922, 120.68251, 401.73755, 710.5063, 441.96387, 983.99255, 615.3838]
2025-08-07 13:14:53,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 200.0, 173.0, 140.0, 23.0, 73.0, 133.0, 82.0, 178.0, 110.0]
2025-08-07 13:14:53,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 52 seconds)
2025-08-07 13:16:50,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:16:51,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 575.85370 ± 261.078
2025-08-07 13:16:51,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [461.86877, 341.4853, 628.353, 564.49835, 544.0863, 535.6232, 109.212494, 576.776, 1127.6929, 868.9411]
2025-08-07 13:16:51,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 63.0, 119.0, 110.0, 95.0, 98.0, 21.0, 104.0, 208.0, 164.0]
2025-08-07 13:16:51,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 54 seconds)
2025-08-07 13:18:48,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:18:49,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 615.68457 ± 343.858
2025-08-07 13:18:49,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [487.00058, 524.44604, 97.41919, 507.7024, 467.8375, 1252.59, 472.794, 693.0958, 1235.6564, 418.3036]
2025-08-07 13:18:49,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 92.0, 19.0, 89.0, 88.0, 241.0, 83.0, 125.0, 257.0, 71.0]
2025-08-07 13:18:49,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 52 seconds)
2025-08-07 13:20:47,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:20:49,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 704.04474 ± 253.753
2025-08-07 13:20:49,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [671.94696, 616.04596, 131.0027, 613.0737, 656.1768, 799.7737, 695.4457, 719.0125, 1169.8939, 968.0751]
2025-08-07 13:20:49,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 108.0, 25.0, 114.0, 121.0, 154.0, 127.0, 130.0, 212.0, 178.0]
2025-08-07 13:20:49,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 53 seconds)
2025-08-07 13:22:45,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:22:47,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 687.07922 ± 448.220
2025-08-07 13:22:47,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [145.59485, 1166.3138, 476.05853, 114.22327, 827.51843, 941.9311, 140.38777, 1011.7322, 577.3516, 1469.6813]
2025-08-07 13:22:47,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 211.0, 85.0, 22.0, 149.0, 174.0, 27.0, 185.0, 103.0, 300.0]
2025-08-07 13:22:47,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 53 seconds)
2025-08-07 13:24:43,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:24:45,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 722.73285 ± 278.444
2025-08-07 13:24:45,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [510.3269, 453.34787, 907.4632, 1061.7758, 884.03235, 141.50276, 815.09467, 944.0599, 548.56036, 961.1647]
2025-08-07 13:24:45,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 80.0, 166.0, 194.0, 177.0, 27.0, 152.0, 170.0, 98.0, 183.0]
2025-08-07 13:24:45,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 53 seconds)
2025-08-07 13:26:42,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:26:44,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 514.14124 ± 283.965
2025-08-07 13:26:44,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [590.1791, 1035.2218, 675.02277, 796.95795, 519.9656, 577.5667, 524.65405, 155.66135, 125.59984, 140.58354]
2025-08-07 13:26:44,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 192.0, 125.0, 144.0, 97.0, 109.0, 96.0, 30.0, 24.0, 27.0]
2025-08-07 13:26:44,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 55 seconds)
2025-08-07 13:28:40,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:28:42,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 733.24988 ± 410.450
2025-08-07 13:28:42,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [587.025, 388.431, 1749.9244, 967.4611, 737.7552, 651.4989, 124.88912, 659.279, 541.21967, 925.01556]
2025-08-07 13:28:42,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 71.0, 341.0, 192.0, 136.0, 122.0, 24.0, 119.0, 99.0, 176.0]
2025-08-07 13:28:42,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (733.25) for latency MM1Queue_a033_s075
2025-08-07 13:28:42,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 57 seconds)
2025-08-07 13:30:39,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:30:41,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 628.76251 ± 148.856
2025-08-07 13:30:41,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [508.91782, 961.4826, 600.424, 510.7275, 427.66272, 577.88965, 660.4887, 817.6385, 584.2455, 638.1482]
2025-08-07 13:30:41,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 184.0, 104.0, 113.0, 76.0, 116.0, 137.0, 164.0, 126.0, 132.0]
2025-08-07 13:30:41,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 58 seconds)
2025-08-07 13:32:37,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 13:32:38,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 599.25012 ± 402.251
2025-08-07 13:32:38,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [124.44105, 103.22909, 683.7293, 654.94653, 654.68854, 920.82086, 108.24905, 839.5902, 459.9815, 1442.8252]
2025-08-07 13:32:38,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 20.0, 124.0, 114.0, 121.0, 182.0, 21.0, 145.0, 86.0, 270.0]
2025-08-07 13:32:38,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1251 [DEBUG]: Training session finished
