2025-08-07 03:32:38,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc15-humanoid/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:32:38,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc15-humanoid/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:32:38,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14c2d2f8f990>}
2025-08-07 03:32:38,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 03:32:38,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 03:32:38,349 baseline-bpql-noiseperc15-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=920, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 03:32:38,349 baseline-bpql-noiseperc15-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 03:32:41,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 03:32:41,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 03:34:34,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:34:35,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 152.84735 ± 62.027
2025-08-07 03:34:35,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [160.37428, 89.33055, 318.98108, 118.371, 96.412, 148.68362, 146.01245, 132.66026, 189.36865, 128.27962]
2025-08-07 03:34:35,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 18.0, 61.0, 23.0, 19.0, 28.0, 28.0, 27.0, 36.0, 25.0]
2025-08-07 03:34:35,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (152.85) for latency ExtremeSparseL4U32
2025-08-07 03:34:35,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 7 minutes, 33 seconds)
2025-08-07 03:36:36,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:36:36,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 158.32411 ± 55.059
2025-08-07 03:36:36,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [150.24664, 179.49892, 130.00145, 113.81418, 170.46873, 150.83401, 151.97078, 309.01474, 107.23878, 120.152954]
2025-08-07 03:36:36,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 35.0, 25.0, 22.0, 33.0, 29.0, 29.0, 63.0, 21.0, 23.0]
2025-08-07 03:36:36,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (158.32) for latency ExtremeSparseL4U32
2025-08-07 03:36:36,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 11 minutes, 48 seconds)
2025-08-07 03:38:38,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:38:39,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 177.30109 ± 86.007
2025-08-07 03:38:39,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [197.83029, 124.30394, 90.66735, 202.0546, 149.25386, 119.70944, 176.09099, 161.72594, 415.17972, 136.1947]
2025-08-07 03:38:39,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 24.0, 18.0, 39.0, 29.0, 23.0, 34.0, 31.0, 76.0, 26.0]
2025-08-07 03:38:39,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (177.30) for latency ExtremeSparseL4U32
2025-08-07 03:38:39,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 12 minutes, 32 seconds)
2025-08-07 03:40:40,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:40:40,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 131.72200 ± 21.340
2025-08-07 03:40:40,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [118.784904, 153.33629, 173.95337, 139.98328, 109.727066, 118.15997, 142.51407, 108.42749, 107.7397, 144.59375]
2025-08-07 03:40:40,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 31.0, 34.0, 27.0, 21.0, 23.0, 28.0, 21.0, 21.0, 28.0]
2025-08-07 03:40:40,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 11 minutes, 37 seconds)
2025-08-07 03:42:42,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:42:42,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 160.40494 ± 54.814
2025-08-07 03:42:42,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [135.78496, 141.33282, 176.69206, 156.17854, 173.01237, 118.94885, 310.96075, 130.78954, 157.94878, 102.40064]
2025-08-07 03:42:42,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 27.0, 34.0, 31.0, 34.0, 23.0, 62.0, 25.0, 30.0, 20.0]
2025-08-07 03:42:42,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 10 minutes, 21 seconds)
2025-08-07 03:44:43,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:44:44,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 171.24332 ± 85.875
2025-08-07 03:44:44,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [101.22333, 338.21274, 164.57767, 108.65952, 336.50125, 113.44609, 119.09379, 142.41151, 169.97534, 118.331985]
2025-08-07 03:44:44,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 68.0, 32.0, 21.0, 63.0, 22.0, 23.0, 28.0, 32.0, 23.0]
2025-08-07 03:44:44,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 10 minutes, 46 seconds)
2025-08-07 03:46:44,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:46:45,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 173.59361 ± 71.004
2025-08-07 03:46:45,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [171.24352, 150.56186, 226.1016, 364.35703, 120.18642, 148.6177, 103.01636, 169.83823, 149.25519, 132.75842]
2025-08-07 03:46:45,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 29.0, 44.0, 80.0, 23.0, 29.0, 20.0, 33.0, 29.0, 26.0]
2025-08-07 03:46:45,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 8 minutes, 42 seconds)
2025-08-07 03:48:45,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:48:46,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 156.26984 ± 67.974
2025-08-07 03:48:46,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [159.27072, 132.98166, 156.69849, 195.48807, 342.2756, 95.37538, 118.88103, 102.40951, 122.831184, 136.48671]
2025-08-07 03:48:46,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 26.0, 31.0, 38.0, 63.0, 19.0, 23.0, 20.0, 24.0, 26.0]
2025-08-07 03:48:46,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 6 minutes, 12 seconds)
2025-08-07 03:50:46,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:50:47,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 132.57368 ± 37.237
2025-08-07 03:50:47,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [154.72481, 130.80104, 108.091255, 101.66961, 216.74017, 123.20372, 122.59404, 175.36632, 102.5862, 89.95959]
2025-08-07 03:50:47,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 25.0, 21.0, 20.0, 42.0, 24.0, 24.0, 34.0, 20.0, 18.0]
2025-08-07 03:50:47,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 3 minutes, 56 seconds)
2025-08-07 03:52:47,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:52:48,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 124.83485 ± 22.189
2025-08-07 03:52:48,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [133.77072, 97.21121, 158.30313, 130.82204, 118.47743, 122.383514, 166.10815, 107.4326, 117.64183, 96.19783]
2025-08-07 03:52:48,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 19.0, 31.0, 25.0, 23.0, 24.0, 32.0, 21.0, 23.0, 19.0]
2025-08-07 03:52:48,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 1 minute, 35 seconds)
2025-08-07 03:54:48,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:54:49,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 142.48047 ± 29.141
2025-08-07 03:54:49,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [187.1411, 167.37834, 142.95245, 107.08421, 174.41032, 140.80817, 101.92677, 123.68429, 167.89293, 111.52603]
2025-08-07 03:54:49,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 32.0, 28.0, 21.0, 34.0, 27.0, 20.0, 24.0, 32.0, 22.0]
2025-08-07 03:54:49,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 59 minutes, 24 seconds)
2025-08-07 03:56:49,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:56:49,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 140.41008 ± 35.447
2025-08-07 03:56:49,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [216.98714, 150.0109, 189.72539, 142.83315, 112.93797, 124.01949, 136.76447, 111.33959, 95.4542, 124.02844]
2025-08-07 03:56:49,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [42.0, 29.0, 37.0, 28.0, 22.0, 24.0, 27.0, 22.0, 19.0, 24.0]
2025-08-07 03:56:49,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 57 minutes, 12 seconds)
2025-08-07 03:58:49,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:58:50,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 136.95717 ± 32.419
2025-08-07 03:58:50,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [118.89296, 118.6145, 141.3844, 108.17373, 220.2333, 124.57987, 170.77046, 122.910835, 114.14866, 129.86287]
2025-08-07 03:58:50,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 27.0, 21.0, 42.0, 24.0, 33.0, 24.0, 22.0, 25.0]
2025-08-07 03:58:50,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 55 minutes, 7 seconds)
2025-08-07 04:00:50,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:00:50,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 118.81895 ± 20.848
2025-08-07 04:00:50,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [106.60746, 102.61692, 160.66284, 118.871216, 101.95274, 157.29355, 117.1388, 102.640526, 111.92992, 108.47541]
2025-08-07 04:00:50,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 20.0, 31.0, 23.0, 20.0, 30.0, 23.0, 20.0, 22.0, 21.0]
2025-08-07 04:00:50,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 53 minutes, 2 seconds)
2025-08-07 04:02:51,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:02:52,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 254.22287 ± 131.767
2025-08-07 04:02:52,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [273.58945, 309.30374, 165.5561, 163.93808, 119.93925, 369.38275, 455.35806, 96.07196, 129.93727, 459.152]
2025-08-07 04:02:52,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 63.0, 32.0, 32.0, 23.0, 69.0, 87.0, 19.0, 25.0, 89.0]
2025-08-07 04:02:52,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (254.22) for latency ExtremeSparseL4U32
2025-08-07 04:02:52,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 51 minutes, 4 seconds)
2025-08-07 04:04:52,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:04:52,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 134.99179 ± 20.717
2025-08-07 04:04:52,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [113.51719, 123.8886, 124.55175, 96.50309, 121.07628, 152.93286, 156.81454, 161.12721, 151.58363, 147.92279]
2025-08-07 04:04:52,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 24.0, 24.0, 19.0, 24.0, 30.0, 31.0, 31.0, 29.0, 29.0]
2025-08-07 04:04:52,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 49 minutes, 1 second)
2025-08-07 04:06:50,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:06:51,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 141.54106 ± 35.616
2025-08-07 04:06:51,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [114.66138, 96.7845, 209.8398, 118.5726, 160.84119, 113.8034, 173.22878, 138.22472, 108.74156, 180.71277]
2025-08-07 04:06:51,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 19.0, 40.0, 23.0, 31.0, 22.0, 33.0, 27.0, 21.0, 35.0]
2025-08-07 04:06:51,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 46 minutes, 31 seconds)
2025-08-07 04:08:50,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:08:50,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 192.05081 ± 119.753
2025-08-07 04:08:50,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [181.69606, 379.70453, 102.699005, 468.98032, 119.41432, 141.62883, 133.31651, 125.30321, 112.416855, 155.34866]
2025-08-07 04:08:50,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 79.0, 20.0, 88.0, 23.0, 28.0, 26.0, 24.0, 22.0, 30.0]
2025-08-07 04:08:50,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 44 minutes, 11 seconds)
2025-08-07 04:10:49,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:10:50,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 185.88824 ± 110.783
2025-08-07 04:10:50,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [114.55673, 461.44928, 187.90115, 328.50363, 119.43855, 138.39352, 127.364716, 140.76584, 102.698, 137.81099]
2025-08-07 04:10:50,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 90.0, 36.0, 67.0, 23.0, 27.0, 25.0, 27.0, 20.0, 27.0]
2025-08-07 04:10:50,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 41 minutes, 50 seconds)
2025-08-07 04:12:48,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:12:49,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 151.88765 ± 27.452
2025-08-07 04:12:49,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [108.89177, 170.80292, 119.74512, 162.36963, 145.91527, 160.97998, 151.03275, 150.23032, 135.42632, 213.48247]
2025-08-07 04:12:49,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 33.0, 23.0, 32.0, 28.0, 31.0, 29.0, 29.0, 26.0, 41.0]
2025-08-07 04:12:49,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 39 minutes, 18 seconds)
2025-08-07 04:14:47,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:14:48,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 188.85670 ± 99.229
2025-08-07 04:14:48,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [112.99923, 290.546, 366.70898, 97.21538, 169.82732, 131.55238, 341.3823, 90.68038, 114.37083, 173.28413]
2025-08-07 04:14:48,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 56.0, 71.0, 19.0, 33.0, 25.0, 66.0, 18.0, 22.0, 34.0]
2025-08-07 04:14:48,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 36 minutes, 51 seconds)
2025-08-07 04:16:47,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:16:47,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 155.11275 ± 65.143
2025-08-07 04:16:47,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [344.75406, 155.73792, 112.09889, 145.54236, 113.19591, 130.99667, 135.05045, 132.46204, 119.93889, 161.3504]
2025-08-07 04:16:47,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 30.0, 22.0, 28.0, 22.0, 25.0, 26.0, 26.0, 23.0, 31.0]
2025-08-07 04:16:47,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 35 minutes, 4 seconds)
2025-08-07 04:18:46,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:18:46,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 147.43210 ± 11.701
2025-08-07 04:18:46,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [135.54813, 135.92465, 159.13626, 156.27945, 134.37607, 130.55902, 159.22662, 147.99883, 151.41476, 163.85724]
2025-08-07 04:18:46,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 26.0, 31.0, 30.0, 26.0, 25.0, 31.0, 28.0, 29.0, 31.0]
2025-08-07 04:18:46,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 32 minutes, 55 seconds)
2025-08-07 04:20:45,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:20:45,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 129.24686 ± 21.589
2025-08-07 04:20:45,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [139.98985, 127.466736, 119.17457, 140.37201, 138.81422, 155.18027, 165.01892, 102.059395, 96.25782, 108.13478]
2025-08-07 04:20:45,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 25.0, 23.0, 27.0, 27.0, 30.0, 32.0, 20.0, 19.0, 21.0]
2025-08-07 04:20:45,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 30 minutes, 49 seconds)
2025-08-07 04:22:44,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:22:45,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 151.60356 ± 72.546
2025-08-07 04:22:45,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [134.23347, 108.52705, 102.7885, 180.22968, 125.43279, 101.74945, 168.93709, 355.79645, 118.93692, 119.40432]
2025-08-07 04:22:45,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 21.0, 20.0, 36.0, 24.0, 20.0, 33.0, 66.0, 23.0, 23.0]
2025-08-07 04:22:45,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 29 minutes)
2025-08-07 04:24:43,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:24:44,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 200.67561 ± 152.863
2025-08-07 04:24:44,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [124.48948, 588.0947, 150.91272, 101.33132, 154.25902, 393.98685, 141.90364, 116.04644, 145.92021, 89.811714]
2025-08-07 04:24:44,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 118.0, 29.0, 20.0, 30.0, 79.0, 28.0, 23.0, 28.0, 18.0]
2025-08-07 04:24:44,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 26 minutes, 57 seconds)
2025-08-07 04:26:42,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:26:42,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 179.93271 ± 105.093
2025-08-07 04:26:42,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [209.39941, 107.89886, 108.53712, 183.85596, 171.1489, 114.53939, 112.6781, 124.450325, 191.61607, 475.20294]
2025-08-07 04:26:42,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [41.0, 21.0, 21.0, 36.0, 33.0, 22.0, 22.0, 24.0, 37.0, 92.0]
2025-08-07 04:26:42,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 24 minutes, 45 seconds)
2025-08-07 04:28:40,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:28:41,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 186.84470 ± 98.456
2025-08-07 04:28:41,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [101.62683, 90.92875, 102.1393, 179.15572, 151.38597, 172.24802, 364.59067, 166.26291, 156.24734, 383.86157]
2025-08-07 04:28:41,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 18.0, 20.0, 35.0, 30.0, 34.0, 70.0, 33.0, 30.0, 70.0]
2025-08-07 04:28:41,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 22 minutes, 44 seconds)
2025-08-07 04:30:39,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:30:40,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 135.96878 ± 48.740
2025-08-07 04:30:40,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [102.54287, 90.3597, 97.1647, 124.32684, 195.0028, 119.362434, 129.76596, 144.20033, 254.0962, 102.86585]
2025-08-07 04:30:40,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 18.0, 19.0, 24.0, 38.0, 23.0, 25.0, 28.0, 48.0, 20.0]
2025-08-07 04:30:40,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 20 minutes, 41 seconds)
2025-08-07 04:32:39,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:32:40,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 146.63593 ± 67.313
2025-08-07 04:32:40,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [150.19034, 108.396774, 153.78426, 114.23855, 343.33112, 130.208, 119.59532, 107.342476, 115.02747, 124.245026]
2025-08-07 04:32:40,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 21.0, 30.0, 22.0, 71.0, 25.0, 23.0, 21.0, 22.0, 24.0]
2025-08-07 04:32:40,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 18 minutes, 45 seconds)
2025-08-07 04:34:39,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:34:40,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 140.23201 ± 28.099
2025-08-07 04:34:40,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [156.52223, 110.53199, 168.67003, 140.96953, 124.79304, 101.95285, 131.3933, 135.83954, 203.4248, 128.22284]
2025-08-07 04:34:40,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 22.0, 32.0, 27.0, 24.0, 20.0, 25.0, 26.0, 39.0, 25.0]
2025-08-07 04:34:40,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 17 minutes, 8 seconds)
2025-08-07 04:36:39,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:36:40,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 188.38217 ± 86.556
2025-08-07 04:36:40,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [404.2021, 142.20248, 101.76799, 150.55421, 130.3833, 221.31252, 271.6619, 119.16889, 187.34396, 155.2243]
2025-08-07 04:36:40,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 27.0, 20.0, 29.0, 25.0, 42.0, 54.0, 23.0, 36.0, 30.0]
2025-08-07 04:36:40,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 15 minutes, 29 seconds)
2025-08-07 04:38:39,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:38:40,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 157.56267 ± 42.955
2025-08-07 04:38:40,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [102.790054, 224.23514, 108.252075, 129.53989, 164.15155, 160.21358, 102.96083, 187.41249, 215.21812, 180.853]
2025-08-07 04:38:40,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 44.0, 21.0, 25.0, 32.0, 31.0, 20.0, 37.0, 42.0, 35.0]
2025-08-07 04:38:40,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 13 minutes, 47 seconds)
2025-08-07 04:40:39,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:40:40,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 162.07581 ± 41.523
2025-08-07 04:40:40,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [95.638374, 226.02242, 146.91066, 158.78645, 203.80518, 196.50389, 203.42456, 151.47154, 113.85289, 124.34208]
2025-08-07 04:40:40,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 45.0, 28.0, 30.0, 39.0, 38.0, 41.0, 29.0, 22.0, 24.0]
2025-08-07 04:40:40,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 12 minutes, 3 seconds)
2025-08-07 04:42:40,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:42:40,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 143.99510 ± 50.995
2025-08-07 04:42:40,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [102.460335, 152.98108, 117.31375, 253.1318, 102.54572, 149.90189, 215.44965, 154.21812, 96.85131, 95.097374]
2025-08-07 04:42:40,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 30.0, 23.0, 50.0, 20.0, 29.0, 43.0, 30.0, 19.0, 19.0]
2025-08-07 04:42:40,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 10 minutes, 8 seconds)
2025-08-07 04:44:40,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:44:40,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 126.87551 ± 21.726
2025-08-07 04:44:40,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [101.78147, 139.97844, 106.70643, 122.14494, 109.235275, 141.4604, 177.38243, 138.71564, 112.24689, 119.1033]
2025-08-07 04:44:40,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 27.0, 21.0, 24.0, 21.0, 27.0, 34.0, 27.0, 22.0, 23.0]
2025-08-07 04:44:40,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 8 minutes, 5 seconds)
2025-08-07 04:46:39,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:46:40,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 128.68314 ± 22.653
2025-08-07 04:46:40,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [118.65524, 141.75916, 130.12523, 150.17441, 161.14824, 113.24658, 131.05382, 89.62119, 153.38383, 97.66368]
2025-08-07 04:46:40,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 27.0, 26.0, 29.0, 31.0, 22.0, 25.0, 18.0, 30.0, 19.0]
2025-08-07 04:46:40,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 5 minutes, 56 seconds)
2025-08-07 04:48:39,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:48:40,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 152.41000 ± 32.075
2025-08-07 04:48:40,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [180.8605, 127.903175, 112.842964, 202.90903, 128.82126, 119.472374, 137.92667, 136.99162, 180.64119, 195.73123]
2025-08-07 04:48:40,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 25.0, 22.0, 40.0, 25.0, 23.0, 27.0, 26.0, 35.0, 38.0]
2025-08-07 04:48:40,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 3 minutes, 57 seconds)
2025-08-07 04:50:39,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:50:39,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 142.76888 ± 29.939
2025-08-07 04:50:39,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [124.42942, 188.85617, 129.53737, 183.5278, 107.26989, 157.29181, 102.649055, 140.6853, 118.87862, 174.56326]
2025-08-07 04:50:39,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 37.0, 26.0, 36.0, 21.0, 30.0, 20.0, 28.0, 23.0, 34.0]
2025-08-07 04:50:39,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 1 minute, 55 seconds)
2025-08-07 04:52:39,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:52:40,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 173.33208 ± 42.104
2025-08-07 04:52:40,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [146.90271, 152.98485, 129.4752, 264.1313, 190.09502, 162.06596, 148.5396, 145.33502, 238.9106, 154.88039]
2025-08-07 04:52:40,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 30.0, 25.0, 57.0, 37.0, 31.0, 29.0, 29.0, 48.0, 30.0]
2025-08-07 04:52:40,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 59 minutes, 50 seconds)
2025-08-07 04:54:39,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:54:40,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 152.86691 ± 32.035
2025-08-07 04:54:40,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [161.21265, 172.76808, 177.65598, 129.8921, 138.86903, 119.055786, 102.0479, 131.93651, 185.5865, 209.64459]
2025-08-07 04:54:40,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 34.0, 34.0, 25.0, 27.0, 23.0, 20.0, 26.0, 36.0, 41.0]
2025-08-07 04:54:40,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 57 minutes, 51 seconds)
2025-08-07 04:56:39,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:56:40,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 155.89761 ± 33.342
2025-08-07 04:56:40,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [119.44786, 176.81886, 188.97833, 147.01064, 192.67972, 114.23923, 190.34749, 158.6434, 96.32546, 174.48509]
2025-08-07 04:56:40,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 35.0, 36.0, 29.0, 37.0, 22.0, 37.0, 30.0, 19.0, 35.0]
2025-08-07 04:56:40,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 58 seconds)
2025-08-07 04:58:39,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:58:39,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 149.24248 ± 29.076
2025-08-07 04:58:39,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [196.53871, 114.36679, 108.67455, 185.96013, 127.66624, 160.11455, 169.51866, 140.31857, 123.5382, 165.72841]
2025-08-07 04:58:39,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 22.0, 21.0, 36.0, 25.0, 32.0, 33.0, 27.0, 24.0, 32.0]
2025-08-07 04:58:39,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 56 seconds)
2025-08-07 05:00:39,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:00:40,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 128.19861 ± 21.039
2025-08-07 05:00:40,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [108.43401, 112.352554, 167.90375, 141.77377, 107.10123, 128.7505, 120.16717, 101.95205, 138.56339, 154.98772]
2025-08-07 05:00:40,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 22.0, 32.0, 28.0, 21.0, 25.0, 24.0, 20.0, 27.0, 30.0]
2025-08-07 05:00:40,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 52 minutes)
2025-08-07 05:02:39,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:02:39,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 147.71167 ± 26.224
2025-08-07 05:02:39,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [102.866035, 145.25955, 145.54306, 146.47589, 199.07, 163.14676, 109.188866, 162.4724, 162.64888, 140.44514]
2025-08-07 05:02:39,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 28.0, 28.0, 28.0, 39.0, 33.0, 21.0, 32.0, 31.0, 27.0]
2025-08-07 05:02:39,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 54 seconds)
2025-08-07 05:04:39,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:04:39,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 166.65683 ± 35.058
2025-08-07 05:04:39,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [124.040115, 126.30871, 203.06522, 195.98836, 189.50949, 143.28822, 207.62936, 201.0804, 160.92126, 114.737114]
2025-08-07 05:04:39,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 24.0, 39.0, 39.0, 37.0, 28.0, 41.0, 41.0, 31.0, 22.0]
2025-08-07 05:04:39,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 47 minutes, 55 seconds)
2025-08-07 05:06:39,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:06:39,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 149.61307 ± 47.231
2025-08-07 05:06:39,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [96.197426, 120.28217, 108.634476, 222.35274, 122.84915, 150.74503, 113.949394, 243.52634, 140.81972, 176.77422]
2025-08-07 05:06:39,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 23.0, 21.0, 42.0, 24.0, 29.0, 22.0, 50.0, 27.0, 34.0]
2025-08-07 05:06:39,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 45 minutes, 54 seconds)
2025-08-07 05:08:39,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:08:39,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 147.31477 ± 32.343
2025-08-07 05:08:39,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [166.4909, 196.2668, 111.84377, 123.092735, 103.08819, 148.02515, 199.88087, 117.001816, 161.81537, 145.64212]
2025-08-07 05:08:39,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 40.0, 22.0, 24.0, 20.0, 29.0, 39.0, 23.0, 31.0, 28.0]
2025-08-07 05:08:39,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 56 seconds)
2025-08-07 05:10:38,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:10:38,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 142.16313 ± 25.534
2025-08-07 05:10:38,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [131.09373, 165.28009, 182.64064, 117.85606, 128.90839, 146.13138, 131.20142, 102.016495, 133.83186, 182.67125]
2025-08-07 05:10:38,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 33.0, 35.0, 23.0, 25.0, 28.0, 26.0, 20.0, 26.0, 35.0]
2025-08-07 05:10:38,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 49 seconds)
2025-08-07 05:12:38,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:12:38,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 127.36383 ± 22.151
2025-08-07 05:12:38,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [90.85818, 129.42398, 117.53438, 167.08528, 152.80435, 130.23286, 118.48947, 95.99268, 140.12653, 131.09058]
2025-08-07 05:12:38,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 25.0, 23.0, 33.0, 30.0, 25.0, 23.0, 19.0, 27.0, 25.0]
2025-08-07 05:12:38,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 51 seconds)
2025-08-07 05:14:37,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:14:38,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 127.44242 ± 24.140
2025-08-07 05:14:38,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [108.86222, 169.25862, 117.3912, 114.041954, 115.32884, 103.02941, 140.14778, 153.09785, 157.3689, 95.897316]
2025-08-07 05:14:38,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 33.0, 23.0, 22.0, 23.0, 20.0, 27.0, 29.0, 30.0, 19.0]
2025-08-07 05:14:38,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 45 seconds)
2025-08-07 05:16:37,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:16:38,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 169.20830 ± 65.630
2025-08-07 05:16:38,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [176.69447, 134.51622, 96.5964, 351.56976, 137.70328, 164.94994, 127.09287, 154.95555, 185.21063, 162.79393]
2025-08-07 05:16:38,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 26.0, 19.0, 71.0, 27.0, 32.0, 25.0, 30.0, 39.0, 31.0]
2025-08-07 05:16:38,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 35 minutes, 48 seconds)
2025-08-07 05:18:37,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:18:38,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 136.27336 ± 22.386
2025-08-07 05:18:38,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [151.2985, 122.79773, 119.94748, 132.6926, 166.91129, 125.98031, 136.38359, 90.32598, 168.37859, 148.01752]
2025-08-07 05:18:38,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 24.0, 23.0, 26.0, 33.0, 24.0, 26.0, 18.0, 33.0, 29.0]
2025-08-07 05:18:38,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 48 seconds)
2025-08-07 05:20:37,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:20:37,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 137.14999 ± 23.829
2025-08-07 05:20:37,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [163.18843, 148.57394, 118.68676, 163.54153, 109.18407, 114.769, 146.4389, 96.11309, 148.24976, 162.7544]
2025-08-07 05:20:37,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 29.0, 23.0, 33.0, 21.0, 22.0, 29.0, 19.0, 29.0, 32.0]
2025-08-07 05:20:37,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 50 seconds)
2025-08-07 05:22:37,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:22:38,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 175.33549 ± 68.040
2025-08-07 05:22:38,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [180.00214, 102.98028, 328.3811, 153.69698, 257.95566, 144.53526, 208.48761, 136.96759, 101.274895, 139.07346]
2025-08-07 05:22:38,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 20.0, 68.0, 30.0, 54.0, 28.0, 41.0, 27.0, 20.0, 27.0]
2025-08-07 05:22:38,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 29 minutes, 55 seconds)
2025-08-07 05:24:37,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:24:38,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 138.45526 ± 19.803
2025-08-07 05:24:38,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [140.79411, 123.425125, 130.94318, 127.21818, 102.93395, 156.84644, 139.41997, 128.73776, 157.92146, 176.31233]
2025-08-07 05:24:38,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 24.0, 26.0, 25.0, 20.0, 30.0, 27.0, 25.0, 31.0, 34.0]
2025-08-07 05:24:38,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes)
2025-08-07 05:26:37,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:26:38,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 134.19211 ± 27.832
2025-08-07 05:26:38,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [179.46793, 128.49002, 161.41093, 101.22894, 129.99664, 172.13306, 103.391045, 131.10117, 138.00893, 96.6925]
2025-08-07 05:26:38,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 25.0, 31.0, 20.0, 25.0, 34.0, 20.0, 25.0, 27.0, 19.0]
2025-08-07 05:26:38,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 25 minutes, 56 seconds)
2025-08-07 05:28:37,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:28:38,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 140.93198 ± 29.253
2025-08-07 05:28:38,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [176.6553, 139.39978, 167.82538, 170.80524, 114.26976, 155.44357, 96.261406, 89.90537, 149.0853, 149.66869]
2025-08-07 05:28:38,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 27.0, 32.0, 33.0, 22.0, 30.0, 19.0, 18.0, 29.0, 29.0]
2025-08-07 05:28:38,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 23 minutes, 57 seconds)
2025-08-07 05:30:37,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:30:37,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 137.12700 ± 26.923
2025-08-07 05:30:37,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [114.25149, 151.78645, 103.09235, 124.47619, 102.50605, 171.31322, 155.43414, 113.57692, 160.35089, 174.48236]
2025-08-07 05:30:37,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 30.0, 20.0, 24.0, 20.0, 33.0, 31.0, 22.0, 31.0, 34.0]
2025-08-07 05:30:37,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 21 minutes, 58 seconds)
2025-08-07 05:32:37,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:32:37,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 140.81850 ± 30.800
2025-08-07 05:32:37,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [139.11555, 187.04915, 196.89891, 114.14189, 102.31065, 134.01112, 107.9813, 160.41954, 146.9003, 119.35652]
2025-08-07 05:32:37,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 37.0, 38.0, 22.0, 20.0, 26.0, 21.0, 31.0, 28.0, 23.0]
2025-08-07 05:32:37,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 19 minutes, 58 seconds)
2025-08-07 05:34:37,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:34:37,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 175.61812 ± 46.481
2025-08-07 05:34:37,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [197.43234, 217.13751, 151.87843, 164.75014, 140.33829, 281.45837, 108.3459, 159.25768, 141.7022, 193.88042]
2025-08-07 05:34:37,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 44.0, 29.0, 32.0, 27.0, 56.0, 21.0, 31.0, 27.0, 38.0]
2025-08-07 05:34:37,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 17 minutes, 57 seconds)
2025-08-07 05:36:37,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:36:37,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 132.94713 ± 17.116
2025-08-07 05:36:37,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [119.91285, 159.66823, 143.42017, 107.274376, 123.953896, 136.02596, 116.29991, 150.46686, 119.182434, 153.26677]
2025-08-07 05:36:37,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 31.0, 28.0, 21.0, 24.0, 27.0, 23.0, 29.0, 23.0, 30.0]
2025-08-07 05:36:37,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 15 minutes, 58 seconds)
2025-08-07 05:38:37,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:38:37,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 146.75638 ± 31.845
2025-08-07 05:38:37,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [136.50461, 159.98424, 161.73518, 203.65788, 175.95157, 119.378815, 171.0389, 107.3035, 135.48831, 96.52087]
2025-08-07 05:38:37,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 31.0, 31.0, 39.0, 36.0, 23.0, 33.0, 21.0, 26.0, 19.0]
2025-08-07 05:38:37,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 13 minutes, 58 seconds)
2025-08-07 05:40:37,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:40:37,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 142.97884 ± 27.335
2025-08-07 05:40:37,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [154.27176, 114.41936, 165.83699, 157.68016, 113.41254, 119.13528, 147.11485, 204.59064, 127.65171, 125.674995]
2025-08-07 05:40:37,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 22.0, 32.0, 31.0, 22.0, 23.0, 29.0, 40.0, 25.0, 24.0]
2025-08-07 05:40:37,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 12 minutes)
2025-08-07 05:42:36,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:42:37,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 132.98788 ± 25.368
2025-08-07 05:42:37,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [141.96646, 130.17685, 146.18022, 128.28577, 112.772316, 107.50043, 193.78548, 150.10538, 108.055176, 111.05081]
2025-08-07 05:42:37,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 25.0, 28.0, 25.0, 22.0, 21.0, 38.0, 29.0, 21.0, 22.0]
2025-08-07 05:42:37,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 9 minutes, 56 seconds)
2025-08-07 05:44:36,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:44:37,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 149.56390 ± 30.663
2025-08-07 05:44:37,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [103.16652, 191.51825, 128.6038, 184.97223, 117.780334, 124.878044, 183.1387, 134.60968, 178.28726, 148.68434]
2025-08-07 05:44:37,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 37.0, 25.0, 35.0, 23.0, 24.0, 36.0, 26.0, 35.0, 29.0]
2025-08-07 05:44:37,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 7 minutes, 56 seconds)
2025-08-07 05:46:36,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:46:36,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 140.77032 ± 26.378
2025-08-07 05:46:36,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [139.79659, 175.75566, 112.85566, 164.07195, 175.01587, 135.43687, 108.722466, 101.71879, 130.19826, 164.1311]
2025-08-07 05:46:36,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 34.0, 22.0, 32.0, 35.0, 26.0, 21.0, 20.0, 25.0, 32.0]
2025-08-07 05:46:36,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes, 53 seconds)
2025-08-07 05:48:35,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:48:36,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 147.41731 ± 16.546
2025-08-07 05:48:36,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [143.11644, 157.35388, 175.60463, 131.0865, 150.61182, 149.52956, 168.0805, 114.91785, 142.79146, 141.0803]
2025-08-07 05:48:36,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 31.0, 34.0, 26.0, 29.0, 29.0, 33.0, 22.0, 28.0, 27.0]
2025-08-07 05:48:36,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 3 minutes, 51 seconds)
2025-08-07 05:50:35,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:50:36,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 147.17883 ± 34.189
2025-08-07 05:50:36,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [146.94678, 193.11328, 168.36652, 113.593765, 188.99527, 139.10274, 103.22884, 126.99313, 101.878815, 189.56927]
2025-08-07 05:50:36,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 37.0, 33.0, 22.0, 37.0, 27.0, 20.0, 25.0, 20.0, 37.0]
2025-08-07 05:50:36,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 1 minute, 48 seconds)
2025-08-07 05:52:35,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:52:35,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 155.54074 ± 66.401
2025-08-07 05:52:35,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [156.0707, 347.44635, 136.36847, 123.82531, 127.259544, 161.21407, 155.30786, 130.07385, 103.33051, 114.51057]
2025-08-07 05:52:35,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 67.0, 26.0, 24.0, 25.0, 32.0, 30.0, 25.0, 20.0, 22.0]
2025-08-07 05:52:35,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes, 49 seconds)
2025-08-07 05:54:36,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:54:37,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 170.63486 ± 59.288
2025-08-07 05:54:37,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [143.48892, 150.57407, 345.3033, 150.17969, 145.97273, 165.95396, 161.93611, 124.904205, 161.93463, 156.10107]
2025-08-07 05:54:37,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 29.0, 68.0, 29.0, 28.0, 32.0, 31.0, 24.0, 31.0, 30.0]
2025-08-07 05:54:37,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 57 minutes, 59 seconds)
2025-08-07 05:56:37,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:56:38,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 172.46707 ± 98.279
2025-08-07 05:56:38,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [168.9978, 90.90213, 398.06802, 123.75103, 153.72592, 90.38639, 90.12534, 154.52258, 317.912, 136.27954]
2025-08-07 05:56:38,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 18.0, 79.0, 24.0, 30.0, 18.0, 18.0, 29.0, 62.0, 26.0]
2025-08-07 05:56:38,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 56 minutes, 7 seconds)
2025-08-07 05:58:38,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:58:38,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 157.20462 ± 61.560
2025-08-07 05:58:38,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [124.21653, 308.70892, 169.09798, 89.382576, 95.32406, 123.89155, 140.46263, 217.11888, 163.06602, 140.77708]
2025-08-07 05:58:38,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 62.0, 33.0, 18.0, 19.0, 24.0, 28.0, 42.0, 32.0, 27.0]
2025-08-07 05:58:39,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 54 minutes, 13 seconds)
2025-08-07 06:00:39,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:00:40,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 139.18687 ± 24.110
2025-08-07 06:00:40,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [114.409996, 166.01001, 130.123, 158.68594, 135.37488, 101.27291, 169.6481, 171.66516, 119.61674, 125.062035]
2025-08-07 06:00:40,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 33.0, 25.0, 32.0, 26.0, 20.0, 33.0, 33.0, 23.0, 24.0]
2025-08-07 06:00:40,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 52 minutes, 22 seconds)
2025-08-07 06:02:40,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:02:40,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 147.39981 ± 83.255
2025-08-07 06:02:40,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [159.0656, 169.93398, 89.30829, 97.0365, 124.87935, 386.37213, 114.5838, 118.143265, 106.752495, 107.9229]
2025-08-07 06:02:40,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 33.0, 18.0, 19.0, 24.0, 74.0, 22.0, 23.0, 21.0, 21.0]
2025-08-07 06:02:40,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 26 seconds)
2025-08-07 06:04:40,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:04:41,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 202.18452 ± 148.198
2025-08-07 06:04:41,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [102.40565, 441.92276, 109.10094, 144.6872, 96.00455, 128.69102, 174.75847, 112.883514, 539.6695, 171.72162]
2025-08-07 06:04:41,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 84.0, 21.0, 28.0, 19.0, 25.0, 34.0, 22.0, 101.0, 33.0]
2025-08-07 06:04:41,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 48 minutes, 20 seconds)
2025-08-07 06:06:41,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:06:41,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 167.17659 ± 81.325
2025-08-07 06:06:41,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [107.59124, 158.23846, 160.22261, 151.76387, 149.1062, 90.56811, 149.7446, 164.18039, 401.50592, 138.84444]
2025-08-07 06:06:41,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 31.0, 31.0, 29.0, 29.0, 18.0, 29.0, 33.0, 76.0, 28.0]
2025-08-07 06:06:41,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 16 seconds)
2025-08-07 06:08:41,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:08:42,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 130.62575 ± 23.239
2025-08-07 06:08:42,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [108.239746, 163.32922, 114.01526, 102.29429, 95.6436, 132.54848, 147.38695, 141.80368, 163.30992, 137.68626]
2025-08-07 06:08:42,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 33.0, 22.0, 20.0, 19.0, 26.0, 29.0, 27.0, 31.0, 27.0]
2025-08-07 06:08:42,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 14 seconds)
2025-08-07 06:10:41,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:10:42,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 135.91953 ± 24.892
2025-08-07 06:10:42,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [102.512375, 163.3598, 124.5573, 134.48128, 103.11682, 156.99667, 168.64688, 108.0052, 133.01582, 164.50298]
2025-08-07 06:10:42,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 32.0, 24.0, 26.0, 20.0, 31.0, 33.0, 21.0, 26.0, 32.0]
2025-08-07 06:10:42,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 7 seconds)
2025-08-07 06:12:41,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:12:42,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 160.19160 ± 79.333
2025-08-07 06:12:42,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [152.31349, 168.41519, 165.87807, 106.82247, 118.45064, 148.78918, 90.60268, 379.11954, 90.543304, 180.9814]
2025-08-07 06:12:42,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 32.0, 32.0, 21.0, 23.0, 29.0, 18.0, 74.0, 18.0, 35.0]
2025-08-07 06:12:42,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 6 seconds)
2025-08-07 06:14:41,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:14:42,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 160.78464 ± 62.761
2025-08-07 06:14:42,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [331.76733, 108.57259, 145.28587, 188.44061, 165.30339, 134.11674, 149.01009, 107.30538, 168.56682, 109.4776]
2025-08-07 06:14:42,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 21.0, 28.0, 37.0, 32.0, 26.0, 29.0, 21.0, 33.0, 21.0]
2025-08-07 06:14:42,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 2 seconds)
2025-08-07 06:16:42,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:16:42,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 129.79297 ± 31.014
2025-08-07 06:16:42,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [140.83281, 165.32047, 107.89854, 157.21172, 193.23836, 119.81137, 107.73787, 107.56324, 95.937386, 102.377975]
2025-08-07 06:16:42,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 33.0, 21.0, 30.0, 37.0, 23.0, 21.0, 21.0, 19.0, 20.0]
2025-08-07 06:16:42,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 2 seconds)
2025-08-07 06:18:41,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:18:42,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 197.28529 ± 129.502
2025-08-07 06:18:42,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [112.9038, 130.41881, 180.53033, 97.105774, 425.448, 125.09232, 161.27391, 477.48624, 128.81316, 133.78049]
2025-08-07 06:18:42,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 25.0, 35.0, 19.0, 79.0, 24.0, 31.0, 96.0, 25.0, 26.0]
2025-08-07 06:18:42,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 1 second)
2025-08-07 06:20:42,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:20:43,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 183.49564 ± 102.540
2025-08-07 06:20:43,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [114.41085, 134.31297, 325.40195, 433.54742, 159.58415, 170.1648, 108.67446, 130.67838, 118.32083, 139.86066]
2025-08-07 06:20:43,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 26.0, 66.0, 89.0, 32.0, 34.0, 21.0, 25.0, 23.0, 27.0]
2025-08-07 06:20:43,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 3 seconds)
2025-08-07 06:22:42,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:22:43,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 201.85437 ± 128.374
2025-08-07 06:22:43,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [117.59747, 120.297104, 114.30602, 119.923325, 133.93294, 379.41394, 102.86969, 478.4383, 306.74686, 145.01822]
2025-08-07 06:22:43,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 22.0, 23.0, 26.0, 68.0, 20.0, 99.0, 59.0, 28.0]
2025-08-07 06:22:43,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 3 seconds)
2025-08-07 06:24:43,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:24:43,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 147.19583 ± 33.367
2025-08-07 06:24:43,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [176.47148, 161.7571, 101.619446, 103.28669, 146.7277, 117.23243, 198.99854, 134.63611, 136.96677, 194.2619]
2025-08-07 06:24:43,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 31.0, 20.0, 20.0, 28.0, 23.0, 38.0, 26.0, 26.0, 37.0]
2025-08-07 06:24:43,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 4 seconds)
2025-08-07 06:26:43,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:26:44,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 192.11014 ± 99.484
2025-08-07 06:26:44,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [162.55235, 143.67918, 143.01012, 143.84058, 354.0763, 167.04543, 158.16295, 122.46802, 418.1471, 108.11932]
2025-08-07 06:26:44,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 28.0, 28.0, 28.0, 70.0, 32.0, 31.0, 24.0, 78.0, 21.0]
2025-08-07 06:26:44,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 4 seconds)
2025-08-07 06:28:44,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:28:45,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 158.67868 ± 99.735
2025-08-07 06:28:45,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [96.31693, 153.47546, 159.90338, 125.707405, 95.98599, 113.86279, 115.35381, 164.24968, 112.575165, 449.35626]
2025-08-07 06:28:45,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 30.0, 31.0, 24.0, 19.0, 22.0, 22.0, 32.0, 22.0, 82.0]
2025-08-07 06:28:45,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 5 seconds)
2025-08-07 06:30:44,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:30:45,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 128.44675 ± 27.556
2025-08-07 06:30:45,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [119.07805, 102.55322, 123.14334, 170.94206, 113.533585, 146.93124, 84.39587, 122.363106, 177.81628, 123.71063]
2025-08-07 06:30:45,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 20.0, 24.0, 33.0, 22.0, 28.0, 17.0, 24.0, 35.0, 24.0]
2025-08-07 06:30:45,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 4 seconds)
2025-08-07 06:32:45,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:32:45,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 165.63100 ± 89.361
2025-08-07 06:32:45,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [171.16847, 185.77542, 100.9771, 135.97845, 113.76585, 107.09391, 415.557, 194.25203, 117.70235, 114.03927]
2025-08-07 06:32:45,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 36.0, 20.0, 26.0, 22.0, 21.0, 83.0, 39.0, 23.0, 22.0]
2025-08-07 06:32:45,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 3 seconds)
2025-08-07 06:34:45,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:34:46,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 117.56819 ± 19.631
2025-08-07 06:34:46,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [107.01448, 107.032036, 112.49347, 108.099266, 119.09095, 119.22092, 96.881615, 167.67776, 102.241066, 135.93028]
2025-08-07 06:34:46,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 21.0, 22.0, 21.0, 23.0, 23.0, 19.0, 33.0, 20.0, 26.0]
2025-08-07 06:34:46,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 4 seconds)
2025-08-07 06:36:45,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:36:46,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 177.75536 ± 118.941
2025-08-07 06:36:46,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [149.69798, 95.547066, 96.91799, 166.99512, 154.59981, 155.64713, 526.8532, 149.49808, 161.6809, 120.11621]
2025-08-07 06:36:46,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 19.0, 19.0, 33.0, 30.0, 30.0, 97.0, 29.0, 31.0, 24.0]
2025-08-07 06:36:46,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 3 seconds)
2025-08-07 06:38:45,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:38:46,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 134.29973 ± 17.080
2025-08-07 06:38:46,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [162.64685, 143.54524, 133.37952, 108.00269, 143.72424, 125.29944, 114.065506, 136.86375, 156.79109, 118.67894]
2025-08-07 06:38:46,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 28.0, 26.0, 21.0, 29.0, 24.0, 22.0, 26.0, 30.0, 23.0]
2025-08-07 06:38:46,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 1 second)
2025-08-07 06:40:45,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:40:46,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 147.64366 ± 29.003
2025-08-07 06:40:46,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [141.9107, 175.63628, 168.87498, 171.59442, 117.91989, 107.15001, 107.03554, 166.20457, 189.60391, 130.50638]
2025-08-07 06:40:46,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 34.0, 33.0, 33.0, 23.0, 21.0, 21.0, 32.0, 37.0, 25.0]
2025-08-07 06:40:46,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 1 second)
2025-08-07 06:42:46,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:42:46,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 212.43530 ± 132.458
2025-08-07 06:42:46,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [114.58991, 190.01146, 529.63837, 219.92844, 142.96605, 119.43276, 105.77577, 393.29028, 140.31427, 168.4058]
2025-08-07 06:42:46,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 36.0, 103.0, 42.0, 28.0, 23.0, 21.0, 74.0, 27.0, 33.0]
2025-08-07 06:42:46,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 1 second)
2025-08-07 06:44:46,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:44:47,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 134.18961 ± 25.660
2025-08-07 06:44:47,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [136.11238, 145.43057, 143.70406, 114.50299, 150.26332, 106.44835, 96.16009, 191.532, 120.00597, 137.73631]
2025-08-07 06:44:47,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 28.0, 28.0, 22.0, 29.0, 21.0, 19.0, 37.0, 23.0, 27.0]
2025-08-07 06:44:47,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes)
2025-08-07 06:46:47,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:46:47,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 162.13651 ± 89.046
2025-08-07 06:46:47,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [168.28014, 97.06653, 106.96359, 134.91624, 114.363914, 169.89952, 418.856, 122.42542, 124.628845, 163.96501]
2025-08-07 06:46:47,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 19.0, 21.0, 26.0, 22.0, 33.0, 78.0, 24.0, 24.0, 32.0]
2025-08-07 06:46:47,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes)
2025-08-07 06:48:46,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:48:47,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 129.80327 ± 17.716
2025-08-07 06:48:47,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [119.72314, 141.53365, 113.03714, 145.85963, 102.73014, 109.182846, 145.8256, 129.41705, 160.54669, 130.1767]
2025-08-07 06:48:47,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 27.0, 22.0, 28.0, 20.0, 21.0, 28.0, 25.0, 31.0, 25.0]
2025-08-07 06:48:47,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes)
2025-08-07 06:50:47,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:50:47,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 122.77262 ± 19.772
2025-08-07 06:50:47,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [157.1059, 114.640594, 108.43393, 160.73335, 130.10902, 102.259155, 120.12057, 101.820145, 118.71563, 113.7878]
2025-08-07 06:50:47,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 22.0, 21.0, 32.0, 25.0, 20.0, 23.0, 20.0, 24.0, 22.0]
2025-08-07 06:50:47,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes)
2025-08-07 06:52:47,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:52:47,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 131.29993 ± 20.481
2025-08-07 06:52:47,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [96.261116, 111.99124, 103.4457, 151.80525, 135.76347, 150.71587, 161.84445, 130.42337, 139.49501, 131.25385]
2025-08-07 06:52:47,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 22.0, 20.0, 29.0, 26.0, 29.0, 31.0, 25.0, 28.0, 25.0]
2025-08-07 06:52:47,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1251 [DEBUG]: Training session finished
