2025-05-13 09:06:38,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mda-highdim-mem24
2025-05-13 09:06:38,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mda-highdim-mem24
2025-05-13 09:06:38,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14fb7d24e590>}
2025-05-13 09:06:38,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:38,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1133 [INFO]: Creating new trainer
2025-05-13 09:06:38,759 baseline-bpql-mda-noisy-humanoid:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-13 09:06:38,759 baseline-bpql-mda-noisy-humanoid:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:38,768 baseline-bpql-mda-noisy-humanoid:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(17, 512, batch_first=True)
)
2025-05-13 09:06:39,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:39,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 1/100
2025-05-13 09:11:14,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:11:16,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 347.68689 ± 161.329
2025-05-13 09:11:16,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [261.09323, 135.64972, 440.9179, 517.57794, 354.27142, 151.32805, 582.85315, 553.433, 166.47289, 313.2715]
2025-05-13 09:11:16,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 26.0, 85.0, 102.0, 68.0, 29.0, 118.0, 116.0, 32.0, 62.0]
2025-05-13 09:11:16,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (347.69) for latency ExtremeClogL1U23
2025-05-13 09:11:16,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 7 hours, 35 minutes, 57 seconds)
2025-05-13 09:16:01,475 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:16:02,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 261.57071 ± 56.752
2025-05-13 09:16:02,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [318.24786, 152.34372, 239.15666, 292.6079, 284.88962, 302.47427, 171.65193, 235.29088, 307.5174, 311.52698]
2025-05-13 09:16:02,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 29.0, 47.0, 58.0, 54.0, 60.0, 33.0, 47.0, 60.0, 61.0]
2025-05-13 09:16:02,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 7 hours, 39 minutes, 34 seconds)
2025-05-13 09:20:48,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:20:49,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 314.68192 ± 127.083
2025-05-13 09:20:49,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [145.77882, 430.4336, 140.75706, 284.2886, 314.48932, 495.07788, 375.57712, 140.36552, 452.34537, 367.70587]
2025-05-13 09:20:49,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 84.0, 27.0, 57.0, 61.0, 98.0, 73.0, 27.0, 85.0, 74.0]
2025-05-13 09:20:49,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 7 hours, 37 minutes, 54 seconds)
2025-05-13 09:25:36,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:25:37,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 338.36752 ± 75.501
2025-05-13 09:25:37,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [305.8506, 164.53769, 377.3539, 383.3699, 274.3642, 302.91837, 445.33005, 397.2867, 353.0064, 379.65756]
2025-05-13 09:25:37,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 32.0, 72.0, 74.0, 53.0, 57.0, 81.0, 72.0, 68.0, 73.0]
2025-05-13 09:25:37,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 7 hours, 35 minutes, 12 seconds)
2025-05-13 09:30:22,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:30:24,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 323.92987 ± 100.164
2025-05-13 09:30:24,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [129.47765, 134.31464, 336.70755, 405.91193, 378.35626, 413.93747, 338.71796, 365.51596, 405.26013, 331.099]
2025-05-13 09:30:24,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 26.0, 66.0, 78.0, 74.0, 81.0, 65.0, 69.0, 79.0, 64.0]
2025-05-13 09:30:24,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 7 hours, 31 minutes, 6 seconds)
2025-05-13 09:35:11,680 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:35:12,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 307.05450 ± 119.997
2025-05-13 09:35:12,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [485.7758, 360.6247, 129.91187, 410.31656, 129.45374, 146.05359, 379.1937, 336.6582, 330.16418, 362.3928]
2025-05-13 09:35:12,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 76.0, 25.0, 75.0, 25.0, 28.0, 70.0, 63.0, 64.0, 69.0]
2025-05-13 09:35:12,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 7 hours, 30 minutes, 11 seconds)
2025-05-13 09:39:59,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:40:00,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 382.86176 ± 211.051
2025-05-13 09:40:00,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [140.3714, 322.91074, 370.96094, 305.1833, 342.12082, 192.95749, 593.58295, 192.1086, 488.35886, 880.0626]
2025-05-13 09:40:00,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 64.0, 69.0, 59.0, 62.0, 37.0, 115.0, 37.0, 96.0, 180.0]
2025-05-13 09:40:00,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (382.86) for latency ExtremeClogL1U23
2025-05-13 09:40:00,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 7 hours, 25 minutes, 49 seconds)
2025-05-13 09:44:47,674 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:44:49,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 382.08154 ± 179.190
2025-05-13 09:44:49,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [135.66289, 165.90506, 487.8155, 150.80174, 417.5155, 465.9431, 572.51385, 368.27795, 703.588, 352.79205]
2025-05-13 09:44:49,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 32.0, 90.0, 29.0, 76.0, 85.0, 107.0, 75.0, 132.0, 71.0]
2025-05-13 09:44:49,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 7 hours, 21 minutes, 31 seconds)
2025-05-13 09:49:36,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:49:38,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 384.53229 ± 164.058
2025-05-13 09:49:38,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [529.44867, 444.93323, 135.7796, 134.19229, 429.84634, 531.06604, 508.32812, 578.5478, 176.19408, 376.98676]
2025-05-13 09:49:38,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 83.0, 26.0, 26.0, 80.0, 100.0, 100.0, 113.0, 34.0, 76.0]
2025-05-13 09:49:38,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (384.53) for latency ExtremeClogL1U23
2025-05-13 09:49:38,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 7 hours, 16 minutes, 55 seconds)
2025-05-13 09:54:26,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:54:27,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 363.64178 ± 193.452
2025-05-13 09:54:27,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [471.49777, 134.1053, 150.60643, 403.81732, 674.9868, 129.75699, 559.3564, 390.26712, 551.08075, 170.94289]
2025-05-13 09:54:27,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 26.0, 29.0, 84.0, 141.0, 25.0, 110.0, 74.0, 103.0, 33.0]
2025-05-13 09:54:27,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 7 hours, 12 minutes, 59 seconds)
2025-05-13 09:59:16,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:59:18,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 454.29376 ± 206.339
2025-05-13 09:59:18,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [135.0384, 516.11053, 157.416, 529.9915, 757.5249, 568.09894, 628.53973, 177.46855, 516.68036, 556.0687]
2025-05-13 09:59:18,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 96.0, 30.0, 108.0, 145.0, 106.0, 120.0, 34.0, 100.0, 104.0]
2025-05-13 09:59:18,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (454.29) for latency ExtremeClogL1U23
2025-05-13 09:59:18,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 7 hours, 8 minutes, 43 seconds)
2025-05-13 10:04:05,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:04:06,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 342.61835 ± 137.501
2025-05-13 10:04:06,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [130.47525, 447.68616, 193.92899, 124.814476, 433.17355, 540.86, 418.77704, 406.90903, 314.48972, 415.06934]
2025-05-13 10:04:06,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 90.0, 37.0, 24.0, 80.0, 104.0, 78.0, 81.0, 60.0, 77.0]
2025-05-13 10:04:06,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 7 hours, 4 minutes, 10 seconds)
2025-05-13 10:08:55,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:08:57,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 432.94443 ± 147.400
2025-05-13 10:08:57,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [129.61772, 431.1992, 774.01807, 453.6759, 460.5518, 420.97552, 482.81528, 388.24008, 401.2093, 387.14117]
2025-05-13 10:08:57,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 85.0, 148.0, 95.0, 87.0, 77.0, 90.0, 73.0, 76.0, 71.0]
2025-05-13 10:08:57,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 59 minutes, 51 seconds)
2025-05-13 10:13:46,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:13:47,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 326.43536 ± 142.191
2025-05-13 10:13:47,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [156.0091, 468.76532, 198.63028, 483.95056, 480.61008, 140.38216, 145.59564, 457.33716, 378.6939, 354.37952]
2025-05-13 10:13:47,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 86.0, 38.0, 90.0, 92.0, 27.0, 28.0, 87.0, 71.0, 66.0]
2025-05-13 10:13:47,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 55 minutes, 25 seconds)
2025-05-13 10:18:36,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:18:37,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 327.13251 ± 146.056
2025-05-13 10:18:37,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [436.11584, 359.7395, 193.80058, 514.22186, 471.79666, 438.55536, 431.14636, 129.52666, 134.67242, 161.75003]
2025-05-13 10:18:37,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 69.0, 37.0, 104.0, 95.0, 84.0, 79.0, 25.0, 26.0, 31.0]
2025-05-13 10:18:37,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 50 minutes, 50 seconds)
2025-05-13 10:23:24,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:23:26,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 405.76904 ± 127.226
2025-05-13 10:23:26,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [454.5305, 186.23543, 501.19226, 543.3111, 502.50592, 373.13788, 479.53143, 371.00372, 485.19336, 161.04887]
2025-05-13 10:23:26,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 36.0, 94.0, 100.0, 104.0, 72.0, 99.0, 68.0, 94.0, 31.0]
2025-05-13 10:23:26,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 6 hours, 45 minutes, 26 seconds)
2025-05-13 10:28:14,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:28:16,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 383.24866 ± 160.512
2025-05-13 10:28:16,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [554.21704, 530.16296, 344.09567, 567.42236, 165.7939, 489.57565, 419.68082, 455.68356, 160.46155, 145.39331]
2025-05-13 10:28:16,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 100.0, 69.0, 106.0, 32.0, 100.0, 78.0, 86.0, 31.0, 28.0]
2025-05-13 10:28:16,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 6 hours, 41 minutes, 1 second)
2025-05-13 10:33:03,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:33:05,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 325.03360 ± 162.677
2025-05-13 10:33:05,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [130.11284, 461.96042, 573.93317, 145.75912, 436.72397, 195.60588, 172.65121, 195.1068, 507.2142, 431.2685]
2025-05-13 10:33:05,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 86.0, 106.0, 28.0, 80.0, 37.0, 33.0, 37.0, 94.0, 80.0]
2025-05-13 10:33:05,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 6 hours, 35 minutes, 48 seconds)
2025-05-13 10:37:53,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:37:55,147 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 352.64178 ± 130.619
2025-05-13 10:37:55,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [141.0598, 189.496, 440.33212, 424.4047, 437.99905, 135.55615, 452.06284, 433.93027, 464.60083, 406.97644]
2025-05-13 10:37:55,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 36.0, 83.0, 82.0, 82.0, 26.0, 86.0, 80.0, 88.0, 77.0]
2025-05-13 10:37:55,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 6 hours, 30 minutes, 53 seconds)
2025-05-13 10:42:44,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:42:46,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 450.42618 ± 168.077
2025-05-13 10:42:46,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [615.04944, 510.67294, 497.36282, 428.304, 696.7824, 445.4326, 130.29485, 172.04285, 540.2454, 468.07437]
2025-05-13 10:42:46,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 96.0, 103.0, 81.0, 137.0, 81.0, 25.0, 33.0, 100.0, 89.0]
2025-05-13 10:42:46,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 6 hours, 26 minutes, 14 seconds)
2025-05-13 10:47:32,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:47:35,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 503.93555 ± 60.755
2025-05-13 10:47:35,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [496.6941, 531.4709, 491.40298, 468.4126, 450.61768, 501.90735, 651.287, 557.3427, 431.88824, 458.33218]
2025-05-13 10:47:35,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 102.0, 91.0, 86.0, 87.0, 97.0, 123.0, 104.0, 82.0, 86.0]
2025-05-13 10:47:35,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (503.94) for latency ExtremeClogL1U23
2025-05-13 10:47:35,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 6 hours, 21 minutes, 34 seconds)
2025-05-13 10:52:22,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:52:24,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 451.28876 ± 123.508
2025-05-13 10:52:24,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [542.25037, 430.85437, 437.8193, 386.5521, 584.5168, 130.06972, 560.50073, 435.6989, 478.1287, 526.4966]
2025-05-13 10:52:24,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 83.0, 82.0, 82.0, 116.0, 25.0, 106.0, 81.0, 88.0, 100.0]
2025-05-13 10:52:24,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 6 hours, 16 minutes, 38 seconds)
2025-05-13 10:57:14,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:57:15,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 390.68423 ± 127.278
2025-05-13 10:57:15,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [405.64938, 440.8785, 168.38676, 119.402664, 433.49805, 493.82025, 485.74088, 463.5876, 410.07254, 485.80557]
2025-05-13 10:57:15,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 81.0, 32.0, 23.0, 82.0, 92.0, 89.0, 95.0, 77.0, 90.0]
2025-05-13 10:57:15,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 6 hours, 12 minutes, 22 seconds)
2025-05-13 11:02:05,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:02:07,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 389.94193 ± 161.788
2025-05-13 11:02:07,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [172.38176, 524.91925, 444.01044, 156.06416, 448.36044, 405.0162, 521.33417, 631.76605, 443.0579, 152.50868]
2025-05-13 11:02:07,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 98.0, 82.0, 30.0, 83.0, 76.0, 96.0, 121.0, 83.0, 29.0]
2025-05-13 11:02:07,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 6 hours, 7 minutes, 52 seconds)
2025-05-13 11:06:54,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:06:56,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 398.89642 ± 164.156
2025-05-13 11:06:56,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [540.65796, 382.34708, 140.43025, 428.53342, 526.5183, 483.31702, 531.1514, 602.8535, 171.02638, 182.12883]
2025-05-13 11:06:56,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 75.0, 27.0, 80.0, 99.0, 91.0, 101.0, 112.0, 33.0, 35.0]
2025-05-13 11:06:56,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 6 hours, 2 minutes, 35 seconds)
2025-05-13 11:11:47,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:11:48,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 410.03156 ± 132.096
2025-05-13 11:11:48,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [495.98642, 435.86884, 537.6096, 371.28674, 523.74896, 508.2872, 491.72363, 150.55418, 403.3128, 181.93706]
2025-05-13 11:11:48,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 85.0, 100.0, 71.0, 111.0, 94.0, 90.0, 29.0, 75.0, 35.0]
2025-05-13 11:11:48,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 58 minutes, 34 seconds)
2025-05-13 11:16:34,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:16:36,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 397.39575 ± 175.092
2025-05-13 11:16:36,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [146.11137, 586.67053, 652.19965, 400.30222, 506.34195, 145.7904, 464.89767, 504.32834, 170.44997, 396.86545]
2025-05-13 11:16:36,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 110.0, 137.0, 75.0, 93.0, 28.0, 85.0, 103.0, 33.0, 75.0]
2025-05-13 11:16:36,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 53 minutes, 11 seconds)
2025-05-13 11:21:24,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:21:26,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 469.69394 ± 132.493
2025-05-13 11:21:26,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [561.9664, 517.4884, 514.66815, 381.86185, 534.2889, 130.29794, 496.09476, 413.43774, 501.7703, 645.06494]
2025-05-13 11:21:26,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 103.0, 99.0, 71.0, 99.0, 25.0, 94.0, 76.0, 93.0, 132.0]
2025-05-13 11:21:26,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 48 minutes, 14 seconds)
2025-05-13 11:26:16,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:26:18,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 379.31754 ± 161.249
2025-05-13 11:26:18,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [166.42674, 393.42975, 466.5175, 464.87515, 157.14162, 462.1705, 388.20193, 145.03745, 497.1184, 652.256]
2025-05-13 11:26:18,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 74.0, 87.0, 92.0, 30.0, 87.0, 73.0, 28.0, 92.0, 126.0]
2025-05-13 11:26:18,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 43 minutes, 24 seconds)
2025-05-13 11:31:04,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:31:06,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 359.30673 ± 237.680
2025-05-13 11:31:06,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [139.64966, 757.98676, 135.78525, 461.9424, 151.46594, 436.57144, 161.05933, 484.03973, 130.16766, 734.3992]
2025-05-13 11:31:06,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 153.0, 26.0, 86.0, 29.0, 81.0, 31.0, 91.0, 25.0, 140.0]
2025-05-13 11:31:06,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 38 minutes, 17 seconds)
2025-05-13 11:35:52,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:35:54,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 401.13382 ± 130.116
2025-05-13 11:35:54,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [465.95474, 530.8995, 125.616, 449.91617, 432.33322, 494.10687, 175.75159, 457.08124, 481.2569, 398.42184]
2025-05-13 11:35:54,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 99.0, 24.0, 84.0, 79.0, 93.0, 34.0, 85.0, 90.0, 75.0]
2025-05-13 11:35:54,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 32 minutes, 31 seconds)
2025-05-13 11:40:42,310 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:40:44,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 483.79657 ± 128.057
2025-05-13 11:40:44,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [663.86646, 464.67487, 527.8491, 496.81204, 461.34888, 483.86356, 580.7055, 508.57178, 140.43303, 509.84018]
2025-05-13 11:40:44,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 88.0, 99.0, 92.0, 89.0, 90.0, 111.0, 97.0, 27.0, 93.0]
2025-05-13 11:40:44,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 5 hours, 28 minutes, 13 seconds)
2025-05-13 11:45:33,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:45:35,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 390.93567 ± 171.574
2025-05-13 11:45:35,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [448.09756, 167.13982, 135.32959, 592.28577, 145.25122, 537.68024, 614.1262, 413.66037, 418.13068, 437.65518]
2025-05-13 11:45:35,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 32.0, 26.0, 113.0, 28.0, 100.0, 121.0, 76.0, 78.0, 85.0]
2025-05-13 11:45:35,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 5 hours, 23 minutes, 28 seconds)
2025-05-13 11:50:25,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:50:26,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 368.68149 ± 148.622
2025-05-13 11:50:26,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [464.6056, 134.9021, 535.7109, 396.53937, 493.73767, 165.30862, 433.39227, 488.784, 427.2261, 146.60846]
2025-05-13 11:50:26,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 26.0, 101.0, 73.0, 92.0, 32.0, 80.0, 92.0, 78.0, 28.0]
2025-05-13 11:50:26,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 5 hours, 18 minutes, 38 seconds)
2025-05-13 11:55:11,961 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:55:13,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 465.39777 ± 128.825
2025-05-13 11:55:13,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [611.60443, 568.0669, 417.7858, 544.54517, 517.93396, 134.92259, 467.43604, 446.29614, 553.5051, 391.8808]
2025-05-13 11:55:13,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 110.0, 82.0, 102.0, 98.0, 26.0, 88.0, 85.0, 112.0, 72.0]
2025-05-13 11:55:13,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 5 hours, 13 minutes, 39 seconds)
2025-05-13 12:00:00,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:00:01,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 387.31094 ± 163.295
2025-05-13 12:00:01,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [119.603226, 453.34967, 463.66803, 481.5371, 135.71776, 523.6021, 173.08176, 563.50464, 472.57104, 486.4741]
2025-05-13 12:00:01,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 84.0, 84.0, 90.0, 26.0, 97.0, 33.0, 103.0, 101.0, 91.0]
2025-05-13 12:00:01,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 5 hours, 8 minutes, 42 seconds)
2025-05-13 12:04:50,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:04:52,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 428.40576 ± 215.476
2025-05-13 12:04:52,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [824.0156, 435.08908, 510.75174, 151.84132, 173.16092, 586.7364, 124.282135, 429.5378, 632.4112, 416.23154]
2025-05-13 12:04:52,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [168.0, 83.0, 103.0, 29.0, 33.0, 119.0, 24.0, 81.0, 129.0, 75.0]
2025-05-13 12:04:52,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 5 hours, 4 minutes, 5 seconds)
2025-05-13 12:09:37,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:09:39,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 375.65436 ± 142.156
2025-05-13 12:09:39,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [156.77876, 506.21744, 155.57384, 188.13167, 450.965, 386.3325, 466.91745, 420.9711, 510.98895, 513.66705]
2025-05-13 12:09:39,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 95.0, 30.0, 36.0, 95.0, 71.0, 86.0, 79.0, 94.0, 95.0]
2025-05-13 12:09:39,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 58 minutes, 27 seconds)
2025-05-13 12:14:24,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:14:26,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 403.73038 ± 180.504
2025-05-13 12:14:26,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [496.30444, 146.40462, 454.77097, 524.23517, 500.69052, 491.7199, 480.78546, 666.9586, 135.67375, 139.76024]
2025-05-13 12:14:26,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 28.0, 83.0, 96.0, 92.0, 90.0, 91.0, 125.0, 26.0, 27.0]
2025-05-13 12:14:26,507 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 52 minutes, 44 seconds)
2025-05-13 12:19:13,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:19:15,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 466.56680 ± 124.613
2025-05-13 12:19:15,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [511.3597, 489.05505, 524.759, 524.76184, 649.3861, 390.30557, 522.55206, 493.5642, 408.9113, 151.01347]
2025-05-13 12:19:15,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 92.0, 97.0, 99.0, 124.0, 74.0, 97.0, 101.0, 77.0, 29.0]
2025-05-13 12:19:15,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 48 minutes, 20 seconds)
2025-05-13 12:24:01,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:24:03,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 418.36890 ± 161.048
2025-05-13 12:24:03,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [125.13765, 584.40234, 139.67233, 432.77545, 502.37653, 423.00925, 462.23383, 607.3245, 555.30005, 351.45688]
2025-05-13 12:24:03,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 109.0, 27.0, 82.0, 93.0, 79.0, 91.0, 118.0, 105.0, 68.0]
2025-05-13 12:24:03,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 43 minutes, 28 seconds)
2025-05-13 12:28:48,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:28:49,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 313.68256 ± 176.324
2025-05-13 12:28:49,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [482.71426, 459.23645, 472.92032, 129.72308, 156.52576, 151.62025, 141.59375, 412.48062, 595.59076, 134.42029]
2025-05-13 12:28:49,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 84.0, 86.0, 25.0, 30.0, 29.0, 27.0, 77.0, 110.0, 26.0]
2025-05-13 12:28:49,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 37 minutes, 56 seconds)
2025-05-13 12:33:36,796 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:33:38,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 436.38321 ± 169.460
2025-05-13 12:33:38,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [492.17673, 416.80405, 675.42206, 448.8017, 448.2302, 446.7913, 686.21204, 448.16107, 161.77362, 139.45953]
2025-05-13 12:33:38,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 76.0, 140.0, 84.0, 87.0, 82.0, 132.0, 84.0, 31.0, 27.0]
2025-05-13 12:33:38,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 33 minutes, 26 seconds)
2025-05-13 12:38:22,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:38:24,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 424.98495 ± 189.386
2025-05-13 12:38:24,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [145.16678, 488.31134, 514.25946, 533.8584, 528.6916, 175.417, 372.2492, 747.6056, 560.62524, 183.66502]
2025-05-13 12:38:24,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 93.0, 95.0, 101.0, 99.0, 34.0, 72.0, 143.0, 109.0, 35.0]
2025-05-13 12:38:24,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 28 minutes, 24 seconds)
2025-05-13 12:43:10,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:43:12,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 362.01080 ± 178.905
2025-05-13 12:43:12,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [151.2253, 145.81192, 471.39133, 385.37064, 634.2645, 173.2017, 186.91644, 622.5594, 414.24445, 435.1224]
2025-05-13 12:43:12,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 28.0, 88.0, 72.0, 120.0, 33.0, 36.0, 117.0, 79.0, 80.0]
2025-05-13 12:43:12,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 4 hours, 23 minutes, 21 seconds)
2025-05-13 12:47:57,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:47:59,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 439.76715 ± 188.984
2025-05-13 12:47:59,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [342.43716, 354.43466, 546.54614, 565.0772, 571.6019, 484.9158, 771.6143, 141.25815, 484.39972, 135.38628]
2025-05-13 12:47:59,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 68.0, 103.0, 104.0, 107.0, 93.0, 146.0, 27.0, 92.0, 26.0]
2025-05-13 12:47:59,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 4 hours, 18 minutes, 36 seconds)
2025-05-13 12:52:45,806 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:52:47,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 318.61197 ± 210.405
2025-05-13 12:52:47,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [151.68134, 297.63098, 130.60535, 130.56621, 448.51633, 130.14645, 715.3701, 420.56616, 139.87663, 621.15985]
2025-05-13 12:52:47,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 57.0, 25.0, 25.0, 84.0, 25.0, 144.0, 78.0, 27.0, 118.0]
2025-05-13 12:52:47,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 4 hours, 13 minutes, 53 seconds)
2025-05-13 12:57:32,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:57:33,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 343.55969 ± 156.014
2025-05-13 12:57:33,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [181.95125, 431.39655, 539.35815, 185.58081, 125.534546, 524.99225, 156.23111, 361.33148, 488.1594, 441.0615]
2025-05-13 12:57:33,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 80.0, 102.0, 36.0, 24.0, 98.0, 30.0, 68.0, 89.0, 81.0]
2025-05-13 12:57:33,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 4 hours, 8 minutes, 44 seconds)
2025-05-13 13:02:21,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:02:22,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 353.95331 ± 182.386
2025-05-13 13:02:22,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [145.84174, 482.69037, 130.41971, 563.20355, 417.22607, 498.01947, 125.03653, 551.04517, 140.32004, 485.7306]
2025-05-13 13:02:22,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 90.0, 25.0, 106.0, 78.0, 94.0, 24.0, 108.0, 27.0, 92.0]
2025-05-13 13:02:22,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 4 hours, 4 minutes, 30 seconds)
2025-05-13 13:07:06,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:07:07,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 329.57391 ± 173.099
2025-05-13 13:07:07,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [154.9317, 166.9419, 413.4623, 140.41997, 191.551, 505.6685, 611.6967, 466.62256, 162.03, 482.41434]
2025-05-13 13:07:07,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 32.0, 77.0, 27.0, 37.0, 103.0, 115.0, 86.0, 31.0, 100.0]
2025-05-13 13:07:07,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 59 minutes, 14 seconds)
2025-05-13 13:11:55,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:11:56,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 444.16098 ± 136.472
2025-05-13 13:11:56,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [534.28345, 506.98682, 696.4247, 520.40735, 414.4336, 471.07224, 150.59859, 438.4069, 374.83984, 334.15646]
2025-05-13 13:11:56,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 96.0, 145.0, 102.0, 78.0, 88.0, 29.0, 81.0, 70.0, 67.0]
2025-05-13 13:11:56,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 54 minutes, 43 seconds)
2025-05-13 13:16:42,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:16:44,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 399.24490 ± 126.647
2025-05-13 13:16:44,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [140.92468, 417.3789, 474.6128, 416.0261, 431.73926, 175.99239, 493.9046, 466.6211, 426.98688, 548.2624]
2025-05-13 13:16:44,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 81.0, 89.0, 86.0, 80.0, 34.0, 91.0, 91.0, 82.0, 102.0]
2025-05-13 13:16:44,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 49 minutes, 56 seconds)
2025-05-13 13:21:29,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:21:31,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 315.56177 ± 133.660
2025-05-13 13:21:31,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [372.32495, 413.03668, 527.0106, 418.50998, 161.02014, 170.67447, 390.37277, 395.9508, 130.31393, 176.4034]
2025-05-13 13:21:31,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 77.0, 96.0, 80.0, 31.0, 33.0, 75.0, 77.0, 25.0, 34.0]
2025-05-13 13:21:31,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 45 minutes, 11 seconds)
2025-05-13 13:26:17,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:26:19,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 431.78375 ± 112.610
2025-05-13 13:26:19,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [407.5958, 415.17688, 191.65039, 427.21735, 353.96448, 473.33273, 421.53113, 456.70074, 664.70807, 505.96005]
2025-05-13 13:26:19,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 78.0, 37.0, 79.0, 66.0, 91.0, 80.0, 88.0, 137.0, 110.0]
2025-05-13 13:26:19,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 40 minutes, 19 seconds)
2025-05-13 13:31:06,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:31:08,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 328.52087 ± 158.175
2025-05-13 13:31:08,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [130.86667, 140.77803, 512.12933, 447.2173, 338.5161, 468.2194, 160.57619, 509.61536, 139.84811, 437.44217]
2025-05-13 13:31:08,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 27.0, 98.0, 90.0, 64.0, 90.0, 31.0, 102.0, 27.0, 84.0]
2025-05-13 13:31:08,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 36 minutes, 5 seconds)
2025-05-13 13:35:52,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:35:54,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 412.89606 ± 158.973
2025-05-13 13:35:54,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [393.27774, 150.65282, 608.849, 140.08815, 420.78983, 461.1582, 446.7657, 513.69525, 350.29526, 643.38873]
2025-05-13 13:35:54,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 29.0, 116.0, 27.0, 84.0, 88.0, 90.0, 102.0, 66.0, 120.0]
2025-05-13 13:35:54,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 30 minutes, 49 seconds)
2025-05-13 13:40:41,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:40:42,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 301.46506 ± 164.991
2025-05-13 13:40:42,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [372.38345, 405.52405, 140.30064, 405.31992, 511.711, 130.59285, 155.65335, 145.17393, 586.12085, 161.8706]
2025-05-13 13:40:42,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 77.0, 27.0, 76.0, 97.0, 25.0, 30.0, 28.0, 112.0, 31.0]
2025-05-13 13:40:42,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 26 minutes, 8 seconds)
2025-05-13 13:45:28,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:45:29,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 295.83844 ± 175.062
2025-05-13 13:45:29,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [129.1653, 532.3585, 129.69598, 135.59952, 313.95786, 135.40326, 512.3476, 501.3687, 448.57556, 119.91201]
2025-05-13 13:45:29,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 99.0, 25.0, 26.0, 61.0, 26.0, 106.0, 94.0, 85.0, 23.0]
2025-05-13 13:45:29,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 21 minutes, 24 seconds)
2025-05-13 13:50:16,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:50:18,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 367.49268 ± 180.561
2025-05-13 13:50:18,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [150.91524, 151.98697, 515.5344, 152.13571, 528.42206, 499.95844, 587.1266, 419.105, 519.13165, 150.61064]
2025-05-13 13:50:18,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 29.0, 95.0, 29.0, 100.0, 91.0, 111.0, 80.0, 95.0, 29.0]
2025-05-13 13:50:18,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 3 hours, 16 minutes, 35 seconds)
2025-05-13 13:55:03,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:55:05,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 400.11670 ± 154.873
2025-05-13 13:55:05,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [426.51688, 574.3735, 471.70786, 530.18024, 524.64166, 172.00406, 178.44318, 404.33234, 542.1732, 176.79382]
2025-05-13 13:55:05,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 113.0, 94.0, 97.0, 97.0, 33.0, 34.0, 76.0, 101.0, 34.0]
2025-05-13 13:55:05,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 3 hours, 11 minutes, 38 seconds)
2025-05-13 13:59:53,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:59:55,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 446.21112 ± 177.769
2025-05-13 13:59:55,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [670.6745, 135.03505, 563.4811, 466.85953, 411.00803, 369.79016, 135.2452, 579.211, 516.7074, 614.09973]
2025-05-13 13:59:55,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 26.0, 115.0, 85.0, 80.0, 71.0, 26.0, 109.0, 96.0, 114.0]
2025-05-13 13:59:55,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 3 hours, 7 minutes, 21 seconds)
2025-05-13 14:04:41,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:04:42,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 451.49384 ± 180.740
2025-05-13 14:04:42,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [125.00586, 150.8676, 375.11182, 504.31854, 673.9806, 527.5812, 494.5752, 409.4642, 643.5425, 610.4913]
2025-05-13 14:04:42,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 29.0, 74.0, 96.0, 133.0, 97.0, 99.0, 78.0, 120.0, 121.0]
2025-05-13 14:04:42,912 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 3 hours, 2 minutes, 27 seconds)
2025-05-13 14:09:29,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:09:31,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 440.20343 ± 164.849
2025-05-13 14:09:31,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [454.98605, 683.40356, 470.0096, 578.34344, 476.90167, 526.44696, 135.65376, 425.26, 146.14069, 504.88818]
2025-05-13 14:09:31,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 129.0, 86.0, 110.0, 91.0, 100.0, 26.0, 79.0, 28.0, 96.0]
2025-05-13 14:09:31,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 57 minutes, 46 seconds)
2025-05-13 14:14:18,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:14:19,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 381.29553 ± 162.179
2025-05-13 14:14:19,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [487.61136, 134.90625, 524.25446, 433.77676, 525.26636, 135.59586, 454.45413, 493.09067, 139.85512, 484.14435]
2025-05-13 14:14:19,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 26.0, 96.0, 80.0, 107.0, 26.0, 87.0, 91.0, 27.0, 88.0]
2025-05-13 14:14:19,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 53 minutes, 1 second)
2025-05-13 14:19:07,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:19:09,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 453.89764 ± 104.059
2025-05-13 14:19:09,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [536.6143, 526.263, 435.03867, 181.78174, 505.67978, 447.30768, 363.87, 511.44043, 509.40936, 521.5715]
2025-05-13 14:19:09,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 101.0, 88.0, 35.0, 93.0, 85.0, 75.0, 97.0, 94.0, 98.0]
2025-05-13 14:19:09,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 48 minutes, 26 seconds)
2025-05-13 14:23:56,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:23:58,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 563.36279 ± 96.114
2025-05-13 14:23:58,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [397.71182, 666.7931, 594.1213, 427.34235, 487.7105, 555.37646, 652.0218, 704.3255, 596.96796, 551.25775]
2025-05-13 14:23:58,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 128.0, 112.0, 80.0, 92.0, 107.0, 122.0, 132.0, 111.0, 109.0]
2025-05-13 14:23:58,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (563.36) for latency ExtremeClogL1U23
2025-05-13 14:23:58,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 43 minutes, 35 seconds)
2025-05-13 14:28:43,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:28:45,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 421.71320 ± 177.065
2025-05-13 14:28:45,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [171.37924, 694.51953, 479.2028, 574.4751, 174.88354, 161.78874, 505.2055, 520.69763, 454.37762, 480.602]
2025-05-13 14:28:45,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 129.0, 89.0, 115.0, 34.0, 31.0, 94.0, 98.0, 85.0, 91.0]
2025-05-13 14:28:45,481 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 38 minutes, 40 seconds)
2025-05-13 14:33:30,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:33:32,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 375.32617 ± 238.208
2025-05-13 14:33:32,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [815.0405, 478.89902, 130.09094, 639.40295, 163.01079, 551.25214, 150.75229, 500.86426, 151.49495, 172.45374]
2025-05-13 14:33:32,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 89.0, 25.0, 118.0, 31.0, 102.0, 29.0, 93.0, 29.0, 33.0]
2025-05-13 14:33:32,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 33 minutes, 42 seconds)
2025-05-13 14:38:19,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:38:21,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 346.63034 ± 198.215
2025-05-13 14:38:21,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [176.26923, 519.1432, 455.69693, 140.56047, 167.46248, 140.25755, 575.0931, 605.44104, 550.3781, 136.0015]
2025-05-13 14:38:21,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 98.0, 84.0, 27.0, 32.0, 27.0, 105.0, 114.0, 103.0, 26.0]
2025-05-13 14:38:21,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 28 minutes, 55 seconds)
2025-05-13 14:43:07,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:43:09,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 463.12421 ± 163.700
2025-05-13 14:43:09,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [162.8899, 516.2477, 646.2404, 166.02086, 494.46292, 438.122, 632.5276, 442.32004, 590.9361, 541.4746]
2025-05-13 14:43:09,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 94.0, 122.0, 32.0, 91.0, 82.0, 120.0, 83.0, 120.0, 99.0]
2025-05-13 14:43:09,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 24 minutes)
2025-05-13 14:47:54,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:47:56,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 524.25470 ± 75.876
2025-05-13 14:47:56,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [491.63666, 506.99396, 522.2187, 584.5963, 442.59326, 491.05865, 678.0239, 464.25262, 437.12732, 624.0452]
2025-05-13 14:47:56,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 94.0, 97.0, 117.0, 82.0, 100.0, 126.0, 85.0, 88.0, 118.0]
2025-05-13 14:47:56,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 18 minutes, 59 seconds)
2025-05-13 14:52:42,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:52:44,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 384.33765 ± 158.823
2025-05-13 14:52:44,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [495.92914, 440.08917, 562.73676, 411.52625, 448.97083, 159.92043, 145.4153, 146.40129, 492.6337, 539.7536]
2025-05-13 14:52:44,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 86.0, 103.0, 84.0, 84.0, 31.0, 28.0, 28.0, 92.0, 99.0]
2025-05-13 14:52:44,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 14 minutes, 18 seconds)
2025-05-13 14:57:31,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:57:33,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 458.64731 ± 127.600
2025-05-13 14:57:33,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [495.51898, 395.38858, 145.97906, 523.9027, 541.66516, 436.10602, 429.8636, 671.7811, 499.24496, 447.02286]
2025-05-13 14:57:33,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 74.0, 28.0, 98.0, 99.0, 81.0, 79.0, 127.0, 99.0, 87.0]
2025-05-13 14:57:33,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 9 minutes, 42 seconds)
2025-05-13 15:02:20,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:02:21,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 400.62234 ± 216.504
2025-05-13 15:02:21,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [453.0574, 766.0923, 156.1745, 157.34247, 593.0991, 135.30957, 533.41235, 508.73938, 541.0967, 161.8997]
2025-05-13 15:02:21,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 146.0, 30.0, 30.0, 118.0, 26.0, 109.0, 94.0, 111.0, 31.0]
2025-05-13 15:02:21,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 2 hours, 4 minutes, 51 seconds)
2025-05-13 15:07:08,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:07:10,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 403.97467 ± 161.161
2025-05-13 15:07:10,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [155.46046, 548.2846, 565.94244, 550.17444, 446.88428, 170.46739, 411.66196, 176.40962, 507.4677, 506.99368]
2025-05-13 15:07:10,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 106.0, 105.0, 103.0, 82.0, 33.0, 78.0, 34.0, 96.0, 95.0]
2025-05-13 15:07:10,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 2 hours, 3 seconds)
2025-05-13 15:11:57,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:11:59,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 416.71857 ± 196.484
2025-05-13 15:11:59,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [465.4288, 606.79486, 428.69284, 146.21841, 139.61568, 519.1749, 166.82672, 480.02316, 452.38647, 762.02344]
2025-05-13 15:11:59,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 113.0, 80.0, 28.0, 27.0, 98.0, 32.0, 89.0, 84.0, 143.0]
2025-05-13 15:11:59,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 55 minutes, 24 seconds)
2025-05-13 15:16:43,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:16:45,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 456.68390 ± 220.468
2025-05-13 15:16:45,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [477.29343, 130.14732, 642.048, 644.4085, 569.3561, 680.1643, 139.86691, 470.3596, 140.89769, 672.2971]
2025-05-13 15:16:45,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 25.0, 120.0, 121.0, 114.0, 128.0, 27.0, 89.0, 27.0, 125.0]
2025-05-13 15:16:45,834 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 50 minutes, 30 seconds)
2025-05-13 15:21:34,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:21:36,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 442.26108 ± 156.887
2025-05-13 15:21:36,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [428.11456, 595.8181, 424.22528, 161.29472, 484.6827, 620.5829, 481.51465, 145.85075, 561.2376, 519.2896]
2025-05-13 15:21:36,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 120.0, 81.0, 31.0, 92.0, 117.0, 90.0, 28.0, 107.0, 97.0]
2025-05-13 15:21:36,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 45 minutes, 51 seconds)
2025-05-13 15:26:21,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:26:22,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 377.47516 ± 203.520
2025-05-13 15:26:22,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [387.04453, 569.1514, 573.9842, 160.5412, 533.23315, 146.3033, 559.04803, 601.66077, 114.007645, 129.77794]
2025-05-13 15:26:22,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 107.0, 105.0, 31.0, 113.0, 28.0, 103.0, 113.0, 22.0, 25.0]
2025-05-13 15:26:22,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 40 minutes, 52 seconds)
2025-05-13 15:31:10,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:31:12,644 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 450.81332 ± 209.133
2025-05-13 15:31:12,644 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [559.0078, 125.515076, 428.29047, 171.48004, 178.10194, 619.6842, 751.2564, 653.01465, 515.8491, 505.93375]
2025-05-13 15:31:12,644 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 24.0, 82.0, 33.0, 34.0, 115.0, 143.0, 121.0, 106.0, 106.0]
2025-05-13 15:31:12,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 36 minutes, 9 seconds)
2025-05-13 15:35:59,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:36:02,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 540.89886 ± 96.962
2025-05-13 15:36:02,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [592.19147, 589.3432, 627.49835, 678.39166, 530.36505, 606.8684, 408.32758, 388.7276, 574.1788, 413.09665]
2025-05-13 15:36:02,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 114.0, 120.0, 128.0, 98.0, 124.0, 76.0, 79.0, 109.0, 79.0]
2025-05-13 15:36:02,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 31 minutes, 22 seconds)
2025-05-13 15:40:51,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:40:53,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 539.75427 ± 52.781
2025-05-13 15:40:53,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [568.686, 496.00064, 492.9389, 512.5886, 568.52826, 533.6987, 535.8892, 619.4929, 621.1481, 448.57144]
2025-05-13 15:40:53,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 94.0, 91.0, 99.0, 106.0, 98.0, 98.0, 116.0, 117.0, 84.0]
2025-05-13 15:40:53,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 26 minutes, 52 seconds)
2025-05-13 15:45:42,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:45:44,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 488.37457 ± 252.039
2025-05-13 15:45:44,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [150.75931, 613.23883, 496.5521, 536.82825, 686.93494, 146.35994, 162.14224, 952.2278, 647.2785, 491.42386]
2025-05-13 15:45:44,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 129.0, 91.0, 101.0, 140.0, 28.0, 31.0, 193.0, 121.0, 94.0]
2025-05-13 15:45:44,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 22 minutes, 1 second)
2025-05-13 15:50:33,931 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:50:35,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 474.74933 ± 119.779
2025-05-13 15:50:35,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [535.26605, 145.63182, 546.84094, 483.079, 506.74698, 604.646, 522.5491, 496.02026, 405.01688, 501.69604]
2025-05-13 15:50:35,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 28.0, 101.0, 89.0, 95.0, 118.0, 102.0, 94.0, 75.0, 100.0]
2025-05-13 15:50:35,874 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 17 minutes, 29 seconds)
2025-05-13 15:55:23,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:55:25,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 484.40411 ± 226.811
2025-05-13 15:55:25,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [140.31696, 166.3609, 640.0812, 511.86325, 156.92963, 638.3241, 772.3634, 544.58374, 691.7961, 581.4218]
2025-05-13 15:55:25,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 32.0, 120.0, 96.0, 30.0, 129.0, 149.0, 100.0, 129.0, 108.0]
2025-05-13 15:55:25,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 12 minutes, 39 seconds)
2025-05-13 16:00:15,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:00:17,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 507.71475 ± 197.358
2025-05-13 16:00:17,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [554.63745, 562.30524, 124.559784, 622.3723, 484.20953, 651.0621, 161.54555, 585.4052, 782.975, 548.07526]
2025-05-13 16:00:17,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 104.0, 24.0, 116.0, 88.0, 124.0, 31.0, 114.0, 150.0, 102.0]
2025-05-13 16:00:17,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 7 minutes, 54 seconds)
2025-05-13 16:05:07,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:05:09,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 473.50366 ± 225.128
2025-05-13 16:05:09,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [670.11414, 617.3825, 619.9872, 141.30472, 590.76056, 139.55574, 135.72177, 707.2708, 619.2105, 493.7287]
2025-05-13 16:05:09,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 118.0, 116.0, 27.0, 111.0, 27.0, 26.0, 135.0, 117.0, 90.0]
2025-05-13 16:05:09,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 1 hour, 3 minutes, 5 seconds)
2025-05-13 16:09:53,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:09:55,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 545.00739 ± 201.173
2025-05-13 16:09:55,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [735.67523, 663.7122, 540.1107, 155.91895, 638.8136, 560.1384, 754.0914, 552.87067, 670.26996, 178.4726]
2025-05-13 16:09:55,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 124.0, 110.0, 30.0, 119.0, 102.0, 162.0, 104.0, 127.0, 34.0]
2025-05-13 16:09:55,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 58 minutes, 4 seconds)
2025-05-13 16:14:33,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:14:35,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 420.71957 ± 206.305
2025-05-13 16:14:35,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [151.04109, 422.35535, 171.19656, 426.6448, 570.5422, 529.6116, 166.4153, 447.1432, 855.1993, 467.04645]
2025-05-13 16:14:35,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 78.0, 33.0, 80.0, 108.0, 99.0, 32.0, 84.0, 166.0, 87.0]
2025-05-13 16:14:35,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 52 minutes, 47 seconds)
2025-05-13 16:19:08,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:19:09,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 422.64117 ± 181.694
2025-05-13 16:19:09,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [501.12778, 151.54102, 673.09607, 457.48767, 177.12825, 176.45958, 577.0848, 385.74344, 583.6694, 543.07367]
2025-05-13 16:19:09,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 29.0, 127.0, 84.0, 34.0, 34.0, 116.0, 74.0, 109.0, 103.0]
2025-05-13 16:19:09,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 47 minutes, 28 seconds)
2025-05-13 16:23:46,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:23:48,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 501.04776 ± 179.426
2025-05-13 16:23:48,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [171.17818, 718.91614, 182.51656, 440.84534, 595.382, 624.3527, 615.5032, 468.40558, 641.43854, 551.93945]
2025-05-13 16:23:48,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 135.0, 35.0, 82.0, 110.0, 119.0, 113.0, 88.0, 134.0, 107.0]
2025-05-13 16:23:48,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 42 minutes, 19 seconds)
2025-05-13 16:28:54,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:28:56,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 447.89420 ± 210.148
2025-05-13 16:28:56,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [618.07324, 629.6806, 537.7858, 453.08698, 130.13748, 140.49596, 474.34753, 171.74803, 723.3631, 600.22296]
2025-05-13 16:28:56,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 117.0, 105.0, 83.0, 25.0, 27.0, 87.0, 33.0, 135.0, 115.0]
2025-05-13 16:28:56,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 38 minutes, 2 seconds)
2025-05-13 16:33:47,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:33:49,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 460.84229 ± 263.626
2025-05-13 16:33:49,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [144.48448, 140.85796, 176.98537, 497.00424, 709.74646, 585.5924, 151.68445, 676.8695, 696.4249, 828.7727]
2025-05-13 16:33:49,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 27.0, 34.0, 90.0, 138.0, 110.0, 29.0, 129.0, 134.0, 158.0]
2025-05-13 16:33:50,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 33 minutes, 27 seconds)
2025-05-13 16:39:13,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:39:15,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 417.53165 ± 187.124
2025-05-13 16:39:15,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [145.34254, 130.30382, 431.8743, 564.2964, 604.5587, 586.0021, 467.69772, 454.88446, 622.719, 167.63725]
2025-05-13 16:39:15,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 25.0, 81.0, 103.0, 127.0, 111.0, 89.0, 87.0, 115.0, 32.0]
2025-05-13 16:39:15,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 29 minutes, 35 seconds)
2025-05-13 16:44:23,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:44:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 466.99561 ± 196.330
2025-05-13 16:44:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [526.616, 657.8518, 380.04776, 613.04236, 530.48315, 172.70667, 374.44244, 509.10498, 124.80139, 780.85913]
2025-05-13 16:44:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 122.0, 71.0, 114.0, 100.0, 33.0, 75.0, 96.0, 24.0, 148.0]
2025-05-13 16:44:25,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 25 minutes, 15 seconds)
2025-05-13 16:49:27,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:49:29,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 462.15338 ± 126.627
2025-05-13 16:49:29,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [493.41275, 402.76083, 542.41473, 508.3542, 633.64044, 139.68625, 527.0582, 443.85236, 396.97238, 533.38165]
2025-05-13 16:49:29,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 85.0, 101.0, 109.0, 120.0, 27.0, 96.0, 82.0, 72.0, 102.0]
2025-05-13 16:49:29,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 20 minutes, 32 seconds)
2025-05-13 16:54:33,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:54:35,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 409.73761 ± 217.007
2025-05-13 16:54:35,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [187.92888, 470.65717, 617.7404, 129.70366, 639.8045, 485.7533, 167.83951, 521.59863, 725.38, 150.96985]
2025-05-13 16:54:35,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 87.0, 120.0, 25.0, 126.0, 91.0, 32.0, 100.0, 141.0, 29.0]
2025-05-13 16:54:35,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 15 minutes, 23 seconds)
2025-05-13 16:59:44,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:59:46,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 520.97046 ± 205.545
2025-05-13 16:59:46,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [752.3748, 156.11557, 535.9955, 597.01013, 772.7533, 576.32605, 523.2037, 140.38293, 500.14645, 655.3962]
2025-05-13 16:59:46,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 30.0, 101.0, 109.0, 154.0, 108.0, 94.0, 27.0, 91.0, 122.0]
2025-05-13 16:59:46,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 10 minutes, 22 seconds)
2025-05-13 17:04:51,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 17:04:53,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 467.78964 ± 209.246
2025-05-13 17:04:53,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [172.65297, 630.67035, 601.86786, 403.66684, 189.5203, 604.3424, 604.39496, 151.84187, 575.5965, 743.34265]
2025-05-13 17:04:53,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 123.0, 112.0, 75.0, 36.0, 115.0, 114.0, 29.0, 114.0, 143.0]
2025-05-13 17:04:53,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 5 minutes, 7 seconds)
2025-05-13 17:09:56,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 17:09:59,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 560.13550 ± 155.958
2025-05-13 17:09:59,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [771.0781, 501.62057, 601.056, 572.7808, 640.78735, 176.67372, 514.7582, 748.8856, 560.8954, 512.81903]
2025-05-13 17:09:59,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 94.0, 112.0, 107.0, 124.0, 34.0, 95.0, 144.0, 106.0, 99.0]
2025-05-13 17:09:59,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1251 [DEBUG]: Training session finished
