2025-05-13 09:06:33,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mda-highdim-mem2
2025-05-13 09:06:33,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-bpql-mda-highdim-mem2
2025-05-13 09:06:33,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14b6a1faee10>}
2025-05-13 09:06:33,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:33,244 baseline-bpql-mda-noisy-humanoid:91 [WARNING]: args.assumed_delay != args.horizon: 2 != 24
2025-05-13 09:06:33,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1133 [INFO]: Creating new trainer
2025-05-13 09:06:33,262 baseline-bpql-mda-noisy-humanoid:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-13 09:06:33,262 baseline-bpql-mda-noisy-humanoid:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:33,271 baseline-bpql-mda-noisy-humanoid:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(17, 512, batch_first=True)
)
2025-05-13 09:06:34,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:34,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:54,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:10:55,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 258.46869 ± 45.545
2025-05-13 09:10:55,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [223.64098, 246.58615, 219.16327, 286.094, 286.70703, 217.77327, 264.5086, 256.64255, 370.4236, 213.14757]
2025-05-13 09:10:55,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [41.0, 45.0, 43.0, 53.0, 55.0, 42.0, 52.0, 49.0, 70.0, 40.0]
2025-05-13 09:10:55,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (258.47) for latency ExtremeClogL1U23
2025-05-13 09:10:55,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 7 hours, 11 minutes, 34 seconds)
2025-05-13 09:15:28,294 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:15:29,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 313.33801 ± 82.574
2025-05-13 09:15:29,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [323.59222, 307.5245, 351.77814, 392.08685, 442.3047, 351.35995, 263.90536, 176.32384, 173.22128, 351.28354]
2025-05-13 09:15:29,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 56.0, 64.0, 73.0, 82.0, 64.0, 47.0, 34.0, 34.0, 63.0]
2025-05-13 09:15:29,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (313.34) for latency ExtremeClogL1U23
2025-05-13 09:15:29,221 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 7 hours, 16 minutes, 55 seconds)
2025-05-13 09:20:03,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:20:04,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 316.05777 ± 47.399
2025-05-13 09:20:04,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [389.77573, 286.12994, 331.79355, 325.06073, 284.55176, 380.31516, 294.0174, 340.62555, 310.03082, 218.27705]
2025-05-13 09:20:04,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 52.0, 59.0, 58.0, 51.0, 69.0, 53.0, 61.0, 56.0, 47.0]
2025-05-13 09:20:04,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (316.06) for latency ExtremeClogL1U23
2025-05-13 09:20:04,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 7 hours, 16 minutes, 24 seconds)
2025-05-13 09:24:38,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:24:39,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 383.78812 ± 105.567
2025-05-13 09:24:39,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [315.7852, 455.41983, 475.31873, 156.71165, 401.96475, 353.28223, 435.41275, 566.2898, 354.24005, 323.4564]
2025-05-13 09:24:39,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 86.0, 88.0, 30.0, 76.0, 64.0, 80.0, 101.0, 64.0, 59.0]
2025-05-13 09:24:39,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (383.79) for latency ExtremeClogL1U23
2025-05-13 09:24:39,762 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 7 hours, 14 minutes, 13 seconds)
2025-05-13 09:29:14,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:29:15,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 413.84042 ± 134.707
2025-05-13 09:29:15,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [354.91617, 382.8814, 517.281, 295.59543, 438.78674, 378.17722, 768.81995, 274.49603, 364.8911, 362.55908]
2025-05-13 09:29:15,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 77.0, 101.0, 60.0, 83.0, 77.0, 149.0, 54.0, 74.0, 75.0]
2025-05-13 09:29:15,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (413.84) for latency ExtremeClogL1U23
2025-05-13 09:29:15,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 7 hours, 11 minutes, 3 seconds)
2025-05-13 09:33:47,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:33:49,619 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 515.67151 ± 113.603
2025-05-13 09:33:49,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [528.7119, 262.43842, 723.04553, 497.58636, 617.7081, 539.6593, 574.0217, 498.19196, 456.15326, 459.1983]
2025-05-13 09:33:49,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 60.0, 142.0, 110.0, 134.0, 105.0, 120.0, 94.0, 89.0, 91.0]
2025-05-13 09:33:49,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (515.67) for latency ExtremeClogL1U23
2025-05-13 09:33:49,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 7 hours, 10 minutes, 28 seconds)
2025-05-13 09:38:26,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:38:27,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 414.59912 ± 135.906
2025-05-13 09:38:27,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [556.32855, 481.27176, 434.28586, 236.65224, 433.5622, 427.40836, 377.97757, 134.33849, 632.3405, 431.82523]
2025-05-13 09:38:27,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 88.0, 80.0, 45.0, 79.0, 78.0, 69.0, 26.0, 131.0, 79.0]
2025-05-13 09:38:27,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 7 hours, 7 minutes, 13 seconds)
2025-05-13 09:42:58,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:42:59,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 450.80280 ± 60.212
2025-05-13 09:42:59,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [458.10742, 391.9011, 354.28162, 505.2242, 529.6194, 393.42206, 480.19794, 387.84454, 519.8792, 487.5505]
2025-05-13 09:42:59,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 74.0, 65.0, 95.0, 104.0, 72.0, 89.0, 80.0, 99.0, 100.0]
2025-05-13 09:42:59,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 7 hours, 1 minute, 57 seconds)
2025-05-13 09:47:33,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:47:34,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 411.30054 ± 31.629
2025-05-13 09:47:34,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [414.9831, 409.1881, 467.58063, 442.40897, 418.28378, 369.45578, 409.19446, 352.11832, 430.99014, 398.80185]
2025-05-13 09:47:34,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 76.0, 88.0, 81.0, 79.0, 68.0, 77.0, 64.0, 80.0, 74.0]
2025-05-13 09:47:34,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 57 minutes, 6 seconds)
2025-05-13 09:52:08,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:52:09,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 444.87689 ± 58.008
2025-05-13 09:52:09,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [504.78873, 385.36227, 422.20862, 411.9968, 415.64594, 474.35178, 366.98712, 565.4529, 486.48404, 415.49066]
2025-05-13 09:52:09,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 69.0, 74.0, 73.0, 74.0, 85.0, 66.0, 103.0, 87.0, 73.0]
2025-05-13 09:52:09,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 52 minutes, 12 seconds)
2025-05-13 09:56:40,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:56:42,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 535.60205 ± 102.009
2025-05-13 09:56:42,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [737.00195, 495.53226, 603.2501, 669.44794, 415.02914, 528.16815, 387.09097, 519.8313, 503.04645, 497.62247]
2025-05-13 09:56:42,479 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 89.0, 112.0, 135.0, 77.0, 101.0, 71.0, 96.0, 93.0, 92.0]
2025-05-13 09:56:42,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (535.60) for latency ExtremeClogL1U23
2025-05-13 09:56:42,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 47 minutes, 16 seconds)
2025-05-13 10:01:17,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:01:19,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 525.45105 ± 115.697
2025-05-13 10:01:19,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [660.24066, 612.1328, 623.6464, 326.2163, 446.1094, 625.6982, 608.99585, 339.741, 517.2496, 494.48056]
2025-05-13 10:01:19,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 112.0, 115.0, 71.0, 84.0, 119.0, 118.0, 63.0, 107.0, 93.0]
2025-05-13 10:01:19,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 42 minutes, 22 seconds)
2025-05-13 10:05:50,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:05:52,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 448.53305 ± 87.545
2025-05-13 10:05:52,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [441.8494, 342.5265, 430.54538, 390.42267, 380.27274, 524.58435, 377.24835, 582.5299, 610.58984, 404.76178]
2025-05-13 10:05:52,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 75.0, 88.0, 80.0, 76.0, 106.0, 83.0, 107.0, 117.0, 82.0]
2025-05-13 10:05:52,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 37 minutes, 56 seconds)
2025-05-13 10:10:26,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:10:28,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 498.79248 ± 155.898
2025-05-13 10:10:28,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [452.85217, 503.4287, 557.3667, 159.20016, 314.70224, 566.33234, 676.6559, 725.2538, 506.33566, 525.79736]
2025-05-13 10:10:28,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 103.0, 105.0, 31.0, 60.0, 105.0, 129.0, 138.0, 96.0, 99.0]
2025-05-13 10:10:28,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 33 minutes, 44 seconds)
2025-05-13 10:15:03,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:15:04,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 490.13702 ± 155.492
2025-05-13 10:15:04,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [609.0353, 485.3945, 566.38983, 413.79642, 354.28928, 829.4943, 527.26044, 398.3083, 220.86827, 496.5331]
2025-05-13 10:15:04,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 92.0, 109.0, 88.0, 68.0, 160.0, 113.0, 76.0, 43.0, 95.0]
2025-05-13 10:15:04,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 29 minutes, 42 seconds)
2025-05-13 10:19:37,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:19:39,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 585.94745 ± 143.013
2025-05-13 10:19:39,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [607.9762, 538.89606, 317.29346, 532.7836, 541.8562, 839.43335, 821.73694, 555.47034, 522.0179, 582.0101]
2025-05-13 10:19:39,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 104.0, 56.0, 100.0, 102.0, 162.0, 156.0, 102.0, 97.0, 110.0]
2025-05-13 10:19:39,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (585.95) for latency ExtremeClogL1U23
2025-05-13 10:19:39,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 6 hours, 25 minutes, 26 seconds)
2025-05-13 10:24:09,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:24:11,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 558.25330 ± 88.180
2025-05-13 10:24:11,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [787.31177, 571.822, 488.7856, 495.0529, 527.392, 541.25055, 469.6312, 600.32416, 601.95807, 499.00464]
2025-05-13 10:24:11,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [163.0, 106.0, 100.0, 93.0, 99.0, 104.0, 89.0, 115.0, 118.0, 92.0]
2025-05-13 10:24:11,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 6 hours, 19 minutes, 44 seconds)
2025-05-13 10:28:45,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:28:47,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 545.46405 ± 129.739
2025-05-13 10:28:47,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [443.243, 548.77893, 499.643, 608.19104, 367.6137, 822.7799, 501.48392, 581.3591, 399.30588, 682.242]
2025-05-13 10:28:47,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 98.0, 91.0, 109.0, 67.0, 149.0, 90.0, 105.0, 72.0, 126.0]
2025-05-13 10:28:47,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 6 hours, 15 minutes, 49 seconds)
2025-05-13 10:33:20,692 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:33:22,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 476.31308 ± 130.956
2025-05-13 10:33:22,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [139.94032, 468.0286, 607.913, 400.99033, 476.16956, 567.8202, 618.3406, 464.8897, 462.57596, 556.4623]
2025-05-13 10:33:22,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 82.0, 114.0, 73.0, 89.0, 104.0, 113.0, 88.0, 96.0, 104.0]
2025-05-13 10:33:22,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 6 hours, 10 minutes, 54 seconds)
2025-05-13 10:37:56,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:37:58,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 579.50763 ± 87.712
2025-05-13 10:37:58,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [509.55975, 477.80545, 756.65295, 578.15247, 556.44214, 559.7335, 699.1047, 608.813, 460.53717, 588.27527]
2025-05-13 10:37:58,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 92.0, 155.0, 110.0, 104.0, 113.0, 132.0, 117.0, 85.0, 119.0]
2025-05-13 10:37:58,762 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 6 hours, 6 minutes, 21 seconds)
2025-05-13 10:42:31,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:42:33,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 597.49091 ± 134.870
2025-05-13 10:42:33,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [513.5671, 632.0245, 499.4223, 561.65924, 857.8988, 610.5478, 558.8287, 420.9551, 495.22583, 824.7801]
2025-05-13 10:42:33,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 133.0, 101.0, 104.0, 163.0, 127.0, 103.0, 90.0, 109.0, 155.0]
2025-05-13 10:42:33,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (597.49) for latency ExtremeClogL1U23
2025-05-13 10:42:33,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 6 hours, 1 minute, 59 seconds)
2025-05-13 10:47:10,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:47:11,948 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 517.44733 ± 93.914
2025-05-13 10:47:11,948 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [490.19412, 498.27975, 687.72144, 321.6079, 578.78577, 522.1288, 493.3011, 476.90118, 477.82718, 627.7261]
2025-05-13 10:47:11,948 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 104.0, 130.0, 60.0, 108.0, 97.0, 91.0, 87.0, 101.0, 133.0]
2025-05-13 10:47:11,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 58 minutes, 52 seconds)
2025-05-13 10:51:42,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:51:44,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 508.61176 ± 159.291
2025-05-13 10:51:44,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [180.85907, 425.97116, 504.26645, 463.88742, 603.8209, 798.94165, 597.5791, 429.78806, 422.2917, 658.71185]
2025-05-13 10:51:44,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 76.0, 91.0, 83.0, 111.0, 161.0, 116.0, 76.0, 75.0, 127.0]
2025-05-13 10:51:44,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 53 minutes, 24 seconds)
2025-05-13 10:56:18,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:56:19,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 491.37103 ± 58.744
2025-05-13 10:56:19,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [476.97806, 483.12335, 461.44498, 435.2482, 521.7675, 443.22028, 520.3254, 556.31866, 610.98553, 404.29816]
2025-05-13 10:56:19,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 86.0, 82.0, 79.0, 92.0, 79.0, 91.0, 99.0, 109.0, 74.0]
2025-05-13 10:56:19,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 48 minutes, 59 seconds)
2025-05-13 11:00:52,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:00:54,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 466.72910 ± 220.839
2025-05-13 11:00:54,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [463.24567, 683.7519, 504.9536, 696.03217, 135.02286, 140.43657, 591.2944, 694.6796, 170.17981, 587.6943]
2025-05-13 11:00:54,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 122.0, 94.0, 139.0, 26.0, 27.0, 108.0, 141.0, 33.0, 108.0]
2025-05-13 11:00:54,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 43 minutes, 50 seconds)
2025-05-13 11:05:31,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:05:32,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 470.25552 ± 203.786
2025-05-13 11:05:32,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [490.59167, 689.5744, 87.96967, 156.193, 436.76923, 718.0303, 637.2772, 369.70206, 503.4722, 612.9757]
2025-05-13 11:05:32,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 126.0, 19.0, 30.0, 84.0, 131.0, 116.0, 67.0, 92.0, 112.0]
2025-05-13 11:05:32,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 40 minutes, 6 seconds)
2025-05-13 11:10:05,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:10:06,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 525.77283 ± 155.953
2025-05-13 11:10:06,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [618.6848, 517.3585, 592.88666, 615.8174, 568.3289, 558.41754, 508.3983, 624.6427, 72.03755, 581.156]
2025-05-13 11:10:06,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 93.0, 106.0, 110.0, 106.0, 98.0, 101.0, 112.0, 15.0, 103.0]
2025-05-13 11:10:06,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 34 minutes, 34 seconds)
2025-05-13 11:14:41,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:14:43,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 589.84412 ± 165.770
2025-05-13 11:14:43,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [690.42706, 772.55493, 785.53265, 323.8158, 353.29996, 379.2125, 633.59827, 595.15424, 644.8232, 720.02295]
2025-05-13 11:14:43,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 146.0, 151.0, 57.0, 67.0, 71.0, 124.0, 113.0, 118.0, 142.0]
2025-05-13 11:14:43,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 31 minutes)
2025-05-13 11:19:17,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:19:19,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 621.89783 ± 147.378
2025-05-13 11:19:19,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [631.77423, 803.1757, 658.173, 812.73065, 569.9203, 493.1961, 625.5638, 338.6225, 794.7363, 491.08603]
2025-05-13 11:19:19,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 155.0, 122.0, 149.0, 112.0, 94.0, 125.0, 61.0, 161.0, 90.0]
2025-05-13 11:19:19,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (621.90) for latency ExtremeClogL1U23
2025-05-13 11:19:19,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 26 minutes, 31 seconds)
2025-05-13 11:23:53,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:23:55,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 555.14984 ± 285.796
2025-05-13 11:23:55,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [525.04016, 782.45074, 493.92914, 494.19754, 160.45685, 346.46115, 515.631, 1130.7946, 886.2795, 216.25845]
2025-05-13 11:23:55,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 147.0, 93.0, 93.0, 31.0, 68.0, 102.0, 221.0, 160.0, 41.0]
2025-05-13 11:23:55,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 22 minutes, 13 seconds)
2025-05-13 11:28:32,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:28:34,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 669.55701 ± 209.480
2025-05-13 11:28:34,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [649.31305, 622.06116, 891.34863, 865.0463, 749.3433, 640.4864, 520.81195, 821.4788, 795.6739, 140.00636]
2025-05-13 11:28:34,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 118.0, 164.0, 165.0, 142.0, 120.0, 94.0, 159.0, 150.0, 29.0]
2025-05-13 11:28:34,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (669.56) for latency ExtremeClogL1U23
2025-05-13 11:28:34,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 17 minutes, 45 seconds)
2025-05-13 11:33:11,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:33:13,660 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 644.76282 ± 229.824
2025-05-13 11:33:13,660 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [293.4026, 886.9697, 821.9259, 707.59467, 680.7918, 589.3308, 605.931, 937.91547, 186.1733, 737.5929]
2025-05-13 11:33:13,660 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 183.0, 155.0, 131.0, 124.0, 109.0, 111.0, 178.0, 36.0, 135.0]
2025-05-13 11:33:13,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 5 hours, 14 minutes, 19 seconds)
2025-05-13 11:37:47,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:37:49,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 576.26404 ± 132.881
2025-05-13 11:37:49,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [630.858, 502.5775, 623.3097, 708.1798, 291.51727, 588.6008, 457.81726, 587.2133, 565.95557, 806.61127]
2025-05-13 11:37:49,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 108.0, 117.0, 131.0, 57.0, 111.0, 91.0, 111.0, 118.0, 158.0]
2025-05-13 11:37:49,779 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 5 hours, 9 minutes, 39 seconds)
2025-05-13 11:42:21,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:42:23,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 593.50684 ± 160.049
2025-05-13 11:42:23,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [792.4801, 699.86884, 316.85425, 458.9593, 617.17694, 732.8189, 504.06345, 536.2534, 444.18976, 832.40356]
2025-05-13 11:42:23,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 131.0, 55.0, 89.0, 116.0, 145.0, 93.0, 107.0, 87.0, 151.0]
2025-05-13 11:42:23,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 5 hours, 4 minutes, 32 seconds)
2025-05-13 11:46:58,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:46:59,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 379.31494 ± 152.220
2025-05-13 11:46:59,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [517.15643, 161.48991, 454.3576, 460.26874, 340.31372, 493.50406, 204.2679, 135.52733, 592.5946, 433.66885]
2025-05-13 11:46:59,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 31.0, 88.0, 99.0, 70.0, 95.0, 40.0, 26.0, 126.0, 93.0]
2025-05-13 11:46:59,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 59 minutes, 56 seconds)
2025-05-13 11:51:33,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:51:35,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 628.98987 ± 221.441
2025-05-13 11:51:35,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [616.42615, 682.38727, 528.4127, 516.2939, 154.23442, 812.6813, 519.3777, 1041.0546, 782.3818, 636.64856]
2025-05-13 11:51:35,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 129.0, 101.0, 101.0, 32.0, 152.0, 102.0, 198.0, 141.0, 125.0]
2025-05-13 11:51:35,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 54 minutes, 42 seconds)
2025-05-13 11:56:11,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:56:12,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 559.54688 ± 158.009
2025-05-13 11:56:12,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [170.26575, 604.4553, 522.0806, 661.3957, 488.60135, 491.09894, 560.29517, 680.102, 797.0625, 620.1114]
2025-05-13 11:56:12,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 126.0, 107.0, 127.0, 92.0, 99.0, 119.0, 128.0, 147.0, 127.0]
2025-05-13 11:56:12,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 49 minutes, 39 seconds)
2025-05-13 12:00:45,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:00:48,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 715.15393 ± 180.431
2025-05-13 12:00:48,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [645.8836, 958.4693, 513.46106, 812.2474, 530.66595, 513.0126, 624.63715, 957.81934, 630.5629, 964.7799]
2025-05-13 12:00:48,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 190.0, 94.0, 155.0, 111.0, 106.0, 115.0, 182.0, 132.0, 198.0]
2025-05-13 12:00:48,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (715.15) for latency ExtremeClogL1U23
2025-05-13 12:00:48,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 44 minutes, 50 seconds)
2025-05-13 12:05:21,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:05:23,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 637.07312 ± 118.935
2025-05-13 12:05:23,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [846.84454, 705.48596, 611.7716, 655.6413, 474.74997, 485.5465, 686.22595, 780.4141, 497.7982, 626.2529]
2025-05-13 12:05:23,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 133.0, 111.0, 122.0, 86.0, 88.0, 125.0, 141.0, 89.0, 115.0]
2025-05-13 12:05:23,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 40 minutes, 32 seconds)
2025-05-13 12:09:56,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:09:59,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 681.90637 ± 103.744
2025-05-13 12:09:59,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [866.9157, 827.3418, 596.6161, 673.91425, 519.66644, 629.7008, 748.903, 575.88196, 690.61914, 689.50494]
2025-05-13 12:09:59,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 169.0, 120.0, 132.0, 105.0, 129.0, 148.0, 117.0, 138.0, 133.0]
2025-05-13 12:09:59,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 35 minutes, 55 seconds)
2025-05-13 12:14:37,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:14:39,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 601.48505 ± 111.278
2025-05-13 12:14:39,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [470.71515, 792.70905, 537.7803, 441.5133, 617.18634, 557.35046, 598.7813, 773.6797, 677.42194, 547.7131]
2025-05-13 12:14:39,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 152.0, 108.0, 82.0, 120.0, 114.0, 112.0, 145.0, 124.0, 108.0]
2025-05-13 12:14:39,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 32 minutes, 5 seconds)
2025-05-13 12:19:12,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:19:14,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 790.16779 ± 153.786
2025-05-13 12:19:14,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [823.05457, 970.43463, 1139.6802, 826.6675, 675.45044, 626.4704, 615.10626, 766.6836, 708.9217, 749.2082]
2025-05-13 12:19:14,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 181.0, 213.0, 155.0, 128.0, 120.0, 112.0, 142.0, 139.0, 139.0]
2025-05-13 12:19:14,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (790.17) for latency ExtremeClogL1U23
2025-05-13 12:19:14,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 27 minutes, 9 seconds)
2025-05-13 12:23:47,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:23:49,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 797.65234 ± 142.791
2025-05-13 12:23:49,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [815.4462, 906.62897, 743.6938, 713.03687, 936.6199, 822.969, 994.98975, 760.6017, 831.5966, 450.94058]
2025-05-13 12:23:49,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 172.0, 138.0, 128.0, 178.0, 154.0, 183.0, 150.0, 156.0, 82.0]
2025-05-13 12:23:49,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (797.65) for latency ExtremeClogL1U23
2025-05-13 12:23:49,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 22 minutes, 29 seconds)
2025-05-13 12:28:23,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:28:25,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 650.19843 ± 174.122
2025-05-13 12:28:25,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [693.90656, 1038.2893, 545.259, 637.1055, 578.56555, 773.898, 373.88593, 751.47815, 639.1579, 470.43918]
2025-05-13 12:28:25,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 202.0, 101.0, 128.0, 118.0, 149.0, 72.0, 148.0, 116.0, 94.0]
2025-05-13 12:28:25,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 17 minutes, 59 seconds)
2025-05-13 12:33:03,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:33:06,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 846.99969 ± 458.058
2025-05-13 12:33:06,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [426.66473, 2105.255, 706.71295, 514.2795, 848.9406, 676.4139, 813.24884, 521.07935, 758.284, 1099.1185]
2025-05-13 12:33:06,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 408.0, 152.0, 95.0, 164.0, 127.0, 159.0, 110.0, 141.0, 214.0]
2025-05-13 12:33:06,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (847.00) for latency ExtremeClogL1U23
2025-05-13 12:33:06,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 4 hours, 14 minutes, 19 seconds)
2025-05-13 12:37:42,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:37:44,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 833.24042 ± 160.932
2025-05-13 12:37:44,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [726.09015, 712.9071, 1160.2819, 820.9485, 906.8054, 1036.1599, 571.3441, 735.1977, 807.6303, 855.0398]
2025-05-13 12:37:44,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 140.0, 226.0, 154.0, 167.0, 201.0, 120.0, 146.0, 148.0, 165.0]
2025-05-13 12:37:44,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 4 hours, 9 minutes, 27 seconds)
2025-05-13 12:42:14,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:42:17,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 678.68762 ± 280.276
2025-05-13 12:42:17,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [603.9235, 782.10254, 742.4538, 77.09414, 745.3264, 842.5628, 324.32773, 1156.9117, 809.6424, 702.5312]
2025-05-13 12:42:17,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 144.0, 135.0, 16.0, 138.0, 152.0, 58.0, 234.0, 152.0, 128.0]
2025-05-13 12:42:17,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 4 hours, 4 minutes, 11 seconds)
2025-05-13 12:46:52,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:46:54,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 635.36658 ± 261.151
2025-05-13 12:46:54,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [760.5544, 585.44934, 533.0006, 177.14467, 996.90125, 883.67395, 641.3267, 720.26984, 859.77527, 195.56963]
2025-05-13 12:46:54,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 109.0, 99.0, 34.0, 182.0, 176.0, 121.0, 132.0, 160.0, 38.0]
2025-05-13 12:46:54,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 4 hours)
2025-05-13 12:51:31,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:51:34,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 759.44617 ± 137.574
2025-05-13 12:51:34,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [823.7452, 816.0225, 884.62537, 531.5845, 951.3654, 780.76404, 801.15027, 845.7431, 539.2923, 620.1696]
2025-05-13 12:51:34,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 151.0, 182.0, 96.0, 195.0, 151.0, 164.0, 166.0, 113.0, 128.0]
2025-05-13 12:51:34,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 56 minutes, 6 seconds)
2025-05-13 12:56:08,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:56:11,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 795.52039 ± 94.367
2025-05-13 12:56:11,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [899.03894, 788.81976, 753.5972, 926.04663, 806.0153, 853.58905, 841.67975, 819.69055, 652.18567, 614.5412]
2025-05-13 12:56:11,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 145.0, 149.0, 173.0, 150.0, 163.0, 148.0, 151.0, 128.0, 133.0]
2025-05-13 12:56:11,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 50 minutes, 52 seconds)
2025-05-13 13:00:42,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:00:45,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 786.48206 ± 222.945
2025-05-13 13:00:45,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [827.8665, 766.5251, 758.2669, 811.4014, 1387.7012, 566.7877, 752.1041, 533.37244, 808.6562, 652.13904]
2025-05-13 13:00:45,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 139.0, 142.0, 150.0, 255.0, 112.0, 134.0, 103.0, 147.0, 118.0]
2025-05-13 13:00:45,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 45 minutes, 26 seconds)
2025-05-13 13:05:22,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:05:24,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 774.84680 ± 243.455
2025-05-13 13:05:24,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [679.9476, 480.54865, 1238.2996, 559.40704, 571.6107, 547.57544, 842.167, 987.5263, 1079.782, 761.60376]
2025-05-13 13:05:24,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 104.0, 233.0, 104.0, 100.0, 113.0, 156.0, 178.0, 222.0, 140.0]
2025-05-13 13:05:24,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 42 minutes, 3 seconds)
2025-05-13 13:09:55,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:09:57,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 627.79919 ± 191.240
2025-05-13 13:09:57,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [197.44841, 610.3611, 777.8026, 768.45795, 935.52277, 579.5018, 663.6339, 443.38986, 603.2579, 698.6164]
2025-05-13 13:09:57,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 112.0, 144.0, 143.0, 176.0, 108.0, 122.0, 84.0, 111.0, 126.0]
2025-05-13 13:09:57,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 36 minutes, 44 seconds)
2025-05-13 13:14:36,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:14:38,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 705.07867 ± 332.726
2025-05-13 13:14:38,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [175.68163, 605.4302, 689.16956, 1095.325, 517.6159, 1378.8584, 579.92126, 893.4093, 356.90637, 758.46924]
2025-05-13 13:14:38,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 125.0, 119.0, 216.0, 109.0, 279.0, 102.0, 171.0, 65.0, 137.0]
2025-05-13 13:14:38,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 32 minutes, 12 seconds)
2025-05-13 13:19:11,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:19:12,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 581.70856 ± 157.805
2025-05-13 13:19:12,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [670.1242, 517.5692, 790.78766, 633.4572, 699.7451, 795.18427, 282.98834, 475.16412, 411.70395, 540.361]
2025-05-13 13:19:12,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 92.0, 145.0, 115.0, 128.0, 147.0, 49.0, 91.0, 80.0, 108.0]
2025-05-13 13:19:12,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 27 minutes, 11 seconds)
2025-05-13 13:23:47,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:23:50,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 872.18378 ± 166.720
2025-05-13 13:23:50,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [893.09894, 811.1234, 704.5343, 1283.1661, 750.7845, 803.31146, 825.96936, 1048.7148, 883.45215, 717.6824]
2025-05-13 13:23:50,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 153.0, 147.0, 242.0, 147.0, 173.0, 173.0, 202.0, 185.0, 142.0]
2025-05-13 13:23:50,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (872.18) for latency ExtremeClogL1U23
2025-05-13 13:23:50,601 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 23 minutes, 11 seconds)
2025-05-13 13:28:23,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:28:25,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 722.95642 ± 219.811
2025-05-13 13:28:25,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [943.5493, 620.18616, 618.50604, 720.9165, 922.9141, 936.0424, 691.99146, 781.3016, 164.48326, 829.67346]
2025-05-13 13:28:25,612 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 119.0, 114.0, 137.0, 171.0, 169.0, 125.0, 147.0, 32.0, 153.0]
2025-05-13 13:28:25,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 17 minutes, 53 seconds)
2025-05-13 13:32:59,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:33:02,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 721.68506 ± 250.724
2025-05-13 13:33:02,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [649.0904, 688.21045, 911.3682, 690.1887, 221.6556, 974.6998, 1168.1979, 713.85535, 744.82385, 454.76007]
2025-05-13 13:33:02,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 130.0, 179.0, 142.0, 45.0, 192.0, 237.0, 141.0, 141.0, 85.0]
2025-05-13 13:33:02,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 13 minutes, 48 seconds)
2025-05-13 13:37:38,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:37:41,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 774.66730 ± 187.557
2025-05-13 13:37:41,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [551.589, 618.16406, 887.0149, 690.5518, 583.19275, 739.0582, 689.3817, 1011.5482, 1168.9307, 807.241]
2025-05-13 13:37:41,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 116.0, 165.0, 132.0, 114.0, 137.0, 124.0, 183.0, 219.0, 149.0]
2025-05-13 13:37:41,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 3 hours, 8 minutes, 57 seconds)
2025-05-13 13:42:15,522 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:42:17,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 498.90762 ± 328.286
2025-05-13 13:42:17,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [640.33673, 377.35916, 580.72186, 1140.5377, 490.17444, 86.38127, 166.95862, 388.82794, 961.87646, 155.90186]
2025-05-13 13:42:17,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 69.0, 113.0, 214.0, 96.0, 18.0, 32.0, 72.0, 173.0, 30.0]
2025-05-13 13:42:17,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 3 hours, 4 minutes, 33 seconds)
2025-05-13 13:46:51,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:46:53,598 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 690.27380 ± 272.652
2025-05-13 13:46:53,598 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1206.2643, 816.42053, 787.33514, 781.0139, 72.99063, 661.4968, 595.14276, 478.73706, 765.3527, 737.98456]
2025-05-13 13:46:53,598 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [227.0, 152.0, 141.0, 144.0, 15.0, 123.0, 110.0, 89.0, 143.0, 144.0]
2025-05-13 13:46:53,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 59 minutes, 47 seconds)
2025-05-13 13:51:28,396 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:51:30,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 666.05298 ± 277.586
2025-05-13 13:51:30,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [541.4665, 1016.2652, 741.95355, 846.2602, 598.56964, 161.04788, 223.51765, 956.9093, 676.07227, 898.4676]
2025-05-13 13:51:30,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 195.0, 133.0, 157.0, 105.0, 31.0, 44.0, 184.0, 119.0, 163.0]
2025-05-13 13:51:30,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 55 minutes, 24 seconds)
2025-05-13 13:56:05,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:56:08,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 895.57947 ± 287.651
2025-05-13 13:56:08,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1519.2959, 842.7328, 885.6308, 725.9372, 848.5368, 1302.3757, 831.7505, 872.1457, 479.664, 647.72546]
2025-05-13 13:56:08,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [284.0, 154.0, 163.0, 137.0, 158.0, 250.0, 153.0, 166.0, 88.0, 127.0]
2025-05-13 13:56:08,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (895.58) for latency ExtremeClogL1U23
2025-05-13 13:56:08,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 51 minutes)
2025-05-13 14:00:40,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:00:43,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 866.99774 ± 355.768
2025-05-13 14:00:43,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [472.19397, 473.39783, 814.1114, 397.33554, 733.0548, 1004.44434, 1498.227, 1319.3444, 796.4796, 1161.3892]
2025-05-13 14:00:43,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 100.0, 166.0, 88.0, 153.0, 189.0, 289.0, 266.0, 148.0, 231.0]
2025-05-13 14:00:43,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 45 minutes, 54 seconds)
2025-05-13 14:05:23,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:05:25,746 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 748.24780 ± 423.136
2025-05-13 14:05:25,746 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [915.05817, 234.948, 855.4797, 189.87479, 193.46211, 457.7952, 1087.1661, 1389.224, 941.4799, 1217.9901]
2025-05-13 14:05:25,746 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [173.0, 46.0, 178.0, 37.0, 37.0, 87.0, 201.0, 256.0, 177.0, 226.0]
2025-05-13 14:05:25,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 42 minutes)
2025-05-13 14:09:57,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:09:59,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 736.60193 ± 227.072
2025-05-13 14:09:59,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [466.1586, 1167.7815, 810.20044, 692.3435, 519.028, 1089.62, 844.2305, 616.28613, 613.5909, 546.77985]
2025-05-13 14:09:59,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 216.0, 156.0, 142.0, 110.0, 205.0, 164.0, 125.0, 127.0, 100.0]
2025-05-13 14:09:59,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 37 minutes, 4 seconds)
2025-05-13 14:14:37,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:14:40,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 833.52209 ± 252.231
2025-05-13 14:14:40,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [989.1101, 879.2521, 1132.451, 953.4259, 1243.5643, 701.57477, 509.73425, 822.52637, 720.49554, 383.08597]
2025-05-13 14:14:40,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 162.0, 209.0, 176.0, 223.0, 151.0, 109.0, 151.0, 130.0, 87.0]
2025-05-13 14:14:40,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 32 minutes, 51 seconds)
2025-05-13 14:19:13,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:19:15,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 872.07068 ± 266.752
2025-05-13 14:19:15,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [885.5016, 1255.7556, 694.13666, 577.3931, 825.6141, 908.5707, 355.99628, 1145.0983, 1190.1912, 882.44916]
2025-05-13 14:19:15,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 235.0, 124.0, 103.0, 166.0, 177.0, 75.0, 209.0, 219.0, 163.0]
2025-05-13 14:19:16,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 27 minutes, 59 seconds)
2025-05-13 14:23:49,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:23:53,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 991.05310 ± 254.088
2025-05-13 14:23:53,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1243.5175, 690.97906, 589.3594, 1150.5769, 1179.7821, 758.6236, 1243.0123, 1333.8966, 870.6174, 850.1668]
2025-05-13 14:23:53,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [262.0, 131.0, 114.0, 221.0, 221.0, 145.0, 226.0, 244.0, 157.0, 158.0]
2025-05-13 14:23:53,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (991.05) for latency ExtremeClogL1U23
2025-05-13 14:23:53,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 23 minutes, 34 seconds)
2025-05-13 14:28:26,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:28:30,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1203.75317 ± 637.246
2025-05-13 14:28:30,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1877.3761, 1137.0393, 880.79407, 301.0542, 782.11383, 1730.037, 594.1248, 2538.749, 936.0882, 1260.156]
2025-05-13 14:28:30,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [351.0, 208.0, 177.0, 56.0, 149.0, 328.0, 128.0, 469.0, 173.0, 243.0]
2025-05-13 14:28:30,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (1203.75) for latency ExtremeClogL1U23
2025-05-13 14:28:30,443 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 18 minutes, 28 seconds)
2025-05-13 14:33:05,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:33:09,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1064.12524 ± 377.083
2025-05-13 14:33:09,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [827.3867, 1580.203, 1058.3324, 928.2005, 1136.468, 846.7707, 917.51855, 909.1971, 1902.4495, 534.7254]
2025-05-13 14:33:09,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [176.0, 319.0, 217.0, 196.0, 232.0, 180.0, 176.0, 168.0, 368.0, 101.0]
2025-05-13 14:33:09,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 14 minutes, 21 seconds)
2025-05-13 14:37:43,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:37:46,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 982.07507 ± 228.957
2025-05-13 14:37:46,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [862.0449, 673.60516, 1205.6562, 1186.3341, 1199.2723, 983.2228, 786.6466, 1216.787, 1126.9656, 580.21606]
2025-05-13 14:37:46,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 122.0, 227.0, 217.0, 230.0, 179.0, 145.0, 227.0, 214.0, 125.0]
2025-05-13 14:37:46,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 9 minutes, 21 seconds)
2025-05-13 14:42:24,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:42:28,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1112.77319 ± 347.715
2025-05-13 14:42:28,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [705.0453, 946.7618, 1642.5212, 1064.3608, 1003.951, 908.66504, 1711.1975, 608.2746, 1174.5164, 1362.4377]
2025-05-13 14:42:28,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 198.0, 327.0, 199.0, 196.0, 174.0, 312.0, 128.0, 219.0, 256.0]
2025-05-13 14:42:28,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 5 minutes, 17 seconds)
2025-05-13 14:47:01,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:47:03,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 886.79700 ± 444.190
2025-05-13 14:47:03,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1851.8868, 718.44904, 1049.6191, 1155.0356, 682.989, 354.25555, 1267.9468, 591.9487, 906.61975, 289.21838]
2025-05-13 14:47:03,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [356.0, 132.0, 195.0, 214.0, 127.0, 67.0, 246.0, 116.0, 175.0, 50.0]
2025-05-13 14:47:03,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 2 hours, 32 seconds)
2025-05-13 14:51:40,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:51:42,931 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 890.87805 ± 323.006
2025-05-13 14:51:42,931 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [996.638, 1415.8112, 1019.3504, 648.06836, 643.27057, 939.44464, 1026.3066, 145.62419, 1015.531, 1058.7352]
2025-05-13 14:51:42,931 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 257.0, 192.0, 141.0, 137.0, 175.0, 194.0, 28.0, 184.0, 204.0]
2025-05-13 14:51:42,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 56 minutes, 2 seconds)
2025-05-13 14:56:21,619 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:56:24,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 852.64783 ± 219.239
2025-05-13 14:56:24,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [859.5851, 899.3299, 717.0979, 810.25366, 733.5, 1194.5994, 907.38367, 568.92065, 1260.7742, 575.0339]
2025-05-13 14:56:24,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 165.0, 131.0, 151.0, 127.0, 221.0, 167.0, 103.0, 233.0, 104.0]
2025-05-13 14:56:24,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 51 minutes, 35 seconds)
2025-05-13 15:00:54,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:00:57,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 913.07910 ± 513.969
2025-05-13 15:00:57,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [811.35565, 1404.2478, 794.6216, 172.08133, 431.27304, 1518.8566, 567.5676, 724.6317, 1938.6316, 767.52356]
2025-05-13 15:00:57,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 260.0, 144.0, 33.0, 93.0, 284.0, 116.0, 143.0, 364.0, 143.0]
2025-05-13 15:00:57,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 46 minutes, 39 seconds)
2025-05-13 15:05:32,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:05:34,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 832.91974 ± 504.016
2025-05-13 15:05:34,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1857.0956, 290.54175, 853.7076, 924.5023, 931.0631, 170.95534, 169.88593, 1212.9514, 709.6346, 1208.8589]
2025-05-13 15:05:34,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [349.0, 51.0, 155.0, 190.0, 177.0, 33.0, 33.0, 219.0, 128.0, 219.0]
2025-05-13 15:05:34,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 41 minutes, 41 seconds)
2025-05-13 15:10:13,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:10:16,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 884.34045 ± 493.414
2025-05-13 15:10:16,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [538.5816, 1483.1938, 726.8459, 783.0818, 618.86707, 633.9235, 1954.1462, 1138.824, 124.70236, 841.2379]
2025-05-13 15:10:16,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 272.0, 152.0, 140.0, 132.0, 120.0, 355.0, 209.0, 24.0, 154.0]
2025-05-13 15:10:16,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 37 minutes, 28 seconds)
2025-05-13 15:14:53,654 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:14:56,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 947.86993 ± 444.603
2025-05-13 15:14:56,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1036.2299, 503.77612, 904.151, 911.62463, 1346.8608, 177.5125, 1234.7512, 1856.517, 880.9801, 626.29596]
2025-05-13 15:14:56,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [207.0, 98.0, 173.0, 168.0, 248.0, 34.0, 240.0, 338.0, 170.0, 118.0]
2025-05-13 15:14:56,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 32 minutes, 54 seconds)
2025-05-13 15:19:30,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:19:35,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1481.84766 ± 448.486
2025-05-13 15:19:35,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1993.2008, 1879.6965, 1141.2985, 1495.782, 1586.2717, 1754.8942, 1183.3201, 1321.2861, 473.2018, 1989.5248]
2025-05-13 15:19:35,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [391.0, 365.0, 213.0, 283.0, 296.0, 334.0, 227.0, 243.0, 96.0, 413.0]
2025-05-13 15:19:35,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (1481.85) for latency ExtremeClogL1U23
2025-05-13 15:19:35,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 28 minutes, 6 seconds)
2025-05-13 15:24:14,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:24:18,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1417.86353 ± 666.448
2025-05-13 15:24:18,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [2895.6218, 1283.9607, 1064.6561, 1812.4668, 729.27594, 636.95935, 1467.1562, 761.257, 2082.0044, 1445.2754]
2025-05-13 15:24:18,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [559.0, 248.0, 206.0, 343.0, 153.0, 134.0, 292.0, 138.0, 389.0, 263.0]
2025-05-13 15:24:18,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 24 minutes, 5 seconds)
2025-05-13 15:28:51,016 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:28:54,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 975.43524 ± 347.423
2025-05-13 15:28:54,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [595.42065, 1521.3104, 682.78265, 1366.9183, 331.516, 1119.164, 984.8552, 1261.8594, 1002.2387, 888.2883]
2025-05-13 15:28:54,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 284.0, 129.0, 253.0, 63.0, 207.0, 184.0, 231.0, 183.0, 173.0]
2025-05-13 15:28:54,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 19 minutes, 17 seconds)
2025-05-13 15:33:28,210 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:33:31,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 995.96155 ± 693.926
2025-05-13 15:33:31,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [651.04565, 697.7368, 576.6741, 571.6419, 729.65326, 589.298, 553.31415, 1156.7693, 2854.267, 1579.2148]
2025-05-13 15:33:31,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 149.0, 121.0, 117.0, 144.0, 125.0, 111.0, 212.0, 526.0, 330.0]
2025-05-13 15:33:31,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 14 minutes, 24 seconds)
2025-05-13 15:38:07,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:38:11,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1250.85437 ± 869.736
2025-05-13 15:38:11,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [146.16714, 793.39264, 1231.8763, 727.0329, 1112.8658, 2555.3, 629.5535, 1255.0615, 3156.3623, 900.93134]
2025-05-13 15:38:11,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 157.0, 237.0, 155.0, 201.0, 475.0, 129.0, 252.0, 610.0, 172.0]
2025-05-13 15:38:11,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 9 minutes, 45 seconds)
2025-05-13 15:42:51,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:42:54,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1139.28003 ± 816.074
2025-05-13 15:42:54,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [196.37965, 3001.2417, 545.30365, 833.78235, 1436.0074, 710.542, 998.8548, 686.5981, 2237.791, 746.3005]
2025-05-13 15:42:54,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 571.0, 104.0, 180.0, 279.0, 152.0, 204.0, 147.0, 443.0, 135.0]
2025-05-13 15:42:54,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 5 minutes, 18 seconds)
2025-05-13 15:47:30,901 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:47:35,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1543.09497 ± 536.138
2025-05-13 15:47:35,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1903.7001, 1959.0706, 2454.3564, 738.5658, 745.1275, 1572.9231, 2095.973, 1341.273, 1288.0118, 1331.9495]
2025-05-13 15:47:35,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [366.0, 366.0, 468.0, 155.0, 138.0, 297.0, 396.0, 253.0, 255.0, 264.0]
2025-05-13 15:47:35,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (1543.09) for latency ExtremeClogL1U23
2025-05-13 15:47:35,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 1 hour, 32 seconds)
2025-05-13 15:52:27,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:52:31,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1012.97473 ± 235.410
2025-05-13 15:52:31,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1265.1177, 1021.8269, 802.554, 1167.052, 753.80054, 881.4733, 1458.3932, 921.1694, 687.1295, 1171.2307]
2025-05-13 15:52:31,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [230.0, 181.0, 163.0, 212.0, 136.0, 162.0, 269.0, 169.0, 126.0, 216.0]
2025-05-13 15:52:31,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 56 minutes, 40 seconds)
2025-05-13 15:57:32,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:57:37,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1199.11108 ± 770.671
2025-05-13 15:57:37,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1001.2697, 1953.8691, 1773.6996, 489.28735, 553.15466, 782.0629, 1818.7484, 398.7642, 2737.7297, 482.52463]
2025-05-13 15:57:37,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 393.0, 356.0, 85.0, 97.0, 160.0, 375.0, 74.0, 541.0, 86.0]
2025-05-13 15:57:37,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 53 minutes)
2025-05-13 16:02:28,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:02:31,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1079.68311 ± 381.603
2025-05-13 16:02:31,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [596.161, 1633.0405, 988.3559, 1746.2305, 647.9266, 1069.1038, 990.1689, 1163.2146, 1329.6799, 632.9496]
2025-05-13 16:02:31,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 305.0, 182.0, 323.0, 120.0, 195.0, 187.0, 215.0, 253.0, 116.0]
2025-05-13 16:02:31,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 48 minutes, 39 seconds)
2025-05-13 16:06:48,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:06:51,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 992.55017 ± 496.661
2025-05-13 16:06:51,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1738.9468, 623.49207, 1500.3358, 772.8685, 424.98102, 697.33484, 815.60425, 1897.0878, 543.1185, 911.7326]
2025-05-13 16:06:51,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [351.0, 115.0, 285.0, 143.0, 73.0, 139.0, 169.0, 362.0, 105.0, 172.0]
2025-05-13 16:06:51,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 43 minutes, 6 seconds)
2025-05-13 16:11:12,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:11:16,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1165.39722 ± 723.787
2025-05-13 16:11:16,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [632.53613, 590.16907, 950.6801, 565.8521, 2207.0015, 1961.5498, 404.3653, 1362.2574, 2403.871, 575.6892]
2025-05-13 16:11:16,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 105.0, 188.0, 119.0, 406.0, 361.0, 78.0, 258.0, 470.0, 112.0]
2025-05-13 16:11:16,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 37 minutes, 52 seconds)
2025-05-13 16:15:42,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:15:45,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 940.16718 ± 254.374
2025-05-13 16:15:45,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [551.6274, 710.6223, 1144.874, 1236.7977, 1265.6715, 606.5462, 895.82874, 836.9469, 1228.904, 923.8523]
2025-05-13 16:15:45,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 131.0, 212.0, 232.0, 235.0, 115.0, 166.0, 153.0, 225.0, 171.0]
2025-05-13 16:15:45,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 32 minutes, 31 seconds)
2025-05-13 16:20:10,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:20:14,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1275.08765 ± 531.419
2025-05-13 16:20:14,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1176.5023, 2430.1086, 1315.5403, 587.10834, 612.2979, 1133.4955, 1911.2504, 980.7483, 1465.005, 1138.8192]
2025-05-13 16:20:14,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [220.0, 449.0, 240.0, 112.0, 120.0, 226.0, 352.0, 205.0, 272.0, 213.0]
2025-05-13 16:20:14,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 27 minutes, 8 seconds)
2025-05-13 16:24:38,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:24:41,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 975.36884 ± 376.638
2025-05-13 16:24:41,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1204.4069, 1464.438, 129.72755, 961.46985, 1331.297, 1095.708, 1036.8424, 661.0334, 1221.174, 647.5915]
2025-05-13 16:24:41,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [227.0, 263.0, 25.0, 178.0, 244.0, 201.0, 197.0, 131.0, 223.0, 120.0]
2025-05-13 16:24:42,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 22 minutes, 10 seconds)
2025-05-13 16:29:40,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:29:44,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1040.05957 ± 365.684
2025-05-13 16:29:44,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1198.7003, 1095.8969, 1264.2941, 769.4217, 780.10657, 1258.0314, 1636.8567, 1089.9237, 199.19197, 1108.1729]
2025-05-13 16:29:44,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [249.0, 210.0, 235.0, 142.0, 147.0, 251.0, 303.0, 201.0, 38.0, 204.0]
2025-05-13 16:29:44,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 18 minutes, 17 seconds)
2025-05-13 16:34:28,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:34:31,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 839.89709 ± 301.437
2025-05-13 16:34:31,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1176.8685, 1280.0854, 722.3833, 176.44577, 632.345, 1041.2096, 700.3093, 823.9494, 799.60077, 1045.7738]
2025-05-13 16:34:31,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [225.0, 263.0, 150.0, 34.0, 122.0, 198.0, 141.0, 168.0, 165.0, 214.0]
2025-05-13 16:34:31,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 57 seconds)
2025-05-13 16:39:27,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:39:31,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1108.71680 ± 522.479
2025-05-13 16:39:31,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [1135.9172, 1976.4111, 557.342, 1123.7532, 1796.4342, 1502.5453, 211.03586, 625.8943, 1118.8115, 1039.0222]
2025-05-13 16:39:31,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [215.0, 372.0, 112.0, 211.0, 342.0, 273.0, 44.0, 108.0, 210.0, 191.0]
2025-05-13 16:39:31,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 9 minutes, 30 seconds)
2025-05-13 16:44:27,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:44:31,428 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1170.82971 ± 701.302
2025-05-13 16:44:31,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [2550.1484, 488.0359, 555.1835, 2277.2854, 1140.9343, 528.2281, 1023.8179, 491.16748, 1315.465, 1338.0312]
2025-05-13 16:44:31,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [499.0, 95.0, 100.0, 441.0, 225.0, 100.0, 185.0, 102.0, 261.0, 258.0]
2025-05-13 16:44:31,443 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 51 seconds)
2025-05-13 16:49:11,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:49:15,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 1160.30798 ± 589.841
2025-05-13 16:49:15,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [888.3002, 1524.9309, 930.4329, 814.4004, 531.6674, 1279.5126, 1579.1174, 784.78253, 2615.596, 654.3399]
2025-05-13 16:49:15,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 294.0, 177.0, 157.0, 115.0, 268.0, 301.0, 153.0, 520.0, 132.0]
2025-05-13 16:49:15,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1251 [DEBUG]: Training session finished
