2025-05-13 09:06:32,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mda-highdim-mem4
2025-05-13 09:06:32,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mda-highdim-mem4
2025-05-13 09:06:32,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x149a390d1d90>}
2025-05-13 09:06:32,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:32,179 baseline-bpql-mda-noisy-humanoid:91 [WARNING]: args.assumed_delay != args.horizon: 4 != 32
2025-05-13 09:06:32,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1133 [INFO]: Creating new trainer
2025-05-13 09:06:32,197 baseline-bpql-mda-noisy-humanoid:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-13 09:06:32,197 baseline-bpql-mda-noisy-humanoid:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:32,206 baseline-bpql-mda-noisy-humanoid:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(17, 512, batch_first=True)
)
2025-05-13 09:06:33,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:33,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 1/100
2025-05-13 09:11:06,961 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:11:07,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 239.29424 ± 68.590
2025-05-13 09:11:07,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [279.01395, 300.24182, 304.48276, 299.98074, 141.10107, 164.42723, 176.02937, 276.4606, 305.65274, 145.55191]
2025-05-13 09:11:07,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [51.0, 55.0, 56.0, 55.0, 27.0, 32.0, 34.0, 51.0, 56.0, 28.0]
2025-05-13 09:11:07,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (239.29) for latency ExtremeSparseL4U32
2025-05-13 09:11:07,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 7 hours, 32 minutes, 44 seconds)
2025-05-13 09:15:53,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:15:54,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 435.60971 ± 139.611
2025-05-13 09:15:54,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [395.30652, 434.13678, 417.89603, 375.2682, 591.04816, 456.9392, 727.49927, 423.99228, 160.8794, 373.1314]
2025-05-13 09:15:54,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 97.0, 92.0, 71.0, 128.0, 102.0, 141.0, 85.0, 32.0, 83.0]
2025-05-13 09:15:54,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (435.61) for latency ExtremeSparseL4U32
2025-05-13 09:15:54,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 7 hours, 38 minutes, 29 seconds)
2025-05-13 09:20:42,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:20:44,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 465.57758 ± 82.590
2025-05-13 09:20:44,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [334.12448, 531.3286, 361.15952, 492.78754, 525.5989, 515.2808, 605.4182, 379.6208, 484.6621, 425.79504]
2025-05-13 09:20:44,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 114.0, 70.0, 92.0, 98.0, 102.0, 112.0, 80.0, 91.0, 83.0]
2025-05-13 09:20:44,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (465.58) for latency ExtremeSparseL4U32
2025-05-13 09:20:44,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 7 hours, 38 minutes, 26 seconds)
2025-05-13 09:25:28,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:25:29,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 314.68411 ± 41.933
2025-05-13 09:25:29,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [322.0056, 272.682, 291.32053, 319.34763, 323.27548, 409.68698, 345.1168, 262.06433, 332.3802, 268.9616]
2025-05-13 09:25:29,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 51.0, 55.0, 58.0, 59.0, 76.0, 65.0, 50.0, 62.0, 51.0]
2025-05-13 09:25:29,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 7 hours, 34 minutes, 23 seconds)
2025-05-13 09:30:15,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:30:16,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 414.78143 ± 92.839
2025-05-13 09:30:16,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [463.77258, 176.67293, 477.1935, 396.58884, 423.8694, 515.2269, 495.61996, 409.10925, 343.22128, 446.54025]
2025-05-13 09:30:16,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 34.0, 101.0, 74.0, 80.0, 99.0, 95.0, 82.0, 65.0, 86.0]
2025-05-13 09:30:16,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 7 hours, 30 minutes, 39 seconds)
2025-05-13 09:35:02,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:35:03,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 312.24371 ± 106.160
2025-05-13 09:35:03,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [293.07285, 199.91608, 372.26407, 280.74017, 146.33215, 293.30066, 347.8914, 568.55133, 298.74527, 321.62277]
2025-05-13 09:35:03,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 39.0, 79.0, 54.0, 28.0, 57.0, 69.0, 105.0, 56.0, 64.0]
2025-05-13 09:35:03,521 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 7 hours, 29 minutes, 53 seconds)
2025-05-13 09:39:46,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:39:47,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 295.73431 ± 109.853
2025-05-13 09:39:47,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [462.3584, 243.68053, 243.69307, 299.55035, 265.9728, 178.6108, 181.8887, 190.92052, 490.993, 399.67508]
2025-05-13 09:39:47,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 45.0, 45.0, 55.0, 49.0, 35.0, 35.0, 37.0, 99.0, 77.0]
2025-05-13 09:39:47,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 7 hours, 24 minutes, 9 seconds)
2025-05-13 09:44:31,743 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:44:32,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 241.44809 ± 40.441
2025-05-13 09:44:32,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [151.06369, 288.55606, 249.228, 221.50278, 281.1276, 218.56725, 283.6051, 210.89822, 242.04735, 267.885]
2025-05-13 09:44:32,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 54.0, 47.0, 43.0, 53.0, 43.0, 52.0, 41.0, 46.0, 50.0]
2025-05-13 09:44:32,454 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 7 hours, 18 minutes, 3 seconds)
2025-05-13 09:49:16,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:49:17,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 212.36960 ± 43.076
2025-05-13 09:49:17,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [255.17387, 173.4669, 177.48877, 284.30112, 227.46156, 224.2474, 206.55296, 253.71095, 134.60193, 186.69044]
2025-05-13 09:49:17,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [47.0, 34.0, 34.0, 55.0, 42.0, 42.0, 39.0, 50.0, 26.0, 36.0]
2025-05-13 09:49:17,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 7 hours, 13 minutes, 8 seconds)
2025-05-13 09:54:01,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:54:02,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 321.33160 ± 24.088
2025-05-13 09:54:02,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [315.2176, 326.634, 298.11105, 306.24927, 324.07474, 309.55417, 287.5579, 377.43182, 324.17795, 344.30737]
2025-05-13 09:54:02,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 61.0, 56.0, 57.0, 59.0, 57.0, 54.0, 70.0, 61.0, 63.0]
2025-05-13 09:54:02,187 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 7 hours, 7 minutes, 43 seconds)
2025-05-13 09:58:44,330 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:58:45,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 275.29926 ± 77.558
2025-05-13 09:58:45,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [300.19135, 296.13077, 166.36324, 319.41733, 300.727, 306.8577, 87.68634, 337.9247, 299.75027, 337.94388]
2025-05-13 09:58:45,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 54.0, 32.0, 58.0, 55.0, 57.0, 18.0, 64.0, 57.0, 62.0]
2025-05-13 09:58:45,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 7 hours, 1 minute, 44 seconds)
2025-05-13 10:03:29,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:03:30,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 384.10931 ± 23.588
2025-05-13 10:03:30,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [375.9756, 388.43494, 350.48032, 362.02014, 356.12466, 406.4476, 411.0504, 419.48785, 368.26733, 402.8041]
2025-05-13 10:03:30,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 72.0, 66.0, 67.0, 66.0, 76.0, 77.0, 79.0, 69.0, 77.0]
2025-05-13 10:03:30,596 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 57 minutes, 26 seconds)
2025-05-13 10:08:13,139 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:08:13,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 279.80634 ± 43.261
2025-05-13 10:08:13,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [331.2575, 276.36713, 178.44511, 324.61935, 256.40613, 285.3425, 290.13058, 298.31302, 241.5553, 315.62692]
2025-05-13 10:08:13,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 50.0, 35.0, 60.0, 49.0, 53.0, 56.0, 57.0, 46.0, 57.0]
2025-05-13 10:08:13,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 52 minutes, 13 seconds)
2025-05-13 10:12:56,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:12:57,784 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 305.11578 ± 117.695
2025-05-13 10:12:57,784 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [402.46692, 328.75986, 151.72342, 204.55792, 139.52348, 177.75691, 367.08322, 456.96466, 381.0537, 441.26752]
2025-05-13 10:12:57,784 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 64.0, 29.0, 40.0, 27.0, 34.0, 76.0, 88.0, 78.0, 81.0]
2025-05-13 10:12:57,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 47 minutes, 14 seconds)
2025-05-13 10:17:38,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:17:39,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 184.08974 ± 39.138
2025-05-13 10:17:39,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [243.62985, 229.24962, 207.89915, 121.40069, 165.20125, 185.13667, 129.96596, 209.27544, 198.37723, 150.76143]
2025-05-13 10:17:39,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 46.0, 42.0, 24.0, 33.0, 37.0, 26.0, 43.0, 40.0, 29.0]
2025-05-13 10:17:39,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 41 minutes, 27 seconds)
2025-05-13 10:22:05,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:22:06,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 225.46956 ± 51.236
2025-05-13 10:22:06,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [276.08527, 228.8906, 165.54158, 275.1358, 255.24615, 164.56285, 277.24402, 165.12126, 281.28098, 165.58723]
2025-05-13 10:22:06,221 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [51.0, 43.0, 32.0, 51.0, 48.0, 32.0, 51.0, 32.0, 53.0, 32.0]
2025-05-13 10:22:06,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 6 hours, 32 minutes, 18 seconds)
2025-05-13 10:26:25,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:26:26,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 246.78484 ± 100.752
2025-05-13 10:26:26,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [314.42883, 164.96982, 514.54724, 176.18222, 283.14648, 216.27524, 218.93816, 172.011, 228.40083, 178.94841]
2025-05-13 10:26:26,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 32.0, 98.0, 34.0, 52.0, 41.0, 43.0, 33.0, 43.0, 34.0]
2025-05-13 10:26:26,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 6 hours, 20 minutes, 37 seconds)
2025-05-13 10:30:46,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:30:47,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 308.73895 ± 45.374
2025-05-13 10:30:47,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [262.0196, 418.17755, 327.2681, 275.89603, 302.47653, 291.13782, 271.10278, 334.6724, 337.57498, 267.06372]
2025-05-13 10:30:47,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [50.0, 77.0, 59.0, 52.0, 55.0, 54.0, 50.0, 62.0, 62.0, 51.0]
2025-05-13 10:30:47,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 6 hours, 10 minutes, 1 second)
2025-05-13 10:35:04,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:35:05,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 116.89253 ± 12.315
2025-05-13 10:35:05,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [124.588486, 82.957054, 125.23761, 119.461075, 123.16381, 118.361336, 123.76017, 118.86903, 108.0112, 124.51544]
2025-05-13 10:35:05,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 17.0, 24.0, 23.0, 24.0, 23.0, 24.0, 23.0, 21.0, 24.0]
2025-05-13 10:35:05,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 58 minutes, 21 seconds)
2025-05-13 10:39:26,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:39:26,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 80.13226 ± 21.073
2025-05-13 10:39:26,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [143.35161, 73.027306, 73.11105, 72.98576, 73.1379, 73.173164, 73.047195, 73.01368, 73.14034, 73.33466]
2025-05-13 10:39:26,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 10:39:26,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 48 minutes, 42 seconds)
2025-05-13 10:43:58,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:43:58,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 72.76534 ± 0.058
2025-05-13 10:43:58,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.82475, 72.69692, 72.79647, 72.786224, 72.79552, 72.71067, 72.68458, 72.865364, 72.78436, 72.70841]
2025-05-13 10:43:58,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 10:43:58,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 45 minutes, 39 seconds)
2025-05-13 10:48:34,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:48:35,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 76.48423 ± 2.192
2025-05-13 10:48:35,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.42227, 78.008965, 77.85143, 77.99134, 77.93843, 73.06548, 72.93888, 77.8526, 77.763054, 78.00981]
2025-05-13 10:48:35,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 16.0, 16.0, 16.0, 16.0, 15.0, 15.0, 16.0, 16.0, 16.0]
2025-05-13 10:48:35,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 45 minutes, 30 seconds)
2025-05-13 10:53:15,654 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:53:15,901 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 88.18687 ± 5.739
2025-05-13 10:53:15,901 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [88.86995, 83.328995, 83.80166, 94.04511, 94.39156, 78.04606, 88.51269, 88.71614, 97.90661, 84.24986]
2025-05-13 10:53:15,901 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 17.0, 17.0, 19.0, 19.0, 16.0, 18.0, 18.0, 20.0, 17.0]
2025-05-13 10:53:15,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 46 minutes, 3 seconds)
2025-05-13 10:57:57,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:57:57,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 73.71383 ± 1.585
2025-05-13 10:57:57,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.014046, 73.20578, 73.10059, 78.459984, 73.15602, 73.24093, 73.278496, 73.142075, 73.36349, 73.17691]
2025-05-13 10:57:57,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 16.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 10:57:57,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 47 minutes, 41 seconds)
2025-05-13 11:02:11,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:02:12,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 105.65068 ± 21.302
2025-05-13 11:02:12,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [90.12662, 146.05751, 139.86464, 106.746895, 82.86665, 118.145836, 102.00488, 83.20118, 96.602745, 90.89001]
2025-05-13 11:02:12,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 28.0, 28.0, 21.0, 17.0, 23.0, 20.0, 17.0, 19.0, 18.0]
2025-05-13 11:02:12,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 41 minutes, 19 seconds)
2025-05-13 11:06:25,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:06:25,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 96.34666 ± 15.436
2025-05-13 11:06:25,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [100.793175, 90.13798, 106.54417, 121.76772, 83.76302, 67.75631, 84.79512, 116.399864, 90.17461, 101.33465]
2025-05-13 11:06:25,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 18.0, 21.0, 24.0, 17.0, 14.0, 17.0, 23.0, 18.0, 20.0]
2025-05-13 11:06:25,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 32 minutes, 10 seconds)
2025-05-13 11:10:40,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:10:40,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 109.13069 ± 11.064
2025-05-13 11:10:40,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [95.55809, 100.1494, 107.36592, 123.6173, 123.622345, 124.142136, 100.548294, 115.38528, 95.57578, 105.342285]
2025-05-13 11:10:40,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 20.0, 21.0, 24.0, 24.0, 24.0, 20.0, 23.0, 19.0, 21.0]
2025-05-13 11:10:40,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 22 minutes, 34 seconds)
2025-05-13 11:15:15,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:15:16,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 84.62962 ± 2.730
2025-05-13 11:15:16,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [90.25271, 83.08336, 82.90974, 83.7365, 83.18089, 83.82182, 89.86271, 83.06588, 83.35248, 83.03012]
2025-05-13 11:15:16,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 17.0, 17.0, 17.0, 17.0, 17.0, 18.0, 17.0, 17.0, 17.0]
2025-05-13 11:15:16,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 16 minutes, 51 seconds)
2025-05-13 11:19:49,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:19:49,700 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 90.71045 ± 3.262
2025-05-13 11:19:49,700 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [89.64076, 89.89794, 94.717476, 89.65505, 89.65054, 90.60779, 83.7562, 95.090805, 94.79755, 89.29032]
2025-05-13 11:19:49,700 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 19.0, 18.0, 18.0, 18.0, 17.0, 19.0, 19.0, 18.0]
2025-05-13 11:19:49,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 10 minutes, 32 seconds)
2025-05-13 11:24:24,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:24:24,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 134.53867 ± 24.516
2025-05-13 11:24:24,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [122.49605, 139.69263, 111.644775, 135.89604, 123.67492, 124.25287, 139.05553, 112.06032, 202.09224, 134.52142]
2025-05-13 11:24:24,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 27.0, 22.0, 26.0, 24.0, 24.0, 27.0, 22.0, 39.0, 26.0]
2025-05-13 11:24:25,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 11 minutes)
2025-05-13 11:28:58,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:28:58,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 116.18083 ± 31.254
2025-05-13 11:28:58,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [90.71715, 85.08229, 116.03561, 115.80068, 116.0345, 95.68148, 111.472595, 203.21222, 121.79575, 105.976135]
2025-05-13 11:28:58,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 17.0, 23.0, 23.0, 23.0, 19.0, 22.0, 39.0, 24.0, 21.0]
2025-05-13 11:28:59,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 11 minutes, 17 seconds)
2025-05-13 11:33:32,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:33:32,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 77.68579 ± 0.118
2025-05-13 11:33:32,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [77.54529, 77.888145, 77.72484, 77.601006, 77.69441, 77.58027, 77.57668, 77.68341, 77.66414, 77.89975]
2025-05-13 11:33:32,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0]
2025-05-13 11:33:32,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 5 hours, 10 minutes, 54 seconds)
2025-05-13 11:38:03,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:38:04,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 92.57393 ± 4.380
2025-05-13 11:38:04,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [89.452354, 90.083176, 90.11477, 89.80554, 89.77997, 89.54581, 90.20101, 100.59947, 100.77794, 95.37922]
2025-05-13 11:38:04,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 20.0, 20.0, 19.0]
2025-05-13 11:38:04,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 5 hours, 5 minutes, 30 seconds)
2025-05-13 11:42:35,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:42:35,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 86.56090 ± 24.766
2025-05-13 11:42:35,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [78.007286, 78.096436, 78.15777, 160.84592, 78.173416, 78.19169, 78.25247, 78.18973, 77.979454, 79.714874]
2025-05-13 11:42:35,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 16.0, 31.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0]
2025-05-13 11:42:35,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 5 hours, 34 seconds)
2025-05-13 11:47:08,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:47:08,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 103.68982 ± 37.040
2025-05-13 11:47:08,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [94.53649, 89.49569, 94.77701, 84.76621, 90.00424, 89.74214, 94.98271, 214.40436, 94.59833, 89.59106]
2025-05-13 11:47:08,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 19.0, 17.0, 18.0, 18.0, 19.0, 42.0, 19.0, 18.0]
2025-05-13 11:47:08,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 55 minutes, 27 seconds)
2025-05-13 11:51:41,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:51:41,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 104.63643 ± 13.436
2025-05-13 11:51:41,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [95.92791, 117.19151, 116.2457, 84.42175, 90.46143, 116.61471, 100.634254, 101.09164, 95.30236, 128.47302]
2025-05-13 11:51:41,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 23.0, 23.0, 17.0, 18.0, 23.0, 20.0, 20.0, 19.0, 25.0]
2025-05-13 11:51:41,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 50 minutes, 38 seconds)
2025-05-13 11:56:14,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:56:14,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 85.14632 ± 15.639
2025-05-13 11:56:14,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [84.76473, 78.171234, 78.07009, 128.51183, 77.93565, 96.3159, 78.17854, 73.128685, 78.2519, 78.13467]
2025-05-13 11:56:14,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 16.0, 16.0, 25.0, 16.0, 19.0, 16.0, 15.0, 16.0, 16.0]
2025-05-13 11:56:14,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 46 minutes, 2 seconds)
2025-05-13 12:00:45,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:00:45,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 77.52755 ± 1.482
2025-05-13 12:00:45,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [77.8869, 73.108154, 78.165146, 78.23604, 77.89366, 78.14983, 78.06874, 77.78783, 77.79648, 78.18273]
2025-05-13 12:00:45,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 15.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0]
2025-05-13 12:00:45,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 41 minutes, 26 seconds)
2025-05-13 12:05:19,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:05:19,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 77.96509 ± 4.229
2025-05-13 12:05:19,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.005905, 77.81327, 77.71058, 77.73245, 77.844574, 73.0146, 77.64873, 77.83047, 89.34156, 77.70876]
2025-05-13 12:05:19,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 16.0, 16.0, 16.0, 16.0, 15.0, 16.0, 16.0, 18.0, 16.0]
2025-05-13 12:05:19,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 37 minutes, 18 seconds)
2025-05-13 12:09:55,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:09:55,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 86.19653 ± 4.780
2025-05-13 12:09:55,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [90.5656, 89.86513, 82.95439, 90.52203, 83.50382, 82.96812, 77.8688, 94.77915, 84.65252, 84.28571]
2025-05-13 12:09:55,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 17.0, 18.0, 17.0, 17.0, 16.0, 19.0, 17.0, 17.0]
2025-05-13 12:09:55,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 33 minutes, 19 seconds)
2025-05-13 12:14:30,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:14:30,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 95.40536 ± 12.054
2025-05-13 12:14:30,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [113.12426, 123.16423, 90.37729, 84.69303, 90.310265, 96.67844, 84.190636, 89.40591, 91.852745, 90.2568]
2025-05-13 12:14:30,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 24.0, 18.0, 17.0, 18.0, 19.0, 17.0, 18.0, 18.0, 18.0]
2025-05-13 12:14:30,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 29 minutes, 17 seconds)
2025-05-13 12:19:05,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:19:06,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 102.77931 ± 8.643
2025-05-13 12:19:06,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [90.0453, 107.45228, 107.17054, 107.9552, 101.30124, 101.45419, 119.41405, 95.39475, 90.14052, 107.465]
2025-05-13 12:19:06,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 21.0, 21.0, 21.0, 20.0, 20.0, 23.0, 19.0, 18.0, 21.0]
2025-05-13 12:19:06,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 25 minutes, 8 seconds)
2025-05-13 12:23:44,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:23:45,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 221.21103 ± 57.108
2025-05-13 12:23:45,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [290.64044, 150.0967, 154.43877, 261.04764, 219.54189, 233.17682, 167.05006, 298.74414, 280.27814, 157.09607]
2025-05-13 12:23:45,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 29.0, 30.0, 49.0, 42.0, 44.0, 33.0, 57.0, 54.0, 30.0]
2025-05-13 12:23:45,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 22 minutes, 2 seconds)
2025-05-13 12:28:23,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:28:23,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 164.78888 ± 72.738
2025-05-13 12:28:23,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [157.50334, 150.17221, 84.2541, 178.9583, 347.72003, 234.80943, 145.83015, 121.16125, 106.22405, 121.25593]
2025-05-13 12:28:23,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 29.0, 17.0, 35.0, 65.0, 46.0, 29.0, 24.0, 21.0, 24.0]
2025-05-13 12:28:23,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 18 minutes, 19 seconds)
2025-05-13 12:32:47,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:32:48,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 111.95860 ± 18.946
2025-05-13 12:32:48,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [121.43952, 110.56644, 148.61914, 125.66665, 109.46898, 78.816895, 120.909195, 110.232704, 84.15684, 109.70954]
2025-05-13 12:32:48,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 22.0, 30.0, 25.0, 22.0, 16.0, 24.0, 22.0, 17.0, 22.0]
2025-05-13 12:32:48,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 4 hours, 11 minutes, 39 seconds)
2025-05-13 12:37:00,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:37:00,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 74.85191 ± 2.425
2025-05-13 12:37:00,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.79055, 72.89808, 77.87026, 72.932755, 77.642, 72.849915, 78.06838, 77.68991, 73.00443, 72.77276]
2025-05-13 12:37:00,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 16.0, 15.0, 16.0, 15.0, 16.0, 16.0, 15.0, 15.0]
2025-05-13 12:37:00,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 4 hours, 2 minutes, 59 seconds)
2025-05-13 12:41:12,468 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:41:12,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 73.95662 ± 1.986
2025-05-13 12:41:12,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.09322, 72.971924, 77.96361, 72.656105, 72.91804, 73.09784, 77.87711, 72.91211, 73.0349, 73.041306]
2025-05-13 12:41:12,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 16.0, 15.0, 15.0, 15.0, 16.0, 15.0, 15.0, 15.0]
2025-05-13 12:41:12,680 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 54 minutes, 21 seconds)
2025-05-13 12:45:23,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:45:24,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 73.03728 ± 0.113
2025-05-13 12:45:24,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.077255, 72.95504, 73.104744, 73.22339, 73.02961, 73.14342, 72.78018, 73.04164, 73.0064, 73.01116]
2025-05-13 12:45:24,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 12:45:24,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 45 minutes, 10 seconds)
2025-05-13 12:49:33,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:49:33,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 73.06373 ± 0.128
2025-05-13 12:49:33,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.98509, 72.89147, 73.23891, 73.250755, 72.94557, 73.058266, 72.95083, 73.199005, 73.153755, 72.96363]
2025-05-13 12:49:33,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 12:49:33,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 35 minutes, 50 seconds)
2025-05-13 12:53:45,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:53:46,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 93.28584 ± 4.813
2025-05-13 12:53:46,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [95.31685, 100.85512, 95.481445, 90.07473, 85.13437, 100.61423, 90.50218, 95.17736, 89.95093, 89.75117]
2025-05-13 12:53:46,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 20.0, 19.0, 18.0, 17.0, 20.0, 18.0, 19.0, 18.0, 18.0]
2025-05-13 12:53:46,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 29 minutes, 41 seconds)
2025-05-13 12:57:56,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:57:56,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 72.78861 ± 0.129
2025-05-13 12:57:56,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.919586, 72.77393, 73.083145, 72.66481, 72.7643, 72.7845, 72.67633, 72.864, 72.71835, 72.63721]
2025-05-13 12:57:56,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 12:57:56,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 25 minutes, 12 seconds)
2025-05-13 13:02:07,318 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:02:07,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 82.91138 ± 30.188
2025-05-13 13:02:07,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.886345, 72.95371, 173.4732, 72.8298, 72.71969, 72.70288, 72.88524, 72.97919, 73.10272, 72.58099]
2025-05-13 13:02:07,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 34.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 13:02:07,561 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 20 minutes, 46 seconds)
2025-05-13 13:06:17,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:06:17,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 80.66066 ± 22.926
2025-05-13 13:06:17,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.165215, 149.4373, 73.07428, 72.807045, 72.85071, 72.94805, 72.92826, 73.2156, 73.16768, 73.01243]
2025-05-13 13:06:17,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 29.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 13:06:17,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 16 minutes, 19 seconds)
2025-05-13 13:10:26,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:10:26,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 72.92361 ± 0.152
2025-05-13 13:10:26,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.08906, 72.963715, 72.65512, 72.94125, 72.96876, 72.932945, 72.90534, 72.98467, 73.14432, 72.65093]
2025-05-13 13:10:26,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 13:10:26,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 12 minutes, 7 seconds)
2025-05-13 13:14:34,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:14:35,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 72.96239 ± 0.119
2025-05-13 13:14:35,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.99506, 73.05111, 73.09474, 73.08246, 72.9846, 72.83804, 72.97297, 72.95275, 72.67075, 72.98145]
2025-05-13 13:14:35,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 13:14:35,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 7 minutes, 20 seconds)
2025-05-13 13:18:46,692 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:18:46,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 102.84676 ± 40.836
2025-05-13 13:18:46,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [85.093605, 78.026474, 77.7796, 78.001396, 184.7463, 72.94402, 72.97648, 77.67509, 130.51205, 170.71255]
2025-05-13 13:18:46,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 16.0, 16.0, 16.0, 36.0, 15.0, 15.0, 16.0, 25.0, 33.0]
2025-05-13 13:18:46,985 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 3 minutes, 20 seconds)
2025-05-13 13:23:00,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:23:00,721 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 80.84066 ± 3.043
2025-05-13 13:23:00,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [83.902405, 78.18788, 78.016716, 84.60314, 78.4685, 85.122, 84.429405, 78.19172, 77.915146, 79.569695]
2025-05-13 13:23:00,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 16.0, 16.0, 17.0, 16.0, 17.0, 17.0, 16.0, 16.0, 16.0]
2025-05-13 13:23:00,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 59 minutes, 37 seconds)
2025-05-13 13:27:10,796 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:27:11,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 92.05047 ± 28.342
2025-05-13 13:27:11,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [151.26883, 146.1058, 78.15844, 77.79105, 77.86615, 77.65115, 77.82184, 77.81748, 77.95981, 78.06417]
2025-05-13 13:27:11,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 28.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0]
2025-05-13 13:27:11,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 55 minutes, 31 seconds)
2025-05-13 13:31:20,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:31:20,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 81.92558 ± 23.645
2025-05-13 13:31:20,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.02131, 73.10909, 73.04674, 73.10194, 73.08354, 77.85256, 72.78593, 72.742714, 152.62239, 77.88973]
2025-05-13 13:31:20,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 16.0, 15.0, 15.0, 30.0, 16.0]
2025-05-13 13:31:20,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 51 minutes, 27 seconds)
2025-05-13 13:35:29,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:35:29,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 72.85195 ± 0.078
2025-05-13 13:35:29,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.87739, 72.9191, 72.917656, 72.93899, 72.75433, 72.80789, 72.92546, 72.82193, 72.86693, 72.68985]
2025-05-13 13:35:29,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 13:35:30,000 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 47 minutes, 19 seconds)
2025-05-13 13:39:38,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:39:39,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 72.87623 ± 0.113
2025-05-13 13:39:39,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.8028, 72.91501, 73.03621, 72.90112, 72.786766, 72.74403, 73.109314, 72.82321, 72.87496, 72.76885]
2025-05-13 13:39:39,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 13:39:39,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 42 minutes, 46 seconds)
2025-05-13 13:43:47,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:43:47,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 75.56374 ± 3.923
2025-05-13 13:43:47,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.78226, 77.94994, 85.29624, 77.72354, 73.06529, 72.91716, 72.73346, 72.741295, 77.6212, 72.80695]
2025-05-13 13:43:47,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 16.0, 17.0, 16.0, 15.0, 15.0, 15.0, 15.0, 16.0, 15.0]
2025-05-13 13:43:47,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 37 minutes, 53 seconds)
2025-05-13 13:47:56,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:47:56,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 84.53863 ± 26.726
2025-05-13 13:47:56,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.73595, 160.0141, 103.041824, 72.85308, 72.79208, 72.80036, 72.719284, 72.940544, 72.708305, 72.78074]
2025-05-13 13:47:56,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 31.0, 20.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 13:47:56,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 33 minutes, 37 seconds)
2025-05-13 13:52:05,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:52:05,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 73.04153 ± 0.128
2025-05-13 13:52:05,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.95497, 73.15577, 73.20302, 73.25504, 72.887985, 72.85159, 72.9504, 73.02036, 73.04773, 73.08845]
2025-05-13 13:52:05,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 13:52:05,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 29 minutes, 25 seconds)
2025-05-13 13:56:14,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:56:14,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 73.96725 ± 1.899
2025-05-13 13:56:14,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [77.76506, 72.92471, 72.78107, 73.0828, 77.75424, 73.20574, 73.06156, 73.01154, 73.069786, 73.01607]
2025-05-13 13:56:14,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 15.0, 15.0, 15.0, 16.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 13:56:14,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 25 minutes, 12 seconds)
2025-05-13 14:00:25,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:00:25,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 107.26146 ± 43.023
2025-05-13 14:00:25,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [171.85341, 135.19029, 72.9013, 72.885506, 72.81456, 73.238106, 156.7107, 170.98384, 72.9342, 73.10277]
2025-05-13 14:00:25,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 26.0, 15.0, 15.0, 15.0, 15.0, 30.0, 33.0, 15.0, 15.0]
2025-05-13 14:00:25,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 21 minutes, 15 seconds)
2025-05-13 14:04:53,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:04:53,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 79.50553 ± 18.287
2025-05-13 14:04:53,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [134.21132, 73.03098, 73.01537, 72.94551, 77.53255, 72.81432, 72.88563, 72.82882, 72.75292, 73.03787]
2025-05-13 14:04:53,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 15.0, 15.0, 15.0, 16.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 14:04:53,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 19 minutes, 15 seconds)
2025-05-13 14:09:23,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:09:23,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 73.96686 ± 1.891
2025-05-13 14:09:23,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.06208, 72.98924, 73.183495, 72.83467, 73.06122, 72.95306, 73.13338, 72.961044, 77.75561, 77.73474]
2025-05-13 14:09:23,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 16.0, 16.0]
2025-05-13 14:09:23,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 17 minutes, 15 seconds)
2025-05-13 14:13:53,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:13:53,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 75.96214 ± 2.402
2025-05-13 14:13:53,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.11197, 72.96984, 73.09488, 77.84576, 77.93423, 77.866714, 78.03931, 72.91208, 77.88579, 77.96087]
2025-05-13 14:13:53,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 16.0, 16.0, 16.0, 16.0, 15.0, 16.0, 16.0]
2025-05-13 14:13:53,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 15 minutes, 6 seconds)
2025-05-13 14:18:23,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:18:23,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 75.50437 ± 2.506
2025-05-13 14:18:23,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [78.02335, 72.87498, 72.79878, 77.97388, 78.04924, 73.273926, 73.01181, 78.021996, 73.04444, 77.97128]
2025-05-13 14:18:23,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 15.0, 15.0, 16.0, 16.0, 15.0, 15.0, 16.0, 15.0, 16.0]
2025-05-13 14:18:23,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 12 minutes, 52 seconds)
2025-05-13 14:22:53,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:22:53,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 88.62199 ± 37.410
2025-05-13 14:22:53,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [77.73486, 77.795525, 200.6673, 77.76765, 72.91709, 72.974976, 77.90697, 77.62202, 77.80795, 73.02559]
2025-05-13 14:22:53,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 39.0, 16.0, 15.0, 15.0, 16.0, 16.0, 16.0, 15.0]
2025-05-13 14:22:53,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 10 minutes, 20 seconds)
2025-05-13 14:27:24,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:27:25,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 76.45639 ± 2.182
2025-05-13 14:27:25,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [77.78455, 78.144585, 73.19721, 77.90088, 73.20049, 77.73337, 77.96315, 77.93892, 72.98827, 77.71248]
2025-05-13 14:27:25,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 15.0, 16.0, 15.0, 16.0, 16.0, 16.0, 15.0, 16.0]
2025-05-13 14:27:25,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 6 minutes, 9 seconds)
2025-05-13 14:31:54,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:31:54,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 76.52787 ± 2.194
2025-05-13 14:31:54,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [77.80894, 73.31715, 78.05791, 77.74438, 73.18493, 78.12677, 77.88308, 77.99256, 73.04521, 78.11776]
2025-05-13 14:31:54,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 15.0, 16.0, 16.0, 15.0, 16.0, 16.0, 16.0, 15.0, 16.0]
2025-05-13 14:31:54,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 1 minute, 37 seconds)
2025-05-13 14:36:25,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:36:25,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 77.96488 ± 0.123
2025-05-13 14:36:25,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [77.77834, 77.79328, 77.88085, 77.90369, 78.13672, 77.98989, 78.146904, 77.93683, 78.01281, 78.069435]
2025-05-13 14:36:25,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0]
2025-05-13 14:36:25,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 57 minutes, 13 seconds)
2025-05-13 14:40:55,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:40:55,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 93.59795 ± 33.067
2025-05-13 14:40:55,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.129585, 77.89365, 77.93945, 77.91955, 77.54549, 77.57274, 77.80972, 77.747025, 150.48805, 167.93427]
2025-05-13 14:40:55,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 29.0, 32.0]
2025-05-13 14:40:55,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 52 minutes, 40 seconds)
2025-05-13 14:45:23,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:45:24,237 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 87.74465 ± 29.627
2025-05-13 14:45:24,237 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [77.90723, 176.624, 78.04224, 77.763565, 77.83751, 77.73003, 77.96801, 77.80807, 77.85764, 77.908195]
2025-05-13 14:45:24,237 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 34.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0]
2025-05-13 14:45:24,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 48 minutes, 1 second)
2025-05-13 14:49:51,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:49:51,961 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 74.96936 ± 2.382
2025-05-13 14:49:51,961 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [77.88933, 72.857414, 72.93337, 77.64735, 72.8344, 73.09562, 73.24238, 73.20746, 77.953064, 78.03323]
2025-05-13 14:49:51,961 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 15.0, 15.0, 16.0, 15.0, 15.0, 15.0, 15.0, 16.0, 16.0]
2025-05-13 14:49:51,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 43 minutes, 15 seconds)
2025-05-13 14:54:19,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:54:19,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 89.93698 ± 30.061
2025-05-13 14:54:19,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [128.77933, 78.05004, 78.007095, 73.099236, 73.054214, 78.07516, 166.24042, 77.866356, 72.971985, 73.22596]
2025-05-13 14:54:19,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 16.0, 16.0, 15.0, 15.0, 16.0, 33.0, 16.0, 15.0, 15.0]
2025-05-13 14:54:19,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 38 minutes, 36 seconds)
2025-05-13 14:58:47,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:58:47,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 116.75076 ± 51.919
2025-05-13 14:58:47,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [181.14578, 78.120186, 77.66757, 72.77602, 73.13129, 72.89905, 186.7795, 166.1721, 185.81355, 73.00253]
2025-05-13 14:58:47,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 16.0, 16.0, 15.0, 15.0, 15.0, 36.0, 32.0, 37.0, 15.0]
2025-05-13 14:58:47,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 33 minutes, 55 seconds)
2025-05-13 15:03:15,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:03:15,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 74.83018 ± 2.322
2025-05-13 15:03:15,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.93621, 72.78548, 77.652504, 72.841034, 77.56049, 73.04807, 77.7203, 72.96184, 77.752426, 73.04343]
2025-05-13 15:03:15,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 16.0, 15.0, 16.0, 15.0, 16.0, 15.0, 16.0, 15.0]
2025-05-13 15:03:15,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 29 minutes, 20 seconds)
2025-05-13 15:07:43,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:07:43,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 79.86653 ± 18.786
2025-05-13 15:07:43,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.26716, 73.0108, 77.767136, 73.086624, 73.06971, 73.23338, 72.84861, 73.05061, 73.263466, 136.06783]
2025-05-13 15:07:43,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 16.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 27.0]
2025-05-13 15:07:43,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 24 minutes, 50 seconds)
2025-05-13 15:12:11,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:12:11,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 74.01619 ± 1.944
2025-05-13 15:12:11,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [77.70554, 72.90514, 73.2654, 78.08138, 73.02332, 72.86934, 72.945076, 73.19587, 73.031166, 73.13967]
2025-05-13 15:12:11,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 15.0, 15.0, 16.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 15:12:11,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 20 minutes, 22 seconds)
2025-05-13 15:16:38,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:16:39,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 73.44537 ± 1.529
2025-05-13 15:16:39,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.81178, 73.01447, 73.0662, 73.12856, 72.777985, 78.019844, 72.94958, 72.82959, 72.94174, 72.91393]
2025-05-13 15:16:39,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 16.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 15:16:39,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 15 minutes, 54 seconds)
2025-05-13 15:20:55,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:20:55,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 94.67127 ± 41.723
2025-05-13 15:20:55,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.40391, 73.25121, 78.11295, 77.85884, 77.95686, 77.840096, 73.21458, 208.4269, 133.54726, 73.1001]
2025-05-13 15:20:55,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 16.0, 16.0, 16.0, 16.0, 15.0, 40.0, 26.0, 15.0]
2025-05-13 15:20:55,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 10 minutes, 50 seconds)
2025-05-13 15:25:04,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:25:04,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 74.44968 ± 2.189
2025-05-13 15:25:04,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.98913, 73.01366, 72.94971, 77.906425, 72.954216, 72.98428, 77.647064, 73.10559, 73.12473, 77.822044]
2025-05-13 15:25:04,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 16.0, 15.0, 15.0, 16.0, 15.0, 15.0, 16.0]
2025-05-13 15:25:04,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 5 minutes, 26 seconds)
2025-05-13 15:29:11,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:29:11,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 74.08914 ± 1.993
2025-05-13 15:29:11,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.07618, 78.02286, 78.11012, 73.02951, 73.29456, 73.28802, 73.12656, 72.8926, 72.92465, 73.12639]
2025-05-13 15:29:11,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 16.0, 16.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 15:29:11,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 5 seconds)
2025-05-13 15:33:19,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:33:19,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 76.02791 ± 2.395
2025-05-13 15:33:19,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [78.025154, 77.91012, 73.21132, 77.866516, 72.96253, 73.03324, 78.22766, 77.76734, 73.18826, 78.08689]
2025-05-13 15:33:19,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 15.0, 16.0, 15.0, 15.0, 16.0, 16.0, 15.0, 16.0]
2025-05-13 15:33:19,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 54 minutes, 56 seconds)
2025-05-13 15:37:28,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:37:28,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 75.12994 ± 3.275
2025-05-13 15:37:28,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.25446, 78.07608, 78.01193, 78.16233, 77.933334, 73.319016, 73.22363, 73.12825, 78.12002, 68.07041]
2025-05-13 15:37:28,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 16.0, 16.0, 16.0, 16.0, 15.0, 15.0, 15.0, 16.0, 14.0]
2025-05-13 15:37:28,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 49 minutes, 59 seconds)
2025-05-13 15:41:36,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:41:37,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 89.80227 ± 28.829
2025-05-13 15:41:37,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.11104, 78.123474, 77.877914, 67.98142, 155.077, 78.13383, 78.15255, 78.09992, 73.322365, 138.14317]
2025-05-13 15:41:37,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 16.0, 16.0, 14.0, 31.0, 16.0, 16.0, 16.0, 15.0, 28.0]
2025-05-13 15:41:37,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 45 minutes, 30 seconds)
2025-05-13 15:45:45,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:45:45,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 83.41967 ± 27.944
2025-05-13 15:45:45,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.00658, 73.29417, 73.03509, 167.04347, 73.10283, 73.03584, 77.945564, 72.72966, 78.038795, 72.96465]
2025-05-13 15:45:45,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 32.0, 15.0, 15.0, 16.0, 15.0, 16.0, 15.0]
2025-05-13 15:45:45,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 41 minutes, 22 seconds)
2025-05-13 15:49:53,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:49:53,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 81.34836 ± 36.293
2025-05-13 15:49:53,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [190.0704, 68.35763, 68.02441, 68.15757, 73.192795, 68.309975, 68.14409, 68.10541, 73.0236, 68.09766]
2025-05-13 15:49:53,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 14.0, 14.0, 14.0, 15.0, 14.0, 14.0, 14.0, 15.0, 14.0]
2025-05-13 15:49:53,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 37 minutes, 16 seconds)
2025-05-13 15:54:01,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:54:01,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 77.11995 ± 1.898
2025-05-13 15:54:01,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.37675, 78.08853, 78.0547, 77.9461, 73.28246, 78.18946, 78.0478, 78.28835, 77.93574, 77.98954]
2025-05-13 15:54:01,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 16.0, 16.0, 16.0, 15.0, 16.0, 16.0, 16.0, 16.0, 16.0]
2025-05-13 15:54:01,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 33 minutes, 7 seconds)
2025-05-13 15:58:10,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:58:10,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 111.79539 ± 49.529
2025-05-13 15:58:10,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [188.41685, 67.799484, 154.94334, 155.56061, 185.70595, 73.341576, 73.005455, 73.17803, 73.027504, 72.97509]
2025-05-13 15:58:10,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 14.0, 30.0, 30.0, 36.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 15:58:10,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 28 minutes, 59 seconds)
2025-05-13 16:02:18,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:02:18,926 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 83.23586 ± 25.743
2025-05-13 16:02:18,926 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.27884, 77.89665, 68.22002, 159.91556, 77.82741, 73.08398, 73.08541, 73.03469, 78.121635, 77.89446]
2025-05-13 16:02:18,926 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 16.0, 14.0, 31.0, 16.0, 15.0, 15.0, 15.0, 16.0, 16.0]
2025-05-13 16:02:18,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 24 minutes, 50 seconds)
2025-05-13 16:06:26,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:06:26,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 77.27785 ± 1.768
2025-05-13 16:06:26,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [78.00573, 78.20946, 74.47835, 78.19885, 78.0214, 78.26361, 73.11775, 78.07615, 78.20085, 78.20638]
2025-05-13 16:06:26,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 15.0, 16.0, 16.0, 16.0]
2025-05-13 16:06:26,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 20 minutes, 41 seconds)
2025-05-13 16:10:35,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:10:35,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 73.67964 ± 1.461
2025-05-13 16:10:35,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.09055, 73.32697, 73.16861, 73.15792, 73.27515, 73.20206, 73.17169, 73.14625, 78.05898, 73.19832]
2025-05-13 16:10:35,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 16.0, 15.0]
2025-05-13 16:10:35,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 16 minutes, 33 seconds)
2025-05-13 16:14:42,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:14:43,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 73.15833 ± 0.121
2025-05-13 16:14:43,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.09676, 73.18377, 73.17984, 73.116776, 73.373665, 73.02814, 73.14395, 72.93591, 73.212875, 73.31153]
2025-05-13 16:14:43,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
2025-05-13 16:14:43,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 12 minutes, 24 seconds)
2025-05-13 16:18:50,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:18:50,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 81.44820 ± 25.181
2025-05-13 16:18:50,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.0478, 73.15129, 73.03408, 73.071884, 73.28648, 73.01474, 72.989494, 73.05981, 72.83677, 156.98953]
2025-05-13 16:18:50,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 30.0]
2025-05-13 16:18:50,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 15 seconds)
2025-05-13 16:22:58,392 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:22:58,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 84.12637 ± 32.921
2025-05-13 16:22:58,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [72.96724, 73.18534, 73.37305, 73.13809, 73.0899, 73.216774, 73.1399, 73.103584, 73.16137, 182.88849]
2025-05-13 16:22:58,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 35.0]
2025-05-13 16:22:58,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 7 seconds)
2025-05-13 16:27:06,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:27:07,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 74.77742 ± 4.941
2025-05-13 16:27:07,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [73.27604, 72.984215, 73.15813, 73.11474, 73.34332, 72.933044, 72.90986, 89.59322, 73.33142, 73.130165]
2025-05-13 16:27:07,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 18.0, 15.0, 15.0]
2025-05-13 16:27:07,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1251 [DEBUG]: Training session finished
