2025-05-13 09:06:33,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mda-highdim-mem32
2025-05-13 09:06:33,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-bpql-mda-highdim-mem32
2025-05-13 09:06:33,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1499d77cdd90>}
2025-05-13 09:06:33,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:33,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1133 [INFO]: Creating new trainer
2025-05-13 09:06:33,740 baseline-bpql-mda-noisy-humanoid:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-13 09:06:33,740 baseline-bpql-mda-noisy-humanoid:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:33,749 baseline-bpql-mda-noisy-humanoid:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(17, 512, batch_first=True)
)
2025-05-13 09:06:34,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:34,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 1/100
2025-05-13 09:11:20,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:11:21,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 167.20801 ± 21.997
2025-05-13 09:11:21,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [199.9518, 161.26166, 146.30797, 194.77763, 174.31465, 197.12373, 146.14722, 159.98825, 156.15642, 136.05075]
2025-05-13 09:11:21,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 31.0, 28.0, 38.0, 34.0, 38.0, 28.0, 31.0, 30.0, 26.0]
2025-05-13 09:11:21,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (167.21) for latency ExtremeSparseL4U32
2025-05-13 09:11:21,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 7 hours, 53 minutes)
2025-05-13 09:16:13,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:16:14,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 202.34245 ± 81.471
2025-05-13 09:16:14,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [150.97722, 318.21527, 170.57497, 176.62321, 145.16428, 208.1534, 151.43932, 396.13693, 165.97206, 140.16786]
2025-05-13 09:16:14,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 64.0, 33.0, 34.0, 28.0, 40.0, 29.0, 84.0, 32.0, 27.0]
2025-05-13 09:16:14,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (202.34) for latency ExtremeSparseL4U32
2025-05-13 09:16:14,016 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 7 hours, 53 minutes, 1 second)
2025-05-13 09:21:06,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:21:07,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 155.91940 ± 16.381
2025-05-13 09:21:07,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [156.69623, 181.06612, 146.0249, 161.6027, 149.62712, 175.90974, 145.4038, 161.74509, 119.76669, 161.35168]
2025-05-13 09:21:07,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 35.0, 28.0, 31.0, 29.0, 34.0, 28.0, 31.0, 23.0, 31.0]
2025-05-13 09:21:07,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 7 hours, 50 minutes, 6 seconds)
2025-05-13 09:26:00,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:26:01,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 264.53452 ± 148.818
2025-05-13 09:26:01,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [241.89273, 190.76463, 540.0536, 124.902916, 165.32556, 451.30554, 165.94508, 463.75455, 155.89874, 145.5019]
2025-05-13 09:26:01,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [47.0, 37.0, 106.0, 24.0, 32.0, 97.0, 32.0, 86.0, 30.0, 28.0]
2025-05-13 09:26:01,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (264.53) for latency ExtremeSparseL4U32
2025-05-13 09:26:01,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 7 hours, 46 minutes, 39 seconds)
2025-05-13 09:30:54,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:30:54,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 153.43994 ± 14.537
2025-05-13 09:30:54,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [140.93715, 150.372, 140.78746, 146.66641, 190.68582, 156.75551, 140.18562, 161.23753, 160.691, 146.08095]
2025-05-13 09:30:54,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 29.0, 27.0, 28.0, 37.0, 30.0, 27.0, 31.0, 31.0, 28.0]
2025-05-13 09:30:54,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 7 hours, 42 minutes, 22 seconds)
2025-05-13 09:35:48,396 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:35:49,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 201.28815 ± 99.075
2025-05-13 09:35:49,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [169.18631, 160.90984, 149.45291, 151.72241, 424.13184, 370.67218, 145.97397, 139.49866, 151.01869, 150.31465]
2025-05-13 09:35:49,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 31.0, 29.0, 29.0, 77.0, 75.0, 28.0, 27.0, 29.0, 29.0]
2025-05-13 09:35:49,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 7 hours, 39 minutes, 55 seconds)
2025-05-13 09:40:43,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:40:44,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 289.26517 ± 203.234
2025-05-13 09:40:44,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [566.9826, 156.34055, 157.19331, 471.893, 124.96296, 160.42918, 165.55722, 170.93616, 197.63472, 720.72186]
2025-05-13 09:40:44,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 30.0, 30.0, 89.0, 24.0, 31.0, 32.0, 33.0, 38.0, 147.0]
2025-05-13 09:40:44,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (289.27) for latency ExtremeSparseL4U32
2025-05-13 09:40:44,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 7 hours, 35 minutes, 53 seconds)
2025-05-13 09:45:38,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:45:39,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 228.40646 ± 121.790
2025-05-13 09:45:39,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [183.18987, 154.68898, 181.51553, 170.22504, 141.5117, 182.08029, 481.6081, 174.7895, 155.31148, 459.14432]
2025-05-13 09:45:39,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 30.0, 35.0, 33.0, 27.0, 36.0, 90.0, 34.0, 30.0, 85.0]
2025-05-13 09:45:39,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 7 hours, 31 minutes, 36 seconds)
2025-05-13 09:50:33,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:50:34,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 226.92769 ± 119.245
2025-05-13 09:50:34,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [170.33217, 209.827, 156.32349, 170.79808, 521.5405, 190.97554, 152.12009, 388.7075, 145.29727, 163.35518]
2025-05-13 09:50:34,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 40.0, 30.0, 33.0, 108.0, 37.0, 29.0, 77.0, 28.0, 31.0]
2025-05-13 09:50:34,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 7 hours, 26 minutes, 42 seconds)
2025-05-13 09:55:29,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:55:30,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 197.07672 ± 81.180
2025-05-13 09:55:30,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [196.70018, 156.4155, 183.53891, 150.80446, 431.11658, 154.80775, 210.92592, 160.48068, 190.45073, 135.5265]
2025-05-13 09:55:30,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 30.0, 35.0, 29.0, 88.0, 30.0, 41.0, 31.0, 37.0, 26.0]
2025-05-13 09:55:30,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 7 hours, 22 minutes, 47 seconds)
2025-05-13 10:00:22,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:00:23,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 185.68358 ± 75.047
2025-05-13 10:00:23,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [405.91144, 161.16049, 146.70438, 161.86136, 185.75531, 170.25027, 140.5371, 168.22137, 135.39738, 181.03687]
2025-05-13 10:00:23,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 31.0, 28.0, 31.0, 36.0, 33.0, 27.0, 32.0, 26.0, 35.0]
2025-05-13 10:00:23,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 7 hours, 17 minutes, 24 seconds)
2025-05-13 10:05:17,383 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:05:18,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 283.36838 ± 159.531
2025-05-13 10:05:18,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [537.3089, 410.82986, 130.39093, 167.1032, 151.51193, 545.09937, 161.04251, 150.23601, 194.73378, 385.4272]
2025-05-13 10:05:18,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 80.0, 25.0, 32.0, 29.0, 107.0, 31.0, 29.0, 37.0, 73.0]
2025-05-13 10:05:18,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 7 hours, 12 minutes, 23 seconds)
2025-05-13 10:10:14,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:10:15,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 240.79251 ± 159.244
2025-05-13 10:10:15,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [175.17958, 130.05696, 134.59889, 175.60059, 165.43396, 497.9676, 150.81767, 215.52945, 606.554, 156.18633]
2025-05-13 10:10:15,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 25.0, 26.0, 34.0, 32.0, 97.0, 29.0, 42.0, 113.0, 30.0]
2025-05-13 10:10:15,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 7 hours, 7 minutes, 59 seconds)
2025-05-13 10:15:08,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:15:09,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 171.34796 ± 21.783
2025-05-13 10:15:09,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [140.23924, 176.181, 146.50345, 213.8372, 156.22128, 176.31941, 184.49596, 175.36505, 193.2211, 151.09593]
2025-05-13 10:15:09,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 34.0, 28.0, 41.0, 30.0, 34.0, 36.0, 34.0, 37.0, 29.0]
2025-05-13 10:15:09,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 7 hours, 2 minutes, 55 seconds)
2025-05-13 10:20:03,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:20:04,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 161.67728 ± 18.600
2025-05-13 10:20:04,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [181.6168, 174.88213, 170.69235, 161.78377, 187.19589, 130.47649, 171.77884, 136.26334, 161.22644, 140.85667]
2025-05-13 10:20:04,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 34.0, 33.0, 31.0, 36.0, 25.0, 33.0, 26.0, 31.0, 27.0]
2025-05-13 10:20:04,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 57 minutes, 33 seconds)
2025-05-13 10:25:00,744 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:25:01,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 202.78156 ± 171.087
2025-05-13 10:25:01,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [146.14642, 140.58943, 147.1119, 134.75891, 155.9576, 135.80669, 166.39664, 140.24849, 145.46704, 715.3324]
2025-05-13 10:25:01,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 27.0, 28.0, 26.0, 30.0, 26.0, 32.0, 27.0, 28.0, 136.0]
2025-05-13 10:25:01,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 6 hours, 53 minutes, 50 seconds)
2025-05-13 10:29:55,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:29:57,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 275.44650 ± 138.848
2025-05-13 10:29:57,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [318.40256, 166.20491, 435.9974, 524.5805, 129.91862, 453.23007, 219.31537, 140.74455, 181.63237, 184.4388]
2025-05-13 10:29:57,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 32.0, 92.0, 102.0, 25.0, 96.0, 42.0, 27.0, 35.0, 36.0]
2025-05-13 10:29:57,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 6 hours, 49 minutes)
2025-05-13 10:34:52,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:34:53,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 188.48885 ± 74.506
2025-05-13 10:34:53,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [171.4384, 130.74855, 207.8863, 162.20738, 160.53937, 150.53358, 403.7379, 146.81044, 185.25552, 165.7311]
2025-05-13 10:34:53,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 25.0, 40.0, 31.0, 31.0, 29.0, 87.0, 28.0, 36.0, 32.0]
2025-05-13 10:34:53,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 6 hours, 43 minutes, 52 seconds)
2025-05-13 10:39:49,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:39:50,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 163.58907 ± 23.212
2025-05-13 10:39:50,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [134.83157, 160.16222, 151.36368, 140.85239, 152.13799, 167.13931, 223.41255, 162.35884, 175.22604, 168.40598]
2025-05-13 10:39:50,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 31.0, 29.0, 27.0, 29.0, 32.0, 43.0, 31.0, 34.0, 32.0]
2025-05-13 10:39:50,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 6 hours, 39 minutes, 46 seconds)
2025-05-13 10:44:45,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:44:46,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 246.09628 ± 121.144
2025-05-13 10:44:46,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [155.26204, 356.15665, 498.7439, 176.35034, 140.25754, 130.3518, 406.2144, 218.35678, 177.04189, 202.22766]
2025-05-13 10:44:46,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 68.0, 99.0, 34.0, 27.0, 25.0, 81.0, 42.0, 34.0, 39.0]
2025-05-13 10:44:46,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 6 hours, 35 minutes, 7 seconds)
2025-05-13 10:49:38,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:49:39,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 238.80701 ± 125.216
2025-05-13 10:49:39,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [558.4019, 374.90143, 220.4853, 151.1207, 124.976074, 188.28827, 175.92908, 170.95233, 175.83473, 247.18022]
2025-05-13 10:49:39,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 73.0, 43.0, 29.0, 24.0, 37.0, 34.0, 33.0, 34.0, 50.0]
2025-05-13 10:49:39,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 6 hours, 29 minutes, 15 seconds)
2025-05-13 10:54:34,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:54:35,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 160.44228 ± 20.713
2025-05-13 10:54:35,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [130.4937, 155.98895, 141.02174, 151.20724, 179.93103, 182.36464, 150.60532, 140.58888, 175.1089, 197.11226]
2025-05-13 10:54:35,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 30.0, 27.0, 29.0, 35.0, 35.0, 29.0, 27.0, 34.0, 38.0]
2025-05-13 10:54:35,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 6 hours, 24 minutes, 22 seconds)
2025-05-13 10:59:26,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:59:27,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 237.94131 ± 96.391
2025-05-13 10:59:27,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [200.79913, 171.9996, 173.8089, 141.19887, 154.71661, 372.76724, 370.02576, 217.26, 175.19708, 401.63998]
2025-05-13 10:59:27,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 33.0, 33.0, 27.0, 30.0, 71.0, 74.0, 43.0, 34.0, 77.0]
2025-05-13 10:59:27,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 6 hours, 18 minutes, 19 seconds)
2025-05-13 11:04:20,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:04:21,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 189.03728 ± 58.859
2025-05-13 11:04:21,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [144.7927, 175.976, 235.52118, 161.33095, 174.7447, 349.8378, 167.25125, 169.1518, 140.76369, 171.00262]
2025-05-13 11:04:21,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 34.0, 45.0, 31.0, 34.0, 70.0, 32.0, 33.0, 27.0, 33.0]
2025-05-13 11:04:21,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 6 hours, 12 minutes, 43 seconds)
2025-05-13 11:09:12,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:09:13,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 155.60982 ± 19.613
2025-05-13 11:09:13,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [184.39638, 160.78728, 144.29004, 159.52689, 169.66437, 140.12325, 114.44619, 180.73665, 146.4231, 155.704]
2025-05-13 11:09:13,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 31.0, 28.0, 31.0, 33.0, 27.0, 22.0, 35.0, 28.0, 30.0]
2025-05-13 11:09:13,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 6 hours, 6 minutes, 44 seconds)
2025-05-13 11:14:06,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:14:06,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 184.66452 ± 73.966
2025-05-13 11:14:06,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [135.26904, 135.24371, 124.82383, 166.14435, 169.93417, 186.78545, 396.8124, 161.97398, 175.87337, 193.78493]
2025-05-13 11:14:06,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 26.0, 24.0, 32.0, 33.0, 36.0, 83.0, 31.0, 34.0, 37.0]
2025-05-13 11:14:06,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 6 hours, 1 minute, 52 seconds)
2025-05-13 11:19:00,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:19:01,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 193.07541 ± 108.859
2025-05-13 11:19:01,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [151.48265, 162.84232, 161.25734, 159.08304, 156.35576, 151.35098, 140.83496, 518.8385, 157.2116, 171.49696]
2025-05-13 11:19:01,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 31.0, 31.0, 31.0, 30.0, 29.0, 27.0, 101.0, 30.0, 33.0]
2025-05-13 11:19:01,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 56 minutes, 45 seconds)
2025-05-13 11:23:53,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:23:54,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 228.44861 ± 116.171
2025-05-13 11:23:54,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [160.2375, 150.9456, 464.90073, 341.93115, 166.62782, 146.37146, 145.82133, 161.05478, 394.86734, 151.72833]
2025-05-13 11:23:54,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 29.0, 89.0, 72.0, 32.0, 28.0, 28.0, 31.0, 75.0, 29.0]
2025-05-13 11:23:54,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 52 minutes, 15 seconds)
2025-05-13 11:28:48,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:28:48,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 189.82767 ± 79.825
2025-05-13 11:28:48,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [169.9937, 129.96634, 156.33621, 196.99272, 141.34114, 423.31982, 171.18533, 174.5304, 160.07394, 174.53697]
2025-05-13 11:28:48,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 25.0, 30.0, 38.0, 27.0, 80.0, 33.0, 34.0, 31.0, 34.0]
2025-05-13 11:28:48,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 47 minutes, 19 seconds)
2025-05-13 11:33:41,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:33:42,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 290.26672 ± 180.521
2025-05-13 11:33:42,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [214.79591, 170.35236, 162.13936, 155.59708, 650.40265, 490.58667, 213.61021, 166.50421, 145.15495, 533.5236]
2025-05-13 11:33:42,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [41.0, 33.0, 31.0, 30.0, 142.0, 96.0, 42.0, 32.0, 28.0, 99.0]
2025-05-13 11:33:42,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (290.27) for latency ExtremeSparseL4U32
2025-05-13 11:33:42,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 42 minutes, 50 seconds)
2025-05-13 11:38:34,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:38:36,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 275.33026 ± 206.100
2025-05-13 11:38:36,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [170.416, 129.83342, 178.68349, 527.57544, 155.35107, 759.2169, 140.31506, 166.92749, 119.70173, 405.28235]
2025-05-13 11:38:36,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 25.0, 35.0, 103.0, 30.0, 161.0, 27.0, 32.0, 23.0, 78.0]
2025-05-13 11:38:36,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 37 minutes, 56 seconds)
2025-05-13 11:43:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:43:33,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 243.30679 ± 126.388
2025-05-13 11:43:33,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [172.6077, 161.754, 175.36623, 135.45901, 176.4649, 480.6712, 454.90945, 353.8206, 176.80296, 145.21156]
2025-05-13 11:43:33,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 31.0, 34.0, 26.0, 34.0, 92.0, 86.0, 70.0, 34.0, 28.0]
2025-05-13 11:43:33,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 5 hours, 33 minutes, 34 seconds)
2025-05-13 11:48:24,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:48:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 222.28674 ± 153.172
2025-05-13 11:48:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [160.04932, 225.88797, 155.46608, 166.18872, 150.98248, 184.21805, 145.86028, 213.61455, 145.75415, 674.846]
2025-05-13 11:48:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 43.0, 30.0, 32.0, 29.0, 36.0, 28.0, 41.0, 28.0, 141.0]
2025-05-13 11:48:25,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 5 hours, 28 minutes, 20 seconds)
2025-05-13 11:53:17,569 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:53:18,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 190.46364 ± 107.795
2025-05-13 11:53:18,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [166.73474, 130.24637, 161.58804, 141.09601, 124.76039, 160.04887, 185.66803, 154.4187, 509.48932, 170.58578]
2025-05-13 11:53:18,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 25.0, 31.0, 27.0, 24.0, 31.0, 36.0, 30.0, 101.0, 33.0]
2025-05-13 11:53:18,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 5 hours, 23 minutes, 16 seconds)
2025-05-13 11:58:11,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:58:12,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 182.50888 ± 66.763
2025-05-13 11:58:12,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [176.25282, 156.0114, 139.85376, 375.78772, 146.46333, 130.43256, 171.3955, 160.32736, 187.65732, 180.90703]
2025-05-13 11:58:12,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 30.0, 27.0, 72.0, 28.0, 25.0, 33.0, 31.0, 36.0, 35.0]
2025-05-13 11:58:12,033 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 5 hours, 18 minutes, 21 seconds)
2025-05-13 12:03:04,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:03:05,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 227.45120 ± 108.525
2025-05-13 12:03:05,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [160.51956, 130.06721, 401.70242, 394.25565, 378.60083, 154.9787, 175.54964, 181.6237, 160.88364, 136.33061]
2025-05-13 12:03:05,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 25.0, 90.0, 73.0, 69.0, 30.0, 34.0, 35.0, 31.0, 26.0]
2025-05-13 12:03:05,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 5 hours, 13 minutes, 30 seconds)
2025-05-13 12:07:59,779 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:08:01,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 316.18408 ± 183.406
2025-05-13 12:08:01,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [160.31407, 160.08563, 196.62207, 162.15706, 179.63284, 451.6095, 435.4293, 620.10175, 182.58873, 613.2999]
2025-05-13 12:08:01,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 31.0, 38.0, 31.0, 35.0, 90.0, 80.0, 114.0, 35.0, 121.0]
2025-05-13 12:08:01,228 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (316.18) for latency ExtremeSparseL4U32
2025-05-13 12:08:01,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 5 hours, 8 minutes, 17 seconds)
2025-05-13 12:12:55,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:12:56,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 187.53789 ± 68.690
2025-05-13 12:12:56,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [139.86395, 199.58952, 171.94574, 380.15585, 205.09413, 172.07272, 119.730545, 170.61047, 169.59332, 146.72263]
2025-05-13 12:12:56,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 39.0, 33.0, 70.0, 40.0, 33.0, 23.0, 33.0, 33.0, 28.0]
2025-05-13 12:12:56,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 5 hours, 3 minutes, 59 seconds)
2025-05-13 12:17:48,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:17:49,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 224.24849 ± 92.832
2025-05-13 12:17:49,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [160.39456, 140.69913, 151.28915, 337.08664, 184.93837, 189.3999, 390.7312, 169.96814, 362.01514, 155.96262]
2025-05-13 12:17:49,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 27.0, 29.0, 64.0, 36.0, 37.0, 80.0, 33.0, 68.0, 30.0]
2025-05-13 12:17:49,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 59 minutes, 6 seconds)
2025-05-13 12:22:43,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:22:44,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 227.61064 ± 128.085
2025-05-13 12:22:44,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [186.88441, 450.28198, 151.67166, 163.46542, 146.33878, 510.63855, 177.82738, 146.00522, 191.36533, 151.62802]
2025-05-13 12:22:44,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 83.0, 29.0, 32.0, 28.0, 96.0, 34.0, 28.0, 37.0, 29.0]
2025-05-13 12:22:44,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 54 minutes, 28 seconds)
2025-05-13 12:27:37,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:27:38,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 174.88226 ± 19.894
2025-05-13 12:27:38,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [145.7806, 197.35173, 208.22418, 155.80315, 168.21873, 159.52383, 199.94537, 180.93442, 161.7591, 171.28166]
2025-05-13 12:27:38,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 38.0, 40.0, 30.0, 32.0, 31.0, 38.0, 35.0, 31.0, 33.0]
2025-05-13 12:27:38,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 49 minutes, 32 seconds)
2025-05-13 12:32:31,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:32:32,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 172.87494 ± 29.737
2025-05-13 12:32:32,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [166.98006, 185.14333, 203.54471, 166.37813, 145.44089, 145.45801, 223.4912, 145.13446, 211.25848, 135.92007]
2025-05-13 12:32:32,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 36.0, 39.0, 32.0, 28.0, 28.0, 43.0, 28.0, 41.0, 26.0]
2025-05-13 12:32:32,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 44 minutes, 23 seconds)
2025-05-13 12:37:25,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:37:26,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 277.80865 ± 182.222
2025-05-13 12:37:26,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [144.52798, 207.4664, 156.65727, 154.6914, 130.28506, 560.55054, 177.51353, 156.8199, 472.18286, 617.39154]
2025-05-13 12:37:26,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 40.0, 30.0, 30.0, 25.0, 113.0, 34.0, 30.0, 98.0, 116.0]
2025-05-13 12:37:26,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 39 minutes, 24 seconds)
2025-05-13 12:42:19,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:42:20,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 239.82332 ± 134.216
2025-05-13 12:42:20,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [426.5035, 135.94456, 460.2513, 171.07103, 150.21765, 124.795364, 171.06233, 135.47101, 180.99483, 441.92154]
2025-05-13 12:42:20,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 26.0, 86.0, 33.0, 29.0, 24.0, 33.0, 26.0, 35.0, 92.0]
2025-05-13 12:42:20,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 34 minutes, 33 seconds)
2025-05-13 12:47:13,124 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:47:13,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 168.07661 ± 32.476
2025-05-13 12:47:13,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [140.62082, 174.89702, 183.98314, 146.00513, 208.7022, 166.01186, 174.70523, 119.18809, 230.50903, 136.14351]
2025-05-13 12:47:13,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 34.0, 36.0, 28.0, 40.0, 32.0, 34.0, 23.0, 44.0, 26.0]
2025-05-13 12:47:13,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 4 hours, 29 minutes, 24 seconds)
2025-05-13 12:52:07,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:52:08,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 201.69136 ± 109.244
2025-05-13 12:52:08,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [424.9671, 162.5592, 408.8407, 150.33293, 167.3867, 108.21998, 177.47462, 146.36813, 130.13696, 140.62718]
2025-05-13 12:52:08,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 31.0, 82.0, 29.0, 32.0, 21.0, 34.0, 28.0, 25.0, 27.0]
2025-05-13 12:52:08,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 4 hours, 24 minutes, 37 seconds)
2025-05-13 12:57:01,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:57:02,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 325.59149 ± 221.678
2025-05-13 12:57:02,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [564.5042, 686.1471, 119.89229, 140.6357, 557.7905, 563.9509, 151.48213, 180.0721, 124.74774, 166.69202]
2025-05-13 12:57:02,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 140.0, 23.0, 27.0, 103.0, 108.0, 29.0, 35.0, 24.0, 32.0]
2025-05-13 12:57:02,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (325.59) for latency ExtremeSparseL4U32
2025-05-13 12:57:02,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 4 hours, 19 minutes, 46 seconds)
2025-05-13 13:01:56,715 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:01:57,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 209.24568 ± 106.852
2025-05-13 13:01:57,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [140.28723, 179.45859, 141.64305, 426.1977, 146.99246, 129.89513, 180.39307, 171.92488, 161.0595, 414.6053]
2025-05-13 13:01:57,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 35.0, 27.0, 79.0, 28.0, 25.0, 35.0, 33.0, 31.0, 79.0]
2025-05-13 13:01:57,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 4 hours, 14 minutes, 58 seconds)
2025-05-13 13:06:48,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:06:50,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 300.48456 ± 189.083
2025-05-13 13:06:50,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [376.41028, 154.90317, 424.97583, 154.35887, 655.37524, 130.22888, 596.51697, 166.7116, 189.76863, 155.59595]
2025-05-13 13:06:50,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 30.0, 82.0, 30.0, 138.0, 25.0, 109.0, 32.0, 37.0, 30.0]
2025-05-13 13:06:50,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 4 hours, 9 minutes, 54 seconds)
2025-05-13 13:11:43,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:11:45,692 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 359.46481 ± 156.858
2025-05-13 13:11:45,692 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [426.12573, 515.2848, 160.4403, 437.72504, 582.8579, 166.66357, 150.77676, 483.75296, 445.88043, 225.14067]
2025-05-13 13:11:45,692 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 96.0, 31.0, 82.0, 117.0, 32.0, 29.0, 101.0, 88.0, 43.0]
2025-05-13 13:11:45,692 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1226 [INFO]: New best (359.46) for latency ExtremeSparseL4U32
2025-05-13 13:11:45,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 4 hours, 5 minutes, 18 seconds)
2025-05-13 13:16:37,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:16:39,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 290.52002 ± 217.293
2025-05-13 13:16:39,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [151.6145, 469.95035, 172.40906, 160.89632, 169.85081, 151.14302, 174.9145, 175.82079, 846.8064, 431.7944]
2025-05-13 13:16:39,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 87.0, 33.0, 31.0, 33.0, 29.0, 34.0, 34.0, 163.0, 87.0]
2025-05-13 13:16:39,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 4 hours, 15 seconds)
2025-05-13 13:21:30,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:21:31,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 189.23593 ± 66.683
2025-05-13 13:21:31,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [155.0282, 156.84108, 178.23933, 161.37206, 183.93437, 183.42682, 159.55736, 171.56442, 155.74335, 386.65237]
2025-05-13 13:21:31,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 30.0, 35.0, 31.0, 36.0, 35.0, 31.0, 33.0, 30.0, 73.0]
2025-05-13 13:21:31,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 54 minutes, 56 seconds)
2025-05-13 13:26:23,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:26:24,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 189.30417 ± 88.883
2025-05-13 13:26:24,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [181.03494, 164.82053, 162.37091, 146.82843, 150.91606, 162.38065, 174.88585, 146.09387, 149.87494, 453.8354]
2025-05-13 13:26:24,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 32.0, 31.0, 28.0, 29.0, 31.0, 34.0, 28.0, 29.0, 87.0]
2025-05-13 13:26:24,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 49 minutes, 47 seconds)
2025-05-13 13:31:17,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:31:18,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 305.95316 ± 174.157
2025-05-13 13:31:18,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [549.6828, 141.07634, 172.2267, 203.16826, 156.02344, 153.09846, 206.57738, 460.53485, 623.7798, 393.3635]
2025-05-13 13:31:18,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 27.0, 33.0, 39.0, 30.0, 30.0, 40.0, 93.0, 114.0, 78.0]
2025-05-13 13:31:18,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 45 minutes, 8 seconds)
2025-05-13 13:36:07,779 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:36:08,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 210.69377 ± 118.855
2025-05-13 13:36:08,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [180.41667, 521.4473, 146.34634, 343.56708, 130.40472, 175.20947, 140.15286, 149.87473, 145.34108, 174.17744]
2025-05-13 13:36:08,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 103.0, 28.0, 72.0, 25.0, 34.0, 27.0, 29.0, 28.0, 34.0]
2025-05-13 13:36:08,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 39 minutes, 27 seconds)
2025-05-13 13:41:01,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:41:02,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 261.10880 ± 140.949
2025-05-13 13:41:02,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [179.46951, 135.06094, 167.07826, 156.97829, 542.0966, 190.53067, 189.79385, 409.6203, 181.7352, 458.7242]
2025-05-13 13:41:02,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 26.0, 32.0, 30.0, 101.0, 37.0, 37.0, 75.0, 35.0, 91.0]
2025-05-13 13:41:02,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 34 minutes, 37 seconds)
2025-05-13 13:45:53,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:45:54,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 193.90974 ± 108.466
2025-05-13 13:45:54,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [190.51399, 140.81514, 162.1765, 161.58812, 140.60301, 156.19135, 171.02824, 515.88153, 134.61395, 165.68556]
2025-05-13 13:45:54,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 27.0, 31.0, 31.0, 27.0, 30.0, 33.0, 96.0, 26.0, 32.0]
2025-05-13 13:45:54,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 29 minutes, 48 seconds)
2025-05-13 13:50:46,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:50:47,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 244.26956 ± 164.505
2025-05-13 13:50:47,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [176.26282, 182.1438, 180.96579, 451.08374, 157.19173, 150.9939, 186.988, 141.49352, 664.44934, 151.12291]
2025-05-13 13:50:47,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 35.0, 35.0, 91.0, 30.0, 29.0, 36.0, 27.0, 141.0, 29.0]
2025-05-13 13:50:47,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 24 minutes, 50 seconds)
2025-05-13 13:55:40,746 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:55:41,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 200.60194 ± 125.353
2025-05-13 13:55:41,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [160.43236, 155.93745, 146.0405, 146.29314, 165.54214, 573.9757, 190.98846, 135.30133, 156.3959, 175.11252]
2025-05-13 13:55:41,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 30.0, 28.0, 28.0, 32.0, 120.0, 37.0, 26.0, 30.0, 34.0]
2025-05-13 13:55:41,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 3 hours, 19 minutes, 56 seconds)
2025-05-13 14:00:34,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:00:35,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 220.28276 ± 104.278
2025-05-13 14:00:35,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [175.88072, 155.22136, 139.47522, 166.75244, 211.87521, 216.17352, 466.3785, 130.62355, 171.57735, 368.86987]
2025-05-13 14:00:35,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 30.0, 27.0, 32.0, 41.0, 42.0, 88.0, 25.0, 33.0, 68.0]
2025-05-13 14:00:35,933 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 3 hours, 15 minutes, 37 seconds)
2025-05-13 14:05:26,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:05:27,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 279.68161 ± 134.196
2025-05-13 14:05:27,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [178.87714, 446.51144, 171.22166, 151.20262, 375.7623, 144.33583, 182.5106, 212.00038, 498.21396, 436.18024]
2025-05-13 14:05:27,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 84.0, 33.0, 29.0, 77.0, 28.0, 35.0, 41.0, 104.0, 83.0]
2025-05-13 14:05:27,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 3 hours, 10 minutes, 30 seconds)
2025-05-13 14:10:20,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:10:21,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 180.86212 ± 62.380
2025-05-13 14:10:21,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [144.99298, 175.7004, 364.37155, 175.63567, 161.32553, 169.16788, 145.99416, 140.43376, 160.23296, 170.7665]
2025-05-13 14:10:21,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 34.0, 77.0, 34.0, 31.0, 33.0, 28.0, 27.0, 31.0, 33.0]
2025-05-13 14:10:21,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 3 hours, 5 minutes, 45 seconds)
2025-05-13 14:15:14,198 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:15:14,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 158.64417 ± 16.191
2025-05-13 14:15:14,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [160.56569, 171.79922, 170.39038, 184.95102, 135.16924, 140.62909, 165.76324, 145.45071, 171.65247, 140.0705]
2025-05-13 14:15:14,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 33.0, 33.0, 36.0, 26.0, 27.0, 32.0, 28.0, 33.0, 27.0]
2025-05-13 14:15:14,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 3 hours, 58 seconds)
2025-05-13 14:20:06,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:20:08,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 263.92960 ± 160.991
2025-05-13 14:20:08,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [156.57257, 160.2491, 394.23532, 200.79266, 206.7598, 156.10712, 139.95062, 169.92589, 389.40765, 665.2951]
2025-05-13 14:20:08,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 31.0, 75.0, 39.0, 40.0, 30.0, 27.0, 33.0, 74.0, 130.0]
2025-05-13 14:20:08,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 55 minutes, 57 seconds)
2025-05-13 14:25:00,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:25:01,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 214.13412 ± 88.535
2025-05-13 14:25:01,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [156.46063, 190.61551, 150.64961, 204.42557, 163.9543, 385.9704, 175.02582, 161.90166, 161.32031, 391.0175]
2025-05-13 14:25:01,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 37.0, 29.0, 39.0, 32.0, 75.0, 34.0, 31.0, 31.0, 74.0]
2025-05-13 14:25:01,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 51 minutes)
2025-05-13 14:29:52,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:29:53,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 164.86021 ± 10.717
2025-05-13 14:29:53,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [181.3211, 180.55617, 154.8565, 155.82828, 164.64688, 150.51819, 175.57307, 155.39262, 160.51694, 169.39247]
2025-05-13 14:29:53,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 35.0, 30.0, 30.0, 32.0, 29.0, 34.0, 30.0, 31.0, 33.0]
2025-05-13 14:29:53,405 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 46 minutes, 5 seconds)
2025-05-13 14:34:46,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:34:47,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 181.06302 ± 77.159
2025-05-13 14:34:47,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [396.09476, 155.84573, 229.41223, 151.00827, 145.60109, 166.28033, 140.74597, 109.25482, 150.59308, 165.79408]
2025-05-13 14:34:47,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 30.0, 44.0, 29.0, 28.0, 32.0, 27.0, 21.0, 29.0, 32.0]
2025-05-13 14:34:47,198 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 41 minutes, 14 seconds)
2025-05-13 14:39:39,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:39:39,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 169.59776 ± 22.406
2025-05-13 14:39:39,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [170.03955, 153.89023, 171.56863, 165.96379, 145.49878, 203.42427, 177.89671, 211.07996, 161.25041, 135.36545]
2025-05-13 14:39:39,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 30.0, 33.0, 32.0, 28.0, 39.0, 34.0, 41.0, 31.0, 26.0]
2025-05-13 14:39:39,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 36 minutes, 16 seconds)
2025-05-13 14:44:33,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:44:34,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 259.08490 ± 143.996
2025-05-13 14:44:34,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [419.0875, 140.63632, 516.0781, 156.68625, 489.09824, 139.7788, 195.76183, 191.27693, 170.96648, 171.47832]
2025-05-13 14:44:34,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 27.0, 105.0, 30.0, 101.0, 27.0, 38.0, 37.0, 33.0, 33.0]
2025-05-13 14:44:34,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 31 minutes, 31 seconds)
2025-05-13 14:49:26,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:49:27,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 181.78702 ± 66.048
2025-05-13 14:49:27,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [125.278336, 162.26553, 171.78484, 184.4051, 172.15717, 156.34105, 151.07771, 145.44174, 175.25046, 373.86816]
2025-05-13 14:49:27,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 31.0, 33.0, 36.0, 33.0, 30.0, 29.0, 28.0, 34.0, 73.0]
2025-05-13 14:49:27,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 26 minutes, 36 seconds)
2025-05-13 14:54:19,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:54:20,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 290.85675 ± 201.270
2025-05-13 14:54:20,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [757.09674, 174.54018, 182.44096, 204.74734, 182.13649, 160.53104, 161.7657, 130.29288, 393.57056, 561.44556]
2025-05-13 14:54:20,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 34.0, 35.0, 39.0, 35.0, 31.0, 31.0, 25.0, 75.0, 111.0]
2025-05-13 14:54:20,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 21 minutes, 50 seconds)
2025-05-13 14:59:13,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:59:14,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 164.83345 ± 25.763
2025-05-13 14:59:14,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [155.67618, 202.22803, 155.75182, 150.64586, 160.70346, 134.9393, 166.29123, 205.21863, 191.95006, 124.92993]
2025-05-13 14:59:14,087 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 39.0, 30.0, 29.0, 31.0, 26.0, 32.0, 40.0, 37.0, 24.0]
2025-05-13 14:59:14,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 16 minutes, 54 seconds)
2025-05-13 15:04:05,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:04:06,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 233.47493 ± 158.008
2025-05-13 15:04:06,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [198.42079, 176.7742, 474.32214, 135.10284, 150.42847, 134.85936, 170.90288, 155.89006, 608.01514, 130.03352]
2025-05-13 15:04:06,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 34.0, 90.0, 26.0, 29.0, 26.0, 33.0, 30.0, 123.0, 25.0]
2025-05-13 15:04:06,154 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 11 minutes, 57 seconds)
2025-05-13 15:08:57,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:08:58,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 203.52878 ± 79.176
2025-05-13 15:08:58,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [208.5602, 196.94733, 239.7026, 164.96088, 140.3024, 425.85226, 160.47794, 176.7263, 151.18434, 170.57356]
2025-05-13 15:08:58,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [40.0, 38.0, 46.0, 32.0, 27.0, 90.0, 31.0, 34.0, 29.0, 33.0]
2025-05-13 15:08:58,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 2 hours, 6 minutes, 55 seconds)
2025-05-13 15:13:50,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:13:51,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 319.84317 ± 176.723
2025-05-13 15:13:51,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [531.97327, 175.59155, 610.3475, 441.8744, 462.22177, 144.08017, 140.64804, 125.697784, 174.51967, 391.4773]
2025-05-13 15:13:51,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 34.0, 119.0, 89.0, 91.0, 28.0, 27.0, 24.0, 34.0, 79.0]
2025-05-13 15:13:51,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 2 hours, 2 minutes)
2025-05-13 15:18:43,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:18:44,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 200.07993 ± 114.437
2025-05-13 15:18:44,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [130.00421, 176.86345, 210.21193, 155.80702, 161.6163, 535.8433, 177.35541, 140.33882, 130.65056, 182.10844]
2025-05-13 15:18:44,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 34.0, 41.0, 30.0, 31.0, 113.0, 34.0, 27.0, 25.0, 35.0]
2025-05-13 15:18:44,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 57 minutes, 6 seconds)
2025-05-13 15:23:36,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:23:37,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 237.23535 ± 114.413
2025-05-13 15:23:37,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [228.33218, 438.01178, 174.85483, 156.63992, 135.81583, 339.41068, 125.18002, 180.48386, 431.95135, 161.67317]
2025-05-13 15:23:37,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [44.0, 94.0, 34.0, 30.0, 26.0, 69.0, 24.0, 35.0, 84.0, 31.0]
2025-05-13 15:23:37,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 52 minutes, 12 seconds)
2025-05-13 15:28:29,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:28:30,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 169.35187 ± 15.880
2025-05-13 15:28:30,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [187.92975, 145.75885, 160.17824, 159.7091, 184.65811, 176.73674, 190.33846, 162.44183, 145.82483, 179.94286]
2025-05-13 15:28:30,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 28.0, 31.0, 31.0, 36.0, 34.0, 37.0, 31.0, 28.0, 35.0]
2025-05-13 15:28:30,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 47 minutes, 21 seconds)
2025-05-13 15:33:23,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:33:24,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 180.12833 ± 83.664
2025-05-13 15:33:24,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [146.02193, 174.11133, 150.72905, 129.9396, 171.70717, 139.85225, 426.75903, 175.83127, 151.61284, 134.71884]
2025-05-13 15:33:24,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 34.0, 29.0, 25.0, 33.0, 27.0, 80.0, 34.0, 29.0, 26.0]
2025-05-13 15:33:24,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 42 minutes, 36 seconds)
2025-05-13 15:38:17,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:38:18,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 221.78894 ± 111.674
2025-05-13 15:38:18,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [134.51166, 242.47105, 365.23038, 160.7409, 209.36667, 492.18463, 140.79631, 156.19255, 166.70274, 149.69264]
2025-05-13 15:38:18,273 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 47.0, 71.0, 31.0, 40.0, 93.0, 27.0, 30.0, 32.0, 29.0]
2025-05-13 15:38:18,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 37 minutes, 46 seconds)
2025-05-13 15:43:11,811 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:43:12,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 162.65067 ± 14.968
2025-05-13 15:43:12,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [159.88321, 165.68126, 176.56763, 161.84547, 192.39822, 166.19536, 154.49403, 134.76009, 168.0885, 146.59293]
2025-05-13 15:43:12,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 32.0, 34.0, 31.0, 37.0, 32.0, 30.0, 26.0, 32.0, 28.0]
2025-05-13 15:43:12,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 32 minutes, 58 seconds)
2025-05-13 15:48:05,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:48:06,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 236.62935 ± 165.901
2025-05-13 15:48:06,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [141.21144, 140.72014, 155.24707, 178.21883, 537.49603, 161.31432, 151.00891, 165.5493, 595.7174, 139.81006]
2025-05-13 15:48:06,922 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 27.0, 30.0, 34.0, 101.0, 31.0, 29.0, 32.0, 118.0, 27.0]
2025-05-13 15:48:06,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 28 minutes, 9 seconds)
2025-05-13 15:53:01,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:53:02,517 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 212.00056 ± 175.815
2025-05-13 15:53:02,517 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [737.85223, 181.37634, 155.08969, 145.12845, 162.23303, 140.15697, 145.98927, 130.5125, 155.0179, 166.64912]
2025-05-13 15:53:02,517 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 35.0, 30.0, 28.0, 31.0, 27.0, 28.0, 25.0, 30.0, 32.0]
2025-05-13 15:53:02,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 23 minutes, 26 seconds)
2025-05-13 15:57:53,735 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:57:55,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 287.68173 ± 211.760
2025-05-13 15:57:55,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [152.17027, 139.37619, 674.2793, 131.19037, 180.83417, 175.84737, 140.84929, 508.95605, 631.68677, 141.62778]
2025-05-13 15:57:55,068 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 27.0, 136.0, 25.0, 35.0, 34.0, 27.0, 96.0, 116.0, 27.0]
2025-05-13 15:57:55,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 18 minutes, 25 seconds)
2025-05-13 16:02:48,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:02:49,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 265.50769 ± 128.575
2025-05-13 16:02:49,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [458.04935, 151.8173, 140.83609, 186.60483, 192.53252, 208.40295, 161.30893, 435.4237, 243.0953, 477.00577]
2025-05-13 16:02:49,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 29.0, 27.0, 36.0, 37.0, 40.0, 31.0, 82.0, 47.0, 100.0]
2025-05-13 16:02:49,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 13 minutes, 34 seconds)
2025-05-13 16:07:43,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:07:43,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 190.63741 ± 58.746
2025-05-13 16:07:43,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [165.36804, 351.2606, 191.14807, 188.70682, 180.78186, 151.57251, 141.09113, 209.41956, 130.59987, 196.42552]
2025-05-13 16:07:43,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 69.0, 37.0, 36.0, 35.0, 29.0, 27.0, 40.0, 25.0, 38.0]
2025-05-13 16:07:43,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 8 minutes, 39 seconds)
2025-05-13 16:12:36,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:12:37,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 237.79900 ± 118.138
2025-05-13 16:12:37,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [166.77257, 458.9626, 175.173, 383.8415, 176.80385, 135.25967, 176.00903, 402.82938, 145.71324, 156.62518]
2025-05-13 16:12:37,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 86.0, 34.0, 73.0, 34.0, 26.0, 34.0, 78.0, 28.0, 30.0]
2025-05-13 16:12:37,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 1 hour, 3 minutes, 44 seconds)
2025-05-13 16:17:31,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:17:32,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 219.50369 ± 94.174
2025-05-13 16:17:32,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [418.66544, 167.27809, 144.91374, 179.26756, 207.80547, 175.75047, 140.50746, 388.4328, 180.44823, 191.96754]
2025-05-13 16:17:32,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 32.0, 28.0, 35.0, 40.0, 34.0, 27.0, 74.0, 35.0, 37.0]
2025-05-13 16:17:32,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 58 minutes, 48 seconds)
2025-05-13 16:22:24,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:22:25,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 173.34373 ± 32.958
2025-05-13 16:22:25,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [227.47511, 145.36989, 150.19553, 223.55913, 150.50739, 160.14038, 196.12589, 125.18592, 161.24225, 193.63591]
2025-05-13 16:22:25,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [44.0, 28.0, 29.0, 43.0, 29.0, 31.0, 38.0, 24.0, 31.0, 38.0]
2025-05-13 16:22:25,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 53 minutes, 55 seconds)
2025-05-13 16:27:18,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:27:19,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 196.78648 ± 62.156
2025-05-13 16:27:19,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [140.1744, 165.71863, 209.2196, 185.0958, 181.05948, 129.13914, 236.7682, 194.01704, 166.16533, 360.50732]
2025-05-13 16:27:19,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 32.0, 40.0, 36.0, 35.0, 25.0, 46.0, 38.0, 32.0, 72.0]
2025-05-13 16:27:19,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 48 minutes, 59 seconds)
2025-05-13 16:32:14,046 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:32:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 224.02478 ± 112.291
2025-05-13 16:32:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [172.43982, 150.03673, 183.66469, 498.13232, 160.00291, 185.54118, 156.56876, 183.00256, 385.19147, 165.66756]
2025-05-13 16:32:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 29.0, 36.0, 95.0, 31.0, 36.0, 30.0, 35.0, 75.0, 32.0]
2025-05-13 16:32:15,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 44 minutes, 8 seconds)
2025-05-13 16:37:17,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:37:19,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 263.34027 ± 194.907
2025-05-13 16:37:19,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [161.88045, 189.39084, 185.47108, 140.46542, 180.69179, 529.3439, 140.77751, 748.3692, 159.71436, 197.29845]
2025-05-13 16:37:19,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 37.0, 36.0, 27.0, 35.0, 114.0, 27.0, 146.0, 31.0, 38.0]
2025-05-13 16:37:19,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 39 minutes, 30 seconds)
2025-05-13 16:42:36,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:42:37,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 300.40033 ± 184.589
2025-05-13 16:42:37,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [716.7303, 353.7206, 141.03732, 150.23645, 170.75836, 386.47314, 179.938, 178.85564, 196.9282, 529.32513]
2025-05-13 16:42:37,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 74.0, 27.0, 29.0, 33.0, 74.0, 35.0, 35.0, 38.0, 100.0]
2025-05-13 16:42:37,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 35 minutes, 7 seconds)
2025-05-13 16:47:33,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:47:35,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 224.94492 ± 130.336
2025-05-13 16:47:35,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [171.60294, 191.22818, 134.59195, 150.61818, 188.41698, 157.01234, 508.81628, 162.09695, 454.68347, 130.38176]
2025-05-13 16:47:35,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 37.0, 26.0, 29.0, 36.0, 30.0, 96.0, 31.0, 88.0, 25.0]
2025-05-13 16:47:35,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 30 minutes, 11 seconds)
2025-05-13 16:52:51,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:52:53,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 280.54657 ± 188.071
2025-05-13 16:52:53,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [140.95753, 195.55524, 578.7896, 149.91925, 145.68103, 516.8271, 146.77937, 165.2256, 165.10506, 600.626]
2025-05-13 16:52:53,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 38.0, 107.0, 29.0, 28.0, 96.0, 28.0, 32.0, 32.0, 118.0]
2025-05-13 16:52:53,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 25 minutes, 33 seconds)
2025-05-13 16:57:58,338 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:57:59,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 165.41231 ± 16.469
2025-05-13 16:57:59,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [155.92195, 145.1445, 175.18611, 187.23305, 186.46938, 180.83635, 156.10355, 165.96593, 165.37047, 135.89177]
2025-05-13 16:57:59,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 28.0, 34.0, 36.0, 36.0, 35.0, 30.0, 32.0, 32.0, 26.0]
2025-05-13 16:57:59,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 20 minutes, 35 seconds)
2025-05-13 17:03:05,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 17:03:06,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 179.15274 ± 74.501
2025-05-13 17:03:06,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [150.83786, 151.0612, 135.08131, 146.1226, 177.1407, 155.55054, 171.38898, 170.32497, 398.81116, 135.20808]
2025-05-13 17:03:06,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 29.0, 26.0, 28.0, 34.0, 30.0, 33.0, 33.0, 85.0, 26.0]
2025-05-13 17:03:06,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 15 minutes, 28 seconds)
2025-05-13 17:08:15,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 17:08:16,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 216.25012 ± 93.460
2025-05-13 17:08:16,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [160.55423, 164.42938, 165.60605, 170.1547, 166.94289, 176.45311, 177.1852, 402.96173, 175.41321, 402.8007]
2025-05-13 17:08:16,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 32.0, 32.0, 33.0, 32.0, 34.0, 34.0, 80.0, 34.0, 79.0]
2025-05-13 17:08:16,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 10 minutes, 15 seconds)
2025-05-13 17:13:31,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 17:13:33,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 258.52609 ± 145.527
2025-05-13 17:13:33,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [141.31718, 187.32649, 181.53088, 160.77821, 562.593, 460.94476, 190.7604, 162.20857, 391.32947, 146.47186]
2025-05-13 17:13:33,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 36.0, 35.0, 31.0, 113.0, 87.0, 37.0, 31.0, 73.0, 28.0]
2025-05-13 17:13:33,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 5 minutes, 11 seconds)
2025-05-13 17:18:17,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 17:18:18,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1221 [DEBUG]: Total Reward: 159.96957 ± 14.346
2025-05-13 17:18:18,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1222 [DEBUG]: All rewards: [176.36618, 155.59262, 156.71114, 185.37009, 172.16113, 167.46582, 157.77028, 141.81174, 145.8709, 140.57587]
2025-05-13 17:18:18,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 30.0, 30.0, 36.0, 33.0, 32.0, 30.0, 27.0, 28.0, 27.0]
2025-05-13 17:18:18,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-humanoid):1251 [DEBUG]: Training session finished
