2025-09-11 19:30:54,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc20-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay
2025-09-11 19:30:54,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc20-humanoid/ExtremeClogL1U23-mbpac-highdim-memdelay
2025-09-11 19:30:54,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1539c61dec90>}
2025-09-11 19:30:54,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1111 [DEBUG]: using device: cuda
2025-09-11 19:30:54,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1133 [INFO]: Creating new trainer
2025-09-11 19:30:54,108 baseline-mbpac-noiseperc20-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-11 19:30:54,108 baseline-mbpac-noiseperc20-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 19:30:54,119 baseline-mbpac-noiseperc20-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-11 19:30:55,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1194 [DEBUG]: Starting training session...
2025-09-11 19:30:55,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 1/100
2025-09-11 19:43:39,357 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:43:39,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:43:54,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 287.48602 ± 38.443
2025-09-11 19:43:54,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [242.94666, 368.66556, 274.53287, 329.36493, 234.40314, 313.14658, 259.92334, 283.1854, 282.206, 286.48593]
2025-09-11 19:43:54,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [45.0, 69.0, 50.0, 62.0, 44.0, 60.0, 49.0, 52.0, 53.0, 53.0]
2025-09-11 19:43:54,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (287.49) for latency ExtremeClogL1U23
2025-09-11 19:43:54,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 21 hours, 25 minutes, 37 seconds)
2025-09-11 19:58:25,364 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:58:25,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:58:45,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 378.22769 ± 59.702
2025-09-11 19:58:45,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [334.8945, 338.06595, 384.6502, 482.75763, 329.74063, 307.67502, 360.13602, 370.0726, 380.04294, 494.2412]
2025-09-11 19:58:45,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 64.0, 72.0, 90.0, 61.0, 57.0, 76.0, 68.0, 70.0, 96.0]
2025-09-11 19:58:45,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (378.23) for latency ExtremeClogL1U23
2025-09-11 19:58:45,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 22 hours, 44 minutes, 7 seconds)
2025-09-11 20:13:07,876 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:13:07,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:13:26,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 350.59024 ± 101.058
2025-09-11 20:13:26,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [354.28925, 393.4895, 334.12857, 364.82922, 392.7414, 293.63815, 568.3829, 328.345, 134.31891, 341.73953]
2025-09-11 20:13:26,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 73.0, 62.0, 67.0, 73.0, 54.0, 108.0, 60.0, 26.0, 63.0]
2025-09-11 20:13:26,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 22 hours, 54 minutes, 39 seconds)
2025-09-11 20:27:53,531 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:27:53,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:28:14,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 378.01788 ± 201.748
2025-09-11 20:28:14,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [368.89673, 882.31573, 330.54807, 300.52982, 114.30079, 356.83157, 343.06958, 311.69818, 571.2251, 200.76328]
2025-09-11 20:28:14,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 175.0, 63.0, 64.0, 22.0, 67.0, 64.0, 58.0, 110.0, 41.0]
2025-09-11 20:28:14,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 22 hours, 55 minutes, 31 seconds)
2025-09-11 20:42:31,081 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:42:31,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:42:53,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 398.46564 ± 82.972
2025-09-11 20:42:53,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [421.39926, 377.48474, 370.96445, 333.0849, 245.96452, 371.2933, 407.2576, 523.8852, 549.55524, 383.7674]
2025-09-11 20:42:53,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 79.0, 69.0, 72.0, 55.0, 69.0, 84.0, 102.0, 105.0, 82.0]
2025-09-11 20:42:53,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (398.47) for latency ExtremeClogL1U23
2025-09-11 20:42:53,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 22 hours, 47 minutes, 22 seconds)
2025-09-11 20:57:10,744 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:57:10,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:57:30,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 352.48657 ± 112.315
2025-09-11 20:57:30,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [326.86456, 266.22415, 95.92509, 386.42538, 409.85617, 449.63388, 472.5163, 288.00803, 337.1297, 492.28256]
2025-09-11 20:57:30,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 56.0, 19.0, 82.0, 75.0, 84.0, 91.0, 63.0, 67.0, 93.0]
2025-09-11 20:57:30,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 23 hours, 3 minutes, 35 seconds)
2025-09-11 21:11:44,472 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:11:44,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:12:04,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 364.73566 ± 86.590
2025-09-11 21:12:04,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [326.17734, 366.64972, 363.76697, 272.7065, 608.31335, 313.44666, 374.64764, 356.46207, 352.8832, 312.30276]
2025-09-11 21:12:04,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 67.0, 67.0, 58.0, 118.0, 66.0, 69.0, 66.0, 77.0, 63.0]
2025-09-11 21:12:04,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 22 hours, 43 minutes, 39 seconds)
2025-09-11 21:26:26,470 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:26:26,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:26:49,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 414.15054 ± 146.095
2025-09-11 21:26:49,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [432.92245, 471.14145, 414.2644, 419.122, 676.3132, 603.3953, 134.90834, 279.79108, 388.0081, 321.6387]
2025-09-11 21:26:49,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 88.0, 75.0, 78.0, 140.0, 127.0, 26.0, 52.0, 82.0, 69.0]
2025-09-11 21:26:49,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (414.15) for latency ExtremeClogL1U23
2025-09-11 21:26:49,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 22 hours, 30 minutes, 21 seconds)
2025-09-11 21:41:00,916 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:41:00,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:41:23,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 428.21420 ± 142.043
2025-09-11 21:41:23,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [414.02567, 546.97455, 614.28705, 107.62235, 619.07904, 428.90845, 326.57053, 444.6448, 382.7732, 397.25626]
2025-09-11 21:41:23,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 102.0, 114.0, 21.0, 117.0, 82.0, 61.0, 82.0, 70.0, 84.0]
2025-09-11 21:41:23,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (428.21) for latency ExtremeClogL1U23
2025-09-11 21:41:23,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 22 hours, 11 minutes, 24 seconds)
2025-09-11 21:55:41,460 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:55:41,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:56:00,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 344.69501 ± 122.014
2025-09-11 21:56:00,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [500.2241, 421.3966, 361.33264, 425.37305, 405.2405, 406.27554, 399.77368, 118.03276, 130.74945, 278.55145]
2025-09-11 21:56:00,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 93.0, 67.0, 79.0, 87.0, 78.0, 74.0, 23.0, 25.0, 52.0]
2025-09-11 21:56:00,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 21 hours, 56 minutes, 9 seconds)
2025-09-11 22:10:20,655 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:10:20,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:10:44,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 408.26157 ± 195.964
2025-09-11 22:10:44,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [604.303, 484.70587, 357.4709, 411.85983, 347.2321, 432.09613, 332.48965, 100.996544, 178.84785, 832.6141]
2025-09-11 22:10:44,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 91.0, 78.0, 92.0, 77.0, 90.0, 74.0, 20.0, 34.0, 175.0]
2025-09-11 22:10:44,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 21 hours, 43 minutes, 35 seconds)
2025-09-11 22:25:02,239 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:25:02,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:25:25,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 421.59750 ± 203.021
2025-09-11 22:25:25,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [577.4657, 118.20652, 367.6504, 393.74393, 138.79588, 310.57104, 480.27075, 538.1128, 848.67, 442.48825]
2025-09-11 22:25:25,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 23.0, 70.0, 74.0, 27.0, 61.0, 101.0, 114.0, 176.0, 86.0]
2025-09-11 22:25:25,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 21 hours, 30 minutes, 56 seconds)
2025-09-11 22:39:31,816 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:39:31,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:39:55,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 447.25830 ± 101.925
2025-09-11 22:39:55,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [446.50195, 423.1663, 392.1998, 351.1265, 597.3588, 479.20987, 362.59595, 340.69406, 665.97455, 413.7553]
2025-09-11 22:39:55,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 79.0, 86.0, 71.0, 112.0, 92.0, 67.0, 77.0, 127.0, 77.0]
2025-09-11 22:39:55,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (447.26) for latency ExtremeClogL1U23
2025-09-11 22:39:55,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 21 hours, 12 minutes, 1 second)
2025-09-11 22:53:58,083 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:53:58,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:54:26,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 522.41663 ± 95.840
2025-09-11 22:54:26,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [480.1103, 345.57425, 667.6461, 534.34595, 497.3016, 703.77234, 516.1923, 474.91614, 480.72604, 523.5816]
2025-09-11 22:54:26,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 77.0, 125.0, 100.0, 90.0, 150.0, 110.0, 85.0, 89.0, 97.0]
2025-09-11 22:54:26,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (522.42) for latency ExtremeClogL1U23
2025-09-11 22:54:26,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 20 hours, 56 minutes, 27 seconds)
2025-09-11 23:08:33,991 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:08:33,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:09:01,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 514.49664 ± 181.534
2025-09-11 23:09:01,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [457.6727, 407.6604, 498.96524, 507.88702, 469.98315, 489.20877, 523.75903, 397.33063, 1036.9229, 355.57675]
2025-09-11 23:09:01,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 77.0, 108.0, 93.0, 96.0, 91.0, 99.0, 74.0, 204.0, 65.0]
2025-09-11 23:09:01,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 20 hours, 41 minutes, 13 seconds)
2025-09-11 23:23:12,044 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:23:12,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:23:35,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 434.97559 ± 120.489
2025-09-11 23:23:35,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [406.3692, 525.3129, 566.40784, 474.4851, 502.9076, 453.4196, 340.57828, 525.40784, 428.56244, 126.30495]
2025-09-11 23:23:35,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 98.0, 121.0, 88.0, 95.0, 85.0, 62.0, 98.0, 80.0, 25.0]
2025-09-11 23:23:35,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 20 hours, 23 minutes, 50 seconds)
2025-09-11 23:37:48,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:37:48,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:38:11,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 449.07788 ± 190.948
2025-09-11 23:38:11,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [357.57208, 677.24335, 555.96783, 629.8453, 102.20397, 440.0837, 112.5871, 498.92075, 549.2592, 567.0959]
2025-09-11 23:38:11,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 129.0, 105.0, 121.0, 20.0, 80.0, 22.0, 106.0, 102.0, 109.0]
2025-09-11 23:38:11,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 20 hours, 7 minutes, 57 seconds)
2025-09-11 23:52:17,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:52:17,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:52:43,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 505.36017 ± 139.537
2025-09-11 23:52:43,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [396.3253, 559.7691, 410.88116, 619.27264, 822.4111, 363.49155, 488.3914, 608.0304, 370.05032, 414.9788]
2025-09-11 23:52:43,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 104.0, 76.0, 119.0, 176.0, 67.0, 92.0, 112.0, 69.0, 78.0]
2025-09-11 23:52:43,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 19 hours, 53 minutes, 48 seconds)
2025-09-12 00:06:44,719 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:06:44,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:07:11,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 505.45050 ± 60.226
2025-09-12 00:07:11,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [555.63544, 492.52435, 589.8979, 606.87665, 495.3672, 425.4308, 495.07162, 493.8348, 409.59055, 490.27573]
2025-09-12 00:07:11,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 90.0, 108.0, 111.0, 91.0, 80.0, 91.0, 107.0, 77.0, 92.0]
2025-09-12 00:07:11,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 19 hours, 38 minutes, 26 seconds)
2025-09-12 00:21:24,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:21:24,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:21:50,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 494.19330 ± 168.852
2025-09-12 00:21:50,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [513.7983, 380.53455, 670.45074, 636.6708, 490.13785, 119.465294, 751.9124, 442.88776, 537.2556, 398.8196]
2025-09-12 00:21:50,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 73.0, 127.0, 120.0, 90.0, 23.0, 142.0, 80.0, 103.0, 74.0]
2025-09-12 00:21:50,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 19 hours, 24 minutes, 57 seconds)
2025-09-12 00:35:55,469 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:35:55,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:36:22,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 491.15631 ± 225.615
2025-09-12 00:36:22,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [90.52923, 680.9043, 536.4857, 747.6638, 596.12634, 103.36462, 447.79575, 669.0918, 674.8523, 364.74966]
2025-09-12 00:36:22,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 126.0, 102.0, 156.0, 111.0, 20.0, 84.0, 135.0, 142.0, 83.0]
2025-09-12 00:36:22,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 19 hours, 10 minutes)
2025-09-12 00:50:48,671 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:50:48,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:51:23,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 606.13025 ± 161.888
2025-09-12 00:51:23,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [515.72375, 614.678, 545.7614, 1023.4601, 645.9027, 444.8226, 728.2604, 452.67557, 580.2515, 509.76694]
2025-09-12 00:51:23,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 116.0, 102.0, 204.0, 122.0, 81.0, 137.0, 83.0, 108.0, 95.0]
2025-09-12 00:51:23,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (606.13) for latency ExtremeClogL1U23
2025-09-12 00:51:23,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 19 hours, 1 minute, 45 seconds)
2025-09-12 01:06:20,357 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:06:20,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:06:47,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 486.08798 ± 194.004
2025-09-12 01:06:47,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [382.06302, 407.11002, 105.88171, 341.30795, 462.35522, 798.17145, 751.7584, 642.37714, 464.58307, 505.27213]
2025-09-12 01:06:47,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 77.0, 21.0, 62.0, 87.0, 151.0, 143.0, 124.0, 91.0, 95.0]
2025-09-12 01:06:47,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 19 hours, 42 seconds)
2025-09-12 01:21:47,058 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:21:47,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:22:19,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 571.12598 ± 174.178
2025-09-12 01:22:19,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [278.67612, 783.4407, 511.8851, 641.4987, 351.35568, 585.8171, 416.68622, 588.495, 717.0245, 836.38104]
2025-09-12 01:22:19,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [51.0, 147.0, 98.0, 121.0, 64.0, 110.0, 77.0, 113.0, 137.0, 166.0]
2025-09-12 01:22:19,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 19 hours, 2 minutes, 7 seconds)
2025-09-12 01:37:26,779 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:37:26,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:37:53,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 436.00214 ± 198.275
2025-09-12 01:37:53,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [130.83057, 297.4184, 430.33337, 153.40265, 609.78705, 401.12177, 479.48984, 830.3789, 511.66483, 515.5939]
2025-09-12 01:37:53,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 56.0, 80.0, 30.0, 114.0, 89.0, 105.0, 158.0, 112.0, 113.0]
2025-09-12 01:37:53,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 46 seconds)
2025-09-12 01:52:49,585 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:52:49,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:53:27,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 668.71143 ± 191.127
2025-09-12 01:53:27,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [619.37103, 665.30304, 394.4155, 621.38416, 925.34106, 398.0217, 1022.3487, 676.53644, 580.3229, 784.0697]
2025-09-12 01:53:27,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 126.0, 88.0, 112.0, 183.0, 83.0, 193.0, 129.0, 110.0, 150.0]
2025-09-12 01:53:27,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (668.71) for latency ExtremeClogL1U23
2025-09-12 01:53:27,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 57 seconds)
2025-09-12 02:08:39,339 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:08:39,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:09:13,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 598.24231 ± 170.926
2025-09-12 02:09:13,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [603.0744, 451.93402, 350.03134, 609.6527, 565.2734, 834.70715, 827.58746, 403.9759, 829.1643, 507.02258]
2025-09-12 02:09:13,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 84.0, 62.0, 117.0, 106.0, 163.0, 156.0, 88.0, 171.0, 96.0]
2025-09-12 02:09:13,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 18 hours, 56 minutes, 34 seconds)
2025-09-12 02:24:06,826 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:24:06,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:24:45,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 634.85052 ± 353.801
2025-09-12 02:24:45,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [771.167, 584.6676, 637.52136, 599.0119, 616.34875, 107.04332, 417.02014, 1524.3654, 764.5973, 326.76263]
2025-09-12 02:24:45,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 129.0, 120.0, 130.0, 115.0, 21.0, 85.0, 316.0, 145.0, 61.0]
2025-09-12 02:24:45,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 18 hours, 42 minutes, 34 seconds)
2025-09-12 02:39:53,276 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:39:53,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:40:31,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 677.14392 ± 107.668
2025-09-12 02:40:31,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [876.0405, 631.62555, 478.10284, 699.5857, 591.19696, 676.0978, 761.0232, 799.8732, 633.3136, 624.5801]
2025-09-12 02:40:31,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 118.0, 90.0, 129.0, 112.0, 126.0, 141.0, 151.0, 120.0, 120.0]
2025-09-12 02:40:31,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (677.14) for latency ExtremeClogL1U23
2025-09-12 02:40:31,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 30 minutes, 21 seconds)
2025-09-12 02:55:32,263 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:55:32,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:56:21,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 864.29456 ± 144.824
2025-09-12 02:56:21,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [788.99677, 652.6611, 730.6836, 863.108, 844.09845, 1056.522, 956.608, 781.49536, 811.40137, 1157.371]
2025-09-12 02:56:21,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 122.0, 143.0, 164.0, 160.0, 198.0, 181.0, 145.0, 154.0, 215.0]
2025-09-12 02:56:21,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (864.29) for latency ExtremeClogL1U23
2025-09-12 02:56:21,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 18 minutes, 37 seconds)
2025-09-12 03:11:31,924 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:11:31,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:12:15,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 763.22473 ± 208.692
2025-09-12 03:12:15,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1079.5865, 1118.4644, 744.80383, 652.3241, 571.9432, 795.0628, 490.11218, 705.02325, 937.0859, 537.8413]
2025-09-12 03:12:15,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [208.0, 218.0, 138.0, 123.0, 108.0, 152.0, 89.0, 130.0, 181.0, 97.0]
2025-09-12 03:12:15,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 7 minutes, 23 seconds)
2025-09-12 03:27:07,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:27:07,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:27:40,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 561.38782 ± 161.817
2025-09-12 03:27:40,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [467.59573, 279.95392, 622.662, 672.9456, 494.6329, 753.0166, 859.8852, 472.75507, 568.5427, 421.8877]
2025-09-12 03:27:40,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 51.0, 120.0, 143.0, 108.0, 145.0, 168.0, 105.0, 108.0, 92.0]
2025-09-12 03:27:40,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 46 minutes, 51 seconds)
2025-09-12 03:43:00,045 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:43:00,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:43:28,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 485.55801 ± 315.057
2025-09-12 03:43:28,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [94.58612, 96.86147, 742.66614, 991.00323, 854.3838, 101.70456, 375.85193, 653.3996, 325.6509, 619.4724]
2025-09-12 03:43:28,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 19.0, 140.0, 195.0, 173.0, 20.0, 70.0, 133.0, 71.0, 122.0]
2025-09-12 03:43:28,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 34 minutes, 54 seconds)
2025-09-12 03:58:20,350 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:58:20,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:59:07,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 805.02771 ± 179.456
2025-09-12 03:59:07,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [519.6757, 1074.3162, 637.1603, 639.9318, 1018.2764, 815.30945, 1051.7515, 761.53143, 791.661, 740.6631]
2025-09-12 03:59:07,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 205.0, 116.0, 138.0, 197.0, 153.0, 219.0, 164.0, 149.0, 143.0]
2025-09-12 03:59:07,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 17 minutes, 40 seconds)
2025-09-12 04:14:06,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:14:06,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:14:53,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 844.48975 ± 353.681
2025-09-12 04:14:53,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [123.34282, 863.1388, 1190.7562, 1187.4104, 1177.3995, 1161.8633, 637.3425, 510.60898, 1043.9646, 549.0704]
2025-09-12 04:14:53,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 161.0, 229.0, 225.0, 230.0, 217.0, 113.0, 95.0, 200.0, 103.0]
2025-09-12 04:14:53,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 1 minute)
2025-09-12 04:30:01,939 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:30:01,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:30:47,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 797.36151 ± 189.864
2025-09-12 04:30:47,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1239.791, 709.48096, 798.9276, 550.3635, 597.50397, 686.02405, 772.6442, 1001.716, 785.7472, 831.41705]
2025-09-12 04:30:47,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [240.0, 132.0, 159.0, 108.0, 108.0, 130.0, 146.0, 191.0, 165.0, 159.0]
2025-09-12 04:30:47,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 45 minutes, 14 seconds)
2025-09-12 04:46:02,676 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:46:02,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:46:45,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 754.58423 ± 216.608
2025-09-12 04:46:45,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [758.59985, 634.643, 792.474, 1095.7163, 633.6211, 557.4523, 602.4441, 1144.5381, 449.20724, 877.1466]
2025-09-12 04:46:45,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 121.0, 151.0, 209.0, 131.0, 103.0, 114.0, 223.0, 85.0, 164.0]
2025-09-12 04:46:45,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 36 minutes, 27 seconds)
2025-09-12 05:01:31,086 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:01:31,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:02:15,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 788.51953 ± 407.734
2025-09-12 05:02:15,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [946.93427, 623.38165, 398.346, 1330.0409, 1397.569, 95.2466, 437.56024, 713.34924, 729.15454, 1213.6134]
2025-09-12 05:02:15,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [178.0, 119.0, 74.0, 257.0, 274.0, 19.0, 80.0, 135.0, 135.0, 233.0]
2025-09-12 05:02:15,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 16 minutes, 52 seconds)
2025-09-12 05:17:25,160 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:17:25,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:18:24,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1011.26398 ± 560.857
2025-09-12 05:18:24,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [708.21063, 1449.7941, 844.65985, 728.1545, 452.31058, 953.8108, 889.3667, 762.5058, 791.743, 2532.0842]
2025-09-12 05:18:24,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 279.0, 162.0, 156.0, 104.0, 175.0, 175.0, 167.0, 151.0, 490.0]
2025-09-12 05:18:24,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1011.26) for latency ExtremeClogL1U23
2025-09-12 05:18:24,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 16 hours, 7 minutes, 10 seconds)
2025-09-12 05:33:37,749 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:33:37,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:34:34,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 972.35938 ± 298.974
2025-09-12 05:34:34,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [684.8755, 1034.6433, 896.42957, 1101.0741, 1692.5156, 642.67755, 1239.6875, 833.67755, 740.3471, 857.66614]
2025-09-12 05:34:34,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 198.0, 173.0, 218.0, 344.0, 134.0, 255.0, 161.0, 140.0, 159.0]
2025-09-12 05:34:34,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 56 minutes, 12 seconds)
2025-09-12 05:49:46,907 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:49:46,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:50:40,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 926.33710 ± 524.184
2025-09-12 05:50:40,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1136.9211, 1676.6414, 1149.8219, 219.70305, 894.83594, 1847.6223, 473.75677, 1028.7373, 483.02786, 352.3036]
2025-09-12 05:50:40,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [217.0, 334.0, 219.0, 46.0, 170.0, 351.0, 91.0, 197.0, 89.0, 73.0]
2025-09-12 05:50:40,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 42 minutes, 39 seconds)
2025-09-12 06:05:28,689 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:05:28,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:06:29,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1060.85059 ± 466.823
2025-09-12 06:06:29,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [139.12082, 1407.7737, 1018.3136, 1248.0138, 1585.1456, 774.834, 1271.4376, 654.5256, 739.31525, 1770.0259]
2025-09-12 06:06:29,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 269.0, 205.0, 234.0, 300.0, 147.0, 247.0, 124.0, 138.0, 363.0]
2025-09-12 06:06:29,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1060.85) for latency ExtremeClogL1U23
2025-09-12 06:06:29,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 24 minutes, 52 seconds)
2025-09-12 06:21:34,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:21:34,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:22:27,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 894.79626 ± 680.527
2025-09-12 06:22:27,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [119.929306, 89.59871, 1031.3634, 1413.6449, 253.95917, 1802.563, 908.2112, 145.04251, 1188.1952, 1995.4553]
2025-09-12 06:22:27,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 18.0, 187.0, 267.0, 51.0, 343.0, 196.0, 28.0, 224.0, 390.0]
2025-09-12 06:22:27,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 14 minutes, 11 seconds)
2025-09-12 06:37:28,926 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:37:28,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:38:42,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1269.96667 ± 580.267
2025-09-12 06:38:42,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2550.8777, 1631.1675, 1814.4128, 1319.4232, 1430.5315, 602.8801, 970.3599, 601.93085, 887.23004, 890.8539]
2025-09-12 06:38:42,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [500.0, 326.0, 348.0, 259.0, 276.0, 119.0, 189.0, 113.0, 168.0, 167.0]
2025-09-12 06:38:42,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1269.97) for latency ExtremeClogL1U23
2025-09-12 06:38:42,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 59 minutes, 21 seconds)
2025-09-12 06:54:04,609 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:54:04,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:55:01,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 984.81329 ± 407.523
2025-09-12 06:55:01,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1522.8511, 102.27445, 1400.622, 793.683, 851.63904, 1433.3733, 1270.3425, 856.32056, 852.2307, 764.79645]
2025-09-12 06:55:01,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [284.0, 20.0, 266.0, 151.0, 161.0, 272.0, 242.0, 165.0, 180.0, 163.0]
2025-09-12 06:55:01,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 44 minutes, 56 seconds)
2025-09-12 07:09:54,752 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:09:54,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:11:06,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1247.12256 ± 838.604
2025-09-12 07:11:06,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2810.1877, 661.5161, 2268.0225, 1199.7184, 1535.3136, 549.4578, 1772.5848, 1344.6898, 209.6016, 120.13397]
2025-09-12 07:11:06,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [529.0, 127.0, 444.0, 224.0, 286.0, 124.0, 341.0, 253.0, 41.0, 23.0]
2025-09-12 07:11:06,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 28 minutes, 32 seconds)
2025-09-12 07:26:13,666 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:26:13,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:27:36,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1421.09973 ± 934.368
2025-09-12 07:27:36,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2275.8003, 2941.3362, 267.2078, 2235.8271, 2250.395, 1462.2415, 1166.329, 1159.2897, 314.34732, 138.22224]
2025-09-12 07:27:36,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [439.0, 545.0, 50.0, 442.0, 436.0, 295.0, 220.0, 220.0, 58.0, 27.0]
2025-09-12 07:27:36,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1421.10) for latency ExtremeClogL1U23
2025-09-12 07:27:36,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 19 minutes, 44 seconds)
2025-09-12 07:42:57,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:42:57,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:44:20,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1445.25256 ± 945.933
2025-09-12 07:44:20,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1779.5454, 122.25131, 299.08267, 1126.6261, 1134.3903, 468.53018, 2212.265, 1724.206, 2391.5693, 3194.06]
2025-09-12 07:44:20,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [336.0, 24.0, 57.0, 215.0, 228.0, 102.0, 420.0, 332.0, 465.0, 605.0]
2025-09-12 07:44:20,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1445.25) for latency ExtremeClogL1U23
2025-09-12 07:44:20,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 14 hours, 11 minutes, 36 seconds)
2025-09-12 07:59:05,026 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:59:05,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:00:26,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1402.12280 ± 738.376
2025-09-12 08:00:26,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1552.1427, 2242.9985, 1684.9878, 709.55365, 2319.022, 894.3987, 95.491264, 2446.3606, 1066.1866, 1010.086]
2025-09-12 08:00:26,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [300.0, 443.0, 321.0, 140.0, 457.0, 186.0, 19.0, 475.0, 206.0, 196.0]
2025-09-12 08:00:26,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 53 minutes, 43 seconds)
2025-09-12 08:15:49,833 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:15:49,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:16:52,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1061.41260 ± 436.232
2025-09-12 08:16:52,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [288.87558, 954.7391, 503.31528, 1181.1045, 1762.7792, 1543.333, 818.63873, 923.6967, 1449.9893, 1187.654]
2025-09-12 08:16:52,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 198.0, 100.0, 245.0, 340.0, 291.0, 170.0, 175.0, 275.0, 245.0]
2025-09-12 08:16:52,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 38 minutes, 31 seconds)
2025-09-12 08:31:49,721 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:31:49,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:33:34,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1827.99243 ± 1331.574
2025-09-12 08:33:34,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [4029.1511, 2285.049, 1685.684, 2458.3494, 628.10095, 4120.6074, 1551.8867, 120.50489, 819.29865, 581.29065]
2025-09-12 08:33:34,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [765.0, 445.0, 321.0, 481.0, 119.0, 796.0, 295.0, 23.0, 155.0, 112.0]
2025-09-12 08:33:34,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1827.99) for latency ExtremeClogL1U23
2025-09-12 08:33:34,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 13 hours, 28 minutes, 14 seconds)
2025-09-12 08:48:49,225 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:48:49,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:50:01,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1276.51929 ± 1085.171
2025-09-12 08:50:01,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1310.2074, 116.35699, 1666.8291, 1137.4391, 3954.122, 1459.9536, 897.5112, 1909.832, 166.40538, 146.5347]
2025-09-12 08:50:01,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [256.0, 23.0, 313.0, 212.0, 751.0, 283.0, 171.0, 364.0, 32.0, 28.0]
2025-09-12 08:50:01,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 13 hours, 11 minutes, 16 seconds)
2025-09-12 09:05:07,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:05:07,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:06:19,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1235.50391 ± 753.070
2025-09-12 09:06:19,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [940.13806, 2572.804, 913.172, 2142.7915, 645.89514, 1438.3743, 674.96814, 809.3185, 120.931145, 2096.648]
2025-09-12 09:06:19,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 497.0, 188.0, 415.0, 120.0, 283.0, 125.0, 153.0, 24.0, 431.0]
2025-09-12 09:06:19,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 50 minutes, 42 seconds)
2025-09-12 09:21:14,919 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:21:14,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:22:46,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1584.07593 ± 957.332
2025-09-12 09:22:46,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1552.6464, 1948.6523, 2537.7668, 156.98247, 2807.969, 1476.7279, 509.47607, 1281.7646, 3058.9426, 509.83124]
2025-09-12 09:22:46,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [296.0, 374.0, 501.0, 30.0, 539.0, 281.0, 111.0, 243.0, 579.0, 98.0]
2025-09-12 09:22:46,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 37 minutes, 26 seconds)
2025-09-12 09:37:50,738 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:37:50,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:38:54,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1084.55933 ± 390.358
2025-09-12 09:38:54,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1137.5764, 721.46136, 894.7377, 896.05084, 2062.0657, 1233.7565, 674.46075, 830.96155, 1399.6228, 994.8995]
2025-09-12 09:38:54,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [219.0, 136.0, 176.0, 175.0, 420.0, 237.0, 131.0, 158.0, 269.0, 194.0]
2025-09-12 09:38:54,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 18 minutes, 13 seconds)
2025-09-12 09:54:06,197 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:54:06,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:55:23,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1345.61694 ± 695.177
2025-09-12 09:55:23,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [495.8078, 880.63007, 2802.818, 2095.1238, 774.6161, 884.1518, 906.42316, 1325.9435, 2030.819, 1259.8359]
2025-09-12 09:55:23,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 164.0, 535.0, 401.0, 157.0, 165.0, 172.0, 259.0, 383.0, 245.0]
2025-09-12 09:55:23,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 59 minutes, 57 seconds)
2025-09-12 10:10:52,753 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:10:52,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:12:36,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1789.39685 ± 1459.915
2025-09-12 10:12:36,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2909.8333, 944.8357, 440.88342, 96.26828, 5188.378, 1597.4429, 672.797, 1132.6339, 2957.6316, 1953.2642]
2025-09-12 10:12:36,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [556.0, 189.0, 82.0, 19.0, 994.0, 303.0, 124.0, 216.0, 561.0, 371.0]
2025-09-12 10:12:36,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 50 minutes, 10 seconds)
2025-09-12 10:27:37,181 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:27:37,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:29:17,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1762.04138 ± 761.357
2025-09-12 10:29:17,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2410.3804, 926.9094, 1207.6465, 2089.377, 95.3064, 2296.9146, 1953.9805, 2130.0698, 2775.5535, 1734.2751]
2025-09-12 10:29:17,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [460.0, 184.0, 228.0, 399.0, 19.0, 437.0, 373.0, 410.0, 528.0, 332.0]
2025-09-12 10:29:17,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 36 minutes, 53 seconds)
2025-09-12 10:44:29,169 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:44:29,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:45:40,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1204.46667 ± 984.511
2025-09-12 10:45:40,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [332.63776, 119.38006, 95.4969, 1175.7439, 1807.0168, 1630.2737, 566.8327, 2760.6726, 661.8031, 2894.8093]
2025-09-12 10:45:40,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 23.0, 19.0, 237.0, 369.0, 320.0, 114.0, 529.0, 140.0, 559.0]
2025-09-12 10:45:40,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 19 minutes, 43 seconds)
2025-09-12 11:00:28,511 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:00:28,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:01:56,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1504.38025 ± 1183.327
2025-09-12 11:01:56,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [289.43185, 603.09973, 1116.6226, 1943.5635, 4583.2886, 1973.111, 582.4687, 630.824, 1591.0145, 1730.3783]
2025-09-12 11:01:56,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 116.0, 217.0, 371.0, 898.0, 378.0, 108.0, 134.0, 300.0, 329.0]
2025-09-12 11:01:56,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 11 hours, 4 minutes, 15 seconds)
2025-09-12 11:17:33,491 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:17:33,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:19:14,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1699.46252 ± 748.387
2025-09-12 11:19:14,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [710.00507, 1364.8365, 1252.3896, 1476.3032, 2142.5461, 1501.6853, 3100.316, 1221.0703, 1249.0585, 2976.414]
2025-09-12 11:19:14,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 279.0, 240.0, 299.0, 447.0, 310.0, 597.0, 238.0, 260.0, 564.0]
2025-09-12 11:19:14,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 54 minutes, 5 seconds)
2025-09-12 11:34:06,170 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:34:06,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:35:38,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1551.17090 ± 1308.248
2025-09-12 11:35:38,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [290.89047, 989.43805, 1055.2565, 990.335, 4588.6533, 2096.1025, 119.96155, 3014.0784, 605.2536, 1761.7397]
2025-09-12 11:35:38,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 188.0, 209.0, 187.0, 908.0, 403.0, 23.0, 598.0, 111.0, 358.0]
2025-09-12 11:35:38,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 31 minutes, 8 seconds)
2025-09-12 11:50:42,283 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:50:42,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:53:28,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2903.25024 ± 1752.912
2025-09-12 11:53:28,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2081.869, 2368.9849, 113.79332, 1195.276, 4441.4365, 4594.715, 1150.8284, 5212.079, 5221.989, 2651.533]
2025-09-12 11:53:28,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [386.0, 454.0, 22.0, 238.0, 850.0, 871.0, 217.0, 1000.0, 1000.0, 509.0]
2025-09-12 11:53:28,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (2903.25) for latency ExtremeClogL1U23
2025-09-12 11:53:28,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 22 minutes, 54 seconds)
2025-09-12 12:08:25,625 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:08:25,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:11:01,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2704.56250 ± 1599.533
2025-09-12 12:11:01,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2542.6084, 5062.1006, 4400.6113, 3945.3481, 4211.403, 3186.799, 825.8816, 1191.5563, 1231.5894, 447.72693]
2025-09-12 12:11:01,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [483.0, 973.0, 829.0, 744.0, 794.0, 610.0, 157.0, 254.0, 231.0, 99.0]
2025-09-12 12:11:01,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 10 hours, 14 minutes, 34 seconds)
2025-09-12 12:26:38,189 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:26:38,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:28:14,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1629.21802 ± 1285.670
2025-09-12 12:28:14,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1683.3544, 1092.7828, 1085.6241, 1209.0116, 596.22394, 737.7034, 646.8123, 2155.1174, 1908.8021, 5176.748]
2025-09-12 12:28:14,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [352.0, 210.0, 219.0, 230.0, 111.0, 157.0, 132.0, 428.0, 369.0, 1000.0]
2025-09-12 12:28:14,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 10 hours, 4 minutes, 7 seconds)
2025-09-12 12:43:45,951 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:43:45,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:45:47,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2052.11353 ± 1250.845
2025-09-12 12:45:47,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [4831.458, 2305.1902, 1644.7925, 627.4322, 336.0155, 3121.802, 2625.5596, 2134.9194, 1949.1879, 944.7777]
2025-09-12 12:45:47,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [922.0, 458.0, 314.0, 120.0, 60.0, 605.0, 538.0, 440.0, 395.0, 195.0]
2025-09-12 12:45:47,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 48 minutes, 30 seconds)
2025-09-12 13:00:19,228 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:00:19,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:02:02,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1800.87476 ± 1303.131
2025-09-12 13:02:02,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1667.4624, 1709.001, 2744.1523, 114.7093, 126.58351, 2947.9858, 1310.7954, 1493.6002, 4699.425, 1195.0325]
2025-09-12 13:02:02,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [318.0, 323.0, 513.0, 22.0, 25.0, 551.0, 251.0, 281.0, 896.0, 228.0]
2025-09-12 13:02:02,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 30 minutes, 14 seconds)
2025-09-12 13:17:05,619 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:17:05,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:18:33,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1500.87109 ± 1057.787
2025-09-12 13:18:33,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2201.1296, 1076.9563, 2842.1868, 3787.16, 438.83545, 300.7808, 677.6268, 1148.107, 1351.695, 1184.2322]
2025-09-12 13:18:33,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [433.0, 206.0, 544.0, 751.0, 79.0, 65.0, 137.0, 217.0, 276.0, 228.0]
2025-09-12 13:18:33,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 9 hours, 4 minutes, 35 seconds)
2025-09-12 13:33:53,231 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:33:53,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:36:17,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2504.06299 ± 1850.938
2025-09-12 13:36:17,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1536.8948, 2446.8037, 1340.5808, 4474.1, 5208.6763, 473.08057, 5237.2646, 825.8325, 133.38681, 3364.0122]
2025-09-12 13:36:17,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [293.0, 455.0, 253.0, 856.0, 1000.0, 101.0, 1000.0, 178.0, 26.0, 653.0]
2025-09-12 13:36:17,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 48 minutes, 40 seconds)
2025-09-12 13:51:12,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:51:12,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:53:38,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2580.37280 ± 1576.380
2025-09-12 13:53:38,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2319.7336, 1336.7462, 5241.3115, 1661.2402, 1279.397, 2692.3901, 786.9876, 3948.8267, 5213.884, 1323.2123]
2025-09-12 13:53:38,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [437.0, 259.0, 1000.0, 313.0, 245.0, 511.0, 162.0, 741.0, 1000.0, 259.0]
2025-09-12 13:53:38,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 32 minutes, 25 seconds)
2025-09-12 14:08:48,006 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:08:48,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:11:03,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2331.45850 ± 1997.876
2025-09-12 14:11:03,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [5125.4785, 5182.493, 1158.4114, 625.9776, 1998.2893, 141.7469, 543.0469, 557.88776, 2778.2048, 5203.05]
2025-09-12 14:11:03,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 211.0, 118.0, 382.0, 27.0, 103.0, 117.0, 552.0, 1000.0]
2025-09-12 14:11:03,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 14 minutes, 34 seconds)
2025-09-12 14:26:10,182 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:26:10,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:28:14,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2119.02832 ± 1583.189
2025-09-12 14:28:14,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1217.7343, 2595.1853, 1270.5016, 4881.542, 1628.6556, 140.21335, 1245.5614, 1663.8529, 1288.2815, 5258.7554]
2025-09-12 14:28:14,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [251.0, 514.0, 254.0, 939.0, 311.0, 27.0, 237.0, 313.0, 257.0, 1000.0]
2025-09-12 14:28:14,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 8 hours, 2 minutes, 41 seconds)
2025-09-12 14:43:31,380 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:43:31,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:45:55,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2431.49658 ± 1407.876
2025-09-12 14:45:55,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [5006.2485, 1150.3665, 2203.269, 2035.9773, 763.46375, 438.35205, 1985.0071, 3520.2114, 4076.8257, 3135.244]
2025-09-12 14:45:55,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 226.0, 453.0, 401.0, 149.0, 80.0, 395.0, 699.0, 794.0, 607.0]
2025-09-12 14:45:55,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 51 minutes, 44 seconds)
2025-09-12 15:01:30,638 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:01:30,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:03:16,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1853.05664 ± 1841.767
2025-09-12 15:03:16,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [589.6662, 1263.6538, 2064.1262, 101.07063, 264.67303, 123.575165, 2677.5764, 5181.711, 5171.4907, 1093.0234]
2025-09-12 15:03:16,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 242.0, 403.0, 20.0, 49.0, 24.0, 510.0, 1000.0, 1000.0, 211.0]
2025-09-12 15:03:16,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 32 minutes, 18 seconds)
2025-09-12 15:18:12,992 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:18:12,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:20:27,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2318.46851 ± 1877.675
2025-09-12 15:20:27,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [5177.2827, 3818.9854, 1220.3173, 2575.2156, 3766.3286, 4882.7773, 1138.9059, 95.48228, 118.564766, 390.82635]
2025-09-12 15:20:27,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 727.0, 234.0, 497.0, 723.0, 934.0, 218.0, 19.0, 23.0, 71.0]
2025-09-12 15:20:27,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 14 minutes, 1 second)
2025-09-12 15:36:18,506 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:36:18,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:38:09,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1872.76208 ± 1923.894
2025-09-12 15:38:09,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [102.602516, 5134.0186, 4386.4233, 110.488144, 4393.808, 768.79297, 1747.9827, 1867.9396, 106.70041, 108.86248]
2025-09-12 15:38:09,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 1000.0, 854.0, 22.0, 853.0, 159.0, 330.0, 388.0, 21.0, 21.0]
2025-09-12 15:38:09,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 58 minutes, 3 seconds)
2025-09-12 15:52:18,594 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:52:18,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:55:00,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2834.89453 ± 1856.320
2025-09-12 15:55:01,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [5267.4233, 1761.715, 5018.3086, 3928.3013, 1154.1191, 1580.2946, 3259.4468, 5287.5703, 966.26416, 125.50172]
2025-09-12 15:55:01,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 335.0, 957.0, 750.0, 213.0, 299.0, 618.0, 1000.0, 186.0, 25.0]
2025-09-12 15:55:01,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 39 minutes, 9 seconds)
2025-09-12 16:10:12,339 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:10:12,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:13:41,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 3616.88525 ± 1581.375
2025-09-12 16:13:41,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [574.25305, 5183.447, 2522.2642, 2213.372, 2392.406, 4759.177, 5218.5293, 5185.6675, 3083.979, 5035.754]
2025-09-12 16:13:41,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 1000.0, 478.0, 440.0, 459.0, 913.0, 1000.0, 1000.0, 591.0, 1000.0]
2025-09-12 16:13:41,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (3616.89) for latency ExtremeClogL1U23
2025-09-12 16:13:41,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 26 minutes, 11 seconds)
2025-09-12 16:29:43,627 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:29:43,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:32:37,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2975.38281 ± 1881.630
2025-09-12 16:32:37,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [583.00287, 967.62, 5145.3438, 2894.2065, 3301.7854, 5257.4004, 619.02045, 5198.4346, 1294.5345, 4492.481]
2025-09-12 16:32:37,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 197.0, 1000.0, 546.0, 627.0, 1000.0, 120.0, 1000.0, 270.0, 849.0]
2025-09-12 16:32:37,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 6 hours, 15 minutes, 15 seconds)
2025-09-12 16:47:14,546 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:47:14,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:50:12,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 3054.70117 ± 1743.660
2025-09-12 16:50:12,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2940.5322, 3072.351, 1875.1891, 652.8017, 5131.341, 5172.269, 1180.1294, 910.03735, 5109.75, 4502.6113]
2025-09-12 16:50:12,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [594.0, 601.0, 362.0, 134.0, 1000.0, 1000.0, 219.0, 196.0, 1000.0, 872.0]
2025-09-12 16:50:12,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 59 minutes, 2 seconds)
2025-09-12 17:05:14,248 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:05:14,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:08:30,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 3393.72900 ± 1673.781
2025-09-12 17:08:30,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [5205.1436, 979.4385, 1383.8474, 5207.724, 5181.661, 1193.3538, 3390.8032, 3300.0222, 5203.685, 2891.6113]
2025-09-12 17:08:30,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 187.0, 270.0, 1000.0, 1000.0, 227.0, 651.0, 635.0, 1000.0, 554.0]
2025-09-12 17:08:30,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 43 minutes, 20 seconds)
2025-09-12 17:24:49,261 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:24:49,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:27:00,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2210.50928 ± 1472.180
2025-09-12 17:27:00,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2808.6062, 1508.969, 3212.992, 124.43102, 5143.6323, 1035.6289, 678.4364, 3169.478, 1158.8131, 3264.1052]
2025-09-12 17:27:00,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [553.0, 297.0, 633.0, 24.0, 1000.0, 200.0, 140.0, 614.0, 221.0, 633.0]
2025-09-12 17:27:00,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 31 minutes, 10 seconds)
2025-09-12 17:41:36,156 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:41:36,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:43:34,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2010.85876 ± 1325.114
2025-09-12 17:43:34,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [840.944, 1824.2444, 1987.9625, 445.09174, 1763.7551, 5132.2314, 1960.6029, 1215.336, 1276.7323, 3661.6868]
2025-09-12 17:43:34,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 374.0, 390.0, 82.0, 344.0, 1000.0, 381.0, 236.0, 244.0, 726.0]
2025-09-12 17:43:34,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 5 hours, 5 minutes, 37 seconds)
2025-09-12 17:58:32,655 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:58:32,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:01:00,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2540.80786 ± 1987.711
2025-09-12 18:01:00,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [446.77036, 5082.277, 3635.6062, 5214.9326, 577.70483, 5129.532, 969.2014, 2841.6028, 113.33446, 1397.1172]
2025-09-12 18:01:00,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 1000.0, 698.0, 1000.0, 118.0, 1000.0, 195.0, 549.0, 22.0, 290.0]
2025-09-12 18:01:00,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 42 minutes, 49 seconds)
2025-09-12 18:16:01,583 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:16:01,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:18:04,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2065.09424 ± 1600.073
2025-09-12 18:18:04,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1525.2269, 1992.2379, 936.8678, 5109.5474, 1879.1974, 5183.124, 1481.9503, 1145.1226, 489.5576, 908.1134]
2025-09-12 18:18:04,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [302.0, 389.0, 200.0, 1000.0, 360.0, 1000.0, 298.0, 230.0, 103.0, 207.0]
2025-09-12 18:18:04,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 23 minutes, 34 seconds)
2025-09-12 18:33:22,021 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:33:22,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:35:22,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2076.44971 ± 1443.982
2025-09-12 18:35:22,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [452.7184, 3232.56, 3694.545, 934.35065, 2137.583, 3247.9556, 89.77807, 3018.1626, 3812.2786, 144.56718]
2025-09-12 18:35:22,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 619.0, 713.0, 176.0, 403.0, 630.0, 18.0, 561.0, 724.0, 28.0]
2025-09-12 18:35:22,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 3 minutes, 12 seconds)
2025-09-12 18:50:23,265 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:50:23,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:54:03,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 3852.49609 ± 1237.909
2025-09-12 18:54:03,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [3041.8723, 5305.0435, 3638.6704, 5227.244, 5259.2812, 2910.7598, 1961.4939, 2301.4302, 5175.9824, 3703.1848]
2025-09-12 18:54:03,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [575.0, 1000.0, 687.0, 1000.0, 1000.0, 550.0, 368.0, 477.0, 978.0, 685.0]
2025-09-12 18:54:03,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (3852.50) for latency ExtremeClogL1U23
2025-09-12 18:54:03,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 46 minutes, 19 seconds)
2025-09-12 19:10:12,811 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:10:12,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:13:14,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 3172.21729 ± 1599.879
2025-09-12 19:13:14,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2798.9749, 4715.5225, 1766.7014, 4308.181, 118.79185, 5250.044, 1864.0763, 3176.5173, 5185.767, 2537.596]
2025-09-12 19:13:14,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [529.0, 893.0, 339.0, 815.0, 23.0, 1000.0, 348.0, 602.0, 1000.0, 482.0]
2025-09-12 19:13:14,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 35 minutes, 11 seconds)
2025-09-12 19:27:24,795 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:27:24,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:30:30,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 3239.29053 ± 1712.638
2025-09-12 19:30:30,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [3323.7593, 3135.5183, 2241.2078, 113.39338, 5119.5493, 5233.7896, 4641.3394, 2081.1646, 1286.23, 5216.9536]
2025-09-12 19:30:30,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [647.0, 590.0, 427.0, 22.0, 1000.0, 1000.0, 879.0, 395.0, 246.0, 997.0]
2025-09-12 19:30:30,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 16 minutes, 54 seconds)
2025-09-12 19:45:43,054 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:45:43,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:47:18,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1609.52148 ± 1796.095
2025-09-12 19:47:18,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1128.938, 5173.6177, 1069.5221, 488.87997, 5175.6797, 813.6913, 435.75015, 671.145, 613.09924, 524.89075]
2025-09-12 19:47:18,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [242.0, 1000.0, 199.0, 92.0, 1000.0, 156.0, 81.0, 135.0, 129.0, 109.0]
2025-09-12 19:47:18,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 58 minutes, 28 seconds)
2025-09-12 20:02:29,746 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:02:29,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:05:11,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2809.70508 ± 1511.188
2025-09-12 20:05:11,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2190.5962, 579.086, 1997.4587, 2293.0745, 5118.257, 2696.4373, 5214.3247, 3508.259, 3718.083, 781.47565]
2025-09-12 20:05:11,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [414.0, 122.0, 376.0, 434.0, 1000.0, 531.0, 1000.0, 674.0, 708.0, 146.0]
2025-09-12 20:05:11,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 41 minutes, 40 seconds)
2025-09-12 20:20:50,714 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:20:50,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:23:26,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2722.01270 ± 1902.373
2025-09-12 20:23:26,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1820.1115, 1475.7162, 5149.6157, 4146.828, 2242.8452, 88.89838, 5170.824, 1459.0435, 5191.1777, 475.06628]
2025-09-12 20:23:26,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [350.0, 283.0, 1000.0, 803.0, 429.0, 18.0, 1000.0, 282.0, 1000.0, 88.0]
2025-09-12 20:23:26,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 23 minutes, 1 second)
2025-09-12 20:37:56,647 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:37:56,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:40:04,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2147.74487 ± 1675.579
2025-09-12 20:40:04,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [5105.697, 1149.455, 1722.8959, 1099.0884, 1262.604, 5168.3506, 265.18658, 3087.9072, 500.64957, 2115.6133]
2025-09-12 20:40:04,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 220.0, 334.0, 224.0, 266.0, 1000.0, 57.0, 602.0, 102.0, 408.0]
2025-09-12 20:40:04,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 1 minute, 34 seconds)
2025-09-12 20:55:04,681 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:55:04,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:57:15,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2149.10474 ± 1695.706
2025-09-12 20:57:15,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [367.6935, 2274.2468, 4997.0767, 1212.5107, 920.65265, 663.3343, 4673.8096, 800.3776, 1404.3241, 4177.0215]
2025-09-12 20:57:15,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 464.0, 1000.0, 242.0, 193.0, 135.0, 937.0, 169.0, 289.0, 801.0]
2025-09-12 20:57:15,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 44 minutes, 5 seconds)
2025-09-12 21:13:14,931 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:13:14,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:16:06,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2952.71948 ± 1836.185
2025-09-12 21:16:06,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2427.824, 1262.9672, 430.0959, 736.97095, 5261.672, 5153.6787, 3969.0032, 5205.4243, 3699.043, 1380.5171]
2025-09-12 21:16:06,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [466.0, 243.0, 86.0, 147.0, 1000.0, 1000.0, 772.0, 1000.0, 696.0, 285.0]
2025-09-12 21:16:06,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 28 minutes, 48 seconds)
2025-09-12 21:30:29,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:30:29,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:32:34,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2200.76196 ± 1940.864
2025-09-12 21:32:34,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [4113.434, 799.96027, 370.05795, 150.0146, 118.46607, 4114.9326, 560.52203, 5324.463, 2244.5078, 4211.2603]
2025-09-12 21:32:34,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [782.0, 149.0, 79.0, 29.0, 23.0, 770.0, 105.0, 1000.0, 420.0, 814.0]
2025-09-12 21:32:34,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 9 minutes, 54 seconds)
2025-09-12 21:47:39,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:47:39,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:50:47,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 3225.10352 ± 1845.547
2025-09-12 21:50:47,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [419.8276, 4894.133, 1848.498, 5232.433, 3228.7507, 4097.5083, 89.24306, 2456.6492, 4831.1606, 5152.8315]
2025-09-12 21:50:47,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 937.0, 356.0, 1000.0, 615.0, 787.0, 18.0, 472.0, 934.0, 1000.0]
2025-09-12 21:50:47,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 52 minutes, 24 seconds)
2025-09-12 22:06:12,234 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:06:12,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:08:52,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2737.23486 ± 1985.997
2025-09-12 22:08:52,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [5134.2124, 4835.2915, 5112.197, 1318.5863, 1964.9889, 2464.4995, 5110.13, 411.54495, 594.545, 426.35516]
2025-09-12 22:08:52,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 924.0, 1000.0, 256.0, 379.0, 483.0, 1000.0, 75.0, 126.0, 80.0]
2025-09-12 22:08:52,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 35 minutes, 31 seconds)
2025-09-12 22:24:37,021 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:24:37,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:27:46,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 3233.68140 ± 1861.104
2025-09-12 22:27:46,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [3208.1257, 5062.46, 5138.6895, 326.60324, 1540.513, 3381.8916, 5110.913, 161.22377, 3343.2114, 5063.1836]
2025-09-12 22:27:46,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [620.0, 1000.0, 1000.0, 64.0, 285.0, 655.0, 1000.0, 31.0, 639.0, 1000.0]
2025-09-12 22:27:46,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 18 minutes, 6 seconds)
2025-09-12 22:42:23,070 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:42:23,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:45:12,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 2893.91992 ± 1419.373
2025-09-12 22:45:12,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2180.3203, 2824.524, 5191.663, 2132.7024, 5100.2334, 2956.7524, 3640.0156, 571.7613, 3146.8005, 1194.4255]
2025-09-12 22:45:12,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [426.0, 551.0, 1000.0, 424.0, 994.0, 567.0, 713.0, 115.0, 616.0, 241.0]
2025-09-12 22:45:12,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1251 [DEBUG]: Training session finished
