2025-09-12 01:01:20,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc5-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-12 01:01:20,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc5-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-12 01:01:20,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x150f6b84f110>}
2025-09-12 01:01:20,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1111 [DEBUG]: using device: cuda
2025-09-12 01:01:20,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1133 [INFO]: Creating new trainer
2025-09-12 01:01:20,927 baseline-mbpac-noiseperc5-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-12 01:01:20,927 baseline-mbpac-noiseperc5-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 01:01:20,938 baseline-mbpac-noiseperc5-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-12 01:01:22,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1194 [DEBUG]: Starting training session...
2025-09-12 01:01:22,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 1/100
2025-09-12 01:13:24,674 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:13:24,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:13:49,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 423.37262 ± 113.992
2025-09-12 01:13:49,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [277.41525, 605.29346, 556.79535, 312.6944, 410.23425, 551.57477, 455.18698, 283.83298, 334.58478, 446.11383]
2025-09-12 01:13:49,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 130.0, 108.0, 68.0, 91.0, 120.0, 89.0, 60.0, 74.0, 90.0]
2025-09-12 01:13:49,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (423.37) for latency MM1Queue_a033_s075
2025-09-12 01:13:49,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 20 hours, 32 minutes, 36 seconds)
2025-09-12 01:27:11,802 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:27:11,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:27:32,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 386.25040 ± 112.485
2025-09-12 01:27:32,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [282.2388, 240.72707, 499.12216, 346.45203, 332.72952, 448.23624, 313.93777, 340.20865, 417.27637, 641.5753]
2025-09-12 01:27:32,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 46.0, 103.0, 68.0, 68.0, 86.0, 63.0, 70.0, 77.0, 125.0]
2025-09-12 01:27:32,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 21 hours, 22 minutes, 36 seconds)
2025-09-12 01:41:00,682 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:41:00,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:41:24,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 442.93100 ± 121.821
2025-09-12 01:41:24,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [327.55585, 372.80344, 345.86288, 735.9122, 315.78784, 498.5642, 561.1595, 434.97842, 404.15674, 432.52887]
2025-09-12 01:41:24,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 69.0, 75.0, 137.0, 69.0, 94.0, 106.0, 93.0, 86.0, 80.0]
2025-09-12 01:41:24,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (442.93) for latency MM1Queue_a033_s075
2025-09-12 01:41:24,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 21 hours, 34 minutes, 46 seconds)
2025-09-12 01:54:52,530 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:54:52,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:55:12,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 361.57669 ± 44.644
2025-09-12 01:55:12,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [310.02295, 393.89523, 399.72958, 313.1786, 314.2936, 331.6176, 380.8586, 438.3136, 405.52817, 328.32895]
2025-09-12 01:55:12,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 76.0, 81.0, 71.0, 64.0, 67.0, 80.0, 80.0, 86.0, 60.0]
2025-09-12 01:55:12,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 21 hours, 32 minutes, 4 seconds)
2025-09-12 02:08:35,985 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:08:35,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:08:53,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 346.04987 ± 41.009
2025-09-12 02:08:53,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [317.80713, 325.34317, 338.5707, 317.24313, 314.83035, 400.1208, 359.59854, 437.4878, 298.7213, 350.77585]
2025-09-12 02:08:53,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 60.0, 63.0, 59.0, 59.0, 78.0, 67.0, 81.0, 57.0, 67.0]
2025-09-12 02:08:53,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 21 hours, 23 minutes, 3 seconds)
2025-09-12 02:22:19,951 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:22:19,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:22:39,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 368.42667 ± 47.955
2025-09-12 02:22:39,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [314.49173, 327.93414, 391.25385, 300.1817, 362.21405, 435.0092, 444.58005, 393.38818, 324.88565, 390.32797]
2025-09-12 02:22:39,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 61.0, 72.0, 56.0, 70.0, 86.0, 82.0, 78.0, 59.0, 80.0]
2025-09-12 02:22:39,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 21 hours, 34 minutes, 5 seconds)
2025-09-12 02:36:01,895 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:36:01,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:36:23,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 409.61523 ± 87.997
2025-09-12 02:36:23,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [361.30527, 419.35486, 386.0246, 454.07678, 296.93942, 608.569, 352.76578, 372.577, 509.64835, 334.8909]
2025-09-12 02:36:23,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 87.0, 75.0, 85.0, 60.0, 114.0, 70.0, 69.0, 100.0, 63.0]
2025-09-12 02:36:23,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 21 hours, 20 minutes, 25 seconds)
2025-09-12 02:49:36,555 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:49:36,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:50:00,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 452.19476 ± 70.444
2025-09-12 02:50:00,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [506.33914, 324.15244, 423.34213, 539.2484, 412.4101, 487.13837, 473.07114, 449.09058, 357.05652, 550.09845]
2025-09-12 02:50:00,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 66.0, 93.0, 103.0, 76.0, 104.0, 88.0, 85.0, 68.0, 103.0]
2025-09-12 02:50:00,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (452.19) for latency MM1Queue_a033_s075
2025-09-12 02:50:00,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 21 hours, 2 minutes, 3 seconds)
2025-09-12 03:03:17,391 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:03:17,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:03:45,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 532.05676 ± 102.981
2025-09-12 03:03:45,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [682.5638, 655.0645, 417.28122, 370.31494, 493.5274, 515.0018, 450.58865, 589.5073, 656.45416, 490.2642]
2025-09-12 03:03:45,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 122.0, 84.0, 75.0, 92.0, 97.0, 82.0, 130.0, 125.0, 89.0]
2025-09-12 03:03:45,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (532.06) for latency MM1Queue_a033_s075
2025-09-12 03:03:45,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 20 hours, 47 minutes, 36 seconds)
2025-09-12 03:17:06,893 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:17:06,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:17:30,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 455.14178 ± 74.872
2025-09-12 03:17:30,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [449.6727, 454.7182, 475.29843, 542.1555, 552.77075, 323.51685, 421.97745, 359.0226, 552.97546, 419.31027]
2025-09-12 03:17:30,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 84.0, 84.0, 113.0, 103.0, 67.0, 77.0, 65.0, 100.0, 79.0]
2025-09-12 03:17:30,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 20 hours, 34 minutes, 50 seconds)
2025-09-12 03:30:46,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:30:46,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:31:14,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 506.44995 ± 113.152
2025-09-12 03:31:14,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [478.767, 455.5235, 555.59985, 377.88483, 749.90186, 521.4178, 359.1312, 625.7728, 408.30774, 532.1929]
2025-09-12 03:31:14,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 86.0, 108.0, 79.0, 153.0, 98.0, 68.0, 133.0, 90.0, 105.0]
2025-09-12 03:31:14,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 20 hours, 20 minutes, 48 seconds)
2025-09-12 03:44:35,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:44:35,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:45:00,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 478.85220 ± 77.979
2025-09-12 03:45:00,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [406.45654, 536.0287, 439.12836, 368.392, 497.9743, 434.29523, 568.0604, 574.9146, 384.25745, 579.01416]
2025-09-12 03:45:00,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 118.0, 88.0, 80.0, 96.0, 81.0, 106.0, 115.0, 70.0, 108.0]
2025-09-12 03:45:00,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 20 hours, 7 minutes, 45 seconds)
2025-09-12 03:58:12,858 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:58:12,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:58:40,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 528.59326 ± 136.627
2025-09-12 03:58:40,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [516.8434, 711.24207, 549.51337, 382.1598, 688.7756, 389.69937, 439.3472, 438.86295, 763.6559, 405.83325]
2025-09-12 03:58:40,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 140.0, 101.0, 81.0, 144.0, 71.0, 87.0, 81.0, 147.0, 75.0]
2025-09-12 03:58:40,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 19 hours, 54 minutes, 51 seconds)
2025-09-12 04:11:59,325 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:11:59,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:12:29,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 599.88800 ± 119.370
2025-09-12 04:12:29,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [542.28436, 602.12213, 626.16565, 436.87454, 618.9223, 903.755, 516.9817, 547.45337, 532.96936, 671.3518]
2025-09-12 04:12:29,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 109.0, 119.0, 82.0, 122.0, 159.0, 112.0, 102.0, 102.0, 123.0]
2025-09-12 04:12:29,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (599.89) for latency MM1Queue_a033_s075
2025-09-12 04:12:29,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 19 hours, 42 minutes, 19 seconds)
2025-09-12 04:25:49,067 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:25:49,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:26:16,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 528.66180 ± 92.366
2025-09-12 04:26:16,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [451.76224, 404.41144, 653.9378, 469.83057, 651.1471, 615.6172, 414.59143, 603.35675, 472.07718, 549.88654]
2025-09-12 04:26:16,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 73.0, 124.0, 88.0, 120.0, 112.0, 76.0, 113.0, 91.0, 102.0]
2025-09-12 04:26:16,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 19 hours, 29 minutes)
2025-09-12 04:39:32,486 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:39:32,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:40:02,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 575.94189 ± 91.169
2025-09-12 04:40:02,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [498.1163, 483.568, 548.19366, 681.19055, 433.27435, 700.8588, 652.306, 652.82806, 618.0395, 491.04346]
2025-09-12 04:40:02,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 106.0, 102.0, 130.0, 79.0, 127.0, 125.0, 120.0, 119.0, 94.0]
2025-09-12 04:40:02,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 19 hours, 15 minutes, 54 seconds)
2025-09-12 04:53:19,850 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:53:19,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:53:53,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 646.05103 ± 137.052
2025-09-12 04:53:53,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [530.1868, 519.7237, 921.2242, 774.64557, 548.5296, 684.47754, 562.5287, 818.81824, 550.6309, 549.7452]
2025-09-12 04:53:53,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 97.0, 177.0, 150.0, 100.0, 132.0, 112.0, 161.0, 117.0, 105.0]
2025-09-12 04:53:53,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (646.05) for latency MM1Queue_a033_s075
2025-09-12 04:53:53,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 19 hours, 3 minutes, 34 seconds)
2025-09-12 05:07:08,329 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:07:08,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:07:44,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 728.56586 ± 214.877
2025-09-12 05:07:44,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [577.08765, 501.23047, 938.3648, 736.07623, 795.9056, 452.2916, 486.78278, 1027.558, 704.5581, 1065.8033]
2025-09-12 05:07:44,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 96.0, 179.0, 135.0, 150.0, 82.0, 89.0, 187.0, 126.0, 199.0]
2025-09-12 05:07:44,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (728.57) for latency MM1Queue_a033_s075
2025-09-12 05:07:44,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 18 hours, 52 minutes, 48 seconds)
2025-09-12 05:21:00,617 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:21:00,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:21:38,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 743.55975 ± 214.190
2025-09-12 05:21:38,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [614.0142, 462.36392, 1050.7894, 764.84717, 658.83594, 693.33435, 473.6035, 917.97906, 669.49603, 1130.3337]
2025-09-12 05:21:38,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 87.0, 194.0, 147.0, 124.0, 125.0, 84.0, 178.0, 132.0, 210.0]
2025-09-12 05:21:38,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (743.56) for latency MM1Queue_a033_s075
2025-09-12 05:21:38,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 18 hours, 40 minutes, 8 seconds)
2025-09-12 05:34:55,885 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:34:55,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:35:35,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 780.29401 ± 200.218
2025-09-12 05:35:35,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [568.78925, 798.8505, 1173.2224, 754.8147, 656.18835, 846.891, 603.9289, 813.1008, 1065.9558, 521.1979]
2025-09-12 05:35:35,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 150.0, 227.0, 143.0, 120.0, 154.0, 113.0, 152.0, 201.0, 97.0]
2025-09-12 05:35:35,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (780.29) for latency MM1Queue_a033_s075
2025-09-12 05:35:35,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 18 hours, 29 minutes, 17 seconds)
2025-09-12 05:48:58,356 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:48:58,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:49:48,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 951.53601 ± 246.164
2025-09-12 05:49:48,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [857.66113, 708.8104, 1444.3523, 690.41266, 900.3344, 1043.2881, 661.36, 919.99304, 976.61786, 1312.5305]
2025-09-12 05:49:48,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 142.0, 278.0, 129.0, 170.0, 198.0, 121.0, 170.0, 202.0, 247.0]
2025-09-12 05:49:48,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (951.54) for latency MM1Queue_a033_s075
2025-09-12 05:49:48,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 18 hours, 22 minutes, 12 seconds)
2025-09-12 06:03:16,989 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:03:16,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:04:14,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1106.54419 ± 386.153
2025-09-12 06:04:14,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1474.0275, 1013.4664, 700.93195, 1739.3313, 1085.0122, 949.40137, 607.1245, 1577.7032, 1307.443, 611.00134]
2025-09-12 06:04:14,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [289.0, 190.0, 148.0, 357.0, 204.0, 174.0, 120.0, 298.0, 238.0, 114.0]
2025-09-12 06:04:14,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (1106.54) for latency MM1Queue_a033_s075
2025-09-12 06:04:14,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 18 hours, 17 minutes, 23 seconds)
2025-09-12 06:17:47,029 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:17:47,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:18:39,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1018.70978 ± 218.065
2025-09-12 06:18:39,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1517.3181, 1181.023, 1211.1643, 837.0248, 889.80493, 916.6821, 767.03973, 922.282, 857.13446, 1087.6244]
2025-09-12 06:18:39,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [285.0, 213.0, 228.0, 155.0, 187.0, 183.0, 141.0, 175.0, 164.0, 204.0]
2025-09-12 06:18:39,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 18 hours, 11 minutes, 58 seconds)
2025-09-12 06:31:52,450 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:31:52,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:33:04,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1359.46228 ± 516.102
2025-09-12 06:33:04,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1520.6699, 1367.5507, 935.4902, 526.012, 1071.4756, 1706.1064, 773.71814, 1737.0885, 2371.04, 1585.472]
2025-09-12 06:33:04,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [289.0, 261.0, 194.0, 112.0, 222.0, 318.0, 147.0, 333.0, 455.0, 296.0]
2025-09-12 06:33:04,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (1359.46) for latency MM1Queue_a033_s075
2025-09-12 06:33:04,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 18 hours, 5 minutes, 51 seconds)
2025-09-12 06:46:34,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:46:34,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:47:36,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1186.35425 ± 656.099
2025-09-12 06:47:36,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [872.8484, 944.10675, 2850.955, 1102.2134, 461.9599, 1259.6967, 1193.3248, 955.48425, 488.97134, 1733.9827]
2025-09-12 06:47:36,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 170.0, 546.0, 214.0, 102.0, 228.0, 226.0, 182.0, 90.0, 319.0]
2025-09-12 06:47:36,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 18 hours, 6 seconds)
2025-09-12 07:00:48,687 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:00:48,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:01:48,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1160.91235 ± 183.597
2025-09-12 07:01:48,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1195.1815, 1223.9603, 912.21954, 1478.0862, 954.7472, 1065.7217, 1252.9061, 1149.0105, 956.5093, 1420.7812]
2025-09-12 07:01:48,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [225.0, 228.0, 180.0, 276.0, 186.0, 197.0, 231.0, 226.0, 196.0, 274.0]
2025-09-12 07:01:48,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 17 hours, 45 minutes, 39 seconds)
2025-09-12 07:15:36,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:15:36,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:16:41,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1295.97278 ± 507.925
2025-09-12 07:16:41,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1027.04, 2598.158, 793.79364, 1166.34, 1201.6466, 969.8417, 952.4512, 1379.6814, 1072.2505, 1798.5244]
2025-09-12 07:16:41,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 486.0, 148.0, 212.0, 222.0, 177.0, 189.0, 255.0, 193.0, 332.0]
2025-09-12 07:16:41,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 17 hours, 37 minutes, 48 seconds)
2025-09-12 07:29:53,494 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:29:53,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:31:15,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1645.46265 ± 706.403
2025-09-12 07:31:15,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2664.1587, 2090.525, 1162.5525, 1515.1362, 985.244, 2336.008, 1789.7467, 663.2416, 2519.7773, 728.23755]
2025-09-12 07:31:15,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [486.0, 385.0, 213.0, 276.0, 186.0, 425.0, 330.0, 127.0, 461.0, 139.0]
2025-09-12 07:31:15,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (1645.46) for latency MM1Queue_a033_s075
2025-09-12 07:31:15,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 17 hours, 25 minutes, 30 seconds)
2025-09-12 07:44:38,488 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:44:38,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:45:59,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1609.82642 ± 498.732
2025-09-12 07:45:59,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1189.7914, 2075.2598, 1525.488, 977.9751, 1769.675, 2494.1094, 2273.706, 1479.5228, 1080.6787, 1232.0573]
2025-09-12 07:45:59,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [211.0, 378.0, 286.0, 189.0, 332.0, 459.0, 423.0, 286.0, 216.0, 234.0]
2025-09-12 07:45:59,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 17 hours, 15 minutes, 24 seconds)
2025-09-12 07:59:27,313 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:59:27,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:01:14,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2088.96338 ± 751.409
2025-09-12 08:01:14,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2824.3772, 2166.7805, 1127.0381, 2228.866, 1522.1965, 1766.3623, 1418.8641, 1881.3048, 3887.9644, 2065.8809]
2025-09-12 08:01:14,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [525.0, 399.0, 239.0, 411.0, 292.0, 335.0, 267.0, 343.0, 753.0, 389.0]
2025-09-12 08:01:14,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (2088.96) for latency MM1Queue_a033_s075
2025-09-12 08:01:14,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 17 hours, 10 minutes, 52 seconds)
2025-09-12 08:14:58,323 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:14:58,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:16:38,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1938.17773 ± 702.297
2025-09-12 08:16:38,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1516.9685, 1093.7144, 1975.7272, 1225.9341, 2511.2346, 2397.7563, 3538.3284, 1752.2644, 2052.384, 1317.4648]
2025-09-12 08:16:38,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [286.0, 239.0, 369.0, 237.0, 474.0, 441.0, 639.0, 328.0, 383.0, 255.0]
2025-09-12 08:16:38,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 17 hours, 12 minutes, 36 seconds)
2025-09-12 08:29:51,688 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:29:51,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:31:44,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2256.23706 ± 1359.679
2025-09-12 08:31:44,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1232.4155, 2865.1174, 3952.5906, 1953.378, 4990.387, 1137.3846, 902.0324, 1098.2356, 3262.6526, 1168.1761]
2025-09-12 08:31:44,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [230.0, 523.0, 726.0, 354.0, 904.0, 218.0, 167.0, 209.0, 600.0, 228.0]
2025-09-12 08:31:44,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (2256.24) for latency MM1Queue_a033_s075
2025-09-12 08:31:44,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 33 seconds)
2025-09-12 08:45:07,037 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:45:07,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:46:56,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2136.98682 ± 582.725
2025-09-12 08:46:56,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2348.5405, 1913.4259, 3288.5225, 2077.0981, 1605.2134, 2815.9297, 1331.9536, 2086.9854, 1447.1683, 2455.0327]
2025-09-12 08:46:56,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [450.0, 345.0, 609.0, 390.0, 305.0, 509.0, 250.0, 406.0, 257.0, 463.0]
2025-09-12 08:46:56,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 16 hours, 54 minutes, 7 seconds)
2025-09-12 09:00:16,196 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:00:16,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:01:40,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 1686.39648 ± 769.318
2025-09-12 09:01:40,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1179.3219, 1603.1177, 1321.9696, 3778.2075, 1058.0898, 1186.3108, 1833.3029, 1612.5337, 2130.6995, 1160.4124]
2025-09-12 09:01:40,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [219.0, 291.0, 246.0, 704.0, 193.0, 228.0, 338.0, 296.0, 386.0, 218.0]
2025-09-12 09:01:40,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 16 hours, 39 minutes)
2025-09-12 09:15:31,503 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:15:31,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:17:38,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2566.93457 ± 1668.912
2025-09-12 09:17:38,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [4893.691, 653.9674, 995.31116, 2013.3594, 5533.3286, 3648.0667, 1296.4529, 3833.4722, 1620.944, 1180.7534]
2025-09-12 09:17:38,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [902.0, 145.0, 181.0, 362.0, 1000.0, 655.0, 253.0, 700.0, 308.0, 210.0]
2025-09-12 09:17:38,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (2566.93) for latency MM1Queue_a033_s075
2025-09-12 09:17:38,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 33 minutes, 16 seconds)
2025-09-12 09:30:47,156 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:30:47,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:32:54,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2522.58936 ± 1100.563
2025-09-12 09:32:54,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [3454.1685, 2058.4128, 1872.5712, 3210.6533, 1460.6123, 5273.2275, 1968.7081, 2360.1099, 1868.1749, 1699.2543]
2025-09-12 09:32:54,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [627.0, 393.0, 353.0, 613.0, 275.0, 1000.0, 375.0, 442.0, 345.0, 324.0]
2025-09-12 09:32:54,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 16 minutes, 21 seconds)
2025-09-12 09:46:51,288 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:46:51,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:49:01,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2655.07471 ± 1363.729
2025-09-12 09:49:01,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5451.0454, 1772.749, 1166.1971, 4251.788, 2303.7864, 1553.6431, 2116.217, 4036.0156, 2543.8767, 1355.4288]
2025-09-12 09:49:01,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [995.0, 315.0, 227.0, 777.0, 423.0, 282.0, 382.0, 735.0, 465.0, 246.0]
2025-09-12 09:49:01,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (2655.07) for latency MM1Queue_a033_s075
2025-09-12 09:49:01,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 13 minutes, 50 seconds)
2025-09-12 10:02:27,238 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:02:27,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:04:34,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2528.56763 ± 1323.627
2025-09-12 10:04:34,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5173.497, 1218.2152, 1289.4479, 2654.2764, 1202.1368, 2149.1975, 1395.8516, 2673.568, 3060.3577, 4469.128]
2025-09-12 10:04:34,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [975.0, 223.0, 257.0, 488.0, 216.0, 418.0, 253.0, 480.0, 570.0, 827.0]
2025-09-12 10:04:34,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 2 minutes, 40 seconds)
2025-09-12 10:17:25,914 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:17:25,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:19:40,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 2670.68188 ± 1667.358
2025-09-12 10:19:40,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [3028.798, 1248.4418, 769.25336, 5609.557, 4121.6807, 5140.547, 1333.746, 2590.3652, 1025.8293, 1838.5979]
2025-09-12 10:19:40,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [551.0, 221.0, 137.0, 1000.0, 753.0, 1000.0, 242.0, 466.0, 193.0, 367.0]
2025-09-12 10:19:40,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (2670.68) for latency MM1Queue_a033_s075
2025-09-12 10:19:40,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 51 minutes, 32 seconds)
2025-09-12 10:32:48,989 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:32:48,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:35:36,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3433.77612 ± 1472.057
2025-09-12 10:35:36,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1804.7694, 2578.1633, 5490.965, 2992.1772, 1203.2411, 4776.435, 3996.2651, 5438.7637, 4127.0938, 1929.8878]
2025-09-12 10:35:36,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [327.0, 469.0, 1000.0, 550.0, 221.0, 848.0, 714.0, 1000.0, 744.0, 357.0]
2025-09-12 10:35:36,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (3433.78) for latency MM1Queue_a033_s075
2025-09-12 10:35:36,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 35 minutes, 38 seconds)
2025-09-12 10:49:02,545 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:49:02,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:51:34,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3069.17383 ± 1444.803
2025-09-12 10:51:34,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2189.3516, 2796.9172, 3387.8396, 5554.576, 1083.7063, 5286.395, 4073.6084, 2357.1704, 1271.966, 2690.2048]
2025-09-12 10:51:34,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [415.0, 528.0, 635.0, 1000.0, 214.0, 1000.0, 737.0, 441.0, 238.0, 501.0]
2025-09-12 10:51:34,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 28 minutes, 17 seconds)
2025-09-12 11:05:37,321 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:05:37,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:08:40,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3577.56006 ± 1827.919
2025-09-12 11:08:40,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5242.361, 548.4785, 4692.555, 682.15045, 5390.978, 2322.3296, 2589.0994, 5467.817, 5129.153, 3710.6812]
2025-09-12 11:08:40,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 119.0, 860.0, 153.0, 1000.0, 438.0, 473.0, 1000.0, 977.0, 694.0]
2025-09-12 11:08:40,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (3577.56) for latency MM1Queue_a033_s075
2025-09-12 11:08:40,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 23 minutes, 52 seconds)
2025-09-12 11:21:36,536 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:21:36,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:24:31,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3328.91479 ± 1978.057
2025-09-12 11:24:31,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5217.5303, 4847.963, 5371.7983, 723.95905, 5283.2046, 5074.049, 3155.7078, 1015.18585, 2251.5789, 348.17313]
2025-09-12 11:24:31,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 959.0, 1000.0, 154.0, 1000.0, 1000.0, 626.0, 212.0, 457.0, 75.0]
2025-09-12 11:24:31,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 11 minutes, 29 seconds)
2025-09-12 11:38:06,665 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:38:06,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:41:41,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4157.74512 ± 1777.460
2025-09-12 11:41:41,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5280.28, 5315.969, 5297.088, 3027.6726, 5260.057, 5266.308, 815.5179, 5178.8896, 5233.923, 901.74524]
2025-09-12 11:41:41,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 574.0, 1000.0, 1000.0, 160.0, 1000.0, 1000.0, 193.0]
2025-09-12 11:41:41,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (4157.75) for latency MM1Queue_a033_s075
2025-09-12 11:41:41,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 15 hours, 18 minutes, 34 seconds)
2025-09-12 11:54:36,041 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:54:36,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:57:41,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3675.18359 ± 1848.559
2025-09-12 11:57:41,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5386.9404, 5364.989, 2562.5703, 5424.429, 5207.839, 561.374, 3109.722, 920.252, 2735.4656, 5478.254]
2025-09-12 11:57:41,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 489.0, 1000.0, 1000.0, 123.0, 576.0, 169.0, 521.0, 1000.0]
2025-09-12 11:57:41,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 15 hours, 2 minutes, 55 seconds)
2025-09-12 12:10:57,562 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:10:57,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:14:15,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3661.92505 ± 1671.712
2025-09-12 12:14:15,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2797.8901, 5065.154, 5000.1577, 761.06537, 5206.2705, 3539.0002, 2719.5647, 1058.0692, 5215.4985, 5256.581]
2025-09-12 12:14:15,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [581.0, 1000.0, 1000.0, 166.0, 1000.0, 709.0, 547.0, 230.0, 1000.0, 1000.0]
2025-09-12 12:14:15,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 52 minutes, 53 seconds)
2025-09-12 12:27:35,727 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:27:35,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:31:21,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4518.66309 ± 1098.277
2025-09-12 12:31:21,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5390.709, 5314.9136, 5405.4424, 3721.2388, 5408.058, 3179.4644, 5406.216, 2390.2715, 3666.3608, 5303.956]
2025-09-12 12:31:21,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 686.0, 1000.0, 572.0, 1000.0, 449.0, 684.0, 1000.0]
2025-09-12 12:31:21,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (4518.66) for latency MM1Queue_a033_s075
2025-09-12 12:31:21,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 36 minutes, 29 seconds)
2025-09-12 12:44:53,931 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:44:53,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:48:13,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4004.80469 ± 1799.266
2025-09-12 12:48:13,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5487.093, 5602.1494, 715.7002, 1096.7053, 3386.3435, 5469.756, 5536.2554, 4542.7217, 5363.841, 2847.4863]
2025-09-12 12:48:13,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 151.0, 238.0, 623.0, 1000.0, 1000.0, 857.0, 1000.0, 525.0]
2025-09-12 12:48:13,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 14 hours, 30 minutes, 21 seconds)
2025-09-12 13:01:48,914 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:01:48,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:06:06,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4918.53369 ± 941.143
2025-09-12 13:06:06,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5190.2056, 5258.4624, 2101.5347, 5230.4087, 5241.906, 5168.2666, 5129.0337, 5223.4907, 5380.889, 5261.141]
2025-09-12 13:06:06,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 429.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:06:06,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (4918.53) for latency MM1Queue_a033_s075
2025-09-12 13:06:06,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 14 hours, 21 minutes, 5 seconds)
2025-09-12 13:19:49,915 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:19:49,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:23:46,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4641.04736 ± 1166.634
2025-09-12 13:23:46,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2591.6958, 5448.686, 5337.157, 5333.9966, 5425.3604, 5375.508, 2442.049, 5325.157, 5379.0264, 3751.8362]
2025-09-12 13:23:46,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [482.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 464.0, 1000.0, 1000.0, 696.0]
2025-09-12 13:23:46,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 14 hours, 20 minutes, 45 seconds)
2025-09-12 13:37:32,570 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:37:32,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:40:12,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3009.85645 ± 1802.526
2025-09-12 13:40:12,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5171.642, 5159.778, 2186.538, 5184.6436, 1311.9675, 2211.9321, 515.0095, 1515.5782, 5075.0806, 1766.3944]
2025-09-12 13:40:12,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 469.0, 1000.0, 269.0, 445.0, 114.0, 310.0, 1000.0, 367.0]
2025-09-12 13:40:12,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 14 hours, 2 minutes, 23 seconds)
2025-09-12 13:52:25,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:52:25,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:56:01,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4298.72656 ± 1720.022
2025-09-12 13:56:01,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5445.3047, 1998.8401, 5326.2886, 5388.366, 5488.187, 5465.6924, 5361.156, 1830.0125, 1231.5256, 5451.895]
2025-09-12 13:56:01,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 377.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 352.0, 238.0, 1000.0]
2025-09-12 13:56:01,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 13 hours, 32 minutes, 42 seconds)
2025-09-12 14:10:16,021 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:10:16,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:14:22,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4992.15625 ± 1083.035
2025-09-12 14:14:22,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1936.7197, 5505.492, 4246.3643, 5363.058, 5525.0073, 5506.939, 5367.0117, 5532.2583, 5475.288, 5463.424]
2025-09-12 14:14:22,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [364.0, 1000.0, 767.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:14:22,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (4992.16) for latency MM1Queue_a033_s075
2025-09-12 14:14:22,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 13 hours, 29 minutes, 48 seconds)
2025-09-12 14:27:28,265 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:27:28,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:31:21,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4673.64844 ± 1117.312
2025-09-12 14:31:21,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5379.729, 2869.9265, 5373.692, 5352.6484, 2813.5413, 5405.4365, 5424.7554, 5430.023, 3240.9053, 5445.829]
2025-09-12 14:31:21,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 547.0, 1000.0, 1000.0, 533.0, 1000.0, 1000.0, 1000.0, 588.0, 1000.0]
2025-09-12 14:31:21,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 13 hours, 4 minutes, 18 seconds)
2025-09-12 14:44:25,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:44:25,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:47:42,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3876.06006 ± 1920.638
2025-09-12 14:47:42,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5331.9653, 5434.901, 5353.4595, 5196.5063, 597.93695, 724.0201, 2102.7224, 5337.9985, 5344.9487, 3336.146]
2025-09-12 14:47:42,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 127.0, 159.0, 397.0, 1000.0, 1000.0, 658.0]
2025-09-12 14:47:42,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 35 minutes, 23 seconds)
2025-09-12 15:01:43,308 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:01:43,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:05:27,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4400.28418 ± 1568.369
2025-09-12 15:05:27,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5352.943, 3357.2803, 5310.8936, 840.07916, 2301.4504, 5343.299, 5298.834, 5435.1567, 5380.54, 5382.368]
2025-09-12 15:05:27,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 619.0, 1000.0, 178.0, 413.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:05:27,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 12 hours, 30 minutes, 7 seconds)
2025-09-12 15:18:53,921 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:18:53,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:23:24,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5253.05713 ± 97.183
2025-09-12 15:23:24,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5247.243, 5192.733, 5041.172, 5321.9937, 5359.5527, 5282.858, 5326.105, 5359.7, 5142.714, 5256.4956]
2025-09-12 15:23:24,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:23:24,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (5253.06) for latency MM1Queue_a033_s075
2025-09-12 15:23:24,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 12 hours, 31 minutes, 36 seconds)
2025-09-12 15:35:52,066 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:35:52,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:40:05,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5102.43262 ± 964.782
2025-09-12 15:40:05,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5445.6353, 5386.6, 5407.8325, 5336.8887, 2211.0598, 5485.1685, 5421.7354, 5442.16, 5491.33, 5395.9106]
2025-09-12 15:40:05,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 408.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:40:05,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 12 hours, 1 second)
2025-09-12 15:53:53,677 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:53:53,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:57:26,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4104.07373 ± 1786.204
2025-09-12 15:57:26,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5135.469, 5167.23, 5181.4844, 572.9536, 944.47656, 5390.7993, 5185.666, 5155.8765, 5193.069, 3113.7163]
2025-09-12 15:57:26,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 125.0, 202.0, 1000.0, 993.0, 1000.0, 1000.0, 617.0]
2025-09-12 15:57:26,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 45 minutes, 50 seconds)
2025-09-12 16:10:56,008 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:10:56,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:14:42,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4590.16553 ± 1424.812
2025-09-12 16:14:42,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5428.0073, 5372.658, 1227.0282, 5502.4634, 3858.6926, 5402.247, 5396.886, 5349.7485, 5611.6606, 2752.2622]
2025-09-12 16:14:42,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 234.0, 1000.0, 719.0, 1000.0, 1000.0, 1000.0, 1000.0, 517.0]
2025-09-12 16:14:42,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 11 hours, 36 minutes)
2025-09-12 16:27:57,774 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:27:57,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:31:47,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4580.86377 ± 1689.125
2025-09-12 16:31:47,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [916.29663, 5440.4326, 5367.077, 5370.6367, 1512.0167, 5382.694, 5414.729, 5514.3594, 5445.019, 5445.374]
2025-09-12 16:31:47,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 1000.0, 1000.0, 1000.0, 282.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:31:47,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 11 hours, 13 minutes, 27 seconds)
2025-09-12 16:44:44,814 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:44:44,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:48:39,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4772.36426 ± 1004.453
2025-09-12 16:48:39,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5562.162, 5317.0283, 3650.1763, 5388.789, 2285.8772, 4792.0103, 5460.7856, 5360.889, 5433.0684, 4472.8525]
2025-09-12 16:48:39,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 670.0, 1000.0, 419.0, 881.0, 1000.0, 1000.0, 1000.0, 811.0]
2025-09-12 16:48:39,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 47 minutes, 54 seconds)
2025-09-12 17:02:16,798 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:02:16,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:05:14,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3473.48364 ± 2258.455
2025-09-12 17:05:14,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5330.785, 724.7147, 5343.3774, 774.7285, 659.6173, 5353.6167, 5161.1357, 5333.388, 5378.519, 674.9522]
2025-09-12 17:05:14,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 155.0, 1000.0, 173.0, 143.0, 1000.0, 990.0, 1000.0, 1000.0, 147.0]
2025-09-12 17:05:14,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 30 minutes, 12 seconds)
2025-09-12 17:19:28,503 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:19:28,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:22:55,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4171.09668 ± 1295.037
2025-09-12 17:22:55,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5165.045, 1619.852, 3372.1487, 2477.5098, 3964.4858, 5449.0034, 5121.205, 5418.4355, 3746.6177, 5376.67]
2025-09-12 17:22:55,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 288.0, 626.0, 441.0, 722.0, 1000.0, 952.0, 1000.0, 693.0, 1000.0]
2025-09-12 17:22:55,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 10 hours, 15 minutes, 32 seconds)
2025-09-12 17:36:20,066 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:36:20,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:40:30,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4987.25098 ± 832.553
2025-09-12 17:40:30,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5335.2285, 5251.15, 5394.2793, 5296.302, 5443.859, 5330.727, 5393.0176, 2645.5706, 5384.3867, 4397.9897]
2025-09-12 17:40:30,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 486.0, 1000.0, 868.0]
2025-09-12 17:40:30,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 10 hours, 36 seconds)
2025-09-12 17:53:09,495 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:53:09,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:57:35,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5337.73193 ± 200.563
2025-09-12 17:57:35,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5430.676, 5356.0923, 4754.153, 5523.1187, 5390.8457, 5358.721, 5439.9204, 5386.054, 5355.2847, 5382.459]
2025-09-12 17:57:35,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 889.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:57:35,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (5337.73) for latency MM1Queue_a033_s075
2025-09-12 17:57:35,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 43 minutes, 26 seconds)
2025-09-12 18:11:00,349 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:11:00,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:14:52,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4619.01758 ± 1279.569
2025-09-12 18:14:52,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [3605.265, 3516.6882, 5405.655, 5248.201, 5383.1104, 5492.2603, 5423.003, 1447.9169, 5482.8706, 5185.2036]
2025-09-12 18:14:52,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [662.0, 652.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 263.0, 1000.0, 960.0]
2025-09-12 18:14:52,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 29 minutes, 1 second)
2025-09-12 18:28:27,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:28:27,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:33:04,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5387.32422 ± 42.677
2025-09-12 18:33:04,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5440.5996, 5324.9634, 5411.868, 5420.012, 5438.125, 5378.2773, 5383.6606, 5387.941, 5384.928, 5302.8677]
2025-09-12 18:33:04,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:33:04,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (5387.32) for latency MM1Queue_a033_s075
2025-09-12 18:33:04,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 9 hours, 22 minutes, 7 seconds)
2025-09-12 18:47:06,711 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:47:06,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:51:11,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4707.72607 ± 1261.207
2025-09-12 18:51:11,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5344.4854, 5384.097, 5310.9897, 5315.1426, 2230.5544, 5375.1753, 5406.8706, 5300.3745, 2143.2302, 5266.345]
2025-09-12 18:51:11,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 423.0, 1000.0, 1000.0, 1000.0, 424.0, 1000.0]
2025-09-12 18:51:11,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 9 hours, 7 minutes, 12 seconds)
2025-09-12 19:03:55,628 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:03:55,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:08:11,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5056.15039 ± 1126.640
2025-09-12 19:08:11,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5522.2075, 5322.839, 5417.398, 5466.1455, 5400.7095, 5488.772, 5440.2686, 5403.8877, 1679.781, 5419.495]
2025-09-12 19:08:11,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 311.0, 1000.0]
2025-09-12 19:08:11,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 46 minutes, 8 seconds)
2025-09-12 19:21:44,400 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:21:44,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:25:27,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4402.52246 ± 1807.033
2025-09-12 19:25:27,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5670.2656, 4127.901, 5349.799, 5506.976, 5401.369, 5414.7485, 1152.9592, 618.7616, 5397.9126, 5384.5327]
2025-09-12 19:25:27,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 736.0, 1000.0, 1000.0, 1000.0, 1000.0, 216.0, 137.0, 1000.0, 1000.0]
2025-09-12 19:25:27,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 29 minutes, 37 seconds)
2025-09-12 19:39:47,682 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:39:47,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:43:42,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4574.99512 ± 1438.624
2025-09-12 19:43:42,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5313.9136, 1985.8019, 5325.868, 1432.7194, 5204.393, 5293.389, 5308.531, 5336.8726, 5266.5566, 5281.908]
2025-09-12 19:43:42,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 393.0, 1000.0, 294.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:43:42,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 8 hours, 17 minutes, 24 seconds)
2025-09-12 19:57:37,084 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:57:37,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:01:55,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5064.95117 ± 762.287
2025-09-12 20:01:55,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5393.481, 5312.0024, 5362.439, 2818.1306, 5229.7783, 5380.585, 4924.6353, 5445.292, 5373.7593, 5409.409]
2025-09-12 20:01:55,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 505.0, 1000.0, 997.0, 897.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:01:55,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 59 minutes, 47 seconds)
2025-09-12 20:15:25,981 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:15:26,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:18:09,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3277.65942 ± 1977.758
2025-09-12 20:18:09,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5605.4536, 1142.3865, 3625.7224, 5523.4463, 5529.9375, 988.2299, 2128.0884, 1877.214, 5453.545, 902.56964]
2025-09-12 20:18:09,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 233.0, 656.0, 1000.0, 1000.0, 195.0, 382.0, 347.0, 1000.0, 181.0]
2025-09-12 20:18:09,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 32 minutes, 12 seconds)
2025-09-12 20:31:18,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:31:18,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:34:07,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 3427.63477 ± 1850.724
2025-09-12 20:34:07,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [465.82306, 3108.1243, 4843.64, 458.50787, 3271.7148, 5561.56, 3575.8108, 5427.708, 2092.7004, 5470.758]
2025-09-12 20:34:07,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 559.0, 886.0, 85.0, 621.0, 1000.0, 657.0, 1000.0, 389.0, 1000.0]
2025-09-12 20:34:07,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 9 minutes, 37 seconds)
2025-09-12 20:46:52,926 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:46:52,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:51:22,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5399.35742 ± 57.863
2025-09-12 20:51:22,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5406.225, 5406.9067, 5394.0977, 5450.96, 5323.497, 5454.1255, 5458.63, 5363.8325, 5454.6753, 5280.6245]
2025-09-12 20:51:22,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:51:22,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (5399.36) for latency MM1Queue_a033_s075
2025-09-12 20:51:22,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 52 minutes, 25 seconds)
2025-09-12 21:05:21,270 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:05:21,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:09:16,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4770.30225 ± 1230.031
2025-09-12 21:09:16,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5526.5293, 5353.819, 5542.844, 5499.52, 5444.9873, 1617.0178, 3609.7722, 5497.54, 5442.772, 4168.2217]
2025-09-12 21:09:16,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 301.0, 666.0, 1000.0, 1000.0, 739.0]
2025-09-12 21:09:16,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 33 minutes, 38 seconds)
2025-09-12 21:23:00,439 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:23:00,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:26:55,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4643.05029 ± 892.629
2025-09-12 21:26:55,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5362.0127, 5331.695, 5396.0454, 3619.8547, 5293.5776, 2841.1396, 4136.2227, 3870.637, 5250.792, 5328.5283]
2025-09-12 21:26:55,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 686.0, 1000.0, 534.0, 769.0, 718.0, 1000.0, 1000.0]
2025-09-12 21:26:55,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 13 minutes, 56 seconds)
2025-09-12 21:40:14,796 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:40:14,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:44:46,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5302.80420 ± 40.305
2025-09-12 21:44:46,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5230.367, 5265.369, 5329.1895, 5307.604, 5262.893, 5354.6235, 5274.1924, 5356.9155, 5322.925, 5323.961]
2025-09-12 21:44:46,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:44:46,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 6 hours, 3 minutes, 49 seconds)
2025-09-12 21:58:31,353 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:58:31,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:02:45,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4923.70996 ± 1257.769
2025-09-12 22:02:45,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5337.6367, 1155.1335, 5367.6465, 5325.7656, 5419.2046, 5315.257, 5264.521, 5244.432, 5337.165, 5470.3374]
2025-09-12 22:02:45,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 242.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:02:45,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 54 minutes, 32 seconds)
2025-09-12 22:16:32,036 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:16:32,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:21:02,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5243.60547 ± 40.401
2025-09-12 22:21:02,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5221.253, 5236.749, 5225.749, 5261.1567, 5310.6016, 5296.1323, 5206.6704, 5285.949, 5205.687, 5186.1074]
2025-09-12 22:21:02,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:21:02,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 40 minutes, 42 seconds)
2025-09-12 22:34:17,514 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:34:17,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:38:29,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5061.81934 ± 1166.931
2025-09-12 22:38:29,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5363.732, 5491.6245, 5461.1963, 1567.0913, 5412.063, 5432.0947, 5367.9126, 5401.7075, 5596.7373, 5524.0356]
2025-09-12 22:38:29,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 294.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:38:29,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 21 minutes, 8 seconds)
2025-09-12 22:51:16,611 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:51:16,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:55:40,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5386.35840 ± 262.395
2025-09-12 22:55:40,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5513.99, 5498.3555, 5512.2305, 5573.359, 5451.455, 4617.16, 5408.1313, 5368.033, 5436.154, 5484.715]
2025-09-12 22:55:40,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 818.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:55:40,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 5 hours, 1 minute, 47 seconds)
2025-09-12 23:09:38,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:09:38,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:14:06,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5265.47363 ± 246.668
2025-09-12 23:14:06,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5354.7017, 5262.418, 5478.2188, 5414.5054, 5374.141, 4553.783, 5289.019, 5256.009, 5290.6294, 5381.313]
2025-09-12 23:14:06,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 840.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:14:06,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 45 minutes, 51 seconds)
2025-09-12 23:26:49,496 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:26:49,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:30:55,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4935.36572 ± 1278.375
2025-09-12 23:30:55,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5475.7935, 4517.468, 5479.0635, 5464.4375, 5528.1826, 5469.928, 5483.505, 5392.4897, 5346.9634, 1195.824]
2025-09-12 23:30:55,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 837.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 237.0]
2025-09-12 23:30:55,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 24 minutes, 31 seconds)
2025-09-12 23:45:10,548 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:45:10,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:49:11,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4863.12402 ± 1288.064
2025-09-12 23:49:11,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1474.2146, 5354.0728, 5487.2715, 5432.0225, 5436.515, 5553.2056, 5360.2246, 5600.2715, 5525.5815, 3407.8584]
2025-09-12 23:49:11,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [265.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 629.0]
2025-09-12 23:49:11,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 6 minutes, 49 seconds)
2025-09-13 00:02:13,543 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:02:13,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:06:03,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4612.75439 ± 1238.953
2025-09-13 00:06:03,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5254.3926, 5566.1406, 5278.342, 5573.0063, 5444.8154, 2762.113, 5325.721, 5311.295, 2091.3396, 3520.375]
2025-09-13 00:06:03,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [922.0, 1000.0, 1000.0, 1000.0, 1000.0, 511.0, 1000.0, 1000.0, 396.0, 638.0]
2025-09-13 00:06:03,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 47 minutes, 40 seconds)
2025-09-13 00:19:32,414 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:19:32,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:23:40,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5124.77295 ± 1331.927
2025-09-13 00:23:40,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5559.456, 5643.1924, 5527.5454, 5589.2407, 5589.422, 5298.235, 5616.4365, 5665.6426, 5618.7974, 1139.7638]
2025-09-13 00:23:40,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 929.0, 1000.0, 1000.0, 1000.0, 218.0]
2025-09-13 00:23:41,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 31 minutes, 12 seconds)
2025-09-13 00:37:55,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:37:55,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:41:48,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4607.48389 ± 1249.835
2025-09-13 00:41:48,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1426.7847, 3633.998, 5335.3804, 5352.649, 5366.7793, 4856.102, 5447.0967, 3766.9187, 5344.5527, 5544.5747]
2025-09-13 00:41:48,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [269.0, 675.0, 1000.0, 1000.0, 1000.0, 894.0, 1000.0, 697.0, 1000.0, 1000.0]
2025-09-13 00:41:48,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 12 minutes, 55 seconds)
2025-09-13 00:55:18,369 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:55:18,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:59:31,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5081.14014 ± 1051.277
2025-09-13 00:59:31,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5402.792, 5414.9126, 1928.7383, 5459.974, 5424.4136, 5467.3467, 5458.756, 5368.612, 5473.1045, 5412.7485]
2025-09-13 00:59:31,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 357.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:59:31,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 57 minutes, 10 seconds)
2025-09-13 01:12:06,748 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:12:06,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:16:03,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4563.39404 ± 1536.585
2025-09-13 01:16:03,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [1998.1056, 5190.674, 5334.8315, 5386.987, 5363.306, 1045.8789, 5272.0024, 5297.617, 5352.304, 5392.236]
2025-09-13 01:16:03,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [375.0, 1000.0, 1000.0, 1000.0, 1000.0, 227.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:16:03,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 36 minutes, 22 seconds)
2025-09-13 01:29:40,015 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:29:40,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:34:11,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5511.52100 ± 76.316
2025-09-13 01:34:11,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5420.7734, 5481.108, 5546.705, 5686.4316, 5473.9976, 5520.8804, 5510.3467, 5491.9814, 5406.144, 5576.839]
2025-09-13 01:34:11,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:34:11,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1226 [INFO]: New best (5511.52) for latency MM1Queue_a033_s075
2025-09-13 01:34:11,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 21 minutes, 1 second)
2025-09-13 01:47:17,969 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:47:17,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:51:13,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4548.25684 ± 1637.417
2025-09-13 01:51:13,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5439.159, 5430.647, 1395.68, 1157.2697, 5265.918, 5370.4814, 5329.7617, 5344.282, 5376.981, 5372.3916]
2025-09-13 01:51:13,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 293.0, 215.0, 1000.0, 1000.0, 992.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:51:13,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 2 minutes, 33 seconds)
2025-09-13 02:04:43,283 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:04:43,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:08:41,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4936.77441 ± 900.798
2025-09-13 02:08:41,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5341.2456, 5670.3022, 5205.268, 5423.8843, 3324.158, 5628.784, 4668.3267, 3116.2869, 5408.0283, 5581.461]
2025-09-13 02:08:41,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [925.0, 1000.0, 938.0, 1000.0, 579.0, 1000.0, 841.0, 558.0, 1000.0, 1000.0]
2025-09-13 02:08:41,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 44 minutes, 16 seconds)
2025-09-13 02:22:03,523 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:22:03,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:26:17,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4978.76465 ± 571.227
2025-09-13 02:26:17,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5250.685, 5322.3145, 3765.6982, 5203.2397, 5151.52, 3925.6348, 5336.699, 5303.76, 5346.745, 5181.3506]
2025-09-13 02:26:17,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 724.0, 1000.0, 950.0, 759.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:26:17,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 26 minutes, 46 seconds)
2025-09-13 02:39:52,241 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:39:52,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:44:06,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5185.26855 ± 792.832
2025-09-13 02:44:06,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5548.618, 5539.24, 5534.6426, 5464.4473, 5605.652, 5632.3237, 5600.1714, 5579.0957, 3145.8994, 4202.5947]
2025-09-13 02:44:06,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 558.0, 743.0]
2025-09-13 02:44:06,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 10 minutes, 25 seconds)
2025-09-13 02:58:58,759 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:58:58,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:02:45,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4373.63965 ± 1607.066
2025-09-13 03:02:45,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [2381.9062, 5385.8677, 5457.3936, 5337.1636, 5415.4414, 2405.8503, 5361.0845, 5514.5737, 1126.5724, 5350.541]
2025-09-13 03:02:45,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [429.0, 1000.0, 1000.0, 1000.0, 1000.0, 458.0, 1000.0, 1000.0, 214.0, 1000.0]
2025-09-13 03:02:45,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 53 minutes, 8 seconds)
2025-09-13 03:15:58,295 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:15:58,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:19:51,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4575.36133 ± 1738.079
2025-09-13 03:19:51,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5547.5474, 5461.2817, 5601.8154, 1197.7936, 5517.4385, 5512.699, 5516.765, 4806.999, 1055.5249, 5535.751]
2025-09-13 03:19:51,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 234.0, 1000.0, 1000.0, 1000.0, 877.0, 201.0, 1000.0]
2025-09-13 03:19:51,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 35 minutes, 27 seconds)
2025-09-13 03:33:34,674 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:33:34,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:37:36,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 4748.01318 ± 1484.047
2025-09-13 03:37:36,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [5404.78, 5519.962, 5470.276, 5463.7695, 5661.998, 5494.191, 1669.2118, 5450.94, 1899.9291, 5445.073]
2025-09-13 03:37:36,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 303.0, 1000.0, 348.0, 1000.0]
2025-09-13 03:37:36,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 17 minutes, 46 seconds)
2025-09-13 03:51:20,930 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:51:20,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:55:28,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 5080.51416 ± 1011.964
2025-09-13 03:55:28,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [4546.695, 5749.7188, 3164.7932, 5638.449, 5634.3877, 5718.04, 3188.8452, 5752.1787, 5732.2266, 5679.8027]
2025-09-13 03:55:28,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [805.0, 1000.0, 552.0, 1000.0, 1000.0, 1000.0, 549.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:55:28,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-humanoid):1251 [DEBUG]: Training session finished
