2025-09-12 02:09:16,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc15-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-12 02:09:16,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc15-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-12 02:09:16,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x146a660be2d0>}
2025-09-12 02:09:16,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1111 [DEBUG]: using device: cuda
2025-09-12 02:09:16,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1133 [INFO]: Creating new trainer
2025-09-12 02:09:16,591 baseline-mbpac-noiseperc15-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-12 02:09:16,591 baseline-mbpac-noiseperc15-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 02:09:16,602 baseline-mbpac-noiseperc15-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-12 02:09:17,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1194 [DEBUG]: Starting training session...
2025-09-12 02:09:17,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 1/100
2025-09-12 02:22:19,612 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:22:19,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:22:39,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 336.58624 ± 72.161
2025-09-12 02:22:39,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [332.16104, 478.83636, 398.91843, 340.54147, 259.314, 276.06113, 422.02594, 241.28154, 320.72018, 296.00232]
2025-09-12 02:22:39,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 95.0, 79.0, 69.0, 48.0, 51.0, 83.0, 45.0, 59.0, 57.0]
2025-09-12 02:22:39,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (336.59) for latency MM1Queue_a033_s075
2025-09-12 02:22:39,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 22 hours, 2 minutes, 13 seconds)
2025-09-12 02:37:06,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:37:06,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:37:24,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 302.29370 ± 44.513
2025-09-12 02:37:24,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [264.70303, 320.9734, 304.42075, 290.81522, 267.91345, 227.8233, 313.57404, 402.19803, 295.9737, 334.54224]
2025-09-12 02:37:24,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 67.0, 58.0, 57.0, 52.0, 44.0, 58.0, 79.0, 59.0, 76.0]
2025-09-12 02:37:24,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 22 hours, 57 minutes, 38 seconds)
2025-09-12 02:51:48,233 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:51:48,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:52:10,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 377.82794 ± 71.297
2025-09-12 02:52:10,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [447.15375, 546.93665, 333.8134, 321.8403, 373.96255, 302.04178, 411.41022, 323.32245, 388.75595, 329.0421]
2025-09-12 02:52:10,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 112.0, 62.0, 58.0, 74.0, 55.0, 81.0, 59.0, 83.0, 60.0]
2025-09-12 02:52:10,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (377.83) for latency MM1Queue_a033_s075
2025-09-12 02:52:10,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 23 hours, 6 minutes, 33 seconds)
2025-09-12 03:06:38,909 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:06:38,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:06:59,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 353.54266 ± 35.664
2025-09-12 03:06:59,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [357.56232, 272.60352, 382.4939, 364.56668, 316.72214, 371.93796, 379.10208, 339.04956, 404.4443, 346.94412]
2025-09-12 03:06:59,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 51.0, 69.0, 68.0, 58.0, 68.0, 70.0, 62.0, 84.0, 64.0]
2025-09-12 03:06:59,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 23 hours, 4 minutes, 39 seconds)
2025-09-12 03:21:25,019 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:21:25,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:21:50,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 421.60016 ± 83.817
2025-09-12 03:21:50,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [370.28296, 419.0588, 340.95972, 494.26434, 439.13873, 419.82944, 335.44888, 394.9433, 369.19305, 632.88226]
2025-09-12 03:21:50,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 82.0, 62.0, 91.0, 84.0, 78.0, 74.0, 73.0, 67.0, 127.0]
2025-09-12 03:21:50,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (421.60) for latency MM1Queue_a033_s075
2025-09-12 03:21:50,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 22 hours, 58 minutes, 11 seconds)
2025-09-12 03:36:12,199 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:36:12,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:36:37,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 447.21289 ± 84.943
2025-09-12 03:36:37,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [439.36667, 340.61032, 436.92044, 357.13428, 429.58508, 375.6665, 561.76666, 405.09955, 614.823, 511.15665]
2025-09-12 03:36:37,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 63.0, 81.0, 65.0, 79.0, 68.0, 107.0, 73.0, 124.0, 102.0]
2025-09-12 03:36:37,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (447.21) for latency MM1Queue_a033_s075
2025-09-12 03:36:37,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 23 hours, 10 minutes, 40 seconds)
2025-09-12 03:51:08,545 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:51:08,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:51:35,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 452.32074 ± 150.436
2025-09-12 03:51:35,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [463.8106, 463.0431, 334.26605, 436.86777, 370.4383, 400.28583, 358.63174, 423.6601, 886.42645, 385.778]
2025-09-12 03:51:35,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 85.0, 64.0, 82.0, 68.0, 75.0, 70.0, 77.0, 185.0, 71.0]
2025-09-12 03:51:35,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (452.32) for latency MM1Queue_a033_s075
2025-09-12 03:51:35,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 22 hours, 59 minutes, 38 seconds)
2025-09-12 04:06:01,003 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:06:01,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:06:24,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 398.06360 ± 60.642
2025-09-12 04:06:24,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [432.65512, 447.801, 455.60065, 246.86603, 444.46875, 422.8973, 426.48163, 351.11835, 366.10522, 386.6418]
2025-09-12 04:06:24,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 83.0, 85.0, 52.0, 93.0, 78.0, 79.0, 65.0, 69.0, 72.0]
2025-09-12 04:06:24,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 22 hours, 45 minutes, 41 seconds)
2025-09-12 04:20:48,354 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:20:48,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:21:18,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 503.35187 ± 83.449
2025-09-12 04:21:18,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [470.55804, 438.21857, 533.1843, 703.01166, 479.78888, 585.13416, 439.37808, 488.80994, 501.69598, 393.73895]
2025-09-12 04:21:18,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 82.0, 99.0, 134.0, 102.0, 112.0, 84.0, 92.0, 93.0, 73.0]
2025-09-12 04:21:18,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (503.35) for latency MM1Queue_a033_s075
2025-09-12 04:21:18,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 22 hours, 32 minutes, 26 seconds)
2025-09-12 04:35:54,169 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:35:54,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:36:22,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 463.71231 ± 97.177
2025-09-12 04:36:22,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [491.04465, 616.05554, 388.74893, 601.44794, 365.2594, 280.82492, 451.65222, 446.90268, 494.1359, 501.0506]
2025-09-12 04:36:22,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 117.0, 82.0, 126.0, 82.0, 62.0, 85.0, 98.0, 92.0, 93.0]
2025-09-12 04:36:22,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 22 hours, 21 minutes, 48 seconds)
2025-09-12 04:50:48,859 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:50:48,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:51:19,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 496.90567 ± 77.780
2025-09-12 04:51:19,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [415.01404, 494.37027, 498.231, 611.1087, 638.4034, 550.08606, 396.82965, 496.09534, 440.6197, 428.29858]
2025-09-12 04:51:19,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 93.0, 109.0, 115.0, 134.0, 119.0, 75.0, 94.0, 80.0, 92.0]
2025-09-12 04:51:19,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 22 hours, 9 minutes, 39 seconds)
2025-09-12 05:05:48,321 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:05:48,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:06:21,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 550.33148 ± 197.131
2025-09-12 05:06:21,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [452.76974, 1008.5933, 783.1121, 458.72653, 397.65097, 612.4483, 351.68942, 430.63602, 400.81555, 606.873]
2025-09-12 05:06:21,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 198.0, 151.0, 85.0, 88.0, 113.0, 65.0, 82.0, 88.0, 113.0]
2025-09-12 05:06:21,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (550.33) for latency MM1Queue_a033_s075
2025-09-12 05:06:21,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 21 hours, 55 minutes, 58 seconds)
2025-09-12 05:20:56,814 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:20:56,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:21:29,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 549.47314 ± 84.240
2025-09-12 05:21:29,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [664.05005, 493.77417, 517.6071, 487.0657, 534.53876, 667.8837, 524.8968, 422.29745, 678.7759, 503.8419]
2025-09-12 05:21:29,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 91.0, 99.0, 93.0, 99.0, 131.0, 113.0, 93.0, 125.0, 93.0]
2025-09-12 05:21:29,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 21 hours, 46 minutes, 25 seconds)
2025-09-12 05:36:03,332 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:36:03,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:36:33,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 474.94440 ± 118.101
2025-09-12 05:36:33,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [437.61874, 402.4206, 702.604, 369.5742, 306.31256, 431.73682, 391.39005, 506.70773, 605.62164, 595.4575]
2025-09-12 05:36:33,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 87.0, 143.0, 81.0, 57.0, 92.0, 73.0, 111.0, 125.0, 128.0]
2025-09-12 05:36:33,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 21 hours, 34 minutes, 18 seconds)
2025-09-12 05:50:54,303 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:50:54,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:51:21,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 439.36572 ± 93.315
2025-09-12 05:51:21,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [445.50156, 297.0887, 474.1128, 403.15472, 524.0492, 629.2403, 328.1165, 391.68927, 394.83954, 505.86426]
2025-09-12 05:51:21,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 62.0, 99.0, 73.0, 107.0, 117.0, 70.0, 72.0, 73.0, 106.0]
2025-09-12 05:51:21,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 21 hours, 14 minutes, 34 seconds)
2025-09-12 06:05:52,542 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:05:52,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:06:22,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 512.17114 ± 129.852
2025-09-12 06:06:22,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [814.1519, 480.13672, 505.81808, 431.3917, 469.02896, 685.8895, 491.06754, 353.29517, 485.49713, 405.43484]
2025-09-12 06:06:22,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 89.0, 111.0, 80.0, 86.0, 135.0, 89.0, 65.0, 89.0, 86.0]
2025-09-12 06:06:22,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 21 hours, 55 seconds)
2025-09-12 06:20:55,524 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:20:55,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:21:30,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 582.31622 ± 203.634
2025-09-12 06:21:30,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [507.37048, 561.59607, 531.2767, 512.6786, 494.18008, 396.48917, 661.0622, 1143.2916, 623.38995, 391.827]
2025-09-12 06:21:30,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 117.0, 115.0, 97.0, 92.0, 85.0, 122.0, 218.0, 120.0, 71.0]
2025-09-12 06:21:30,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (582.32) for latency MM1Queue_a033_s075
2025-09-12 06:21:30,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 20 hours, 47 minutes, 36 seconds)
2025-09-12 06:36:00,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:36:00,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:36:34,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 575.38135 ± 131.344
2025-09-12 06:36:34,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [568.01733, 486.6335, 520.70624, 624.04614, 481.0189, 928.9471, 496.4507, 581.03644, 621.6178, 445.33914]
2025-09-12 06:36:34,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 91.0, 98.0, 116.0, 89.0, 177.0, 92.0, 110.0, 114.0, 81.0]
2025-09-12 06:36:34,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 20 hours, 31 minutes, 25 seconds)
2025-09-12 06:51:07,927 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:51:07,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:51:38,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 487.99298 ± 165.946
2025-09-12 06:51:38,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [493.25305, 384.6635, 369.9678, 634.35956, 366.78372, 487.49835, 446.28314, 435.51346, 921.79224, 339.81458]
2025-09-12 06:51:38,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 84.0, 81.0, 118.0, 82.0, 108.0, 82.0, 82.0, 189.0, 65.0]
2025-09-12 06:51:38,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 20 hours, 16 minutes, 22 seconds)
2025-09-12 07:06:15,346 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:06:15,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:06:51,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 620.42316 ± 166.618
2025-09-12 07:06:51,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [758.204, 652.5296, 528.60834, 538.9058, 803.6343, 627.0385, 506.02615, 954.4012, 386.7503, 448.13324]
2025-09-12 07:06:51,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 120.0, 119.0, 112.0, 153.0, 113.0, 93.0, 175.0, 72.0, 83.0]
2025-09-12 07:06:51,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (620.42) for latency MM1Queue_a033_s075
2025-09-12 07:06:51,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 20 hours, 8 minutes, 2 seconds)
2025-09-12 07:21:20,895 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:21:20,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:21:51,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 505.04019 ± 144.724
2025-09-12 07:21:51,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [413.2396, 337.4475, 525.0831, 430.14554, 812.7648, 568.28436, 326.00644, 689.9555, 470.25574, 477.21924]
2025-09-12 07:21:51,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 75.0, 104.0, 95.0, 160.0, 112.0, 68.0, 128.0, 92.0, 87.0]
2025-09-12 07:21:51,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 19 hours, 52 minutes, 39 seconds)
2025-09-12 07:36:32,833 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:36:32,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:37:15,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 704.17224 ± 265.539
2025-09-12 07:37:15,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [452.1445, 576.35156, 657.3649, 1397.2274, 613.80133, 952.44934, 543.521, 740.26544, 533.1937, 575.4035]
2025-09-12 07:37:15,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 125.0, 128.0, 277.0, 131.0, 178.0, 104.0, 144.0, 98.0, 111.0]
2025-09-12 07:37:15,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (704.17) for latency MM1Queue_a033_s075
2025-09-12 07:37:15,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 19 hours, 41 minutes, 36 seconds)
2025-09-12 07:51:45,852 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:51:45,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:52:30,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 755.93689 ± 183.740
2025-09-12 07:52:30,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [527.89154, 1029.7288, 499.62738, 961.0956, 882.9411, 519.6836, 801.60474, 722.0235, 701.6109, 913.16223]
2025-09-12 07:52:30,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 198.0, 111.0, 180.0, 163.0, 100.0, 156.0, 136.0, 130.0, 176.0]
2025-09-12 07:52:30,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (755.94) for latency MM1Queue_a033_s075
2025-09-12 07:52:30,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 19 hours, 29 minutes, 29 seconds)
2025-09-12 08:07:00,349 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:07:00,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:07:42,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 659.46326 ± 267.400
2025-09-12 08:07:42,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [399.91263, 1256.4038, 477.93433, 745.3611, 618.64716, 730.90594, 547.2816, 989.7325, 447.9879, 380.46588]
2025-09-12 08:07:42,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 262.0, 106.0, 158.0, 140.0, 141.0, 105.0, 189.0, 84.0, 85.0]
2025-09-12 08:07:42,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 19 hours, 16 minutes, 21 seconds)
2025-09-12 08:22:21,979 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:22:21,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:23:09,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 797.21912 ± 287.229
2025-09-12 08:23:09,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [553.57855, 1584.6124, 965.9399, 652.8687, 699.2267, 569.74603, 755.7475, 802.72186, 771.0002, 616.75006]
2025-09-12 08:23:09,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 310.0, 181.0, 119.0, 129.0, 106.0, 139.0, 150.0, 148.0, 133.0]
2025-09-12 08:23:09,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (797.22) for latency MM1Queue_a033_s075
2025-09-12 08:23:09,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 4 minutes, 24 seconds)
2025-09-12 08:37:43,038 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:37:43,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:38:17,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 600.84753 ± 214.476
2025-09-12 08:38:17,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [556.12506, 543.47003, 604.4817, 1202.0988, 327.50775, 597.8928, 601.7111, 497.92654, 546.7485, 530.5128]
2025-09-12 08:38:17,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 101.0, 109.0, 211.0, 71.0, 114.0, 111.0, 95.0, 105.0, 112.0]
2025-09-12 08:38:17,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 18 hours, 51 minutes, 7 seconds)
2025-09-12 08:52:51,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:52:51,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:53:36,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 770.76520 ± 302.438
2025-09-12 08:53:36,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1184.547, 579.878, 1147.3236, 1261.1254, 617.5684, 819.437, 568.6582, 692.3599, 375.95505, 460.7993]
2025-09-12 08:53:36,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [213.0, 111.0, 214.0, 249.0, 110.0, 155.0, 106.0, 131.0, 69.0, 91.0]
2025-09-12 08:53:36,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 18 hours, 34 minutes, 44 seconds)
2025-09-12 09:08:28,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:08:28,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:09:21,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 861.37927 ± 374.456
2025-09-12 09:09:21,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1396.0715, 993.0681, 1250.8707, 513.0129, 1524.0366, 551.5554, 559.1955, 591.9477, 677.9851, 556.049]
2025-09-12 09:09:21,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [263.0, 200.0, 242.0, 112.0, 300.0, 117.0, 104.0, 119.0, 146.0, 109.0]
2025-09-12 09:09:21,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (861.38) for latency MM1Queue_a033_s075
2025-09-12 09:09:21,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 18 hours, 26 minutes, 33 seconds)
2025-09-12 09:23:38,276 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:23:38,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:24:24,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 744.45154 ± 378.321
2025-09-12 09:24:24,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [535.2749, 300.4977, 695.32623, 1250.1948, 523.9677, 704.19556, 427.9539, 1586.9847, 896.60065, 523.51886]
2025-09-12 09:24:24,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 68.0, 156.0, 242.0, 96.0, 132.0, 86.0, 322.0, 183.0, 99.0]
2025-09-12 09:24:24,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 9 minutes, 7 seconds)
2025-09-12 09:39:19,237 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:39:19,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:40:18,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 997.33948 ± 303.780
2025-09-12 09:40:18,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [962.3172, 703.4898, 1248.1772, 1502.4108, 612.2277, 1392.1014, 789.6048, 931.0349, 1191.9761, 640.05493]
2025-09-12 09:40:18,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 130.0, 233.0, 305.0, 135.0, 257.0, 170.0, 179.0, 220.0, 116.0]
2025-09-12 09:40:18,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (997.34) for latency MM1Queue_a033_s075
2025-09-12 09:40:18,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 8 seconds)
2025-09-12 09:54:29,491 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:54:29,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:55:14,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 769.78296 ± 155.919
2025-09-12 09:55:14,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [961.996, 964.6465, 985.9051, 747.4169, 550.18866, 672.3644, 670.11676, 856.3108, 729.37634, 559.5085]
2025-09-12 09:55:14,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 183.0, 200.0, 143.0, 99.0, 136.0, 138.0, 160.0, 135.0, 106.0]
2025-09-12 09:55:14,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 17 hours, 41 minutes, 56 seconds)
2025-09-12 10:10:05,536 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:10:05,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:10:58,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 900.83398 ± 313.558
2025-09-12 10:10:58,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1384.7686, 1373.639, 360.00037, 593.8603, 574.73376, 962.97955, 826.5383, 923.54724, 980.3159, 1027.9569]
2025-09-12 10:10:58,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [265.0, 266.0, 72.0, 116.0, 103.0, 186.0, 155.0, 176.0, 179.0, 196.0]
2025-09-12 10:10:58,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 32 minutes, 10 seconds)
2025-09-12 10:25:24,305 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:25:24,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:26:13,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 828.80548 ± 255.916
2025-09-12 10:26:13,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1079.5016, 1254.239, 1080.6277, 514.85114, 1047.1865, 660.27893, 620.1131, 498.87695, 686.29266, 846.0872]
2025-09-12 10:26:13,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [204.0, 240.0, 197.0, 115.0, 203.0, 126.0, 115.0, 95.0, 125.0, 166.0]
2025-09-12 10:26:13,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 9 minutes, 55 seconds)
2025-09-12 10:40:50,383 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:40:50,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:41:48,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 965.71191 ± 394.867
2025-09-12 10:41:48,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [881.0686, 687.4147, 1131.7985, 1240.1893, 418.68893, 919.995, 952.7973, 1762.931, 384.80835, 1277.4276]
2025-09-12 10:41:48,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 130.0, 221.0, 248.0, 95.0, 173.0, 181.0, 340.0, 72.0, 234.0]
2025-09-12 10:41:48,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 1 minute, 34 seconds)
2025-09-12 10:56:23,001 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:56:23,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:57:27,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1114.65491 ± 185.450
2025-09-12 10:57:27,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1264.0903, 971.7091, 1465.592, 1054.4236, 1094.5906, 982.68475, 777.4295, 1094.1202, 1320.4738, 1121.4342]
2025-09-12 10:57:27,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [240.0, 178.0, 273.0, 194.0, 211.0, 179.0, 146.0, 208.0, 251.0, 220.0]
2025-09-12 10:57:27,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (1114.65) for latency MM1Queue_a033_s075
2025-09-12 10:57:27,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 43 minutes, 1 second)
2025-09-12 11:12:12,693 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:12:12,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:13:14,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1073.80896 ± 297.792
2025-09-12 11:13:14,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [641.7935, 1159.3386, 798.78827, 743.5632, 1394.6284, 1037.8654, 1199.3079, 941.8451, 1142.0847, 1678.8754]
2025-09-12 11:13:14,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 221.0, 148.0, 137.0, 253.0, 213.0, 229.0, 176.0, 217.0, 334.0]
2025-09-12 11:13:14,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 38 minutes, 22 seconds)
2025-09-12 11:27:52,052 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:27:52,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:28:54,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1058.52051 ± 234.361
2025-09-12 11:28:54,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [714.6949, 1368.1643, 1200.259, 1476.692, 853.36505, 1142.0172, 1102.1973, 930.9958, 1002.5615, 794.25793]
2025-09-12 11:28:54,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 269.0, 222.0, 284.0, 157.0, 217.0, 203.0, 174.0, 198.0, 149.0]
2025-09-12 11:28:54,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 21 minutes, 58 seconds)
2025-09-12 11:43:23,547 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:43:23,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:44:24,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1031.55371 ± 299.468
2025-09-12 11:44:24,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1017.51685, 492.10022, 848.6854, 1336.3657, 993.0704, 1008.87946, 1519.7163, 1015.1178, 1387.3413, 696.7438]
2025-09-12 11:44:24,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [193.0, 100.0, 160.0, 252.0, 186.0, 192.0, 294.0, 194.0, 266.0, 140.0]
2025-09-12 11:44:24,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 9 minutes, 33 seconds)
2025-09-12 11:59:22,027 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:59:22,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:00:38,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1303.66479 ± 412.115
2025-09-12 12:00:38,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1065.635, 1657.4846, 900.1811, 868.87164, 809.7697, 1437.4363, 2092.5723, 1450.7815, 1729.1711, 1024.7456]
2025-09-12 12:00:38,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [206.0, 315.0, 175.0, 162.0, 147.0, 271.0, 385.0, 274.0, 330.0, 190.0]
2025-09-12 12:00:38,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (1303.66) for latency MM1Queue_a033_s075
2025-09-12 12:00:38,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 16 hours, 1 minute, 45 seconds)
2025-09-12 12:15:11,529 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:15:11,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:16:23,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1215.79932 ± 332.335
2025-09-12 12:16:23,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1448.3345, 1203.8726, 1357.4756, 1366.718, 605.0259, 904.08655, 851.32733, 1218.8577, 1832.8314, 1369.465]
2025-09-12 12:16:23,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [277.0, 228.0, 248.0, 249.0, 116.0, 178.0, 184.0, 230.0, 357.0, 266.0]
2025-09-12 12:16:23,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 47 minutes, 9 seconds)
2025-09-12 12:31:11,151 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:31:11,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:32:29,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1340.96387 ± 352.097
2025-09-12 12:32:29,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1038.9, 1233.9543, 1368.1775, 903.99524, 1387.4043, 1090.3018, 2227.146, 1151.0719, 1557.2255, 1451.4618]
2025-09-12 12:32:29,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 233.0, 256.0, 174.0, 273.0, 202.0, 420.0, 220.0, 293.0, 290.0]
2025-09-12 12:32:29,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (1340.96) for latency MM1Queue_a033_s075
2025-09-12 12:32:29,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 35 minutes, 7 seconds)
2025-09-12 12:47:02,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:47:02,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:48:12,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1213.60046 ± 590.924
2025-09-12 12:48:12,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [597.9744, 1260.5425, 437.2454, 1146.5433, 2673.3877, 1469.7559, 869.5414, 873.9766, 1380.8063, 1426.2311]
2025-09-12 12:48:12,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 234.0, 96.0, 210.0, 506.0, 277.0, 161.0, 177.0, 264.0, 282.0]
2025-09-12 12:48:12,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 19 minutes, 51 seconds)
2025-09-12 13:02:38,834 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:02:38,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:03:51,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1237.70142 ± 351.001
2025-09-12 13:03:51,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1252.7764, 1804.6246, 992.50793, 1171.2618, 1266.2164, 590.46155, 913.0751, 1144.2279, 1525.3429, 1716.5182]
2025-09-12 13:03:51,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [247.0, 344.0, 187.0, 230.0, 239.0, 114.0, 165.0, 214.0, 292.0, 335.0]
2025-09-12 13:03:51,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 5 minutes, 48 seconds)
2025-09-12 13:18:52,467 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:18:52,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:20:07,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1277.90381 ± 570.634
2025-09-12 13:20:07,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [743.1832, 1216.5938, 782.2217, 1297.5126, 985.4767, 1262.5795, 2025.441, 2602.1062, 728.20233, 1135.7203]
2025-09-12 13:20:07,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 229.0, 146.0, 238.0, 198.0, 238.0, 389.0, 491.0, 136.0, 217.0]
2025-09-12 13:20:07,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 50 minutes, 18 seconds)
2025-09-12 13:34:27,816 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:34:27,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:36:04,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1634.98523 ± 419.933
2025-09-12 13:36:04,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [2096.723, 2152.9907, 2093.7256, 1498.5685, 1565.8146, 1690.0563, 1582.0316, 1578.3618, 630.6248, 1460.9551]
2025-09-12 13:36:04,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [407.0, 400.0, 415.0, 287.0, 293.0, 314.0, 298.0, 297.0, 123.0, 282.0]
2025-09-12 13:36:04,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (1634.99) for latency MM1Queue_a033_s075
2025-09-12 13:36:04,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 36 minutes, 29 seconds)
2025-09-12 13:51:14,698 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:51:14,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:52:35,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1368.97754 ± 403.849
2025-09-12 13:52:35,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1795.864, 1416.6218, 1029.4739, 1284.6792, 862.1418, 1152.9447, 763.7103, 1588.051, 2050.0317, 1746.2583]
2025-09-12 13:52:35,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [351.0, 262.0, 191.0, 232.0, 166.0, 213.0, 143.0, 298.0, 423.0, 344.0]
2025-09-12 13:52:35,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 25 minutes, 2 seconds)
2025-09-12 14:06:43,757 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:06:43,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:08:00,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1328.45862 ± 261.133
2025-09-12 14:08:00,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1175.2068, 1258.798, 923.7014, 1248.6498, 1481.2849, 1418.7114, 1915.8031, 1233.8412, 1093.4558, 1535.134]
2025-09-12 14:08:00,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 239.0, 169.0, 229.0, 272.0, 280.0, 356.0, 240.0, 205.0, 289.0]
2025-09-12 14:08:00,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 5 minutes, 50 seconds)
2025-09-12 14:22:41,108 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:22:41,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:24:47,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 2198.46924 ± 1148.550
2025-09-12 14:24:47,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1811.5072, 2337.417, 1570.5903, 5381.8022, 1563.4033, 1334.091, 2028.0435, 2014.362, 1204.0892, 2739.386]
2025-09-12 14:24:47,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [332.0, 443.0, 307.0, 1000.0, 292.0, 261.0, 375.0, 375.0, 229.0, 516.0]
2025-09-12 14:24:47,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (2198.47) for latency MM1Queue_a033_s075
2025-09-12 14:24:47,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 14 hours, 1 minute, 40 seconds)
2025-09-12 14:39:36,407 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:39:36,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:41:20,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1750.64331 ± 315.625
2025-09-12 14:41:20,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1410.6553, 1552.7426, 1923.3641, 1362.3326, 1948.9181, 2113.4282, 2083.998, 2201.698, 1448.0452, 1461.25]
2025-09-12 14:41:20,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [264.0, 284.0, 371.0, 281.0, 367.0, 410.0, 407.0, 418.0, 275.0, 274.0]
2025-09-12 14:41:20,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 48 minutes, 22 seconds)
2025-09-12 14:55:50,563 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:55:50,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:56:59,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1122.19666 ± 770.408
2025-09-12 14:56:59,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [686.18585, 815.24664, 2992.4197, 2126.2524, 749.5364, 426.19592, 852.7518, 542.2727, 803.25006, 1227.8556]
2025-09-12 14:56:59,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 160.0, 574.0, 425.0, 152.0, 95.0, 165.0, 112.0, 161.0, 221.0]
2025-09-12 14:56:59,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 29 minutes, 10 seconds)
2025-09-12 15:12:01,778 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:12:01,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:13:24,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1412.80273 ± 539.009
2025-09-12 15:13:24,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1039.8461, 1816.1196, 651.45135, 1116.5347, 483.36612, 1433.3169, 1943.665, 1867.5312, 1631.7388, 2144.457]
2025-09-12 15:13:24,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 354.0, 126.0, 207.0, 90.0, 266.0, 377.0, 359.0, 319.0, 397.0]
2025-09-12 15:13:24,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 13 hours, 12 minutes, 5 seconds)
2025-09-12 15:28:38,207 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:28:38,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:30:43,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 2106.71191 ± 773.933
2025-09-12 15:30:43,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [2416.2786, 1290.4823, 2267.5183, 1213.6971, 2121.2227, 1080.976, 2059.179, 1850.6677, 3492.055, 3275.0425]
2025-09-12 15:30:43,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [470.0, 262.0, 430.0, 232.0, 407.0, 218.0, 396.0, 347.0, 654.0, 633.0]
2025-09-12 15:30:43,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 13 hours, 14 minutes, 9 seconds)
2025-09-12 15:44:43,052 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:44:43,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:46:22,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1670.89526 ± 789.824
2025-09-12 15:46:22,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [387.5725, 3453.6677, 2367.3975, 1636.9757, 1679.6768, 1319.7156, 1099.6741, 1282.079, 1343.5912, 2138.604]
2025-09-12 15:46:22,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 679.0, 468.0, 323.0, 322.0, 252.0, 223.0, 257.0, 248.0, 402.0]
2025-09-12 15:46:22,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 46 minutes, 54 seconds)
2025-09-12 16:00:55,363 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:00:55,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:02:47,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1908.71448 ± 504.002
2025-09-12 16:02:47,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1447.4175, 1397.3091, 1884.851, 2736.122, 1159.3103, 2791.616, 1775.7367, 1916.8406, 2055.7798, 1922.1626]
2025-09-12 16:02:47,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [272.0, 266.0, 344.0, 533.0, 215.0, 553.0, 327.0, 370.0, 385.0, 354.0]
2025-09-12 16:02:47,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 29 minutes, 15 seconds)
2025-09-12 16:17:35,174 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:17:35,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:20:15,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 2697.31982 ± 1198.679
2025-09-12 16:20:15,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1718.7874, 2098.676, 1752.5173, 4363.005, 3258.33, 1462.8339, 2789.2236, 1743.7751, 2520.1704, 5265.8784]
2025-09-12 16:20:15,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [327.0, 404.0, 337.0, 850.0, 619.0, 277.0, 542.0, 336.0, 489.0, 1000.0]
2025-09-12 16:20:15,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (2697.32) for latency MM1Queue_a033_s075
2025-09-12 16:20:15,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 29 minutes, 29 seconds)
2025-09-12 16:34:54,042 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:34:54,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:37:01,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 2157.57080 ± 1100.767
2025-09-12 16:37:01,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [4492.468, 751.17773, 3021.8735, 1517.4432, 3079.076, 1372.7809, 1787.393, 2431.7368, 805.5399, 2316.2185]
2025-09-12 16:37:01,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [861.0, 142.0, 566.0, 285.0, 616.0, 249.0, 332.0, 459.0, 167.0, 437.0]
2025-09-12 16:37:01,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 12 hours, 15 minutes, 45 seconds)
2025-09-12 16:52:21,325 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:52:21,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:53:56,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 1631.15125 ± 428.655
2025-09-12 16:53:56,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [2437.929, 1870.428, 1765.5426, 1397.0974, 1713.3306, 1289.6554, 1704.8416, 942.4973, 2069.8694, 1120.3209]
2025-09-12 16:53:56,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [463.0, 354.0, 340.0, 269.0, 309.0, 237.0, 324.0, 183.0, 390.0, 224.0]
2025-09-12 16:53:56,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 55 minutes, 31 seconds)
2025-09-12 17:08:03,425 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:08:03,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:10:29,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 2489.14087 ± 658.650
2025-09-12 17:10:29,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [2881.6104, 2145.6062, 3064.0955, 2153.2769, 2005.5549, 2958.4517, 1441.8724, 2249.2292, 3862.1174, 2129.594]
2025-09-12 17:10:29,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [534.0, 421.0, 574.0, 410.0, 375.0, 562.0, 274.0, 423.0, 729.0, 391.0]
2025-09-12 17:10:29,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 46 minutes, 30 seconds)
2025-09-12 17:25:19,747 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:25:19,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:27:44,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 2489.50659 ± 1196.488
2025-09-12 17:27:44,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [2057.2546, 1575.8099, 860.1187, 1851.6323, 2091.007, 2664.6084, 3756.4878, 5384.5845, 2286.573, 2366.991]
2025-09-12 17:27:44,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [397.0, 303.0, 170.0, 346.0, 398.0, 515.0, 705.0, 1000.0, 442.0, 458.0]
2025-09-12 17:27:44,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 36 minutes, 39 seconds)
2025-09-12 17:42:29,351 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:42:29,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:44:32,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 2004.27759 ± 1313.337
2025-09-12 17:44:32,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1096.1074, 4835.0215, 1130.2018, 1322.6249, 3017.1555, 978.7568, 1193.5714, 2756.3013, 3241.783, 471.25156]
2025-09-12 17:44:32,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 931.0, 235.0, 249.0, 587.0, 204.0, 243.0, 547.0, 640.0, 102.0]
2025-09-12 17:44:32,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 11 hours, 14 minutes, 13 seconds)
2025-09-12 17:59:50,005 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:59:50,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:02:39,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 2899.81226 ± 1838.833
2025-09-12 18:02:39,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5239.6323, 5291.4434, 1534.672, 4456.244, 2338.6272, 3651.0586, 1069.7529, 4441.2676, 701.4191, 274.00336]
2025-09-12 18:02:39,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [984.0, 1000.0, 296.0, 845.0, 457.0, 694.0, 223.0, 843.0, 138.0, 60.0]
2025-09-12 18:02:39,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (2899.81) for latency MM1Queue_a033_s075
2025-09-12 18:02:39,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 11 hours, 7 minutes, 56 seconds)
2025-09-12 18:17:04,692 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:17:04,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:19:44,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 2741.01855 ± 1477.639
2025-09-12 18:19:44,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [2177.4646, 2624.6177, 4683.406, 5273.501, 2523.4688, 2090.6992, 660.27167, 4330.9795, 2237.4802, 808.29834]
2025-09-12 18:19:44,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [412.0, 509.0, 869.0, 1000.0, 487.0, 403.0, 138.0, 800.0, 413.0, 174.0]
2025-09-12 18:19:44,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 52 minutes, 8 seconds)
2025-09-12 18:34:08,175 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:34:08,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:36:36,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 2514.96802 ± 1120.520
2025-09-12 18:36:36,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [3176.234, 5267.713, 1893.6636, 1402.9089, 1378.7726, 3218.1472, 1698.2969, 2189.6438, 2041.2496, 2883.051]
2025-09-12 18:36:36,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [598.0, 1000.0, 352.0, 260.0, 265.0, 607.0, 324.0, 407.0, 391.0, 547.0]
2025-09-12 18:36:36,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 37 minutes, 17 seconds)
2025-09-12 18:51:12,913 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:51:12,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:53:35,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 2505.22607 ± 851.115
2025-09-12 18:53:35,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [2733.4082, 3070.2593, 3538.482, 3108.049, 1601.0828, 2004.6754, 1260.9636, 2340.5955, 3864.52, 1530.225]
2025-09-12 18:53:35,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [496.0, 570.0, 649.0, 572.0, 297.0, 392.0, 232.0, 425.0, 711.0, 281.0]
2025-09-12 18:53:35,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 10 hours, 18 minutes, 2 seconds)
2025-09-12 19:08:17,276 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:08:17,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:10:38,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 2388.53198 ± 1172.131
2025-09-12 19:10:38,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1716.8011, 565.6101, 2083.1228, 2428.6296, 3384.926, 2208.0388, 1346.8314, 5028.2964, 3166.1108, 1956.953]
2025-09-12 19:10:38,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [333.0, 106.0, 411.0, 465.0, 632.0, 433.0, 255.0, 929.0, 622.0, 372.0]
2025-09-12 19:10:38,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 10 hours, 2 minutes, 42 seconds)
2025-09-12 19:25:26,369 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:25:26,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:28:08,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 2795.64282 ± 1463.738
2025-09-12 19:28:08,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [863.56714, 5312.419, 2209.788, 2649.6165, 721.931, 2896.509, 3057.8306, 5321.9336, 2528.166, 2394.6687]
2025-09-12 19:28:08,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 1000.0, 416.0, 499.0, 154.0, 555.0, 576.0, 1000.0, 466.0, 460.0]
2025-09-12 19:28:08,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 41 minutes, 20 seconds)
2025-09-12 19:43:42,184 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:43:42,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:47:28,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 3910.40430 ± 1773.461
2025-09-12 19:47:28,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [4425.939, 5423.7944, 5408.385, 505.35822, 3970.9636, 5275.7183, 1663.0764, 5257.3486, 1838.131, 5335.329]
2025-09-12 19:47:28,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [865.0, 1000.0, 1000.0, 99.0, 748.0, 1000.0, 313.0, 1000.0, 346.0, 1000.0]
2025-09-12 19:47:28,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (3910.40) for latency MM1Queue_a033_s075
2025-09-12 19:47:28,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 39 minutes)
2025-09-12 20:01:29,774 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:01:29,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:04:41,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 3323.14307 ± 932.925
2025-09-12 20:04:41,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [2389.188, 3053.063, 4341.2935, 2945.6807, 3489.8896, 5380.3438, 3683.7966, 2029.4551, 2605.1606, 3313.5579]
2025-09-12 20:04:41,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [448.0, 555.0, 824.0, 544.0, 650.0, 1000.0, 668.0, 393.0, 502.0, 618.0]
2025-09-12 20:04:41,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 9 hours, 23 minutes, 43 seconds)
2025-09-12 20:19:59,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:19:59,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:23:18,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 3434.79541 ± 1138.335
2025-09-12 20:23:18,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [2726.3801, 3641.036, 4227.1177, 1887.2045, 3587.4944, 3154.9644, 2360.575, 2237.4402, 5357.5522, 5168.187]
2025-09-12 20:23:18,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [508.0, 695.0, 780.0, 382.0, 676.0, 614.0, 456.0, 420.0, 1000.0, 949.0]
2025-09-12 20:23:18,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 9 hours, 16 minutes, 20 seconds)
2025-09-12 20:38:17,122 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:38:17,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:41:36,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 3381.10205 ± 1821.262
2025-09-12 20:41:36,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [800.5126, 5373.1064, 5272.3213, 4135.001, 5462.6416, 1400.1014, 1555.7063, 5351.5625, 1858.9073, 2601.1597]
2025-09-12 20:41:36,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 1000.0, 1000.0, 775.0, 1000.0, 259.0, 319.0, 1000.0, 378.0, 482.0]
2025-09-12 20:41:36,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 9 hours, 5 minutes, 48 seconds)
2025-09-12 20:56:45,076 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:56:45,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:00:40,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 3994.34619 ± 1575.360
2025-09-12 21:00:40,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5267.755, 5293.7935, 5172.993, 3895.1921, 4348.874, 2873.7969, 875.1598, 5287.574, 5303.0645, 1625.2576]
2025-09-12 21:00:40,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 742.0, 844.0, 552.0, 177.0, 1000.0, 1000.0, 345.0]
2025-09-12 21:00:40,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (3994.35) for latency MM1Queue_a033_s075
2025-09-12 21:00:40,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 56 minutes, 40 seconds)
2025-09-12 21:14:31,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:14:31,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:17:59,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 3490.26123 ± 1498.782
2025-09-12 21:17:59,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [2200.8281, 5306.2695, 2612.2378, 5267.578, 3654.5771, 2454.961, 1140.9865, 2111.2075, 4981.954, 5172.0117]
2025-09-12 21:17:59,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [439.0, 1000.0, 541.0, 1000.0, 697.0, 481.0, 226.0, 413.0, 951.0, 1000.0]
2025-09-12 21:17:59,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 8 hours, 26 minutes, 54 seconds)
2025-09-12 21:33:41,090 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:33:41,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:36:46,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 3214.15210 ± 1411.273
2025-09-12 21:36:46,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [2737.595, 1878.3926, 1645.8176, 5445.666, 3437.4075, 5258.115, 836.4227, 3760.909, 3697.0112, 3444.1863]
2025-09-12 21:36:46,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [510.0, 361.0, 316.0, 1000.0, 642.0, 1000.0, 174.0, 724.0, 690.0, 644.0]
2025-09-12 21:36:46,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 8 hours, 17 minutes, 15 seconds)
2025-09-12 21:50:32,443 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:50:32,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:54:37,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4239.61230 ± 1680.980
2025-09-12 21:54:37,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5496.009, 5321.8574, 4045.348, 3688.6714, 5381.062, 626.0906, 1639.4404, 5458.5347, 5420.6665, 5318.443]
2025-09-12 21:54:37,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 762.0, 675.0, 1000.0, 113.0, 301.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:54:37,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (4239.61) for latency MM1Queue_a033_s075
2025-09-12 21:54:37,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 54 minutes, 51 seconds)
2025-09-12 22:10:23,612 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:10:23,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:14:31,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4226.80957 ± 1358.950
2025-09-12 22:14:31,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5421.1445, 4361.975, 5312.268, 3811.8613, 3466.5652, 5458.9985, 2330.1897, 5423.3403, 5208.694, 1473.0593]
2025-09-12 22:14:31,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 810.0, 1000.0, 700.0, 637.0, 1000.0, 477.0, 1000.0, 978.0, 279.0]
2025-09-12 22:14:31,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 44 minutes, 32 seconds)
2025-09-12 22:28:58,828 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:28:58,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:33:09,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4406.57910 ± 1561.370
2025-09-12 22:33:09,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [4816.2764, 5486.7593, 5469.103, 3333.5898, 1099.2902, 5509.4795, 5403.745, 5424.937, 5456.176, 2066.435]
2025-09-12 22:33:09,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [903.0, 1000.0, 1000.0, 611.0, 208.0, 1000.0, 1000.0, 1000.0, 1000.0, 405.0]
2025-09-12 22:33:09,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (4406.58) for latency MM1Queue_a033_s075
2025-09-12 22:33:09,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 7 hours, 23 minutes, 53 seconds)
2025-09-12 22:47:56,710 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:47:56,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:52:00,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4303.00977 ± 1486.922
2025-09-12 22:52:00,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5386.967, 2582.8696, 5514.3804, 3026.5977, 5470.0845, 5459.9404, 3306.6152, 5378.755, 5523.1377, 1380.7513]
2025-09-12 22:52:00,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 484.0, 1000.0, 562.0, 1000.0, 1000.0, 609.0, 1000.0, 1000.0, 287.0]
2025-09-12 22:52:00,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 7 hours, 12 minutes, 31 seconds)
2025-09-12 23:06:55,977 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:06:55,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:10:52,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4021.18018 ± 1730.232
2025-09-12 23:10:52,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5385.745, 2438.7673, 5362.0337, 1099.762, 2845.3062, 5294.9355, 5386.7456, 1508.5928, 5421.432, 5468.482]
2025-09-12 23:10:52,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 501.0, 1000.0, 208.0, 523.0, 1000.0, 1000.0, 302.0, 1000.0, 1000.0]
2025-09-12 23:10:52,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 54 minutes, 1 second)
2025-09-12 23:25:37,978 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:25:37,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:30:09,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4628.41699 ± 1130.042
2025-09-12 23:30:09,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [4126.122, 1719.2126, 5389.6772, 5424.8877, 3822.9187, 4403.4224, 5459.617, 5139.7314, 5397.6304, 5400.954]
2025-09-12 23:30:09,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [820.0, 338.0, 1000.0, 1000.0, 728.0, 817.0, 1000.0, 974.0, 1000.0, 1000.0]
2025-09-12 23:30:09,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (4628.42) for latency MM1Queue_a033_s075
2025-09-12 23:30:09,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 6 hours, 41 minutes, 13 seconds)
2025-09-12 23:43:43,075 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:43:43,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:48:09,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4639.28809 ± 1212.410
2025-09-12 23:48:09,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5343.0254, 3910.3403, 5489.8916, 2507.5398, 5484.1514, 2275.5193, 5446.136, 5414.4346, 5062.1934, 5459.6455]
2025-09-12 23:48:09,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 722.0, 1000.0, 470.0, 1000.0, 411.0, 1000.0, 1000.0, 931.0, 1000.0]
2025-09-12 23:48:09,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (4639.29) for latency MM1Queue_a033_s075
2025-09-12 23:48:09,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 6 hours, 14 minutes, 32 seconds)
2025-09-13 00:03:25,555 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:03:25,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:07:29,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4216.59912 ± 1280.812
2025-09-13 00:07:29,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [2953.8223, 2538.1948, 4054.1143, 5399.208, 5335.549, 1921.0493, 3879.6887, 5380.286, 5367.294, 5336.787]
2025-09-13 00:07:29,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [565.0, 487.0, 766.0, 1000.0, 1000.0, 361.0, 747.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:07:29,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 58 minutes, 28 seconds)
2025-09-13 00:22:38,479 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:22:38,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:26:51,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4377.34277 ± 1461.567
2025-09-13 00:26:51,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [2605.4988, 4575.771, 5399.991, 1807.8071, 2173.6453, 5463.497, 5503.4526, 5386.0938, 5400.1504, 5457.522]
2025-09-13 00:26:51,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [490.0, 876.0, 1000.0, 339.0, 399.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:26:51,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 41 minutes, 27 seconds)
2025-09-13 00:40:50,870 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:40:50,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:45:12,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4586.34082 ± 1834.894
2025-09-13 00:45:12,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5513.4136, 5423.8677, 5510.3984, 5528.415, 1399.9421, 5483.0674, 480.4788, 5480.1797, 5549.117, 5494.529]
2025-09-13 00:45:12,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 256.0, 1000.0, 97.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:45:12,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 5 hours, 20 minutes, 44 seconds)
2025-09-13 01:00:09,121 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:00:09,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:04:31,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4616.43701 ± 1437.279
2025-09-13 01:04:31,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5563.239, 3256.2222, 5495.68, 5482.848, 5540.997, 1207.2003, 5395.0605, 3267.5928, 5466.7217, 5488.8096]
2025-09-13 01:04:31,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 624.0, 1000.0, 1000.0, 1000.0, 237.0, 1000.0, 618.0, 1000.0, 1000.0]
2025-09-13 01:04:31,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 5 hours, 1 minute, 58 seconds)
2025-09-13 01:19:21,260 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:19:21,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:23:20,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4136.32422 ± 1661.572
2025-09-13 01:23:20,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5398.353, 1940.4817, 4035.8013, 2118.023, 1045.1819, 5395.649, 5356.819, 5400.445, 5366.792, 5305.694]
2025-09-13 01:23:20,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 374.0, 756.0, 400.0, 217.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:23:20,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 45 minutes, 34 seconds)
2025-09-13 01:38:13,440 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:38:13,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:41:56,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 3773.54150 ± 1637.236
2025-09-13 01:41:56,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5250.9907, 5393.5293, 3862.2766, 2816.689, 5370.548, 4909.494, 2329.4348, 5311.235, 1284.9799, 1206.2369]
2025-09-13 01:41:56,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 748.0, 540.0, 1000.0, 920.0, 422.0, 1000.0, 261.0, 251.0]
2025-09-13 01:41:56,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 24 minutes, 27 seconds)
2025-09-13 01:56:21,217 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:56:21,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:00:57,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4844.43164 ± 1070.641
2025-09-13 02:00:57,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5403.7173, 2760.7834, 5461.7363, 2738.766, 5380.0293, 4729.1577, 5492.5073, 5460.4707, 5432.4165, 5584.73]
2025-09-13 02:00:57,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 520.0, 1000.0, 506.0, 1000.0, 895.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:00:57,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (4844.43) for latency MM1Queue_a033_s075
2025-09-13 02:00:57,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 4 hours, 4 minutes, 37 seconds)
2025-09-13 02:16:00,574 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:16:00,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:20:35,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4734.64404 ± 1344.885
2025-09-13 02:20:35,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [1914.6696, 5432.282, 5540.502, 5297.7085, 5341.07, 2187.2888, 5363.1665, 5488.5757, 5412.936, 5368.237]
2025-09-13 02:20:35,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [373.0, 1000.0, 1000.0, 1000.0, 1000.0, 433.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:20:35,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 48 minutes, 55 seconds)
2025-09-13 02:35:16,740 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:35:16,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:39:36,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4539.85938 ± 1196.450
2025-09-13 02:39:36,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5301.619, 5370.3037, 5446.031, 1769.0792, 3266.2856, 3972.5193, 5495.4097, 5378.9653, 5382.8696, 4015.5098]
2025-09-13 02:39:36,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 328.0, 591.0, 747.0, 1000.0, 1000.0, 1000.0, 745.0]
2025-09-13 02:39:36,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 29 minutes, 11 seconds)
2025-09-13 02:55:32,815 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:55:32,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:00:42,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 5321.43066 ± 67.877
2025-09-13 03:00:42,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5262.8213, 5332.967, 5190.7173, 5331.517, 5401.603, 5380.8804, 5285.876, 5367.2646, 5253.416, 5407.246]
2025-09-13 03:00:42,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 999.0, 1000.0, 1000.0]
2025-09-13 03:00:42,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (5321.43) for latency MM1Queue_a033_s075
2025-09-13 03:00:42,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 3 hours, 14 minutes, 43 seconds)
2025-09-13 03:14:33,981 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:14:33,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:19:24,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 5046.84814 ± 892.749
2025-09-13 03:19:24,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5441.5957, 5286.1377, 5415.439, 5408.6587, 5412.8765, 2380.904, 5327.2256, 5131.6206, 5347.5864, 5316.439]
2025-09-13 03:19:24,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 444.0, 1000.0, 951.0, 1000.0, 1000.0]
2025-09-13 03:19:24,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 55 minutes, 26 seconds)
2025-09-13 03:34:35,293 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:34:35,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:39:01,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4509.21484 ± 1618.108
2025-09-13 03:39:01,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5297.0923, 5416.795, 5292.4556, 1068.9081, 5261.1973, 5245.418, 1491.9656, 5357.079, 5399.762, 5261.4746]
2025-09-13 03:39:01,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 225.0, 1000.0, 1000.0, 297.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:39:01,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 36 minutes, 55 seconds)
2025-09-13 03:53:26,861 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:53:26,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:58:06,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4712.66748 ± 1069.612
2025-09-13 03:58:06,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5302.7773, 5250.7383, 5088.3765, 2844.3215, 5240.403, 5350.18, 2336.23, 5296.2104, 5263.6646, 5153.7725]
2025-09-13 03:58:06,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 561.0, 1000.0, 1000.0, 467.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:58:06,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 16 minutes, 30 seconds)
2025-09-13 04:12:50,134 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:12:50,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 04:17:25,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4843.63770 ± 1291.857
2025-09-13 04:17:25,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5462.1377, 5486.064, 5459.481, 2987.7385, 5497.66, 5529.744, 5461.4575, 1669.616, 5416.595, 5465.8833]
2025-09-13 04:17:25,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 559.0, 1000.0, 1000.0, 1000.0, 323.0, 1000.0, 1000.0]
2025-09-13 04:17:25,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 57 minutes, 22 seconds)
2025-09-13 04:33:22,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:33:22,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 04:38:09,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4890.69141 ± 937.439
2025-09-13 04:38:09,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5394.5654, 5404.032, 2484.0515, 5242.5225, 5266.077, 5423.923, 5226.301, 3725.5828, 5362.7495, 5377.1094]
2025-09-13 04:38:09,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 506.0, 1000.0, 1000.0, 1000.0, 1000.0, 747.0, 1000.0, 1000.0]
2025-09-13 04:38:09,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 37 minutes, 27 seconds)
2025-09-13 04:53:07,818 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:53:07,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 04:57:57,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 4971.41016 ± 919.356
2025-09-13 04:57:57,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5475.5454, 3200.8708, 5475.5903, 5555.8516, 5333.6484, 3081.3838, 5464.212, 5390.642, 5244.6045, 5491.7534]
2025-09-13 04:57:57,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 596.0, 1000.0, 1000.0, 1000.0, 589.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:57:57,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 18 minutes, 50 seconds)
2025-09-13 05:11:58,935 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:11:58,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 05:15:43,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 3855.26294 ± 1776.533
2025-09-13 05:15:43,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5399.8506, 1172.2789, 870.19006, 5412.6416, 5318.0596, 2263.0613, 2965.9988, 5447.285, 4279.4565, 5423.803]
2025-09-13 05:15:43,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 231.0, 172.0, 1000.0, 1000.0, 452.0, 550.0, 1000.0, 792.0, 1000.0]
2025-09-13 05:15:43,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 58 minutes)
2025-09-13 05:31:14,728 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:31:14,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 05:36:10,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 5156.11621 ± 747.345
2025-09-13 05:36:10,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [2935.5984, 5441.76, 5439.6562, 5142.2476, 5430.5835, 5422.701, 5444.771, 5296.725, 5530.3643, 5476.7534]
2025-09-13 05:36:10,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [576.0, 1000.0, 1000.0, 931.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:36:10,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 39 minutes, 13 seconds)
2025-09-13 05:50:06,550 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:50:06,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 05:55:11,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 5343.60059 ± 227.822
2025-09-13 05:55:11,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5405.193, 5428.1323, 5437.198, 5313.3135, 5464.219, 5429.697, 5421.3823, 5432.7544, 4669.659, 5434.4585]
2025-09-13 05:55:11,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 856.0, 1000.0]
2025-09-13 05:55:11,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1226 [INFO]: New best (5343.60) for latency MM1Queue_a033_s075
2025-09-13 05:55:11,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 19 minutes, 33 seconds)
2025-09-13 06:11:16,352 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:11:16,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 06:16:16,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 5216.25098 ± 532.637
2025-09-13 06:16:16,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [5378.0835, 5358.326, 5421.0503, 5421.675, 5370.905, 5397.454, 5362.3057, 5400.177, 5432.4805, 3620.0554]
2025-09-13 06:16:16,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 691.0]
2025-09-13 06:16:16,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-humanoid):1251 [DEBUG]: Training session finished
