2025-09-12 00:42:49,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc0-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-12 00:42:49,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc0-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-12 00:42:49,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14e1cf6365d0>}
2025-09-12 00:42:49,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1111 [DEBUG]: using device: cuda
2025-09-12 00:42:49,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1133 [INFO]: Creating new trainer
2025-09-12 00:42:49,490 baseline-mbpac-noiseperc0-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-12 00:42:49,490 baseline-mbpac-noiseperc0-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 00:42:49,501 baseline-mbpac-noiseperc0-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-12 00:42:50,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1194 [DEBUG]: Starting training session...
2025-09-12 00:42:50,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 1/100
2025-09-12 00:55:20,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:55:20,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:55:29,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 168.00549 ± 38.969
2025-09-12 00:55:29,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [169.00542, 194.67201, 169.06958, 134.8444, 270.34564, 167.87534, 159.60101, 148.7221, 132.44954, 133.46982]
2025-09-12 00:55:29,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 38.0, 31.0, 29.0, 53.0, 31.0, 29.0, 28.0, 25.0, 25.0]
2025-09-12 00:55:29,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (168.01) for latency MM1Queue_a033_s075
2025-09-12 00:55:29,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 20 hours, 52 minutes, 7 seconds)
2025-09-12 01:09:40,694 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:09:40,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:10:05,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 358.26059 ± 89.685
2025-09-12 01:10:05,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [323.57214, 291.27808, 275.42453, 400.7896, 159.71614, 423.13318, 439.37973, 436.18417, 455.25223, 377.87622]
2025-09-12 01:10:05,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 53.0, 62.0, 75.0, 34.0, 78.0, 86.0, 89.0, 85.0, 73.0]
2025-09-12 01:10:05,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (358.26) for latency MM1Queue_a033_s075
2025-09-12 01:10:05,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 22 hours, 14 minutes, 36 seconds)
2025-09-12 01:24:09,484 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:24:09,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:24:30,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 361.91602 ± 54.393
2025-09-12 01:24:30,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [305.16168, 367.4696, 305.53403, 445.69296, 400.65802, 367.58817, 330.21323, 339.07077, 457.945, 299.8264]
2025-09-12 01:24:30,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 79.0, 68.0, 83.0, 74.0, 68.0, 62.0, 65.0, 88.0, 68.0]
2025-09-12 01:24:30,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (361.92) for latency MM1Queue_a033_s075
2025-09-12 01:24:30,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 22 hours, 26 minutes, 45 seconds)
2025-09-12 01:38:39,469 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:38:39,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:39:01,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 403.50613 ± 67.676
2025-09-12 01:39:01,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [397.54648, 408.2273, 351.67184, 443.0455, 383.9237, 402.7181, 363.09982, 349.12338, 586.6734, 349.0318]
2025-09-12 01:39:01,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 78.0, 64.0, 82.0, 73.0, 84.0, 67.0, 66.0, 110.0, 65.0]
2025-09-12 01:39:01,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (403.51) for latency MM1Queue_a033_s075
2025-09-12 01:39:01,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 22 hours, 28 minutes, 5 seconds)
2025-09-12 01:53:12,982 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:53:12,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:53:35,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 394.10931 ± 64.133
2025-09-12 01:53:35,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [400.1955, 435.2727, 424.23325, 257.59882, 445.8919, 373.22186, 363.84494, 506.6202, 337.89923, 396.31467]
2025-09-12 01:53:35,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 94.0, 84.0, 59.0, 83.0, 70.0, 73.0, 108.0, 64.0, 75.0]
2025-09-12 01:53:35,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 22 hours, 24 minutes, 6 seconds)
2025-09-12 02:07:42,180 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:07:42,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:08:08,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 459.76056 ± 60.460
2025-09-12 02:08:08,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [498.2838, 446.03085, 384.66623, 493.95288, 586.11395, 480.79904, 360.37073, 418.80872, 458.40994, 470.16898]
2025-09-12 02:08:08,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 84.0, 71.0, 107.0, 128.0, 90.0, 78.0, 78.0, 98.0, 89.0]
2025-09-12 02:08:08,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (459.76) for latency MM1Queue_a033_s075
2025-09-12 02:08:08,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 22 hours, 45 minutes, 36 seconds)
2025-09-12 02:22:20,882 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:22:20,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:22:49,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 492.60025 ± 110.409
2025-09-12 02:22:49,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [453.46066, 393.01678, 527.0893, 342.21432, 450.46967, 497.2599, 533.3508, 419.3244, 767.0062, 542.81036]
2025-09-12 02:22:49,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 85.0, 107.0, 66.0, 84.0, 96.0, 117.0, 92.0, 152.0, 100.0]
2025-09-12 02:22:49,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (492.60) for latency MM1Queue_a033_s075
2025-09-12 02:22:49,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 22 hours, 32 minutes, 53 seconds)
2025-09-12 02:37:05,367 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:37:05,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:37:29,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 427.49570 ± 59.386
2025-09-12 02:37:29,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [391.70535, 384.40283, 580.9759, 481.94455, 392.59818, 450.05377, 416.67822, 390.4454, 396.45493, 389.69818]
2025-09-12 02:37:29,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 71.0, 119.0, 105.0, 74.0, 92.0, 78.0, 74.0, 73.0, 87.0]
2025-09-12 02:37:29,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 22 hours, 23 minutes, 1 second)
2025-09-12 02:51:27,032 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:51:27,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:51:52,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 429.93661 ± 81.837
2025-09-12 02:51:52,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [490.6237, 456.8991, 607.9428, 479.9102, 374.75845, 408.16263, 333.14117, 308.93274, 437.2616, 401.73386]
2025-09-12 02:51:52,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 101.0, 117.0, 92.0, 83.0, 76.0, 76.0, 68.0, 83.0, 88.0]
2025-09-12 02:51:52,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 22 hours, 5 minutes, 59 seconds)
2025-09-12 03:06:14,637 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:06:14,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:06:42,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 475.33218 ± 92.339
2025-09-12 03:06:42,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [537.5002, 341.0749, 367.10785, 456.23898, 608.452, 441.1871, 636.2417, 479.95877, 489.57147, 395.9889]
2025-09-12 03:06:42,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 74.0, 79.0, 85.0, 116.0, 81.0, 135.0, 105.0, 91.0, 87.0]
2025-09-12 03:06:42,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 21 hours, 56 minutes, 3 seconds)
2025-09-12 03:20:46,536 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:20:46,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:21:16,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 546.79572 ± 124.421
2025-09-12 03:21:16,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [421.4139, 494.73932, 359.0532, 726.5098, 572.28046, 401.30447, 610.38324, 553.2785, 751.7356, 577.259]
2025-09-12 03:21:16,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 94.0, 81.0, 138.0, 106.0, 76.0, 114.0, 114.0, 143.0, 109.0]
2025-09-12 03:21:16,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (546.80) for latency MM1Queue_a033_s075
2025-09-12 03:21:16,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 21 hours, 41 minutes, 50 seconds)
2025-09-12 03:35:21,338 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:35:21,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:35:53,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 543.89771 ± 85.772
2025-09-12 03:35:53,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [485.29932, 417.62634, 465.05627, 576.63745, 654.7876, 575.85675, 439.64407, 588.40546, 547.82556, 687.8379]
2025-09-12 03:35:53,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 91.0, 94.0, 107.0, 127.0, 112.0, 98.0, 128.0, 121.0, 128.0]
2025-09-12 03:35:53,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 21 hours, 25 minutes, 56 seconds)
2025-09-12 03:49:57,284 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:49:57,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:50:28,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 549.41510 ± 121.528
2025-09-12 03:50:28,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [514.9071, 640.5368, 423.7897, 636.9734, 498.82217, 781.0085, 546.4694, 365.0993, 432.59814, 653.9465]
2025-09-12 03:50:28,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 133.0, 91.0, 119.0, 101.0, 149.0, 119.0, 78.0, 81.0, 129.0]
2025-09-12 03:50:28,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (549.42) for latency MM1Queue_a033_s075
2025-09-12 03:50:28,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 21 hours, 9 minutes, 59 seconds)
2025-09-12 04:04:47,301 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:04:47,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:05:25,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 561.35242 ± 99.671
2025-09-12 04:05:25,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [490.26047, 601.091, 429.5499, 676.49506, 539.3619, 528.2444, 773.81824, 586.5929, 440.83557, 547.275]
2025-09-12 04:05:25,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 116.0, 80.0, 126.0, 106.0, 113.0, 152.0, 108.0, 82.0, 104.0]
2025-09-12 04:05:25,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (561.35) for latency MM1Queue_a033_s075
2025-09-12 04:05:25,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 21 hours, 4 minutes, 57 seconds)
2025-09-12 04:19:30,333 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:19:30,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:20:01,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 581.92767 ± 77.217
2025-09-12 04:20:01,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [552.92566, 695.0393, 537.20306, 590.07605, 478.15515, 651.29407, 596.2591, 693.6095, 567.05524, 457.6596]
2025-09-12 04:20:01,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 133.0, 101.0, 114.0, 94.0, 119.0, 120.0, 131.0, 109.0, 86.0]
2025-09-12 04:20:01,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (581.93) for latency MM1Queue_a033_s075
2025-09-12 04:20:01,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 20 hours, 46 minutes, 29 seconds)
2025-09-12 04:34:18,575 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:34:18,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:34:54,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 667.01318 ± 218.814
2025-09-12 04:34:54,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1065.847, 1122.3417, 497.6029, 650.9452, 568.52026, 626.1142, 503.85327, 553.7827, 522.5652, 558.55963]
2025-09-12 04:34:54,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [203.0, 215.0, 92.0, 118.0, 113.0, 119.0, 94.0, 102.0, 102.0, 104.0]
2025-09-12 04:34:54,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (667.01) for latency MM1Queue_a033_s075
2025-09-12 04:34:54,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 20 hours, 37 minutes, 3 seconds)
2025-09-12 04:49:03,849 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:49:03,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:49:38,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 610.23669 ± 104.564
2025-09-12 04:49:38,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [715.7783, 499.09375, 699.8264, 846.0339, 502.99854, 584.3593, 575.086, 587.5252, 545.3577, 546.30743]
2025-09-12 04:49:38,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 91.0, 150.0, 176.0, 96.0, 109.0, 116.0, 107.0, 102.0, 114.0]
2025-09-12 04:49:38,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 20 hours, 24 minutes, 16 seconds)
2025-09-12 05:04:07,553 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:04:07,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:04:36,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 534.21259 ± 70.833
2025-09-12 05:04:36,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [524.6604, 538.67017, 614.49115, 558.32367, 496.34543, 626.3885, 602.26605, 445.04816, 391.3291, 544.60376]
2025-09-12 05:04:36,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 98.0, 114.0, 105.0, 92.0, 118.0, 118.0, 82.0, 73.0, 110.0]
2025-09-12 05:04:36,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 20 hours, 15 minutes, 39 seconds)
2025-09-12 05:18:51,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:18:51,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:19:20,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 542.02405 ± 134.283
2025-09-12 05:19:20,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [558.31537, 509.83823, 325.74268, 420.7125, 380.96048, 655.2935, 666.3311, 635.2633, 771.2355, 496.54712]
2025-09-12 05:19:20,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 94.0, 62.0, 86.0, 84.0, 123.0, 123.0, 136.0, 140.0, 97.0]
2025-09-12 05:19:20,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 19 hours, 57 minutes, 33 seconds)
2025-09-12 05:33:38,377 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:33:38,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:34:13,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 617.61603 ± 130.233
2025-09-12 05:34:13,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [499.3656, 689.9397, 572.7433, 768.5433, 673.1043, 526.07996, 582.08826, 894.09766, 510.75037, 459.44778]
2025-09-12 05:34:13,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 130.0, 109.0, 150.0, 137.0, 98.0, 108.0, 187.0, 107.0, 86.0]
2025-09-12 05:34:13,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 19 hours, 47 minutes, 4 seconds)
2025-09-12 05:48:23,991 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:48:23,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:48:57,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 588.83563 ± 123.691
2025-09-12 05:48:57,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [483.02954, 625.53467, 895.7933, 631.274, 499.0303, 550.50616, 411.9933, 644.7105, 570.7275, 575.75757]
2025-09-12 05:48:57,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 120.0, 180.0, 124.0, 92.0, 105.0, 88.0, 125.0, 113.0, 111.0]
2025-09-12 05:48:57,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 19 hours, 29 minutes, 56 seconds)
2025-09-12 06:03:15,334 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:03:15,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:04:04,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 720.14453 ± 141.473
2025-09-12 06:04:04,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [860.36646, 688.758, 552.02185, 840.55774, 725.53143, 882.8851, 725.16144, 882.31525, 470.38303, 573.4656]
2025-09-12 06:04:04,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 133.0, 107.0, 161.0, 135.0, 173.0, 152.0, 167.0, 92.0, 118.0]
2025-09-12 06:04:04,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (720.14) for latency MM1Queue_a033_s075
2025-09-12 06:04:04,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 19 hours, 21 minutes, 9 seconds)
2025-09-12 06:18:23,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:18:23,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:19:02,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 586.18958 ± 147.692
2025-09-12 06:19:02,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [640.15564, 370.3269, 535.4252, 944.2401, 472.3402, 452.23102, 611.33276, 654.5628, 562.4562, 618.8251]
2025-09-12 06:19:02,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 81.0, 100.0, 186.0, 90.0, 101.0, 120.0, 119.0, 107.0, 113.0]
2025-09-12 06:19:02,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 19 hours, 6 minutes, 25 seconds)
2025-09-12 06:33:17,629 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:33:17,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:33:56,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 723.53241 ± 119.672
2025-09-12 06:33:56,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [649.2681, 783.0601, 532.6933, 698.2967, 599.3149, 846.48566, 816.30066, 613.2114, 933.841, 762.8523]
2025-09-12 06:33:56,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 150.0, 113.0, 128.0, 111.0, 158.0, 156.0, 115.0, 169.0, 149.0]
2025-09-12 06:33:56,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (723.53) for latency MM1Queue_a033_s075
2025-09-12 06:33:56,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 18 hours, 54 minutes, 1 second)
2025-09-12 06:48:23,997 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:48:23,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:49:06,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 757.96082 ± 307.500
2025-09-12 06:49:06,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [727.6538, 319.57895, 1236.2279, 492.54446, 590.95685, 707.54407, 507.8046, 1299.7052, 690.23425, 1007.35785]
2025-09-12 06:49:06,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 68.0, 230.0, 105.0, 109.0, 135.0, 104.0, 255.0, 128.0, 214.0]
2025-09-12 06:49:06,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (757.96) for latency MM1Queue_a033_s075
2025-09-12 06:49:06,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 18 hours, 43 minutes, 25 seconds)
2025-09-12 07:03:02,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:03:02,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:03:52,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 854.60889 ± 491.032
2025-09-12 07:03:52,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [421.34744, 546.0491, 2033.0054, 537.9114, 947.33374, 706.0989, 1499.0458, 574.24396, 747.91547, 533.1376]
2025-09-12 07:03:52,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 119.0, 409.0, 112.0, 184.0, 146.0, 309.0, 124.0, 155.0, 104.0]
2025-09-12 07:03:52,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (854.61) for latency MM1Queue_a033_s075
2025-09-12 07:03:52,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 18 hours, 28 minutes, 47 seconds)
2025-09-12 07:18:04,604 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:18:04,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:18:50,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 840.43835 ± 177.603
2025-09-12 07:18:50,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [926.8319, 668.65125, 1034.3468, 916.2364, 700.4819, 466.11942, 1054.9352, 940.1246, 946.6532, 750.00336]
2025-09-12 07:18:50,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 126.0, 216.0, 178.0, 130.0, 93.0, 203.0, 181.0, 178.0, 153.0]
2025-09-12 07:18:50,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 18 hours, 11 minutes, 43 seconds)
2025-09-12 07:32:58,903 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:32:58,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:33:50,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 906.03992 ± 229.722
2025-09-12 07:33:50,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [537.9621, 1020.65015, 832.6177, 1188.424, 994.3324, 1225.3625, 732.3549, 632.7031, 1142.6897, 753.3031]
2025-09-12 07:33:50,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 206.0, 159.0, 225.0, 194.0, 252.0, 138.0, 137.0, 216.0, 156.0]
2025-09-12 07:33:50,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (906.04) for latency MM1Queue_a033_s075
2025-09-12 07:33:50,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 17 hours, 56 minutes, 55 seconds)
2025-09-12 07:48:05,606 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:48:05,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:48:57,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 937.90265 ± 208.910
2025-09-12 07:48:57,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1242.1417, 827.6185, 1171.858, 896.202, 1161.2935, 740.46356, 767.423, 1074.8319, 565.7517, 931.4426]
2025-09-12 07:48:57,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [248.0, 176.0, 220.0, 166.0, 228.0, 136.0, 140.0, 227.0, 105.0, 170.0]
2025-09-12 07:48:57,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (937.90) for latency MM1Queue_a033_s075
2025-09-12 07:48:57,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 17 hours, 45 minutes, 6 seconds)
2025-09-12 08:03:10,849 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:03:10,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:03:59,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 875.99249 ± 300.603
2025-09-12 08:03:59,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [729.8785, 589.8528, 552.4141, 961.71606, 761.65857, 1446.8055, 1035.8186, 578.1105, 1346.2216, 757.44855]
2025-09-12 08:03:59,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 121.0, 117.0, 188.0, 160.0, 277.0, 195.0, 121.0, 274.0, 142.0]
2025-09-12 08:03:59,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 17 hours, 28 minutes, 19 seconds)
2025-09-12 08:18:21,144 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:18:21,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:19:09,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 864.20520 ± 234.206
2025-09-12 08:19:09,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [697.36066, 763.5082, 1295.1912, 988.3183, 622.2851, 804.69617, 1154.2596, 1037.8226, 765.5821, 513.02747]
2025-09-12 08:19:09,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 141.0, 257.0, 187.0, 135.0, 157.0, 241.0, 195.0, 141.0, 110.0]
2025-09-12 08:19:09,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 17 hours, 18 minutes, 57 seconds)
2025-09-12 08:33:30,650 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:33:30,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:34:24,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 991.69794 ± 251.736
2025-09-12 08:34:24,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1126.0554, 952.7627, 889.80286, 948.6883, 882.68524, 1161.9135, 881.6586, 511.44385, 1006.2794, 1555.6898]
2025-09-12 08:34:24,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [223.0, 182.0, 172.0, 193.0, 161.0, 222.0, 168.0, 94.0, 191.0, 302.0]
2025-09-12 08:34:24,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (991.70) for latency MM1Queue_a033_s075
2025-09-12 08:34:24,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 7 minutes, 38 seconds)
2025-09-12 08:48:40,629 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:48:40,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:49:41,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1070.59497 ± 387.436
2025-09-12 08:49:41,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [722.12256, 1211.2081, 1720.2933, 1113.4913, 822.01886, 995.0522, 1591.0094, 866.3115, 1302.4384, 362.00397]
2025-09-12 08:49:41,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 245.0, 324.0, 221.0, 153.0, 190.0, 321.0, 167.0, 268.0, 73.0]
2025-09-12 08:49:41,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (1070.59) for latency MM1Queue_a033_s075
2025-09-12 08:49:41,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 16 hours, 56 minutes, 24 seconds)
2025-09-12 09:03:38,792 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:03:38,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:04:59,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1196.73938 ± 430.381
2025-09-12 09:04:59,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2164.3442, 1431.8286, 1394.8612, 1541.4045, 601.186, 1141.4183, 1004.85254, 988.57324, 740.83105, 958.0937]
2025-09-12 09:04:59,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [414.0, 269.0, 266.0, 294.0, 122.0, 217.0, 191.0, 201.0, 151.0, 185.0]
2025-09-12 09:04:59,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (1196.74) for latency MM1Queue_a033_s075
2025-09-12 09:04:59,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 16 hours, 43 minutes, 34 seconds)
2025-09-12 09:19:26,619 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:19:26,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:20:48,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1477.61218 ± 450.187
2025-09-12 09:20:48,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1492.3557, 1152.4336, 1940.249, 2499.7075, 1093.3387, 1307.9576, 1545.2294, 892.2121, 1684.2296, 1168.4084]
2025-09-12 09:20:48,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [286.0, 224.0, 366.0, 483.0, 218.0, 272.0, 306.0, 175.0, 322.0, 219.0]
2025-09-12 09:20:48,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (1477.61) for latency MM1Queue_a033_s075
2025-09-12 09:20:48,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 38 minutes, 31 seconds)
2025-09-12 09:34:54,361 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:34:54,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:35:55,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1134.67004 ± 524.158
2025-09-12 09:35:55,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [562.5066, 575.0893, 1468.5492, 1852.1426, 487.36356, 892.8878, 879.32324, 1014.97485, 1906.4445, 1707.4182]
2025-09-12 09:35:55,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 105.0, 280.0, 351.0, 92.0, 180.0, 167.0, 193.0, 373.0, 328.0]
2025-09-12 09:35:55,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 22 minutes, 42 seconds)
2025-09-12 09:51:11,346 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:51:11,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:52:40,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1673.89526 ± 754.296
2025-09-12 09:52:40,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1543.9059, 1256.3396, 1213.4895, 1532.622, 1798.4099, 2589.9512, 674.98206, 1323.335, 1347.4988, 3458.4192]
2025-09-12 09:52:40,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [293.0, 232.0, 227.0, 296.0, 353.0, 496.0, 137.0, 245.0, 246.0, 655.0]
2025-09-12 09:52:40,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (1673.90) for latency MM1Queue_a033_s075
2025-09-12 09:52:40,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 26 minutes, 10 seconds)
2025-09-12 10:05:43,747 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:05:43,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:06:45,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1183.07385 ± 303.635
2025-09-12 10:06:45,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [996.5292, 1293.4012, 1361.4653, 1989.9094, 1182.3242, 941.6157, 960.30206, 953.2012, 1010.944, 1141.0461]
2025-09-12 10:06:45,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [187.0, 238.0, 253.0, 372.0, 214.0, 181.0, 181.0, 176.0, 191.0, 214.0]
2025-09-12 10:06:45,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 15 hours, 55 minutes, 40 seconds)
2025-09-12 10:20:49,174 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:20:49,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:21:56,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1223.32422 ± 656.938
2025-09-12 10:21:56,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2412.5334, 746.44147, 604.68823, 642.9876, 1074.4039, 864.7943, 871.7559, 898.42993, 1759.2069, 2358.001]
2025-09-12 10:21:56,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [467.0, 138.0, 112.0, 120.0, 207.0, 161.0, 161.0, 167.0, 330.0, 468.0]
2025-09-12 10:21:56,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 38 minutes, 46 seconds)
2025-09-12 10:36:28,435 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:36:28,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:37:46,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1445.45129 ± 603.071
2025-09-12 10:37:46,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1401.7527, 2709.9858, 922.89874, 1302.0929, 1711.2974, 678.43195, 1442.2123, 1683.603, 599.3395, 2002.8976]
2025-09-12 10:37:46,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [266.0, 513.0, 175.0, 248.0, 321.0, 142.0, 286.0, 320.0, 110.0, 380.0]
2025-09-12 10:37:46,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 23 minutes, 40 seconds)
2025-09-12 10:51:52,081 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:51:52,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:52:48,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1009.50330 ± 156.483
2025-09-12 10:52:48,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1212.8754, 1163.8256, 831.30743, 1055.4973, 777.08276, 1180.379, 1023.03076, 1059.3557, 765.26434, 1026.4146]
2025-09-12 10:52:48,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [232.0, 227.0, 165.0, 209.0, 161.0, 229.0, 195.0, 203.0, 162.0, 187.0]
2025-09-12 10:52:48,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 7 minutes, 4 seconds)
2025-09-12 11:06:52,315 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:06:52,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:07:52,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1090.31079 ± 370.887
2025-09-12 11:07:52,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [733.18024, 1262.234, 760.81476, 791.6076, 1276.204, 1212.2258, 995.81525, 1036.2845, 2027.894, 806.8474]
2025-09-12 11:07:52,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 261.0, 158.0, 167.0, 261.0, 224.0, 190.0, 203.0, 374.0, 162.0]
2025-09-12 11:07:52,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 32 minutes, 20 seconds)
2025-09-12 11:21:56,183 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:21:56,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:22:49,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 960.71399 ± 327.664
2025-09-12 11:22:49,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1492.1855, 1082.6812, 933.95514, 732.09937, 1324.0134, 674.7499, 530.48694, 1244.958, 488.26224, 1103.7477]
2025-09-12 11:22:49,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [311.0, 207.0, 176.0, 142.0, 280.0, 135.0, 119.0, 243.0, 96.0, 213.0]
2025-09-12 11:22:49,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 27 minutes, 12 seconds)
2025-09-12 11:36:54,626 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:36:54,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:37:58,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1227.32275 ± 427.416
2025-09-12 11:37:58,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1166.6946, 604.05865, 969.79675, 1230.0197, 1300.7484, 809.9581, 1420.1521, 1360.2605, 1124.3405, 2287.198]
2025-09-12 11:37:58,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [209.0, 114.0, 181.0, 230.0, 237.0, 149.0, 265.0, 261.0, 206.0, 428.0]
2025-09-12 11:37:58,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 11 minutes, 44 seconds)
2025-09-12 11:52:08,808 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:52:08,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:53:19,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1061.17737 ± 472.550
2025-09-12 11:53:19,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [875.51807, 1264.9474, 425.99487, 879.5228, 688.71356, 1165.4413, 605.7173, 2195.1873, 1288.127, 1222.6034]
2025-09-12 11:53:19,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 239.0, 96.0, 167.0, 132.0, 238.0, 121.0, 421.0, 236.0, 230.0]
2025-09-12 11:53:19,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 13 hours, 50 minutes, 57 seconds)
2025-09-12 12:07:50,124 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:07:50,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:09:02,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1099.65295 ± 493.012
2025-09-12 12:09:02,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [961.9796, 2317.4392, 683.4451, 586.73395, 1506.484, 1029.5121, 1041.3097, 545.13556, 1202.1119, 1122.3777]
2025-09-12 12:09:02,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [187.0, 446.0, 123.0, 112.0, 280.0, 208.0, 194.0, 102.0, 227.0, 210.0]
2025-09-12 12:09:02,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 13 hours, 43 minutes, 23 seconds)
2025-09-12 12:23:10,276 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:23:10,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:24:21,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1052.15942 ± 396.641
2025-09-12 12:24:21,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [618.2575, 1518.2471, 539.6107, 934.21735, 1293.7792, 1461.552, 818.1043, 507.14435, 1258.0029, 1572.679]
2025-09-12 12:24:21,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 286.0, 119.0, 173.0, 258.0, 278.0, 152.0, 110.0, 232.0, 312.0]
2025-09-12 12:24:21,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 30 minutes, 36 seconds)
2025-09-12 12:38:30,165 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:38:30,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:39:48,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1161.10181 ± 549.858
2025-09-12 12:39:48,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [832.39197, 773.45734, 1704.2015, 2525.6414, 589.17474, 1173.9135, 1295.789, 1116.4388, 797.04333, 802.96735]
2025-09-12 12:39:48,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 153.0, 345.0, 484.0, 110.0, 233.0, 259.0, 207.0, 149.0, 166.0]
2025-09-12 12:39:48,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 20 minutes, 38 seconds)
2025-09-12 12:54:03,859 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:54:03,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:55:11,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1300.20544 ± 526.870
2025-09-12 12:55:11,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [851.3236, 1529.5457, 736.5705, 702.1985, 1246.6547, 1631.8612, 1651.3552, 588.99493, 1901.1062, 2162.4443]
2025-09-12 12:55:11,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 279.0, 137.0, 132.0, 231.0, 302.0, 310.0, 107.0, 350.0, 393.0]
2025-09-12 12:55:11,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 7 minutes, 34 seconds)
2025-09-12 13:09:33,621 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:09:33,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:10:51,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1159.25574 ± 525.777
2025-09-12 13:10:51,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [421.44882, 1261.247, 1039.1295, 2365.1853, 1358.4236, 614.3008, 1244.799, 926.6381, 1594.1927, 767.1932]
2025-09-12 13:10:51,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 271.0, 201.0, 439.0, 259.0, 134.0, 227.0, 171.0, 317.0, 161.0]
2025-09-12 13:10:51,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 55 minutes, 24 seconds)
2025-09-12 13:24:57,055 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:24:57,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:26:52,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2169.24561 ± 1118.191
2025-09-12 13:26:52,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1877.4658, 2076.6555, 3335.6658, 1614.5933, 1195.0272, 2247.9126, 707.131, 4856.0654, 1520.3702, 2261.5698]
2025-09-12 13:26:52,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [334.0, 401.0, 631.0, 316.0, 239.0, 410.0, 129.0, 930.0, 286.0, 414.0]
2025-09-12 13:26:52,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (2169.25) for latency MM1Queue_a033_s075
2025-09-12 13:26:52,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 42 minutes, 42 seconds)
2025-09-12 13:41:23,558 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:41:23,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:42:53,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1394.32507 ± 687.806
2025-09-12 13:42:53,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1195.8224, 1005.3889, 1387.5555, 508.2946, 1500.305, 1262.5516, 634.97784, 1879.5656, 3099.8364, 1468.9524]
2025-09-12 13:42:53,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [216.0, 192.0, 273.0, 92.0, 292.0, 236.0, 117.0, 338.0, 582.0, 276.0]
2025-09-12 13:42:53,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 34 minutes, 3 seconds)
2025-09-12 13:56:44,690 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:56:44,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:58:55,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1997.65820 ± 1032.875
2025-09-12 13:58:55,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2717.6763, 2090.488, 1345.6558, 1280.6998, 4130.65, 557.14136, 1719.9119, 756.87726, 2741.2446, 2636.2378]
2025-09-12 13:58:55,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [514.0, 399.0, 257.0, 234.0, 767.0, 123.0, 314.0, 153.0, 536.0, 488.0]
2025-09-12 13:58:55,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 23 minutes, 35 seconds)
2025-09-12 14:13:23,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:13:23,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:15:14,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2114.94727 ± 1472.156
2025-09-12 14:15:14,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5395.188, 1023.4219, 3822.073, 820.20386, 2028.571, 2833.2766, 1620.2007, 719.59143, 573.5928, 2313.3513]
2025-09-12 14:15:14,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 189.0, 697.0, 146.0, 376.0, 549.0, 319.0, 146.0, 115.0, 429.0]
2025-09-12 14:15:14,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 16 minutes, 29 seconds)
2025-09-12 14:29:07,930 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:29:07,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:31:16,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2353.21875 ± 1209.905
2025-09-12 14:31:16,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1980.5605, 2705.661, 1677.9856, 1393.2053, 3008.7808, 5324.8643, 2998.6528, 1997.4941, 1787.0321, 657.9534]
2025-09-12 14:31:16,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [375.0, 535.0, 332.0, 306.0, 587.0, 1000.0, 579.0, 382.0, 353.0, 142.0]
2025-09-12 14:31:16,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (2353.22) for latency MM1Queue_a033_s075
2025-09-12 14:31:16,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 3 minutes, 47 seconds)
2025-09-12 14:45:55,734 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:45:55,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:47:40,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1626.64490 ± 1166.716
2025-09-12 14:47:40,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3240.2358, 657.3465, 1182.6831, 482.45197, 808.9893, 4142.05, 2116.4773, 1342.2833, 1773.8513, 520.0796]
2025-09-12 14:47:40,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [616.0, 121.0, 238.0, 107.0, 151.0, 747.0, 402.0, 251.0, 324.0, 99.0]
2025-09-12 14:47:40,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 51 minutes, 5 seconds)
2025-09-12 15:01:49,147 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:01:49,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:04:48,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2817.04932 ± 1202.659
2025-09-12 15:04:48,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1130.4003, 2631.3147, 2290.248, 5290.1396, 2541.3118, 3290.8438, 3912.5962, 952.47833, 2853.4324, 3277.7305]
2025-09-12 15:04:48,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [203.0, 483.0, 421.0, 1000.0, 466.0, 616.0, 712.0, 184.0, 529.0, 590.0]
2025-09-12 15:04:48,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (2817.05) for latency MM1Queue_a033_s075
2025-09-12 15:04:48,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 44 minutes, 24 seconds)
2025-09-12 15:19:05,198 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:19:05,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:21:12,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2368.17725 ± 1235.867
2025-09-12 15:21:12,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2027.1005, 3556.1448, 3375.1174, 1107.7255, 1234.2369, 511.29965, 1936.3633, 4850.579, 2559.5447, 2523.6602]
2025-09-12 15:21:12,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [381.0, 690.0, 633.0, 210.0, 240.0, 93.0, 377.0, 919.0, 485.0, 478.0]
2025-09-12 15:21:12,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 31 minutes, 13 seconds)
2025-09-12 15:35:45,015 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:35:45,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:39:12,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3198.18164 ± 1732.156
2025-09-12 15:39:12,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1150.8369, 4551.619, 2579.7776, 1855.727, 5171.976, 4675.8774, 933.49243, 5415.6094, 4465.0215, 1181.8779]
2025-09-12 15:39:12,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [207.0, 861.0, 495.0, 343.0, 1000.0, 899.0, 174.0, 1000.0, 830.0, 223.0]
2025-09-12 15:39:12,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (3198.18) for latency MM1Queue_a033_s075
2025-09-12 15:39:12,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 28 minutes, 28 seconds)
2025-09-12 15:52:53,866 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:52:53,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:56:49,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3666.73096 ± 1632.081
2025-09-12 15:56:49,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5397.211, 5100.5166, 3131.2302, 2518.8564, 1097.159, 1849.7589, 5465.034, 5332.2485, 4786.666, 1988.6292]
2025-09-12 15:56:49,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 955.0, 577.0, 463.0, 211.0, 343.0, 1000.0, 1000.0, 887.0, 378.0]
2025-09-12 15:56:49,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (3666.73) for latency MM1Queue_a033_s075
2025-09-12 15:56:49,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 11 hours, 24 minutes, 22 seconds)
2025-09-12 16:11:33,245 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:11:33,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:13:50,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2560.94458 ± 1531.773
2025-09-12 16:13:50,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [761.7677, 1673.0498, 5299.354, 2570.9153, 3784.06, 2862.746, 774.08636, 1098.8645, 2043.4213, 4741.1797]
2025-09-12 16:13:50,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 314.0, 1000.0, 484.0, 719.0, 560.0, 162.0, 223.0, 397.0, 895.0]
2025-09-12 16:13:50,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 11 hours, 12 minutes, 7 seconds)
2025-09-12 16:28:07,007 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:28:07,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:31:46,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3994.56714 ± 1504.925
2025-09-12 16:31:46,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5122.637, 3311.584, 1403.1007, 5199.0, 3853.9868, 5096.8384, 4742.6255, 5002.6577, 1062.7113, 5150.5317]
2025-09-12 16:31:46,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 663.0, 272.0, 1000.0, 739.0, 1000.0, 913.0, 1000.0, 208.0, 1000.0]
2025-09-12 16:31:46,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (3994.57) for latency MM1Queue_a033_s075
2025-09-12 16:31:46,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 11 hours, 58 seconds)
2025-09-12 16:46:38,219 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:46:38,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:48:14,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1787.55115 ± 573.902
2025-09-12 16:48:14,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2251.8918, 2729.8384, 1360.2534, 1529.6057, 1234.5105, 1700.9957, 809.7338, 2599.7185, 1894.2689, 1764.6942]
2025-09-12 16:48:14,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [430.0, 508.0, 256.0, 273.0, 229.0, 345.0, 141.0, 497.0, 366.0, 350.0]
2025-09-12 16:48:14,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 44 minutes)
2025-09-12 17:01:36,775 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:01:36,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:03:58,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2711.74902 ± 1014.419
2025-09-12 17:03:58,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1971.4023, 2981.5872, 3545.3403, 1627.2938, 4156.234, 2059.675, 2346.773, 1274.9418, 2685.0718, 4469.1724]
2025-09-12 17:03:58,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [376.0, 542.0, 689.0, 304.0, 774.0, 382.0, 450.0, 242.0, 490.0, 831.0]
2025-09-12 17:03:58,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 10 hours, 10 minutes, 17 seconds)
2025-09-12 17:18:06,149 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:18:06,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:20:53,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3079.34497 ± 1802.582
2025-09-12 17:20:53,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5236.1406, 5270.884, 784.948, 2174.735, 3348.5574, 942.23804, 1506.0742, 1527.8822, 4845.3945, 5156.5947]
2025-09-12 17:20:53,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 169.0, 406.0, 628.0, 204.0, 301.0, 278.0, 915.0, 1000.0]
2025-09-12 17:20:53,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 48 minutes, 26 seconds)
2025-09-12 17:34:54,120 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:34:54,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:38:18,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3218.09619 ± 1327.892
2025-09-12 17:38:18,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5006.011, 2924.028, 5400.2393, 1810.1979, 1487.434, 1903.498, 3519.9973, 4645.1523, 2353.6167, 3130.7847]
2025-09-12 17:38:18,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [912.0, 547.0, 1000.0, 329.0, 297.0, 341.0, 640.0, 845.0, 444.0, 582.0]
2025-09-12 17:38:18,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 34 minutes, 19 seconds)
2025-09-12 17:53:21,296 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:53:21,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:56:23,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3509.81299 ± 1620.942
2025-09-12 17:56:23,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5410.325, 1077.7203, 2679.8425, 4555.9375, 4255.4297, 5338.637, 5577.071, 1767.2677, 2673.9631, 1761.9348]
2025-09-12 17:56:23,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 218.0, 497.0, 848.0, 768.0, 1000.0, 1000.0, 320.0, 505.0, 328.0]
2025-09-12 17:56:23,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 18 minutes, 26 seconds)
2025-09-12 18:09:56,031 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:09:56,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:12:29,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2911.00562 ± 1420.166
2025-09-12 18:12:29,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2401.3042, 2032.7776, 1894.4141, 2967.7014, 3058.9785, 974.6746, 5399.064, 3339.4407, 1621.9199, 5419.7817]
2025-09-12 18:12:29,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [429.0, 377.0, 365.0, 540.0, 554.0, 179.0, 1000.0, 623.0, 303.0, 1000.0]
2025-09-12 18:12:29,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 59 minutes, 14 seconds)
2025-09-12 18:26:48,067 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:26:48,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:30:38,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3641.35229 ± 1448.098
2025-09-12 18:30:38,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2555.316, 5478.5625, 2969.021, 5464.193, 2172.4143, 1776.379, 5433.562, 3847.9734, 4697.671, 2018.4318]
2025-09-12 18:30:38,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [473.0, 1000.0, 528.0, 1000.0, 396.0, 321.0, 1000.0, 704.0, 859.0, 364.0]
2025-09-12 18:30:38,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 57 minutes, 19 seconds)
2025-09-12 18:46:09,219 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:46:09,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:48:59,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2542.08643 ± 1758.919
2025-09-12 18:48:59,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [689.3082, 2974.969, 1039.4712, 1304.5453, 5122.1274, 5312.253, 1505.8749, 661.0394, 2161.7588, 4649.5166]
2025-09-12 18:48:59,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 570.0, 192.0, 254.0, 1000.0, 1000.0, 302.0, 133.0, 403.0, 912.0]
2025-09-12 18:48:59,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 48 minutes, 38 seconds)
2025-09-12 19:03:14,409 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:03:14,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:06:02,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3138.71460 ± 1645.313
2025-09-12 19:06:02,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2360.4526, 2964.6074, 2733.6692, 592.3912, 5280.291, 3532.743, 1249.0636, 5299.238, 1926.0277, 5448.662]
2025-09-12 19:06:02,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [430.0, 564.0, 503.0, 119.0, 1000.0, 647.0, 230.0, 1000.0, 377.0, 978.0]
2025-09-12 19:06:02,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 28 minutes, 49 seconds)
2025-09-12 19:19:55,202 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:19:55,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:23:14,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3706.98193 ± 1688.954
2025-09-12 19:23:14,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5281.9795, 5497.5146, 2830.8918, 3682.3362, 3282.448, 4088.2195, 5370.201, 646.1538, 5326.49, 1063.5868]
2025-09-12 19:23:14,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [976.0, 1000.0, 535.0, 698.0, 628.0, 761.0, 1000.0, 128.0, 1000.0, 192.0]
2025-09-12 19:23:14,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 8 hours, 6 minutes, 21 seconds)
2025-09-12 19:38:36,535 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:38:36,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:41:45,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3520.00342 ± 1169.852
2025-09-12 19:41:45,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2435.0334, 4785.9404, 3541.1572, 3222.7144, 5232.2188, 2523.6892, 5348.7627, 3647.4946, 2556.774, 1906.2489]
2025-09-12 19:41:45,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [465.0, 877.0, 670.0, 610.0, 1000.0, 474.0, 1000.0, 684.0, 469.0, 358.0]
2025-09-12 19:41:45,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 8 hours, 1 minute, 59 seconds)
2025-09-12 19:54:43,511 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:54:43,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:57:04,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2745.11035 ± 1154.239
2025-09-12 19:57:04,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2800.8645, 3904.3298, 2785.8843, 1669.3873, 1536.5015, 2287.5823, 5410.165, 1333.9296, 2747.3274, 2975.1313]
2025-09-12 19:57:04,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [500.0, 705.0, 503.0, 298.0, 273.0, 400.0, 1000.0, 246.0, 510.0, 540.0]
2025-09-12 19:57:04,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 29 minutes, 29 seconds)
2025-09-12 20:12:01,095 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:12:01,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:15:36,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3887.61450 ± 1624.524
2025-09-12 20:15:36,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5225.601, 5154.39, 5226.798, 2256.0435, 5199.7266, 1895.6917, 1544.0363, 5184.3584, 5253.9043, 1935.5962]
2025-09-12 20:15:36,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 449.0, 1000.0, 378.0, 303.0, 1000.0, 1000.0, 380.0]
2025-09-12 20:15:36,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 13 minutes)
2025-09-12 20:29:31,717 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:29:31,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:33:10,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4102.70850 ± 1617.910
2025-09-12 20:33:10,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5205.2744, 5352.2266, 3997.6704, 5356.973, 5303.3315, 1790.378, 5069.2334, 2862.5413, 5315.8457, 773.60913]
2025-09-12 20:33:10,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 762.0, 1000.0, 1000.0, 346.0, 947.0, 550.0, 1000.0, 146.0]
2025-09-12 20:33:10,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (4102.71) for latency MM1Queue_a033_s075
2025-09-12 20:33:10,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 58 minutes, 17 seconds)
2025-09-12 20:48:48,762 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:48:48,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:52:25,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4205.26318 ± 1350.859
2025-09-12 20:52:25,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3811.1868, 5402.283, 5461.2646, 3979.9534, 5418.331, 2040.231, 5404.382, 2799.6697, 2250.364, 5484.9688]
2025-09-12 20:52:25,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [692.0, 1000.0, 995.0, 725.0, 980.0, 387.0, 1000.0, 506.0, 421.0, 1000.0]
2025-09-12 20:52:25,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (4205.26) for latency MM1Queue_a033_s075
2025-09-12 20:52:25,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 50 minutes, 16 seconds)
2025-09-12 21:05:50,479 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:05:50,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:09:44,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4197.93066 ± 1418.307
2025-09-12 21:09:44,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5102.9263, 5144.332, 5142.655, 5140.9346, 2559.9993, 1880.0974, 5014.6636, 5070.4185, 5201.1514, 1722.1302]
2025-09-12 21:09:44,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 528.0, 360.0, 1000.0, 1000.0, 1000.0, 343.0]
2025-09-12 21:09:44,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 27 minutes, 8 seconds)
2025-09-12 21:25:02,175 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:25:02,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:29:17,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4712.17090 ± 960.790
2025-09-12 21:29:17,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3736.8777, 5238.9844, 4335.4966, 5304.236, 5257.6396, 2251.008, 5232.2764, 5246.175, 5272.5127, 5246.505]
2025-09-12 21:29:17,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [717.0, 1000.0, 827.0, 1000.0, 1000.0, 425.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:29:17,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (4712.17) for latency MM1Queue_a033_s075
2025-09-12 21:29:17,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 6 hours, 27 minutes, 17 seconds)
2025-09-12 21:43:16,491 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:43:16,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:47:34,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4891.75928 ± 1188.992
2025-09-12 21:47:34,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5406.1934, 5409.1045, 5358.1426, 5380.43, 1389.9218, 5439.159, 5229.1455, 5280.536, 4630.5205, 5394.4434]
2025-09-12 21:47:34,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 257.0, 1000.0, 960.0, 1000.0, 842.0, 1000.0]
2025-09-12 21:47:34,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (4891.76) for latency MM1Queue_a033_s075
2025-09-12 21:47:34,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 6 hours, 7 minutes, 53 seconds)
2025-09-12 22:01:39,317 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:01:39,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:05:47,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4564.61084 ± 1201.198
2025-09-12 22:05:47,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5224.367, 5315.815, 1774.7092, 5250.5054, 2950.2507, 5341.9287, 5228.0894, 5295.3403, 5333.047, 3932.056]
2025-09-12 22:05:47,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 359.0, 1000.0, 583.0, 1000.0, 1000.0, 997.0, 1000.0, 745.0]
2025-09-12 22:05:47,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 51 minutes, 55 seconds)
2025-09-12 22:19:43,736 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:19:43,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:23:27,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3420.16870 ± 1637.500
2025-09-12 22:23:27,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1428.1976, 3403.669, 1332.5968, 5246.7017, 672.8457, 5252.799, 4097.711, 5302.354, 3576.3716, 3888.44]
2025-09-12 22:23:27,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [279.0, 658.0, 255.0, 1000.0, 128.0, 1000.0, 745.0, 1000.0, 683.0, 719.0]
2025-09-12 22:23:27,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 27 minutes, 42 seconds)
2025-09-12 22:38:15,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:38:15,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:42:45,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4164.81787 ± 1623.034
2025-09-12 22:42:45,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5408.0005, 5359.3784, 5397.6787, 1644.3698, 3866.9553, 5387.146, 5449.9653, 1858.6016, 1850.6864, 5425.3975]
2025-09-12 22:42:45,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 314.0, 715.0, 1000.0, 1000.0, 337.0, 361.0, 1000.0]
2025-09-12 22:42:45,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 5 hours, 16 minutes, 14 seconds)
2025-09-12 22:56:41,530 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:56:41,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:59:13,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2661.15552 ± 1789.464
2025-09-12 22:59:13,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1383.6227, 5026.2705, 2518.6604, 700.0034, 5211.5464, 2124.6926, 2965.965, 668.2578, 788.7835, 5223.7524]
2025-09-12 22:59:13,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [285.0, 1000.0, 515.0, 142.0, 1000.0, 458.0, 600.0, 144.0, 170.0, 1000.0]
2025-09-12 22:59:13,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 47 minutes, 47 seconds)
2025-09-12 23:14:15,099 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:14:15,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:17:56,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4073.42383 ± 1581.857
2025-09-12 23:17:56,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [4735.0312, 5370.931, 2852.9963, 5342.2676, 5317.3965, 5282.2837, 635.2449, 5339.8413, 2316.956, 3541.2908]
2025-09-12 23:17:56,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [885.0, 1000.0, 554.0, 1000.0, 1000.0, 1000.0, 133.0, 1000.0, 429.0, 674.0]
2025-09-12 23:17:56,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 31 minutes, 4 seconds)
2025-09-12 23:31:41,651 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:31:41,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:35:33,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4371.08203 ± 1345.641
2025-09-12 23:35:33,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1347.6691, 3018.644, 5332.285, 5377.605, 5399.23, 3587.0146, 5352.6626, 3582.3494, 5364.7764, 5348.5825]
2025-09-12 23:35:33,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [243.0, 555.0, 1000.0, 1000.0, 1000.0, 678.0, 1000.0, 675.0, 1000.0, 1000.0]
2025-09-12 23:35:33,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 11 minutes, 19 seconds)
2025-09-12 23:49:55,822 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:49:55,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:54:13,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4728.91504 ± 646.772
2025-09-12 23:54:13,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5175.5615, 5244.328, 3887.3354, 4065.2173, 3469.4646, 5146.5767, 5245.7817, 5285.895, 5186.819, 4582.1685]
2025-09-12 23:54:13,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 773.0, 786.0, 678.0, 1000.0, 1000.0, 1000.0, 1000.0, 883.0]
2025-09-12 23:54:14,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 56 minutes, 1 second)
2025-09-13 00:09:07,072 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:09:07,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:14:10,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4755.15283 ± 1212.273
2025-09-13 00:14:10,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5401.67, 5371.7705, 5472.7, 5450.4297, 2092.804, 5416.351, 5453.676, 2654.111, 4797.136, 5440.88]
2025-09-13 00:14:10,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 378.0, 1000.0, 1000.0, 482.0, 879.0, 1000.0]
2025-09-13 00:14:10,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 39 minutes, 25 seconds)
2025-09-13 00:28:17,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:28:17,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:33:19,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4581.57422 ± 1391.876
2025-09-13 00:33:19,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5310.797, 5344.764, 5250.0166, 5164.4463, 5307.11, 1256.1201, 5118.491, 5250.479, 5362.935, 2450.582]
2025-09-13 00:33:19,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 250.0, 1000.0, 1000.0, 1000.0, 465.0]
2025-09-13 00:33:19,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 27 minutes, 1 second)
2025-09-13 00:47:05,044 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:47:05,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:51:15,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4823.75732 ± 1322.924
2025-09-13 00:51:15,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2275.841, 5388.454, 2085.6577, 5486.501, 5455.9688, 5549.4473, 5547.595, 5454.4927, 5498.6494, 5494.969]
2025-09-13 00:51:15,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [417.0, 984.0, 369.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:51:15,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 3 hours, 6 minutes, 38 seconds)
2025-09-13 01:06:04,738 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:06:04,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:09:29,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3989.03174 ± 1492.940
2025-09-13 01:09:29,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2652.0532, 5506.666, 5569.5874, 5609.9976, 3675.7783, 4648.8867, 1794.7223, 5626.2915, 2321.9294, 2484.404]
2025-09-13 01:09:29,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [476.0, 1000.0, 1000.0, 1000.0, 678.0, 850.0, 334.0, 1000.0, 424.0, 450.0]
2025-09-13 01:09:29,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 49 minutes, 5 seconds)
2025-09-13 01:22:56,303 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:22:56,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:27:55,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4735.65186 ± 1575.823
2025-09-13 01:27:55,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5511.313, 5579.4062, 5563.055, 5379.7476, 2698.4346, 5510.1367, 5416.4966, 5530.76, 726.18274, 5440.986]
2025-09-13 01:27:55,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 483.0, 1000.0, 1000.0, 1000.0, 122.0, 1000.0]
2025-09-13 01:27:55,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 29 minutes, 54 seconds)
2025-09-13 01:43:46,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:43:46,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:48:20,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4205.37793 ± 1535.215
2025-09-13 01:48:20,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5285.482, 5345.212, 5201.6377, 1090.3402, 3825.9893, 5420.655, 3409.312, 5318.5635, 5328.554, 1828.0359]
2025-09-13 01:48:20,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 199.0, 721.0, 1000.0, 632.0, 1000.0, 1000.0, 344.0]
2025-09-13 01:48:20,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 11 minutes, 50 seconds)
2025-09-13 02:02:45,289 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:02:45,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:07:24,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 5238.96533 ± 332.605
2025-09-13 02:07:24,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5294.9707, 5225.314, 5425.0093, 5373.091, 5353.8013, 4257.319, 5414.275, 5415.4673, 5317.0083, 5313.396]
2025-09-13 02:07:24,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 778.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:07:24,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (5238.97) for latency MM1Queue_a033_s075
2025-09-13 02:07:24,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 52 minutes, 54 seconds)
2025-09-13 02:21:04,587 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:21:04,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:25:23,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4832.06885 ± 954.387
2025-09-13 02:25:23,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3737.7507, 5265.8774, 5234.516, 5296.9854, 2326.613, 5275.45, 5308.7817, 5217.881, 5359.9424, 5296.89]
2025-09-13 02:25:23,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [704.0, 1000.0, 1000.0, 1000.0, 456.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:25:23,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 34 minutes, 7 seconds)
2025-09-13 02:40:36,569 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:40:36,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:45:27,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4480.95703 ± 1701.828
2025-09-13 02:45:27,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5247.085, 5286.564, 5392.0425, 5375.029, 851.34546, 5359.216, 5260.427, 5399.1646, 1319.0956, 5319.605]
2025-09-13 02:45:27,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 156.0, 1000.0, 1000.0, 1000.0, 253.0, 1000.0]
2025-09-13 02:45:27,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 16 minutes, 45 seconds)
2025-09-13 02:59:55,773 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:59:55,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:05:02,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4610.32422 ± 1527.912
2025-09-13 03:05:02,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [690.0627, 2687.5093, 5309.1143, 5280.0415, 5324.0103, 5382.1543, 5377.294, 5315.728, 5340.3896, 5396.938]
2025-09-13 03:05:02,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 515.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:05:02,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 58 minutes, 15 seconds)
2025-09-13 03:19:41,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:19:41,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:24:04,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4093.12256 ± 1737.856
2025-09-13 03:24:04,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5487.4585, 5461.0625, 5423.508, 1444.9136, 5445.535, 1296.2019, 2937.8794, 5502.5303, 5482.31, 2449.8267]
2025-09-13 03:24:04,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 254.0, 1000.0, 244.0, 527.0, 1000.0, 1000.0, 440.0]
2025-09-13 03:24:04,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 38 minutes, 17 seconds)
2025-09-13 03:38:44,031 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:38:44,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:43:21,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 5004.18359 ± 910.087
2025-09-13 03:43:21,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5219.6807, 5300.269, 5333.324, 5354.8594, 5268.215, 5282.982, 5366.4707, 5353.9995, 2277.051, 5284.9824]
2025-09-13 03:43:21,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 434.0, 1000.0]
2025-09-13 03:43:21,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 19 minutes, 11 seconds)
2025-09-13 03:57:47,968 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:57:47,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 04:01:54,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 4620.68262 ± 1538.478
2025-09-13 04:01:54,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5478.7207, 5427.337, 5434.138, 1077.8276, 5017.1304, 2098.0437, 5406.6377, 5458.2134, 5393.5, 5415.2827]
2025-09-13 04:01:54,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 197.0, 922.0, 377.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:01:54,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1251 [DEBUG]: Training session finished
