2025-09-12 02:35:16,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc25-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-12 02:35:16,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc25-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-12 02:35:16,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14cc1a73d590>}
2025-09-12 02:35:16,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1111 [DEBUG]: using device: cuda
2025-09-12 02:35:16,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1133 [INFO]: Creating new trainer
2025-09-12 02:35:16,171 baseline-mbpac-noiseperc25-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-12 02:35:16,172 baseline-mbpac-noiseperc25-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 02:35:16,183 baseline-mbpac-noiseperc25-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-12 02:35:17,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1194 [DEBUG]: Starting training session...
2025-09-12 02:35:17,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 1/100
2025-09-12 02:47:20,148 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:47:20,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:47:38,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 314.43985 ± 40.216
2025-09-12 02:47:38,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [247.01215, 295.111, 326.32495, 406.0288, 300.36035, 320.79858, 323.23987, 289.1504, 347.68213, 288.69028]
2025-09-12 02:47:38,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [46.0, 61.0, 67.0, 85.0, 55.0, 73.0, 62.0, 61.0, 71.0, 55.0]
2025-09-12 02:47:38,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (314.44) for latency MM1Queue_a033_s075
2025-09-12 02:47:38,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 20 hours, 22 minutes, 22 seconds)
2025-09-12 03:01:10,249 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:01:10,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:01:31,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 360.09100 ± 93.722
2025-09-12 03:01:31,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [311.55374, 193.69287, 256.14334, 493.3665, 443.90933, 408.68155, 359.59113, 489.96317, 328.02075, 315.98734]
2025-09-12 03:01:31,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 41.0, 56.0, 104.0, 92.0, 92.0, 70.0, 111.0, 61.0, 61.0]
2025-09-12 03:01:31,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (360.09) for latency MM1Queue_a033_s075
2025-09-12 03:01:31,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 21 hours, 25 minutes, 30 seconds)
2025-09-12 03:15:07,117 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:15:07,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:15:26,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 361.76675 ± 51.768
2025-09-12 03:15:26,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [346.2867, 378.9973, 431.4489, 464.55295, 392.98993, 336.07874, 297.81985, 312.1762, 313.3858, 343.93106]
2025-09-12 03:15:26,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 69.0, 81.0, 103.0, 73.0, 62.0, 55.0, 58.0, 57.0, 63.0]
2025-09-12 03:15:26,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (361.77) for latency MM1Queue_a033_s075
2025-09-12 03:15:26,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 21 hours, 38 minutes, 19 seconds)
2025-09-12 03:29:01,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:29:01,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:29:17,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 298.13959 ± 33.406
2025-09-12 03:29:17,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [224.6193, 336.71234, 315.32993, 316.67572, 271.12027, 294.05005, 270.99518, 336.81912, 322.88382, 292.19028]
2025-09-12 03:29:17,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [42.0, 62.0, 57.0, 59.0, 50.0, 59.0, 51.0, 62.0, 60.0, 54.0]
2025-09-12 03:29:17,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 21 hours, 35 minutes, 53 seconds)
2025-09-12 03:42:53,877 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:42:53,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:43:17,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 417.23578 ± 96.717
2025-09-12 03:43:17,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [525.894, 316.47668, 342.7727, 399.50806, 433.23062, 268.85287, 614.6838, 462.9426, 435.82257, 372.17392]
2025-09-12 03:43:17,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 59.0, 66.0, 76.0, 81.0, 57.0, 119.0, 100.0, 84.0, 69.0]
2025-09-12 03:43:17,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (417.24) for latency MM1Queue_a033_s075
2025-09-12 03:43:17,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 21 hours, 31 minutes, 50 seconds)
2025-09-12 03:56:44,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:56:44,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:57:10,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 456.20004 ± 91.940
2025-09-12 03:57:10,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [361.76575, 552.0389, 413.94077, 347.43997, 405.587, 552.8148, 638.93866, 376.1224, 433.11722, 480.2352]
2025-09-12 03:57:10,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 105.0, 77.0, 66.0, 88.0, 118.0, 134.0, 82.0, 82.0, 96.0]
2025-09-12 03:57:10,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (456.20) for latency MM1Queue_a033_s075
2025-09-12 03:57:10,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 21 hours, 47 minutes, 12 seconds)
2025-09-12 04:10:39,663 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:10:39,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:11:00,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 376.04324 ± 73.893
2025-09-12 04:11:00,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [435.05164, 497.97412, 357.98734, 357.49478, 365.13614, 289.295, 324.45868, 485.6057, 386.15164, 261.27728]
2025-09-12 04:11:00,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 94.0, 80.0, 67.0, 68.0, 53.0, 59.0, 95.0, 72.0, 48.0]
2025-09-12 04:11:00,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 21 hours, 32 minutes, 14 seconds)
2025-09-12 04:24:30,894 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:24:30,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:24:54,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 442.77179 ± 146.411
2025-09-12 04:24:54,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [471.94955, 266.72736, 754.8453, 240.04063, 583.77954, 388.3697, 380.72626, 510.70248, 340.41318, 490.16385]
2025-09-12 04:24:54,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 57.0, 147.0, 44.0, 109.0, 71.0, 81.0, 96.0, 61.0, 89.0]
2025-09-12 04:24:54,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 21 hours, 18 minutes, 11 seconds)
2025-09-12 04:38:28,948 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:38:28,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:38:50,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 407.56729 ± 47.963
2025-09-12 04:38:50,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [430.76334, 377.5413, 387.885, 424.98364, 382.93323, 381.36456, 411.7918, 537.1939, 374.3666, 366.84927]
2025-09-12 04:38:50,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 74.0, 71.0, 78.0, 71.0, 70.0, 76.0, 98.0, 69.0, 68.0]
2025-09-12 04:38:50,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 21 hours, 5 minutes, 47 seconds)
2025-09-12 04:52:21,060 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:52:21,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:52:41,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 391.22296 ± 45.859
2025-09-12 04:52:41,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [386.9075, 307.16605, 380.72116, 363.6282, 419.29327, 417.53314, 471.68176, 331.10498, 407.95297, 426.2407]
2025-09-12 04:52:41,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 57.0, 69.0, 68.0, 78.0, 77.0, 86.0, 60.0, 78.0, 81.0]
2025-09-12 04:52:41,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 20 hours, 49 minutes, 27 seconds)
2025-09-12 05:06:11,261 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:06:11,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:06:32,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 392.63898 ± 59.150
2025-09-12 05:06:32,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [290.30566, 384.32532, 516.85925, 390.52878, 371.4545, 357.63165, 346.7226, 416.31815, 460.9446, 391.29904]
2025-09-12 05:06:32,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 71.0, 110.0, 72.0, 69.0, 66.0, 73.0, 78.0, 86.0, 72.0]
2025-09-12 05:06:32,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 20 hours, 34 minutes, 50 seconds)
2025-09-12 05:20:06,262 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:20:06,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:20:31,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 465.88663 ± 119.595
2025-09-12 05:20:31,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [648.25287, 341.1887, 499.7977, 462.5305, 473.1551, 609.73663, 376.02884, 327.331, 310.44986, 610.39453]
2025-09-12 05:20:31,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 62.0, 93.0, 94.0, 87.0, 117.0, 68.0, 71.0, 61.0, 113.0]
2025-09-12 05:20:31,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (465.89) for latency MM1Queue_a033_s075
2025-09-12 05:20:31,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 20 hours, 23 minutes, 34 seconds)
2025-09-12 05:33:56,421 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:33:56,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:34:20,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 429.61588 ± 57.594
2025-09-12 05:34:20,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [447.45065, 461.24567, 520.014, 382.93112, 411.04266, 411.76608, 400.60657, 383.53143, 344.70193, 532.86847]
2025-09-12 05:34:20,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 86.0, 113.0, 71.0, 79.0, 76.0, 74.0, 72.0, 78.0, 113.0]
2025-09-12 05:34:20,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 20 hours, 8 minutes, 4 seconds)
2025-09-12 05:47:48,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:47:48,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:48:13,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 486.76050 ± 116.857
2025-09-12 05:48:13,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [512.08594, 534.52094, 347.2351, 404.74728, 378.19202, 515.1087, 463.89883, 627.7302, 726.8973, 357.1885]
2025-09-12 05:48:13,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 100.0, 63.0, 74.0, 70.0, 96.0, 93.0, 118.0, 132.0, 65.0]
2025-09-12 05:48:13,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (486.76) for latency MM1Queue_a033_s075
2025-09-12 05:48:13,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 19 hours, 53 minutes, 27 seconds)
2025-09-12 06:01:42,501 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:01:42,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:02:06,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 427.47992 ± 114.986
2025-09-12 06:02:06,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [321.33322, 354.7454, 446.50967, 308.1016, 432.92535, 633.6338, 638.8735, 397.38364, 313.79248, 427.50067]
2025-09-12 06:02:06,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 77.0, 97.0, 56.0, 80.0, 126.0, 120.0, 73.0, 69.0, 79.0]
2025-09-12 06:02:06,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 19 hours, 39 minutes, 53 seconds)
2025-09-12 06:15:33,981 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:15:33,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:15:58,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 447.78574 ± 65.505
2025-09-12 06:15:58,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [374.61243, 540.8992, 435.67, 441.88107, 552.44257, 525.9788, 415.39688, 362.3521, 391.43045, 437.1938]
2025-09-12 06:15:58,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 117.0, 81.0, 89.0, 103.0, 109.0, 77.0, 70.0, 72.0, 81.0]
2025-09-12 06:15:58,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 19 hours, 26 minutes, 22 seconds)
2025-09-12 06:29:39,484 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:29:39,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:30:05,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 485.81006 ± 106.752
2025-09-12 06:30:05,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [551.1232, 459.73788, 605.4167, 292.46762, 464.4241, 686.8627, 440.13428, 439.0838, 532.8598, 385.99054]
2025-09-12 06:30:05,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 82.0, 116.0, 54.0, 84.0, 127.0, 82.0, 80.0, 98.0, 86.0]
2025-09-12 06:30:05,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 19 hours, 14 minutes, 42 seconds)
2025-09-12 06:43:29,117 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:43:29,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:43:51,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 422.36591 ± 63.538
2025-09-12 06:43:51,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [334.73035, 405.47986, 376.95013, 428.8253, 393.74582, 359.4478, 462.3616, 456.19116, 433.0068, 572.9203]
2025-09-12 06:43:51,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 74.0, 82.0, 89.0, 72.0, 66.0, 86.0, 85.0, 82.0, 105.0]
2025-09-12 06:43:51,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 19 hours, 4 seconds)
2025-09-12 06:57:23,866 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:57:23,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:57:52,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 530.83850 ± 136.028
2025-09-12 06:57:52,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [596.73474, 544.33295, 667.8536, 332.224, 405.75027, 641.68677, 435.33173, 702.69006, 652.9228, 328.858]
2025-09-12 06:57:52,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 101.0, 122.0, 69.0, 78.0, 116.0, 78.0, 150.0, 136.0, 60.0]
2025-09-12 06:57:52,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (530.84) for latency MM1Queue_a033_s075
2025-09-12 06:57:52,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 18 hours, 48 minutes, 21 seconds)
2025-09-12 07:11:23,144 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:11:23,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:11:47,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 475.64438 ± 80.746
2025-09-12 07:11:47,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [530.4938, 358.98083, 423.29355, 591.9155, 458.52078, 407.5597, 487.79926, 368.80936, 560.80273, 568.26794]
2025-09-12 07:11:47,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 67.0, 78.0, 110.0, 85.0, 78.0, 87.0, 69.0, 106.0, 106.0]
2025-09-12 07:11:47,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 18 hours, 35 minutes, 4 seconds)
2025-09-12 07:25:18,868 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:25:18,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:25:53,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 611.77307 ± 97.665
2025-09-12 07:25:53,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [581.6051, 430.24167, 589.4366, 793.93146, 554.3656, 602.1825, 663.77374, 607.9422, 745.1948, 549.0566]
2025-09-12 07:25:53,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 81.0, 112.0, 171.0, 119.0, 118.0, 140.0, 114.0, 142.0, 101.0]
2025-09-12 07:25:53,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (611.77) for latency MM1Queue_a033_s075
2025-09-12 07:25:53,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 18 hours, 24 minutes, 45 seconds)
2025-09-12 07:39:30,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:39:30,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:39:53,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 434.41376 ± 60.206
2025-09-12 07:39:53,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [363.28174, 523.2208, 446.06982, 534.09125, 343.37396, 395.7058, 443.30942, 395.72717, 423.20825, 476.14963]
2025-09-12 07:39:53,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 97.0, 80.0, 96.0, 68.0, 73.0, 79.0, 72.0, 81.0, 88.0]
2025-09-12 07:39:53,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 18 hours, 8 minutes, 54 seconds)
2025-09-12 07:53:29,933 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:53:29,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:53:59,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 523.31311 ± 117.818
2025-09-12 07:53:59,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [682.4826, 545.87476, 652.14484, 574.1479, 460.4325, 336.1909, 630.40594, 536.73224, 317.14725, 497.57202]
2025-09-12 07:53:59,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 111.0, 140.0, 106.0, 98.0, 63.0, 117.0, 107.0, 68.0, 92.0]
2025-09-12 07:53:59,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 17 hours, 59 minutes, 55 seconds)
2025-09-12 08:07:29,063 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:07:29,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:07:58,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 551.09485 ± 102.411
2025-09-12 08:07:58,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [730.5171, 570.93604, 624.1017, 672.1868, 561.57074, 497.07214, 433.3045, 444.36343, 398.56488, 578.3311]
2025-09-12 08:07:58,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 116.0, 115.0, 135.0, 106.0, 90.0, 78.0, 84.0, 73.0, 106.0]
2025-09-12 08:07:58,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 17 hours, 45 minutes, 27 seconds)
2025-09-12 08:21:35,388 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:21:35,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:22:01,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 496.98761 ± 120.836
2025-09-12 08:22:01,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [295.7116, 681.2824, 585.7863, 573.21356, 670.70087, 464.16995, 455.9333, 430.19348, 451.4166, 361.46848]
2025-09-12 08:22:01,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 129.0, 109.0, 118.0, 122.0, 82.0, 89.0, 82.0, 86.0, 67.0]
2025-09-12 08:22:02,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 17 hours, 33 minutes, 33 seconds)
2025-09-12 08:35:34,149 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:35:34,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:36:01,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 490.08862 ± 99.293
2025-09-12 08:36:01,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [616.78314, 563.87915, 372.04944, 594.5405, 466.41586, 450.75214, 594.6015, 406.39398, 313.462, 522.00836]
2025-09-12 08:36:01,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 108.0, 68.0, 113.0, 91.0, 98.0, 110.0, 89.0, 58.0, 99.0]
2025-09-12 08:36:01,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 17 hours, 17 minutes, 51 seconds)
2025-09-12 08:49:40,849 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:49:40,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:50:06,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 500.37451 ± 173.019
2025-09-12 08:50:06,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [476.14532, 193.4097, 376.12708, 330.71725, 758.78864, 505.16214, 647.7718, 460.00668, 765.0765, 490.54022]
2025-09-12 08:50:06,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 38.0, 70.0, 60.0, 140.0, 92.0, 119.0, 84.0, 146.0, 90.0]
2025-09-12 08:50:06,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 17 hours, 5 minutes, 20 seconds)
2025-09-12 09:03:39,758 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:03:39,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:04:08,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 541.97180 ± 133.727
2025-09-12 09:04:08,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [470.6007, 717.73944, 622.255, 539.4768, 259.90973, 673.5404, 376.83508, 521.6438, 621.5324, 616.1847]
2025-09-12 09:04:08,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 139.0, 114.0, 109.0, 50.0, 121.0, 70.0, 94.0, 113.0, 119.0]
2025-09-12 09:04:08,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 16 hours, 50 minutes, 12 seconds)
2025-09-12 09:17:24,380 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:17:24,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:17:59,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 663.03625 ± 165.772
2025-09-12 09:17:59,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [450.4591, 875.2702, 875.0652, 482.9442, 686.01306, 789.38513, 373.4648, 693.3992, 739.9444, 664.417]
2025-09-12 09:17:59,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 168.0, 164.0, 89.0, 126.0, 148.0, 80.0, 132.0, 144.0, 124.0]
2025-09-12 09:17:59,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (663.04) for latency MM1Queue_a033_s075
2025-09-12 09:17:59,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 16 hours, 34 minutes, 18 seconds)
2025-09-12 09:31:21,228 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:31:21,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:31:51,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 580.75977 ± 154.873
2025-09-12 09:31:51,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [669.79364, 932.3098, 438.76443, 741.12286, 434.35846, 600.9938, 529.2995, 461.92627, 426.54227, 572.4862]
2025-09-12 09:31:51,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 176.0, 89.0, 136.0, 80.0, 112.0, 97.0, 84.0, 78.0, 108.0]
2025-09-12 09:31:51,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 16 hours, 17 minutes, 30 seconds)
2025-09-12 09:45:17,673 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:45:17,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:45:49,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 582.65002 ± 186.566
2025-09-12 09:45:49,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [845.12036, 458.03168, 435.56458, 425.73776, 454.86673, 824.2731, 902.90125, 408.09512, 587.3977, 484.5123]
2025-09-12 09:45:49,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 102.0, 82.0, 80.0, 87.0, 153.0, 174.0, 88.0, 124.0, 99.0]
2025-09-12 09:45:49,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 16 hours, 3 minutes, 25 seconds)
2025-09-12 09:59:06,439 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:59:06,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:59:40,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 628.50854 ± 223.074
2025-09-12 09:59:40,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1186.4089, 706.3632, 803.18933, 693.79834, 455.08853, 407.58588, 497.7796, 444.47638, 575.28894, 515.1061]
2025-09-12 09:59:40,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [244.0, 152.0, 156.0, 131.0, 83.0, 73.0, 90.0, 99.0, 106.0, 91.0]
2025-09-12 09:59:40,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 15 hours, 45 minutes, 58 seconds)
2025-09-12 10:13:11,601 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:13:11,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:13:38,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 523.06543 ± 133.064
2025-09-12 10:13:38,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [465.99182, 538.42303, 574.9615, 460.45425, 525.46173, 384.4263, 435.95636, 890.43097, 455.05014, 499.49823]
2025-09-12 10:13:38,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 115.0, 106.0, 85.0, 98.0, 76.0, 80.0, 167.0, 84.0, 93.0]
2025-09-12 10:13:39,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 15 hours, 31 minutes, 28 seconds)
2025-09-12 10:26:59,184 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:26:59,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:27:30,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 582.60828 ± 162.045
2025-09-12 10:27:30,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [510.32156, 725.69214, 450.83063, 450.71445, 583.6956, 413.93, 563.4723, 603.12024, 529.33014, 994.9759]
2025-09-12 10:27:30,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 136.0, 87.0, 95.0, 119.0, 77.0, 105.0, 108.0, 96.0, 184.0]
2025-09-12 10:27:30,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 15 hours, 17 minutes, 37 seconds)
2025-09-12 10:41:00,969 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:41:00,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:41:35,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 617.14886 ± 147.926
2025-09-12 10:41:35,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [982.7238, 458.48947, 521.01086, 461.70053, 575.0429, 538.1741, 622.4945, 676.4191, 738.1776, 597.25543]
2025-09-12 10:41:35,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [190.0, 89.0, 105.0, 91.0, 119.0, 104.0, 117.0, 126.0, 141.0, 124.0]
2025-09-12 10:41:35,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 15 hours, 6 minutes, 36 seconds)
2025-09-12 10:55:11,740 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:55:11,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:55:48,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 681.48547 ± 154.354
2025-09-12 10:55:48,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [713.0536, 649.1966, 512.0546, 773.7328, 703.4926, 613.49243, 483.08896, 516.7313, 858.7551, 991.25714]
2025-09-12 10:55:48,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 137.0, 94.0, 141.0, 149.0, 117.0, 93.0, 95.0, 172.0, 181.0]
2025-09-12 10:55:48,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (681.49) for latency MM1Queue_a033_s075
2025-09-12 10:55:48,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 14 hours, 55 minutes, 46 seconds)
2025-09-12 11:09:20,681 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:09:20,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:10:03,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 782.56897 ± 234.047
2025-09-12 11:10:03,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1070.7341, 819.2885, 673.6263, 892.68304, 516.9445, 484.44122, 916.29156, 1198.2943, 484.10928, 769.27625]
2025-09-12 11:10:03,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [198.0, 154.0, 128.0, 167.0, 97.0, 103.0, 171.0, 237.0, 89.0, 159.0]
2025-09-12 11:10:03,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (782.57) for latency MM1Queue_a033_s075
2025-09-12 11:10:03,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 14 hours, 46 minutes, 46 seconds)
2025-09-12 11:23:39,559 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:23:39,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:24:23,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 826.74670 ± 190.148
2025-09-12 11:24:23,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [760.05194, 773.1851, 894.3364, 1032.983, 806.41644, 932.3069, 1200.8406, 761.4283, 579.9409, 525.9769]
2025-09-12 11:24:23,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 144.0, 164.0, 198.0, 152.0, 175.0, 225.0, 143.0, 114.0, 93.0]
2025-09-12 11:24:23,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (826.75) for latency MM1Queue_a033_s075
2025-09-12 11:24:23,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 14 hours, 37 minutes, 10 seconds)
2025-09-12 11:38:00,009 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:38:00,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:38:42,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 770.10468 ± 179.984
2025-09-12 11:38:42,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [698.763, 1024.4222, 808.0802, 742.7332, 1055.9226, 704.582, 384.54263, 734.09265, 863.4347, 684.4734]
2025-09-12 11:38:42,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 203.0, 161.0, 136.0, 208.0, 135.0, 72.0, 135.0, 190.0, 126.0]
2025-09-12 11:38:42,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 14 hours, 28 minutes, 32 seconds)
2025-09-12 11:52:26,615 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:52:26,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:53:09,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 800.61047 ± 226.476
2025-09-12 11:53:09,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [563.3047, 1215.0712, 760.7735, 795.5834, 526.1964, 964.71954, 787.76105, 503.2243, 1104.1522, 785.3179]
2025-09-12 11:53:09,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 222.0, 151.0, 147.0, 96.0, 190.0, 144.0, 105.0, 209.0, 145.0]
2025-09-12 11:53:09,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 14 hours, 18 minutes, 42 seconds)
2025-09-12 12:06:41,813 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:06:41,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:07:18,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 687.40466 ± 190.366
2025-09-12 12:07:18,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [638.40625, 292.88852, 494.7511, 932.921, 748.87494, 913.0748, 806.0356, 765.78253, 523.0488, 758.2639]
2025-09-12 12:07:18,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 55.0, 92.0, 173.0, 153.0, 179.0, 155.0, 138.0, 105.0, 141.0]
2025-09-12 12:07:18,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 3 minutes, 42 seconds)
2025-09-12 12:20:52,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:20:52,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:21:40,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 869.45880 ± 305.878
2025-09-12 12:21:40,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [900.669, 738.88794, 1183.1891, 1554.8162, 691.0719, 1033.3718, 638.13464, 442.84048, 877.0753, 634.5316]
2025-09-12 12:21:40,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 155.0, 226.0, 299.0, 147.0, 196.0, 116.0, 95.0, 169.0, 115.0]
2025-09-12 12:21:40,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (869.46) for latency MM1Queue_a033_s075
2025-09-12 12:21:40,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 13 hours, 50 minutes, 47 seconds)
2025-09-12 12:35:22,139 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:35:22,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:36:04,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 765.12183 ± 291.534
2025-09-12 12:36:04,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [348.4033, 1037.7014, 429.212, 881.85443, 1402.2382, 735.0897, 609.15796, 898.2881, 687.9077, 621.3654]
2025-09-12 12:36:04,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 204.0, 81.0, 163.0, 268.0, 142.0, 133.0, 183.0, 126.0, 121.0]
2025-09-12 12:36:04,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 13 hours, 37 minutes, 9 seconds)
2025-09-12 12:49:33,633 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:49:33,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:50:14,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 753.96014 ± 231.021
2025-09-12 12:50:14,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [749.6242, 822.8598, 1277.8679, 374.5675, 692.9081, 718.6748, 563.48114, 627.10925, 983.2358, 729.2731]
2025-09-12 12:50:14,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 163.0, 256.0, 69.0, 140.0, 152.0, 107.0, 113.0, 190.0, 135.0]
2025-09-12 12:50:14,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 13 hours, 21 minutes, 18 seconds)
2025-09-12 13:03:56,855 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:03:56,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:04:42,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 857.38818 ± 410.484
2025-09-12 13:04:42,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1840.5228, 1244.0092, 997.3667, 541.8745, 395.39923, 869.2434, 668.51886, 615.5361, 916.2985, 485.11267]
2025-09-12 13:04:42,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [349.0, 233.0, 190.0, 99.0, 71.0, 161.0, 139.0, 128.0, 168.0, 89.0]
2025-09-12 13:04:42,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 13 hours, 7 minutes, 7 seconds)
2025-09-12 13:18:13,613 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:18:13,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:18:56,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 803.73279 ± 239.789
2025-09-12 13:18:56,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1225.4149, 523.2605, 720.8076, 922.059, 456.1378, 539.1609, 1077.4834, 980.94666, 782.8736, 809.18353]
2025-09-12 13:18:56,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [224.0, 94.0, 135.0, 183.0, 90.0, 98.0, 197.0, 187.0, 147.0, 152.0]
2025-09-12 13:18:56,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 53 minutes, 33 seconds)
2025-09-12 13:32:43,021 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:32:43,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:33:36,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 997.96106 ± 274.034
2025-09-12 13:33:36,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1032.9679, 988.98145, 912.4376, 1672.418, 986.8142, 972.7867, 544.892, 959.73944, 759.189, 1149.3835]
2025-09-12 13:33:36,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [198.0, 188.0, 173.0, 315.0, 186.0, 183.0, 100.0, 188.0, 143.0, 220.0]
2025-09-12 13:33:36,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (997.96) for latency MM1Queue_a033_s075
2025-09-12 13:33:36,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 12 hours, 42 minutes, 27 seconds)
2025-09-12 13:47:04,026 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:47:04,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:47:47,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 827.23303 ± 286.848
2025-09-12 13:47:47,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [390.83737, 634.07275, 497.40085, 864.77716, 1165.7205, 1013.54083, 896.2436, 588.2348, 1348.3801, 873.12213]
2025-09-12 13:47:47,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 120.0, 92.0, 163.0, 225.0, 189.0, 158.0, 128.0, 257.0, 172.0]
2025-09-12 13:47:48,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 12 hours, 26 minutes)
2025-09-12 14:01:27,902 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:01:27,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:02:16,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 886.21191 ± 253.495
2025-09-12 14:02:16,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1057.8375, 618.7162, 1088.926, 794.8673, 834.11896, 432.36783, 924.9632, 1355.3231, 1053.201, 701.7977]
2025-09-12 14:02:16,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [199.0, 115.0, 220.0, 151.0, 163.0, 87.0, 174.0, 268.0, 195.0, 130.0]
2025-09-12 14:02:16,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 14 minutes, 41 seconds)
2025-09-12 14:15:48,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:15:48,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:16:30,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 793.20093 ± 202.618
2025-09-12 14:16:30,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [653.77057, 1093.9344, 826.5101, 346.79926, 1057.0646, 678.70215, 787.40485, 751.53284, 915.7785, 820.51215]
2025-09-12 14:16:30,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 207.0, 147.0, 63.0, 198.0, 119.0, 145.0, 159.0, 193.0, 155.0]
2025-09-12 14:16:30,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 58 minutes, 3 seconds)
2025-09-12 14:30:06,427 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:30:06,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:30:50,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 812.91077 ± 214.761
2025-09-12 14:30:50,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [519.78516, 884.5114, 819.4608, 738.3648, 1270.3187, 876.5467, 812.87286, 456.27548, 785.2591, 965.7124]
2025-09-12 14:30:50,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 158.0, 157.0, 139.0, 244.0, 162.0, 165.0, 99.0, 146.0, 188.0]
2025-09-12 14:30:50,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 11 hours, 44 minutes, 39 seconds)
2025-09-12 14:44:38,428 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:44:38,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:45:19,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 735.10779 ± 109.584
2025-09-12 14:45:19,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [627.1204, 970.59717, 731.62, 689.7674, 721.4563, 550.3372, 825.0343, 769.0035, 793.6168, 672.52515]
2025-09-12 14:45:19,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 191.0, 138.0, 125.0, 146.0, 108.0, 173.0, 141.0, 168.0, 141.0]
2025-09-12 14:45:19,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 11 hours, 28 minutes, 32 seconds)
2025-09-12 14:58:57,453 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:58:57,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:59:58,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1156.75146 ± 347.470
2025-09-12 14:59:58,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [500.59598, 1277.1819, 746.08435, 1064.6533, 1467.736, 940.8027, 1644.2728, 1079.2576, 1590.6138, 1256.3171]
2025-09-12 14:59:58,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 238.0, 135.0, 209.0, 278.0, 178.0, 308.0, 205.0, 314.0, 243.0]
2025-09-12 14:59:58,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (1156.75) for latency MM1Queue_a033_s075
2025-09-12 14:59:59,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 11 hours, 18 minutes, 31 seconds)
2025-09-12 15:13:32,749 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:13:32,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:14:22,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 916.67834 ± 226.161
2025-09-12 15:14:22,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [790.43976, 653.37537, 1380.747, 1056.5544, 992.2378, 575.18134, 920.3734, 719.42365, 1041.6805, 1036.7693]
2025-09-12 15:14:22,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 121.0, 267.0, 202.0, 189.0, 106.0, 187.0, 137.0, 202.0, 206.0]
2025-09-12 15:14:22,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 3 minutes, 21 seconds)
2025-09-12 15:27:57,516 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:27:57,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:28:41,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 811.99854 ± 239.105
2025-09-12 15:28:41,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [843.55756, 412.36334, 870.02185, 614.8947, 964.24915, 544.7861, 735.405, 1052.0884, 811.59937, 1271.0193]
2025-09-12 15:28:41,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 94.0, 162.0, 124.0, 180.0, 115.0, 155.0, 197.0, 147.0, 236.0]
2025-09-12 15:28:41,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 10 hours, 49 minutes, 34 seconds)
2025-09-12 15:42:25,555 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:42:25,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:43:06,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 755.44159 ± 189.744
2025-09-12 15:43:06,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [610.3898, 482.78265, 827.5481, 851.05505, 711.9011, 396.88037, 983.9632, 884.3089, 964.3922, 841.195]
2025-09-12 15:43:06,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 101.0, 156.0, 162.0, 158.0, 73.0, 185.0, 167.0, 190.0, 161.0]
2025-09-12 15:43:06,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 35 minutes, 55 seconds)
2025-09-12 15:56:41,021 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:56:41,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:57:31,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 908.90393 ± 167.027
2025-09-12 15:57:31,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [918.79596, 757.3446, 937.8611, 634.85114, 947.60724, 1054.1392, 1078.4097, 753.70465, 794.8482, 1211.4775]
2025-09-12 15:57:31,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 146.0, 180.0, 128.0, 195.0, 210.0, 207.0, 145.0, 156.0, 234.0]
2025-09-12 15:57:31,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 20 minutes, 56 seconds)
2025-09-12 16:11:18,331 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:11:18,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:12:10,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 977.91394 ± 268.568
2025-09-12 16:12:10,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [524.978, 903.68256, 818.78625, 730.4486, 1408.962, 1209.3007, 1073.7195, 1220.898, 1183.1686, 705.1952]
2025-09-12 16:12:10,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 172.0, 162.0, 139.0, 267.0, 225.0, 204.0, 232.0, 220.0, 135.0]
2025-09-12 16:12:10,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 6 minutes, 22 seconds)
2025-09-12 16:25:36,183 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:25:36,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:26:31,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1035.41724 ± 380.506
2025-09-12 16:26:31,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [926.54205, 510.4727, 615.7389, 1781.892, 939.6432, 797.2824, 831.6132, 1252.5936, 1543.5906, 1154.805]
2025-09-12 16:26:31,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 109.0, 125.0, 341.0, 189.0, 152.0, 152.0, 232.0, 296.0, 213.0]
2025-09-12 16:26:32,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 51 minutes, 38 seconds)
2025-09-12 16:40:27,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:40:27,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:41:25,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1076.09546 ± 361.400
2025-09-12 16:41:25,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [2030.6123, 956.6106, 1218.7008, 945.2443, 760.72894, 774.17834, 1230.7157, 1055.1676, 717.92847, 1071.0669]
2025-09-12 16:41:25,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [386.0, 194.0, 229.0, 181.0, 151.0, 150.0, 227.0, 206.0, 137.0, 204.0]
2025-09-12 16:41:25,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 41 minutes, 49 seconds)
2025-09-12 16:54:50,939 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:54:50,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:55:36,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 839.32098 ± 220.730
2025-09-12 16:55:36,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [705.6818, 891.648, 958.63885, 689.8614, 618.3398, 618.3978, 604.3489, 1147.3921, 896.4728, 1262.4286]
2025-09-12 16:55:36,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 170.0, 184.0, 146.0, 115.0, 130.0, 128.0, 219.0, 167.0, 230.0]
2025-09-12 16:55:36,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 25 minutes, 28 seconds)
2025-09-12 17:09:26,052 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:09:26,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:10:19,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 982.48065 ± 338.466
2025-09-12 17:10:19,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [890.3764, 1211.2142, 876.5532, 731.2138, 1303.7198, 724.2733, 687.89026, 1703.4197, 533.5084, 1162.6367]
2025-09-12 17:10:19,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 235.0, 171.0, 137.0, 246.0, 142.0, 137.0, 327.0, 117.0, 227.0]
2025-09-12 17:10:19,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 13 minutes, 14 seconds)
2025-09-12 17:23:35,340 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:23:35,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:24:28,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 983.60840 ± 288.375
2025-09-12 17:24:28,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1003.78894, 1322.5134, 1170.1049, 1443.2505, 1054.1741, 869.5714, 482.49573, 884.22327, 1054.0383, 551.924]
2025-09-12 17:24:28,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [191.0, 263.0, 226.0, 273.0, 196.0, 169.0, 96.0, 165.0, 205.0, 108.0]
2025-09-12 17:24:28,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 55 minutes, 2 seconds)
2025-09-12 17:37:59,550 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:37:59,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:38:46,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 866.10773 ± 365.879
2025-09-12 17:38:46,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1496.3796, 444.75964, 574.3295, 1444.509, 595.14355, 1041.1879, 895.69385, 400.8736, 778.8083, 989.39166]
2025-09-12 17:38:46,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [296.0, 80.0, 103.0, 272.0, 110.0, 200.0, 178.0, 75.0, 140.0, 203.0]
2025-09-12 17:38:46,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 40 minutes, 9 seconds)
2025-09-12 17:52:34,081 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:52:34,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:53:34,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1144.35034 ± 353.363
2025-09-12 17:53:34,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [811.41956, 1541.6202, 1113.6263, 1774.4236, 1000.58014, 862.87665, 572.56824, 991.2143, 1469.5685, 1305.6051]
2025-09-12 17:53:34,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 289.0, 204.0, 335.0, 191.0, 160.0, 110.0, 183.0, 274.0, 243.0]
2025-09-12 17:53:35,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 25 minutes, 8 seconds)
2025-09-12 18:07:16,492 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:07:16,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:08:11,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 947.03955 ± 426.362
2025-09-12 18:08:11,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [766.9935, 472.32117, 837.5709, 1655.9092, 738.5138, 798.9043, 1545.8428, 692.42554, 454.42657, 1507.4875]
2025-09-12 18:08:11,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 88.0, 169.0, 318.0, 135.0, 157.0, 311.0, 148.0, 99.0, 311.0]
2025-09-12 18:08:11,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 13 minutes, 34 seconds)
2025-09-12 18:21:56,037 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:21:56,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:22:55,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1066.88000 ± 398.925
2025-09-12 18:22:55,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1199.726, 1091.9415, 1223.0286, 906.5917, 1081.238, 1277.8127, 648.9271, 1965.4424, 385.30426, 888.78705]
2025-09-12 18:22:55,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [222.0, 203.0, 223.0, 198.0, 220.0, 250.0, 141.0, 362.0, 86.0, 175.0]
2025-09-12 18:22:55,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 59 minutes, 11 seconds)
2025-09-12 18:36:37,643 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:36:37,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:37:26,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 894.60876 ± 367.967
2025-09-12 18:37:26,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [682.2376, 607.4851, 629.9286, 1521.9789, 931.9105, 958.544, 519.321, 1336.6324, 420.4307, 1337.619]
2025-09-12 18:37:26,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 120.0, 115.0, 296.0, 178.0, 178.0, 111.0, 251.0, 78.0, 255.0]
2025-09-12 18:37:26,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 46 minutes, 57 seconds)
2025-09-12 18:51:31,457 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:51:31,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:52:28,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1030.26794 ± 324.729
2025-09-12 18:52:28,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1246.2533, 1419.7864, 885.2392, 1483.1432, 881.3375, 942.58405, 1441.9253, 574.93805, 820.8408, 606.6319]
2025-09-12 18:52:28,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [248.0, 272.0, 179.0, 289.0, 163.0, 180.0, 269.0, 126.0, 164.0, 114.0]
2025-09-12 18:52:28,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 36 minutes, 54 seconds)
2025-09-12 19:06:09,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:06:09,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:07:18,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1256.40063 ± 377.332
2025-09-12 19:07:18,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1745.177, 1091.2269, 1040.9604, 1696.9021, 1479.7189, 798.92316, 729.6907, 906.33575, 1761.0916, 1313.9792]
2025-09-12 19:07:18,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [335.0, 209.0, 196.0, 325.0, 272.0, 160.0, 155.0, 170.0, 330.0, 247.0]
2025-09-12 19:07:18,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (1256.40) for latency MM1Queue_a033_s075
2025-09-12 19:07:18,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 22 minutes, 18 seconds)
2025-09-12 19:20:49,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:20:49,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:22:01,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1288.63892 ± 389.517
2025-09-12 19:22:01,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1089.5814, 1871.4009, 1569.845, 617.8149, 1141.2004, 1371.6008, 1844.8429, 1382.5865, 781.9077, 1215.6075]
2025-09-12 19:22:01,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 356.0, 304.0, 126.0, 218.0, 257.0, 350.0, 258.0, 155.0, 230.0]
2025-09-12 19:22:01,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (1288.64) for latency MM1Queue_a033_s075
2025-09-12 19:22:01,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 8 minutes, 12 seconds)
2025-09-12 19:35:59,076 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:35:59,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:37:06,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1231.87683 ± 262.590
2025-09-12 19:37:06,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1321.8215, 1353.1232, 905.7488, 1191.0853, 1028.238, 1422.6279, 1357.4711, 1622.7242, 1412.8157, 703.11316]
2025-09-12 19:37:06,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [260.0, 249.0, 169.0, 216.0, 194.0, 284.0, 250.0, 307.0, 264.0, 127.0]
2025-09-12 19:37:06,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 55 minutes, 24 seconds)
2025-09-12 19:51:01,593 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:51:01,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:52:02,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1105.28247 ± 250.052
2025-09-12 19:52:02,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [992.5095, 1447.5433, 1040.5137, 604.5543, 1158.966, 1308.1661, 1461.0018, 1038.3656, 1148.0852, 853.1191]
2025-09-12 19:52:02,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [188.0, 272.0, 206.0, 125.0, 219.0, 257.0, 264.0, 210.0, 221.0, 153.0]
2025-09-12 19:52:02,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 42 minutes, 51 seconds)
2025-09-12 20:05:37,543 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:05:37,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:06:25,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 891.01141 ± 273.053
2025-09-12 20:06:25,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [967.1591, 411.42227, 1163.9678, 965.2315, 1104.4037, 1192.1522, 641.5321, 722.7626, 1188.679, 552.8033]
2025-09-12 20:06:25,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [188.0, 91.0, 223.0, 181.0, 200.0, 218.0, 117.0, 133.0, 239.0, 100.0]
2025-09-12 20:06:25,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 24 minutes, 32 seconds)
2025-09-12 20:20:23,557 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:20:23,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:21:38,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1350.96021 ± 550.341
2025-09-12 20:21:38,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1489.289, 889.60974, 970.5267, 1624.836, 871.5449, 1177.5497, 2791.4827, 1095.2423, 1007.0287, 1592.491]
2025-09-12 20:21:38,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [285.0, 164.0, 198.0, 305.0, 166.0, 229.0, 544.0, 210.0, 191.0, 305.0]
2025-09-12 20:21:38,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (1350.96) for latency MM1Queue_a033_s075
2025-09-12 20:21:38,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 11 minutes, 43 seconds)
2025-09-12 20:35:34,300 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:35:34,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:36:30,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1018.05487 ± 371.066
2025-09-12 20:36:30,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [949.11676, 1063.5483, 1116.2991, 556.1775, 1107.2844, 596.79, 1263.4856, 1849.8383, 569.3275, 1108.6815]
2025-09-12 20:36:30,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [191.0, 201.0, 208.0, 121.0, 218.0, 115.0, 246.0, 348.0, 122.0, 211.0]
2025-09-12 20:36:30,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 57 minutes, 32 seconds)
2025-09-12 20:50:23,966 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:50:23,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:51:23,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1080.06750 ± 336.683
2025-09-12 20:51:23,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1045.2987, 1377.4121, 987.1865, 1359.7053, 1005.25214, 1495.3796, 624.8257, 1461.0441, 430.20636, 1014.36566]
2025-09-12 20:51:23,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [191.0, 265.0, 182.0, 258.0, 179.0, 281.0, 128.0, 290.0, 80.0, 188.0]
2025-09-12 20:51:23,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 41 minutes, 41 seconds)
2025-09-12 21:04:37,726 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:04:37,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:05:28,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 958.71875 ± 334.776
2025-09-12 21:05:28,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [944.6346, 1678.8961, 596.4264, 1184.6467, 781.43115, 1188.9583, 639.7582, 979.3364, 496.2171, 1096.8823]
2025-09-12 21:05:28,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [178.0, 318.0, 114.0, 222.0, 160.0, 219.0, 129.0, 187.0, 90.0, 210.0]
2025-09-12 21:05:28,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 23 minutes, 8 seconds)
2025-09-12 21:19:10,713 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:19:10,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:19:58,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 888.27893 ± 297.740
2025-09-12 21:19:58,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [781.61145, 536.09174, 1078.7864, 498.85986, 665.6735, 1168.6512, 1257.4071, 1200.1345, 1162.576, 532.9971]
2025-09-12 21:19:58,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 92.0, 209.0, 109.0, 128.0, 220.0, 241.0, 248.0, 217.0, 105.0]
2025-09-12 21:19:58,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 8 minutes, 55 seconds)
2025-09-12 21:33:23,962 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:33:23,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:34:34,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1292.52869 ± 603.096
2025-09-12 21:34:34,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1623.0427, 773.9987, 1191.0758, 2800.0923, 1283.424, 1660.2832, 942.6853, 1259.9685, 650.1202, 740.59686]
2025-09-12 21:34:34,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [327.0, 144.0, 220.0, 544.0, 246.0, 307.0, 181.0, 236.0, 134.0, 138.0]
2025-09-12 21:34:34,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 51 minutes, 43 seconds)
2025-09-12 21:48:21,240 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:48:21,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:49:26,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1241.18677 ± 314.916
2025-09-12 21:49:26,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1163.8304, 948.3389, 1310.6476, 1218.8763, 1246.302, 2061.4204, 1192.9882, 778.1644, 1195.2001, 1296.0981]
2025-09-12 21:49:26,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [211.0, 194.0, 257.0, 225.0, 238.0, 383.0, 234.0, 141.0, 215.0, 238.0]
2025-09-12 21:49:26,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 37 minutes, 8 seconds)
2025-09-12 22:02:57,087 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:02:57,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:04:03,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1247.41736 ± 490.200
2025-09-12 22:04:03,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1088.4896, 2247.48, 1024.0634, 1038.9082, 732.7167, 1647.3323, 1842.3593, 1342.1403, 677.37476, 833.3088]
2025-09-12 22:04:03,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [207.0, 416.0, 197.0, 200.0, 136.0, 305.0, 343.0, 242.0, 126.0, 156.0]
2025-09-12 22:04:03,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 21 minutes, 36 seconds)
2025-09-12 22:17:31,701 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:17:31,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:18:42,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1326.74329 ± 254.559
2025-09-12 22:18:42,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1110.6387, 1373.6228, 1586.9462, 1057.3387, 1838.6594, 1394.0875, 983.5801, 1330.4921, 1105.5537, 1486.5137]
2025-09-12 22:18:42,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [215.0, 269.0, 298.0, 204.0, 355.0, 262.0, 188.0, 262.0, 207.0, 274.0]
2025-09-12 22:18:42,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 8 minutes, 59 seconds)
2025-09-12 22:32:21,378 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:32:21,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:33:17,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1064.62170 ± 286.996
2025-09-12 22:33:17,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [948.2803, 615.8573, 732.0718, 1485.0201, 1374.0037, 1120.8282, 882.16895, 913.34576, 1104.0498, 1470.591]
2025-09-12 22:33:17,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 112.0, 136.0, 275.0, 269.0, 213.0, 164.0, 172.0, 213.0, 278.0]
2025-09-12 22:33:17,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 54 minutes, 37 seconds)
2025-09-12 22:46:45,764 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:46:45,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:48:10,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1585.00500 ± 538.016
2025-09-12 22:48:10,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [2208.569, 1263.1062, 1269.5934, 1538.9038, 1303.8046, 1917.5272, 735.2784, 1312.0372, 2731.3586, 1569.8713]
2025-09-12 22:48:10,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [424.0, 243.0, 234.0, 289.0, 242.0, 355.0, 145.0, 252.0, 503.0, 308.0]
2025-09-12 22:48:10,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1226 [INFO]: New best (1585.01) for latency MM1Queue_a033_s075
2025-09-12 22:48:10,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 40 minutes, 46 seconds)
2025-09-12 23:01:40,462 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:01:40,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:02:36,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1099.60620 ± 248.872
2025-09-12 23:02:36,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [799.25726, 1075.8868, 1293.6803, 1361.5111, 1562.4806, 786.038, 801.7256, 1166.2788, 987.02985, 1162.1726]
2025-09-12 23:02:36,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 199.0, 241.0, 244.0, 285.0, 158.0, 144.0, 214.0, 179.0, 217.0]
2025-09-12 23:02:36,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 24 minutes, 53 seconds)
2025-09-12 23:16:41,362 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:16:41,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:17:38,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1057.01831 ± 278.480
2025-09-12 23:17:38,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1058.145, 1272.8295, 1449.8566, 841.55865, 1365.7915, 778.0681, 951.26807, 902.3065, 583.6942, 1366.6637]
2025-09-12 23:17:38,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [208.0, 238.0, 288.0, 159.0, 261.0, 154.0, 180.0, 172.0, 105.0, 257.0]
2025-09-12 23:17:38,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 11 minutes, 17 seconds)
2025-09-12 23:30:57,570 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:30:57,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:31:45,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 929.70813 ± 258.110
2025-09-12 23:31:45,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [996.4703, 967.22314, 1149.5593, 696.5316, 1124.5966, 1354.4442, 413.11337, 688.1645, 873.1441, 1033.8341]
2025-09-12 23:31:45,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 174.0, 212.0, 130.0, 206.0, 252.0, 85.0, 122.0, 164.0, 208.0]
2025-09-12 23:31:45,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 55 minutes, 19 seconds)
2025-09-12 23:45:16,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:45:16,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:46:24,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1276.13843 ± 451.415
2025-09-12 23:46:24,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1227.9274, 2202.7393, 1598.8156, 817.2693, 1026.2223, 573.50684, 1669.9191, 1399.0032, 1333.8715, 912.1101]
2025-09-12 23:46:24,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 407.0, 298.0, 148.0, 188.0, 108.0, 335.0, 272.0, 243.0, 170.0]
2025-09-12 23:46:24,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 40 minutes, 50 seconds)
2025-09-13 00:00:01,818 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:00:01,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:01:00,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1082.64209 ± 313.338
2025-09-13 00:01:00,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [705.7702, 841.05, 1218.1321, 1035.6152, 1200.0747, 961.092, 1882.1138, 855.5312, 920.6744, 1206.367]
2025-09-13 00:01:00,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 162.0, 235.0, 195.0, 229.0, 188.0, 351.0, 158.0, 179.0, 230.0]
2025-09-13 00:01:00,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 25 minutes, 40 seconds)
2025-09-13 00:14:50,626 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:14:50,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:16:03,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1329.73914 ± 448.159
2025-09-13 00:16:03,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1770.4103, 2125.4438, 1318.0638, 1870.5017, 1029.1644, 1052.7806, 938.68475, 757.58734, 888.758, 1545.9974]
2025-09-13 00:16:03,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [336.0, 435.0, 240.0, 359.0, 193.0, 194.0, 193.0, 150.0, 170.0, 289.0]
2025-09-13 00:16:03,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 12 minutes, 11 seconds)
2025-09-13 00:29:15,497 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:29:15,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:30:14,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1126.35754 ± 407.938
2025-09-13 00:30:14,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [950.97314, 1860.5026, 825.7157, 945.8906, 1160.8019, 716.5407, 638.3588, 1190.6045, 1874.2513, 1099.9369]
2025-09-13 00:30:14,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [186.0, 351.0, 158.0, 173.0, 216.0, 132.0, 118.0, 220.0, 355.0, 206.0]
2025-09-13 00:30:14,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 56 minutes, 11 seconds)
2025-09-13 00:44:01,567 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:44:01,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:45:01,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1126.37817 ± 249.140
2025-09-13 00:45:01,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1366.5378, 1546.7285, 733.06683, 1177.1405, 1286.782, 877.4389, 1010.16266, 877.4946, 1025.8212, 1362.6079]
2025-09-13 00:45:01,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [259.0, 306.0, 140.0, 226.0, 242.0, 168.0, 191.0, 164.0, 198.0, 253.0]
2025-09-13 00:45:01,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 42 minutes, 34 seconds)
2025-09-13 00:58:31,779 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:58:31,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:59:44,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1318.79285 ± 494.780
2025-09-13 00:59:44,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1364.2853, 1493.8291, 1937.5529, 399.54474, 2110.537, 1746.1823, 1023.1839, 1172.5388, 1084.2993, 855.9753]
2025-09-13 00:59:44,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [262.0, 289.0, 363.0, 83.0, 415.0, 349.0, 200.0, 230.0, 206.0, 160.0]
2025-09-13 00:59:45,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 28 minutes)
2025-09-13 01:13:14,131 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:13:14,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:14:20,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1252.68982 ± 257.267
2025-09-13 01:14:20,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1362.1914, 899.36444, 938.9237, 873.6063, 1404.5642, 1284.2115, 1172.7395, 1632.7799, 1427.8226, 1530.6954]
2025-09-13 01:14:20,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [262.0, 166.0, 170.0, 172.0, 259.0, 258.0, 230.0, 302.0, 271.0, 289.0]
2025-09-13 01:14:20,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 13 minutes, 20 seconds)
2025-09-13 01:27:57,411 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:27:57,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:28:55,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1091.85059 ± 254.207
2025-09-13 01:28:55,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1363.463, 1181.8425, 1003.2845, 768.8771, 1518.215, 791.0027, 972.8384, 1345.9247, 1185.2587, 787.7986]
2025-09-13 01:28:55,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [266.0, 229.0, 187.0, 144.0, 286.0, 159.0, 176.0, 252.0, 219.0, 149.0]
2025-09-13 01:28:55,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 58 minutes, 17 seconds)
2025-09-13 01:42:28,074 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:42:28,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:43:31,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1180.09973 ± 266.985
2025-09-13 01:43:31,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1180.2788, 1518.0729, 963.2678, 863.59595, 808.55896, 1350.3359, 966.3742, 1453.8501, 1116.8713, 1579.792]
2025-09-13 01:43:31,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [230.0, 286.0, 177.0, 165.0, 159.0, 250.0, 185.0, 293.0, 213.0, 289.0]
2025-09-13 01:43:31,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 43 minutes, 57 seconds)
2025-09-13 01:57:16,633 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:57:16,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:58:14,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1091.91809 ± 489.770
2025-09-13 01:58:14,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1170.3817, 496.3253, 2250.0457, 1490.1715, 924.043, 851.43646, 1179.869, 478.70895, 1216.0776, 862.12]
2025-09-13 01:58:14,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 89.0, 411.0, 282.0, 174.0, 168.0, 228.0, 94.0, 228.0, 172.0]
2025-09-13 01:58:14,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 29 minutes, 17 seconds)
2025-09-13 02:11:44,404 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:11:44,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:12:51,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1261.89038 ± 254.668
2025-09-13 02:12:51,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [1728.2059, 1255.7539, 1249.9851, 1064.6188, 1267.9614, 1292.9835, 732.2697, 1222.9446, 1588.7476, 1215.4329]
2025-09-13 02:12:51,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [321.0, 240.0, 227.0, 208.0, 247.0, 248.0, 134.0, 230.0, 300.0, 234.0]
2025-09-13 02:12:51,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 14 minutes, 37 seconds)
2025-09-13 02:26:26,454 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:26:26,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:27:30,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 1137.52441 ± 473.249
2025-09-13 02:27:30,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [632.73956, 2098.0974, 601.431, 1433.6414, 1253.2395, 1150.1627, 772.13965, 567.67865, 1290.5875, 1575.5265]
2025-09-13 02:27:30,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 397.0, 115.0, 272.0, 241.0, 220.0, 148.0, 100.0, 244.0, 306.0]
2025-09-13 02:27:30,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-humanoid):1251 [DEBUG]: Training session finished
