2025-09-12 02:18:25,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc20-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-12 02:18:25,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc20-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-12 02:18:25,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x149eafeca590>}
2025-09-12 02:18:25,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1111 [DEBUG]: using device: cuda
2025-09-12 02:18:25,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1133 [INFO]: Creating new trainer
2025-09-12 02:18:25,200 baseline-mbpac-noiseperc20-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-12 02:18:25,200 baseline-mbpac-noiseperc20-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 02:18:25,211 baseline-mbpac-noiseperc20-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-12 02:18:26,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1194 [DEBUG]: Starting training session...
2025-09-12 02:18:26,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 1/100
2025-09-12 02:30:23,773 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:30:23,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:30:34,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 186.39897 ± 55.407
2025-09-12 02:30:34,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [153.14677, 220.11703, 278.771, 81.367874, 250.16037, 176.46582, 176.01628, 183.93062, 126.18163, 217.83224]
2025-09-12 02:30:34,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 43.0, 56.0, 17.0, 52.0, 37.0, 33.0, 36.0, 26.0, 44.0]
2025-09-12 02:30:34,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (186.40) for latency MM1Queue_a033_s075
2025-09-12 02:30:34,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 20 hours, 49 seconds)
2025-09-12 02:44:00,098 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:44:00,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:44:23,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 410.82919 ± 82.862
2025-09-12 02:44:23,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [431.85803, 533.1096, 387.60406, 340.36826, 407.1084, 371.765, 495.37924, 404.8908, 501.12195, 235.08638]
2025-09-12 02:44:23,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 101.0, 73.0, 72.0, 75.0, 80.0, 91.0, 88.0, 96.0, 52.0]
2025-09-12 02:44:23,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (410.83) for latency MM1Queue_a033_s075
2025-09-12 02:44:23,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 21 hours, 11 minutes, 21 seconds)
2025-09-12 02:57:55,396 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:57:55,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:58:12,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 336.41077 ± 69.388
2025-09-12 02:58:12,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [281.35068, 279.84003, 368.87897, 381.81003, 339.78683, 249.1301, 359.44714, 279.6257, 322.81104, 501.42734]
2025-09-12 02:58:12,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [51.0, 51.0, 68.0, 70.0, 63.0, 46.0, 66.0, 51.0, 59.0, 99.0]
2025-09-12 02:58:12,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 21 hours, 26 minutes)
2025-09-12 03:11:37,357 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:11:37,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:12:01,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 431.74176 ± 125.986
2025-09-12 03:12:01,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [552.62756, 325.56497, 699.92004, 372.35153, 267.1547, 292.7428, 460.4828, 372.7196, 503.96045, 469.89288]
2025-09-12 03:12:01,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 60.0, 148.0, 71.0, 55.0, 63.0, 88.0, 74.0, 96.0, 102.0]
2025-09-12 03:12:01,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (431.74) for latency MM1Queue_a033_s075
2025-09-12 03:12:01,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 21 hours, 26 minutes, 10 seconds)
2025-09-12 03:25:37,185 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:25:37,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:25:58,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 380.99625 ± 62.850
2025-09-12 03:25:58,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [365.12735, 276.95718, 364.9454, 472.2708, 452.61575, 376.3638, 345.12247, 316.85547, 475.82352, 363.88074]
2025-09-12 03:25:58,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 59.0, 75.0, 88.0, 84.0, 74.0, 73.0, 58.0, 100.0, 67.0]
2025-09-12 03:25:58,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 21 hours, 22 minutes, 58 seconds)
2025-09-12 03:39:32,950 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:39:32,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:39:55,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 412.19702 ± 79.415
2025-09-12 03:39:55,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [344.14233, 318.86346, 456.12424, 367.12497, 480.86826, 320.5319, 499.2934, 324.41956, 502.2665, 508.33542]
2025-09-12 03:39:55,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 72.0, 89.0, 69.0, 91.0, 59.0, 95.0, 63.0, 99.0, 109.0]
2025-09-12 03:39:55,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 21 hours, 43 minutes, 56 seconds)
2025-09-12 03:53:27,662 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:53:27,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:53:52,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 466.86093 ± 107.598
2025-09-12 03:53:52,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [482.13623, 522.66296, 544.77673, 368.04932, 524.3464, 386.02258, 333.36258, 453.794, 351.24353, 702.21497]
2025-09-12 03:53:52,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 101.0, 101.0, 69.0, 99.0, 84.0, 72.0, 84.0, 66.0, 138.0]
2025-09-12 03:53:52,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (466.86) for latency MM1Queue_a033_s075
2025-09-12 03:53:52,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 21 hours, 32 minutes, 35 seconds)
2025-09-12 04:07:27,950 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:07:27,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:07:53,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 462.89569 ± 154.970
2025-09-12 04:07:53,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [441.01505, 722.58215, 324.24487, 527.56116, 283.1671, 377.45822, 762.4288, 342.6856, 455.71237, 392.10147]
2025-09-12 04:07:53,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 149.0, 59.0, 102.0, 62.0, 75.0, 156.0, 64.0, 83.0, 72.0]
2025-09-12 04:07:53,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 21 hours, 22 minutes, 7 seconds)
2025-09-12 04:21:27,886 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:21:27,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:21:52,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 450.43903 ± 127.761
2025-09-12 04:21:52,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [298.51498, 389.17578, 436.93283, 404.11203, 444.75638, 700.9293, 283.8649, 658.56744, 451.97015, 435.566]
2025-09-12 04:21:52,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 75.0, 80.0, 76.0, 97.0, 140.0, 59.0, 123.0, 83.0, 81.0]
2025-09-12 04:21:52,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 21 hours, 11 minutes, 11 seconds)
2025-09-12 04:35:26,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:35:26,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:35:51,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 445.39459 ± 130.401
2025-09-12 04:35:51,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [508.95535, 356.953, 370.66724, 405.42407, 371.92236, 328.35364, 446.06525, 368.55966, 501.0803, 795.9651]
2025-09-12 04:35:51,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 80.0, 80.0, 77.0, 67.0, 60.0, 82.0, 67.0, 108.0, 159.0]
2025-09-12 04:35:51,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 20 hours, 58 minutes, 6 seconds)
2025-09-12 04:49:33,311 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:49:33,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:49:58,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 473.56503 ± 84.197
2025-09-12 04:49:58,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [441.95044, 416.56512, 464.977, 523.67773, 419.7271, 405.90506, 518.8699, 645.9801, 340.38116, 557.6169]
2025-09-12 04:49:58,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 80.0, 88.0, 95.0, 79.0, 76.0, 100.0, 119.0, 64.0, 99.0]
2025-09-12 04:49:58,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (473.57) for latency MM1Queue_a033_s075
2025-09-12 04:49:58,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 20 hours, 46 minutes, 43 seconds)
2025-09-12 05:03:36,985 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:03:37,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:04:02,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 478.31659 ± 101.535
2025-09-12 05:04:02,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [541.6398, 366.2457, 376.84845, 391.3263, 375.49045, 599.5407, 604.0721, 433.74884, 457.34464, 636.909]
2025-09-12 05:04:02,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 69.0, 76.0, 73.0, 70.0, 114.0, 111.0, 79.0, 84.0, 121.0]
2025-09-12 05:04:02,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (478.32) for latency MM1Queue_a033_s075
2025-09-12 05:04:02,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 20 hours, 34 minutes, 49 seconds)
2025-09-12 05:17:40,794 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:17:40,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:18:07,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 485.85098 ± 67.281
2025-09-12 05:18:07,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [559.9143, 562.5839, 438.2251, 437.09033, 454.3413, 557.4103, 552.5854, 350.95737, 481.04443, 464.35727]
2025-09-12 05:18:07,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 120.0, 83.0, 88.0, 85.0, 111.0, 109.0, 64.0, 90.0, 92.0]
2025-09-12 05:18:07,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (485.85) for latency MM1Queue_a033_s075
2025-09-12 05:18:07,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 20 hours, 22 minutes, 4 seconds)
2025-09-12 05:31:42,928 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:31:42,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:32:10,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 503.27972 ± 72.690
2025-09-12 05:32:10,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [434.65924, 429.51022, 560.0938, 470.49164, 587.21576, 409.31512, 549.1439, 553.3133, 612.7346, 426.3196]
2025-09-12 05:32:10,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 83.0, 102.0, 100.0, 108.0, 76.0, 117.0, 99.0, 118.0, 80.0]
2025-09-12 05:32:10,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (503.28) for latency MM1Queue_a033_s075
2025-09-12 05:32:10,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 20 hours, 9 minutes, 6 seconds)
2025-09-12 05:45:47,377 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:45:47,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:46:19,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 567.62146 ± 163.231
2025-09-12 05:46:19,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [538.398, 352.81555, 537.72455, 392.64038, 506.43506, 525.65735, 558.26306, 923.79596, 539.05023, 801.4342]
2025-09-12 05:46:19,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 78.0, 101.0, 75.0, 105.0, 105.0, 121.0, 178.0, 98.0, 157.0]
2025-09-12 05:46:19,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (567.62) for latency MM1Queue_a033_s075
2025-09-12 05:46:19,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 19 hours, 57 minutes, 46 seconds)
2025-09-12 05:59:58,492 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:59:58,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:00:27,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 502.46704 ± 91.500
2025-09-12 06:00:27,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [673.9542, 456.53973, 477.91974, 579.8178, 439.5354, 606.7395, 424.80405, 349.65598, 481.90393, 533.8004]
2025-09-12 06:00:27,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 100.0, 103.0, 108.0, 82.0, 118.0, 92.0, 78.0, 100.0, 98.0]
2025-09-12 06:00:27,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 19 hours, 44 minutes, 14 seconds)
2025-09-12 06:14:11,115 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:14:11,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:14:39,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 512.32806 ± 122.497
2025-09-12 06:14:39,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [598.991, 428.76932, 385.95126, 506.92816, 478.6308, 423.21365, 511.2722, 388.50122, 590.2254, 810.79736]
2025-09-12 06:14:39,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 79.0, 76.0, 111.0, 97.0, 91.0, 97.0, 72.0, 113.0, 150.0]
2025-09-12 06:14:39,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 19 hours, 32 minutes, 9 seconds)
2025-09-12 06:28:14,915 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:28:14,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:28:45,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 558.42407 ± 133.298
2025-09-12 06:28:45,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [541.26355, 474.1975, 891.8383, 716.9049, 509.0779, 542.3949, 485.84317, 417.71912, 502.73547, 502.26624]
2025-09-12 06:28:45,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 85.0, 167.0, 138.0, 96.0, 104.0, 94.0, 86.0, 108.0, 92.0]
2025-09-12 06:28:45,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 19 hours, 18 minutes, 10 seconds)
2025-09-12 06:42:24,567 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:42:24,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:42:53,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 501.60968 ± 96.458
2025-09-12 06:42:53,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [391.58145, 532.3283, 562.04675, 583.19904, 584.1478, 586.87225, 412.67657, 430.74423, 607.96924, 324.53064]
2025-09-12 06:42:53,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 115.0, 123.0, 126.0, 113.0, 114.0, 75.0, 81.0, 124.0, 59.0]
2025-09-12 06:42:53,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 19 hours, 5 minutes, 28 seconds)
2025-09-12 06:56:33,083 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:56:33,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:57:00,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 513.84583 ± 79.142
2025-09-12 06:57:00,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [602.3156, 520.67535, 434.52744, 591.52295, 486.545, 657.6179, 387.4804, 443.43286, 508.73206, 505.60907]
2025-09-12 06:57:00,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 97.0, 94.0, 108.0, 90.0, 127.0, 70.0, 82.0, 95.0, 96.0]
2025-09-12 06:57:00,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 18 hours, 51 minutes, 2 seconds)
2025-09-12 07:10:40,213 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:10:40,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:11:08,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 537.52783 ± 81.063
2025-09-12 07:11:08,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [624.36194, 378.0414, 508.28952, 696.6269, 480.9777, 557.6629, 562.63416, 549.1343, 525.29944, 492.25]
2025-09-12 07:11:08,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 69.0, 94.0, 132.0, 88.0, 103.0, 106.0, 101.0, 106.0, 95.0]
2025-09-12 07:11:08,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 18 hours, 36 minutes, 49 seconds)
2025-09-12 07:24:44,757 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:24:44,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:25:14,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 523.39642 ± 92.267
2025-09-12 07:25:14,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [445.11295, 526.0791, 581.5421, 537.7729, 418.32806, 358.5526, 690.50824, 598.0516, 498.95157, 579.0655]
2025-09-12 07:25:14,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 115.0, 108.0, 101.0, 93.0, 71.0, 129.0, 106.0, 101.0, 123.0]
2025-09-12 07:25:14,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 18 hours, 21 minutes, 6 seconds)
2025-09-12 07:39:00,241 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:39:00,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:39:33,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 601.48486 ± 131.213
2025-09-12 07:39:33,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [586.70966, 446.81445, 667.34204, 774.34076, 573.756, 424.26938, 787.33386, 499.26807, 760.3517, 494.66248]
2025-09-12 07:39:33,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 83.0, 124.0, 146.0, 120.0, 94.0, 155.0, 91.0, 141.0, 104.0]
2025-09-12 07:39:33,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (601.48) for latency MM1Queue_a033_s075
2025-09-12 07:39:33,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 18 hours, 10 minutes, 25 seconds)
2025-09-12 07:53:10,966 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:53:10,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:53:41,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 575.64612 ± 141.271
2025-09-12 07:53:41,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [480.69366, 543.38385, 439.64847, 527.4446, 465.02423, 527.7624, 624.2516, 952.36884, 531.2642, 664.6189]
2025-09-12 07:53:41,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 103.0, 83.0, 108.0, 86.0, 103.0, 115.0, 187.0, 98.0, 123.0]
2025-09-12 07:53:41,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 17 hours, 56 minutes, 22 seconds)
2025-09-12 08:07:21,151 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:07:21,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:07:54,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 612.56104 ± 92.024
2025-09-12 08:07:54,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [662.174, 577.0689, 495.63913, 797.1938, 596.8851, 683.4116, 525.5818, 703.00653, 576.4174, 508.2319]
2025-09-12 08:07:54,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 109.0, 92.0, 149.0, 116.0, 129.0, 109.0, 125.0, 113.0, 95.0]
2025-09-12 08:07:54,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (612.56) for latency MM1Queue_a033_s075
2025-09-12 08:07:54,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 17 hours, 43 minutes, 23 seconds)
2025-09-12 08:21:33,428 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:21:33,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:22:01,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 531.82129 ± 136.475
2025-09-12 08:22:01,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [416.81476, 702.78143, 830.88727, 589.7221, 506.52643, 454.96698, 495.0959, 550.2493, 384.5621, 386.6071]
2025-09-12 08:22:01,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 135.0, 161.0, 111.0, 92.0, 84.0, 94.0, 105.0, 70.0, 75.0]
2025-09-12 08:22:01,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 17 hours, 29 minutes, 3 seconds)
2025-09-12 08:35:40,017 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:35:40,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:36:08,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 527.56604 ± 65.173
2025-09-12 08:36:08,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [638.5675, 556.76135, 519.155, 444.16003, 471.09543, 419.6191, 546.1325, 512.6146, 600.2777, 567.2773]
2025-09-12 08:36:08,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 112.0, 100.0, 82.0, 89.0, 80.0, 106.0, 104.0, 110.0, 114.0]
2025-09-12 08:36:08,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 17 hours, 15 minutes, 17 seconds)
2025-09-12 08:49:50,700 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:49:50,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:50:21,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 566.27612 ± 128.009
2025-09-12 08:50:21,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [411.91745, 607.3195, 707.8218, 535.3631, 490.2151, 568.84717, 869.76855, 473.52463, 468.962, 529.02185]
2025-09-12 08:50:21,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 113.0, 137.0, 103.0, 94.0, 111.0, 178.0, 88.0, 86.0, 97.0]
2025-09-12 08:50:21,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 16 hours, 59 minutes, 36 seconds)
2025-09-12 09:03:58,688 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:03:58,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:04:33,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 633.40686 ± 126.975
2025-09-12 09:04:33,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [524.1254, 435.17197, 534.0978, 764.13165, 659.32043, 741.87305, 627.241, 859.9254, 683.97833, 504.2034]
2025-09-12 09:04:33,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 89.0, 103.0, 143.0, 124.0, 136.0, 116.0, 167.0, 128.0, 102.0]
2025-09-12 09:04:33,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (633.41) for latency MM1Queue_a033_s075
2025-09-12 09:04:33,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 16 hours, 46 minutes, 12 seconds)
2025-09-12 09:18:12,937 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:18:12,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:18:48,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 644.29370 ± 110.472
2025-09-12 09:18:48,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [584.31573, 726.71436, 784.8924, 577.0829, 515.09454, 607.8336, 827.59503, 655.9638, 463.87234, 699.5722]
2025-09-12 09:18:48,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 140.0, 144.0, 109.0, 106.0, 114.0, 156.0, 127.0, 86.0, 142.0]
2025-09-12 09:18:48,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (644.29) for latency MM1Queue_a033_s075
2025-09-12 09:18:48,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 16 hours, 32 minutes, 38 seconds)
2025-09-12 09:32:29,714 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:32:29,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:33:01,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 607.03601 ± 145.281
2025-09-12 09:33:01,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [429.6672, 753.69244, 552.2382, 562.1627, 451.1541, 916.93115, 540.96533, 512.70874, 746.59406, 604.24634]
2025-09-12 09:33:01,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 138.0, 101.0, 104.0, 84.0, 168.0, 104.0, 93.0, 129.0, 114.0]
2025-09-12 09:33:01,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 16 hours, 19 minutes, 39 seconds)
2025-09-12 09:46:45,666 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:46:45,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:47:19,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 634.77802 ± 100.050
2025-09-12 09:47:19,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [704.565, 791.742, 637.4112, 522.9989, 522.71735, 527.1067, 752.12036, 745.3047, 576.83075, 566.9836]
2025-09-12 09:47:19,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 149.0, 119.0, 99.0, 97.0, 98.0, 146.0, 141.0, 111.0, 106.0]
2025-09-12 09:47:19,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 16 hours, 8 minutes)
2025-09-12 10:00:57,479 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:00:57,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:01:32,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 664.58923 ± 160.953
2025-09-12 10:01:32,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [719.27795, 693.2886, 494.6453, 967.79706, 666.87976, 559.6962, 414.42932, 543.80286, 711.6354, 874.4403]
2025-09-12 10:01:32,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 126.0, 90.0, 185.0, 121.0, 102.0, 93.0, 98.0, 145.0, 168.0]
2025-09-12 10:01:32,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (664.59) for latency MM1Queue_a033_s075
2025-09-12 10:01:32,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 15 hours, 53 minutes, 51 seconds)
2025-09-12 10:14:56,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:14:56,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:15:35,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 739.69342 ± 109.798
2025-09-12 10:15:35,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [624.4461, 812.2889, 796.4607, 824.2356, 664.4631, 894.1523, 542.176, 628.52814, 838.2064, 771.9769]
2025-09-12 10:15:35,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 148.0, 157.0, 156.0, 122.0, 175.0, 102.0, 110.0, 147.0, 153.0]
2025-09-12 10:15:35,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (739.69) for latency MM1Queue_a033_s075
2025-09-12 10:15:35,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 15 hours, 37 minutes, 40 seconds)
2025-09-12 10:29:03,032 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:29:03,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:29:45,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 793.91742 ± 243.526
2025-09-12 10:29:45,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [686.65826, 736.786, 625.25024, 577.12036, 1159.2299, 845.7773, 829.9686, 1316.3993, 610.36346, 551.6212]
2025-09-12 10:29:45,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 139.0, 120.0, 109.0, 213.0, 160.0, 181.0, 248.0, 111.0, 106.0]
2025-09-12 10:29:45,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (793.92) for latency MM1Queue_a033_s075
2025-09-12 10:29:45,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 15 hours, 22 minutes, 21 seconds)
2025-09-12 10:43:22,342 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:43:22,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:43:59,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 723.73840 ± 106.475
2025-09-12 10:43:59,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [747.46136, 621.3926, 651.6636, 828.4687, 843.31726, 842.52496, 711.77844, 839.9757, 568.4583, 582.3424]
2025-09-12 10:43:59,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 113.0, 119.0, 163.0, 159.0, 151.0, 125.0, 157.0, 105.0, 106.0]
2025-09-12 10:43:59,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 15 hours, 8 minutes, 30 seconds)
2025-09-12 10:57:32,104 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:57:32,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:58:13,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 812.85388 ± 181.453
2025-09-12 10:58:13,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [392.2187, 581.75714, 981.87244, 905.4561, 730.449, 914.3465, 929.67944, 900.3318, 827.3702, 965.057]
2025-09-12 10:58:13,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 106.0, 185.0, 162.0, 132.0, 165.0, 175.0, 169.0, 154.0, 188.0]
2025-09-12 10:58:13,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (812.85) for latency MM1Queue_a033_s075
2025-09-12 10:58:13,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 14 hours, 53 minutes, 24 seconds)
2025-09-12 11:11:41,605 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:11:41,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:12:25,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 821.03503 ± 227.668
2025-09-12 11:12:25,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [620.18585, 1103.2745, 668.428, 713.3537, 781.7536, 1367.2716, 662.2442, 844.3658, 626.7062, 822.7667]
2025-09-12 11:12:25,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 195.0, 136.0, 146.0, 143.0, 248.0, 123.0, 160.0, 120.0, 167.0]
2025-09-12 11:12:25,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (821.04) for latency MM1Queue_a033_s075
2025-09-12 11:12:25,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 14 hours, 38 minutes, 47 seconds)
2025-09-12 11:26:01,142 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:26:01,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:26:41,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 755.58545 ± 205.987
2025-09-12 11:26:41,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [643.99384, 822.1371, 499.39716, 576.0349, 843.31647, 765.3498, 412.648, 933.4352, 1062.7053, 996.837]
2025-09-12 11:26:41,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 149.0, 111.0, 105.0, 152.0, 150.0, 79.0, 189.0, 214.0, 181.0]
2025-09-12 11:26:41,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 14 hours, 27 minutes, 24 seconds)
2025-09-12 11:40:00,623 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:40:00,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:40:50,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 936.06580 ± 320.483
2025-09-12 11:40:50,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [652.1734, 931.15564, 1025.3992, 801.2323, 998.0688, 719.58923, 1416.5377, 570.2817, 656.749, 1589.4722]
2025-09-12 11:40:50,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 170.0, 202.0, 150.0, 216.0, 133.0, 270.0, 107.0, 142.0, 304.0]
2025-09-12 11:40:50,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (936.07) for latency MM1Queue_a033_s075
2025-09-12 11:40:50,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 14 hours, 13 minutes, 7 seconds)
2025-09-12 11:54:24,996 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:54:24,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:55:06,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 791.56665 ± 171.524
2025-09-12 11:55:06,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [972.0004, 828.9074, 1107.9061, 634.7368, 628.69904, 811.9556, 611.48956, 627.437, 993.4021, 699.13214]
2025-09-12 11:55:06,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 158.0, 223.0, 120.0, 118.0, 157.0, 110.0, 117.0, 183.0, 135.0]
2025-09-12 11:55:06,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 13 hours, 59 minutes, 5 seconds)
2025-09-12 12:08:31,679 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:08:31,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:09:11,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 729.07288 ± 230.087
2025-09-12 12:09:11,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [591.3384, 481.4282, 1168.546, 631.3296, 1044.0935, 568.4824, 686.8824, 695.10254, 954.5087, 469.0165]
2025-09-12 12:09:11,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 89.0, 222.0, 116.0, 205.0, 116.0, 129.0, 143.0, 181.0, 102.0]
2025-09-12 12:09:11,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 13 hours, 43 minutes, 1 second)
2025-09-12 12:22:44,941 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:22:44,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:23:28,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 849.38379 ± 216.916
2025-09-12 12:23:28,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1006.08875, 1075.4989, 747.1277, 712.5407, 937.3226, 958.2616, 889.73505, 1142.7855, 387.69296, 636.78455]
2025-09-12 12:23:28,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [182.0, 198.0, 141.0, 132.0, 171.0, 172.0, 161.0, 206.0, 78.0, 127.0]
2025-09-12 12:23:28,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 13 hours, 30 minutes, 4 seconds)
2025-09-12 12:36:51,982 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:36:51,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:37:34,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 814.63843 ± 162.328
2025-09-12 12:37:34,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [863.9265, 756.2094, 926.5342, 533.8448, 921.4178, 567.3107, 752.5031, 1060.7521, 982.5157, 781.37024]
2025-09-12 12:37:34,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 137.0, 163.0, 101.0, 180.0, 103.0, 142.0, 189.0, 187.0, 143.0]
2025-09-12 12:37:34,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 13 hours, 13 minutes, 49 seconds)
2025-09-12 12:51:07,554 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:51:07,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:51:53,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 883.30194 ± 157.352
2025-09-12 12:51:53,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [802.9806, 890.2032, 1290.6378, 808.8, 751.00824, 806.68774, 861.9981, 805.55774, 768.6899, 1046.4559]
2025-09-12 12:51:53,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 164.0, 233.0, 148.0, 139.0, 152.0, 168.0, 146.0, 139.0, 195.0]
2025-09-12 12:51:53,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 13 hours, 1 minute, 27 seconds)
2025-09-12 13:05:26,826 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:05:26,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:06:22,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1017.41687 ± 619.026
2025-09-12 13:06:22,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [435.06516, 782.86395, 2558.6855, 1738.6985, 1036.2428, 489.70483, 884.89795, 888.7894, 658.8346, 700.3859]
2025-09-12 13:06:22,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 151.0, 497.0, 328.0, 198.0, 100.0, 166.0, 162.0, 140.0, 146.0]
2025-09-12 13:06:22,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1017.42) for latency MM1Queue_a033_s075
2025-09-12 13:06:22,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 49 minutes, 46 seconds)
2025-09-12 13:19:46,046 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:19:46,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:20:36,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 941.07062 ± 271.561
2025-09-12 13:20:36,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [930.0424, 837.1093, 732.79846, 620.04126, 1472.0167, 699.73944, 697.5318, 1340.8242, 1016.4833, 1064.1191]
2025-09-12 13:20:36,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [168.0, 153.0, 151.0, 115.0, 276.0, 146.0, 135.0, 245.0, 190.0, 215.0]
2025-09-12 13:20:36,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 12 hours, 37 minutes, 5 seconds)
2025-09-12 13:34:08,053 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:34:08,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:34:45,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 667.17426 ± 130.073
2025-09-12 13:34:45,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [571.61774, 615.2926, 661.7615, 756.04877, 881.6576, 682.57495, 739.4353, 790.97906, 578.7562, 393.61853]
2025-09-12 13:34:45,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 128.0, 130.0, 141.0, 165.0, 125.0, 155.0, 162.0, 125.0, 88.0]
2025-09-12 13:34:45,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 12 hours, 21 minutes, 13 seconds)
2025-09-12 13:48:16,946 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:48:16,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:49:09,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1002.79980 ± 412.336
2025-09-12 13:49:09,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [433.15033, 856.09735, 1522.1824, 1354.9482, 991.9805, 765.3555, 1832.9961, 815.1313, 608.6345, 847.521]
2025-09-12 13:49:09,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 159.0, 278.0, 261.0, 174.0, 145.0, 355.0, 149.0, 117.0, 178.0]
2025-09-12 13:49:09,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 10 minutes, 13 seconds)
2025-09-12 14:02:32,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:02:32,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:03:22,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 942.90381 ± 246.896
2025-09-12 14:03:22,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1422.173, 673.23315, 656.931, 1296.6753, 858.8808, 926.1472, 1032.1271, 685.0053, 843.0795, 1034.786]
2025-09-12 14:03:22,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [264.0, 122.0, 141.0, 240.0, 155.0, 161.0, 217.0, 141.0, 151.0, 197.0]
2025-09-12 14:03:22,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 54 minutes, 52 seconds)
2025-09-12 14:16:55,149 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:16:55,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:17:42,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 876.92090 ± 214.643
2025-09-12 14:17:42,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1055.5157, 739.7349, 958.61566, 1120.7899, 766.65216, 784.02136, 616.24963, 1101.3422, 499.60184, 1126.6852]
2025-09-12 14:17:42,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [199.0, 152.0, 177.0, 211.0, 137.0, 165.0, 130.0, 211.0, 98.0, 204.0]
2025-09-12 14:17:42,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 11 hours, 38 minutes, 59 seconds)
2025-09-12 14:31:08,055 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:31:08,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:31:55,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 918.14618 ± 219.950
2025-09-12 14:31:55,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [993.09625, 1162.2805, 866.76996, 649.9953, 571.3296, 700.02716, 1219.1598, 803.4577, 1091.0176, 1124.328]
2025-09-12 14:31:55,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [185.0, 207.0, 161.0, 120.0, 118.0, 130.0, 231.0, 170.0, 199.0, 196.0]
2025-09-12 14:31:55,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 11 hours, 24 minutes, 42 seconds)
2025-09-12 14:45:34,317 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:45:34,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:46:18,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 853.79022 ± 265.660
2025-09-12 14:46:18,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [465.99667, 567.6658, 936.6627, 847.64746, 901.9493, 783.84296, 1190.6862, 500.2332, 1268.7266, 1074.4912]
2025-09-12 14:46:18,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 107.0, 178.0, 156.0, 173.0, 147.0, 211.0, 103.0, 236.0, 197.0]
2025-09-12 14:46:18,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 11 hours, 12 minutes, 38 seconds)
2025-09-12 14:59:50,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:59:50,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:00:34,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 839.65302 ± 298.291
2025-09-12 15:00:34,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [946.7487, 945.8895, 867.3675, 1263.5671, 782.2046, 201.29092, 701.8606, 494.97372, 1152.0022, 1040.6261]
2025-09-12 15:00:34,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 166.0, 175.0, 256.0, 142.0, 39.0, 128.0, 105.0, 213.0, 189.0]
2025-09-12 15:00:34,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 57 minutes, 2 seconds)
2025-09-12 15:14:00,256 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:14:00,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:14:57,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1099.37646 ± 295.838
2025-09-12 15:14:57,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1627.4896, 1042.8256, 753.4066, 766.2981, 1199.71, 878.444, 1627.4481, 946.7747, 1063.0824, 1088.2861]
2025-09-12 15:14:57,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [315.0, 192.0, 138.0, 155.0, 212.0, 169.0, 298.0, 179.0, 194.0, 196.0]
2025-09-12 15:14:57,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1099.38) for latency MM1Queue_a033_s075
2025-09-12 15:14:57,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 10 hours, 44 minutes, 10 seconds)
2025-09-12 15:28:29,520 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:28:29,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:29:07,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 692.71600 ± 145.021
2025-09-12 15:29:07,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [790.0112, 638.9959, 759.2163, 632.41394, 992.9174, 654.13446, 698.3078, 464.2346, 793.6196, 503.30884]
2025-09-12 15:29:07,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 134.0, 147.0, 116.0, 187.0, 133.0, 145.0, 85.0, 152.0, 90.0]
2025-09-12 15:29:07,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 28 minutes, 26 seconds)
2025-09-12 15:42:31,588 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:42:31,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:43:13,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 756.95947 ± 305.192
2025-09-12 15:43:13,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [734.22974, 846.45044, 1213.9666, 1156.7723, 582.22626, 988.66425, 594.0394, 733.773, 107.42767, 612.04517]
2025-09-12 15:43:13,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 159.0, 223.0, 234.0, 125.0, 200.0, 120.0, 154.0, 21.0, 114.0]
2025-09-12 15:43:13,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 13 minutes, 4 seconds)
2025-09-12 15:56:50,173 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:56:50,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:57:46,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1075.38196 ± 394.923
2025-09-12 15:57:46,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1037.9993, 642.2734, 1456.8403, 932.18896, 1047.4274, 1377.0057, 1390.8271, 1689.4237, 302.70642, 877.1268]
2025-09-12 15:57:46,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [186.0, 129.0, 264.0, 175.0, 199.0, 258.0, 265.0, 324.0, 62.0, 161.0]
2025-09-12 15:57:46,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 16 seconds)
2025-09-12 16:11:13,054 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:11:13,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:12:01,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 941.23633 ± 279.315
2025-09-12 16:12:01,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [869.309, 780.7405, 925.26807, 598.7481, 863.0673, 926.63104, 1310.4762, 1096.3903, 545.3327, 1496.3998]
2025-09-12 16:12:01,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 148.0, 168.0, 112.0, 157.0, 168.0, 231.0, 199.0, 97.0, 294.0]
2025-09-12 16:12:01,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 45 minutes, 52 seconds)
2025-09-12 16:25:30,031 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:25:30,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:26:21,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 994.15106 ± 358.790
2025-09-12 16:26:21,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1206.1359, 857.6443, 750.403, 820.5258, 1612.578, 693.6599, 1693.4503, 666.16766, 867.93, 773.0162]
2025-09-12 16:26:21,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [219.0, 167.0, 141.0, 154.0, 326.0, 123.0, 295.0, 133.0, 154.0, 138.0]
2025-09-12 16:26:21,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 31 minutes, 15 seconds)
2025-09-12 16:39:46,796 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:39:46,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:40:33,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 876.30920 ± 265.461
2025-09-12 16:40:33,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1172.479, 1066.4534, 863.9447, 354.76233, 1037.9988, 1046.1499, 719.27386, 1140.7485, 883.7529, 477.52832]
2025-09-12 16:40:33,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [219.0, 202.0, 162.0, 69.0, 193.0, 189.0, 147.0, 226.0, 158.0, 92.0]
2025-09-12 16:40:33,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 17 minutes, 12 seconds)
2025-09-12 16:54:06,048 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:54:06,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:55:01,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1082.49048 ± 506.188
2025-09-12 16:55:01,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [733.25256, 1180.071, 640.1639, 585.93176, 2029.0414, 1899.1799, 666.0929, 650.7679, 1102.0778, 1338.3251]
2025-09-12 16:55:01,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 219.0, 121.0, 112.0, 367.0, 353.0, 123.0, 122.0, 198.0, 250.0]
2025-09-12 16:55:01,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 5 minutes, 40 seconds)
2025-09-12 17:08:31,480 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:08:31,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:09:23,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 981.86072 ± 249.485
2025-09-12 17:09:23,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1151.4979, 1376.5481, 1326.193, 692.79694, 1037.7623, 840.48895, 1089.8632, 569.5383, 914.71643, 819.202]
2025-09-12 17:09:23,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [220.0, 280.0, 243.0, 132.0, 190.0, 178.0, 203.0, 102.0, 172.0, 152.0]
2025-09-12 17:09:23,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 50 minutes, 1 second)
2025-09-12 17:22:47,046 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:22:47,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:23:44,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1137.63562 ± 340.510
2025-09-12 17:23:44,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [621.425, 1054.2859, 2027.812, 1234.1124, 1104.7977, 871.12317, 1123.6238, 1031.2332, 1163.1879, 1144.7557]
2025-09-12 17:23:44,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 205.0, 367.0, 241.0, 200.0, 157.0, 208.0, 191.0, 221.0, 201.0]
2025-09-12 17:23:44,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1137.64) for latency MM1Queue_a033_s075
2025-09-12 17:23:45,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 36 minutes, 24 seconds)
2025-09-12 17:37:15,207 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:37:15,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:38:01,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 914.99060 ± 288.949
2025-09-12 17:38:01,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [749.6906, 1060.971, 1629.892, 589.0938, 907.3998, 755.2079, 740.0696, 632.2992, 1030.4128, 1054.8691]
2025-09-12 17:38:01,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 196.0, 289.0, 111.0, 163.0, 138.0, 139.0, 133.0, 183.0, 184.0]
2025-09-12 17:38:01,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 21 minutes, 38 seconds)
2025-09-12 17:51:46,167 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:51:46,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:52:44,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1112.80164 ± 287.699
2025-09-12 17:52:44,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1135.1063, 894.8568, 1650.6055, 1088.2798, 1171.5734, 1343.053, 822.0908, 877.82556, 691.2614, 1453.3629]
2025-09-12 17:52:44,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [203.0, 178.0, 316.0, 198.0, 214.0, 248.0, 153.0, 176.0, 139.0, 279.0]
2025-09-12 17:52:44,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 10 minutes, 53 seconds)
2025-09-12 18:05:52,755 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:05:52,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:06:40,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 876.21710 ± 322.076
2025-09-12 18:06:40,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [896.76196, 624.45105, 915.4119, 1500.2865, 412.28638, 1083.4991, 767.7814, 1128.4001, 401.96957, 1031.3221]
2025-09-12 18:06:40,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [176.0, 119.0, 180.0, 295.0, 88.0, 228.0, 150.0, 232.0, 78.0, 200.0]
2025-09-12 18:06:40,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 52 minutes, 58 seconds)
2025-09-12 18:20:06,644 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:20:06,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:20:51,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 892.14325 ± 213.739
2025-09-12 18:20:51,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [769.2391, 586.302, 741.86127, 1318.428, 1147.9093, 923.55206, 830.69617, 649.27057, 989.3418, 964.832]
2025-09-12 18:20:51,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 109.0, 142.0, 230.0, 212.0, 160.0, 167.0, 118.0, 176.0, 174.0]
2025-09-12 18:20:51,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 37 minutes, 23 seconds)
2025-09-12 18:34:30,807 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:34:30,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:35:22,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 987.16290 ± 308.484
2025-09-12 18:35:22,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1496.9523, 744.5464, 1219.2611, 810.6666, 1049.5928, 1173.9941, 429.31442, 756.5091, 1341.594, 849.19836]
2025-09-12 18:35:22,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [289.0, 150.0, 219.0, 163.0, 189.0, 230.0, 96.0, 139.0, 240.0, 154.0]
2025-09-12 18:35:22,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 24 minutes, 3 seconds)
2025-09-12 18:48:53,059 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:48:53,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:49:37,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 876.71057 ± 261.598
2025-09-12 18:49:37,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1294.2512, 814.9679, 463.9381, 1101.2301, 571.0366, 899.73065, 579.61005, 981.81915, 1180.1859, 880.33594]
2025-09-12 18:49:37,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [232.0, 146.0, 97.0, 198.0, 116.0, 164.0, 110.0, 186.0, 211.0, 162.0]
2025-09-12 18:49:37,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 9 minutes, 39 seconds)
2025-09-12 19:03:00,378 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:03:00,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:03:57,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1091.39795 ± 340.414
2025-09-12 19:03:57,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1076.2644, 1578.8241, 940.5842, 1156.3752, 1391.1888, 1477.02, 844.5855, 1304.948, 502.8756, 641.3131]
2025-09-12 19:03:57,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 286.0, 170.0, 234.0, 271.0, 274.0, 170.0, 233.0, 94.0, 118.0]
2025-09-12 19:03:57,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 52 minutes, 59 seconds)
2025-09-12 19:17:24,209 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:17:24,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:18:14,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 960.41003 ± 280.987
2025-09-12 19:18:14,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [906.32306, 822.57184, 769.74445, 779.75714, 839.47516, 1094.0608, 1366.8829, 891.5445, 1553.4434, 580.2983]
2025-09-12 19:18:14,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 155.0, 145.0, 169.0, 152.0, 203.0, 247.0, 172.0, 290.0, 111.0]
2025-09-12 19:18:14,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 40 minutes, 42 seconds)
2025-09-12 19:31:34,628 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:31:34,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:32:27,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1004.92566 ± 252.670
2025-09-12 19:32:27,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1347.5754, 796.24927, 1347.4768, 579.0639, 1015.0555, 1320.0093, 994.05273, 1011.2817, 747.5755, 890.9173]
2025-09-12 19:32:27,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [236.0, 165.0, 239.0, 105.0, 186.0, 255.0, 185.0, 207.0, 134.0, 163.0]
2025-09-12 19:32:27,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 26 minutes, 37 seconds)
2025-09-12 19:46:00,950 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:46:00,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:46:45,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 849.69922 ± 218.074
2025-09-12 19:46:45,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [828.7501, 848.1866, 851.3532, 1165.6099, 1213.8441, 985.7013, 785.05695, 638.0902, 732.6496, 447.74982]
2025-09-12 19:46:45,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 156.0, 154.0, 211.0, 217.0, 189.0, 144.0, 118.0, 157.0, 89.0]
2025-09-12 19:46:45,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 11 minutes, 13 seconds)
2025-09-12 20:00:25,574 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:00:25,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:01:17,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1009.75879 ± 328.949
2025-09-12 20:01:17,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [424.52164, 1156.777, 1595.9026, 767.0134, 1462.2847, 1041.4398, 838.58044, 1119.838, 930.4129, 760.8169]
2025-09-12 20:01:17,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 220.0, 284.0, 140.0, 269.0, 194.0, 166.0, 207.0, 176.0, 136.0]
2025-09-12 20:01:17,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 58 minutes, 19 seconds)
2025-09-12 20:14:53,761 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:14:53,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:15:54,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1211.02100 ± 429.361
2025-09-12 20:15:54,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1459.5211, 555.9462, 1766.8827, 1513.1871, 1425.5438, 1239.9442, 690.25305, 848.735, 1797.7129, 812.4835]
2025-09-12 20:15:54,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [252.0, 105.0, 325.0, 283.0, 247.0, 226.0, 130.0, 150.0, 305.0, 153.0]
2025-09-12 20:15:54,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1211.02) for latency MM1Queue_a033_s075
2025-09-12 20:15:54,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 45 minutes, 24 seconds)
2025-09-12 20:29:10,667 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:29:10,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:29:55,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 840.60986 ± 218.582
2025-09-12 20:29:55,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [851.0315, 1214.3966, 615.6865, 581.7577, 584.414, 868.9442, 1108.0255, 636.0202, 1025.5742, 920.2477]
2025-09-12 20:29:55,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 215.0, 115.0, 109.0, 123.0, 158.0, 206.0, 132.0, 189.0, 173.0]
2025-09-12 20:29:55,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 29 minutes, 46 seconds)
2025-09-12 20:43:27,240 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:43:27,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:44:26,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1188.19458 ± 516.798
2025-09-12 20:44:26,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [992.4237, 1871.1753, 639.8551, 1232.6403, 1054.7217, 1521.1567, 874.41187, 511.51904, 939.2656, 2244.7754]
2025-09-12 20:44:26,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [173.0, 317.0, 121.0, 221.0, 186.0, 273.0, 169.0, 93.0, 191.0, 383.0]
2025-09-12 20:44:26,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 16 minutes, 44 seconds)
2025-09-12 20:58:01,133 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:58:01,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:59:05,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1224.35059 ± 365.643
2025-09-12 20:59:05,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1027.8197, 1070.0674, 931.4179, 889.78644, 791.09247, 1617.9044, 1852.475, 1685.207, 1438.6001, 939.13525]
2025-09-12 20:59:05,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 207.0, 181.0, 163.0, 149.0, 305.0, 336.0, 314.0, 262.0, 206.0]
2025-09-12 20:59:05,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1224.35) for latency MM1Queue_a033_s075
2025-09-12 20:59:05,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 3 minutes, 47 seconds)
2025-09-12 21:12:52,826 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:12:52,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:13:54,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1189.34692 ± 473.188
2025-09-12 21:13:54,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [820.63916, 1190.7224, 2355.7776, 945.32635, 947.64484, 1682.9532, 622.8149, 1135.1075, 955.32605, 1237.1572]
2025-09-12 21:13:54,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 217.0, 429.0, 188.0, 183.0, 294.0, 125.0, 203.0, 184.0, 219.0]
2025-09-12 21:13:54,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 50 minutes, 25 seconds)
2025-09-12 21:27:28,432 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:27:28,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:28:38,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1312.88196 ± 605.433
2025-09-12 21:28:38,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1476.9994, 1372.5159, 1245.1915, 1375.494, 1981.0298, 2598.0876, 924.35315, 424.0226, 1171.426, 559.7006]
2025-09-12 21:28:38,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [262.0, 263.0, 234.0, 276.0, 362.0, 486.0, 188.0, 92.0, 221.0, 102.0]
2025-09-12 21:28:38,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1312.88) for latency MM1Queue_a033_s075
2025-09-12 21:28:38,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 36 minutes, 23 seconds)
2025-09-12 21:42:04,620 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:42:04,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:43:09,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1231.11804 ± 527.018
2025-09-12 21:43:09,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1075.0385, 1566.9498, 966.2487, 1431.2075, 955.12, 779.21533, 399.78317, 2000.934, 2176.033, 960.65173]
2025-09-12 21:43:09,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [190.0, 297.0, 200.0, 271.0, 203.0, 141.0, 89.0, 369.0, 369.0, 172.0]
2025-09-12 21:43:09,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 23 minutes, 36 seconds)
2025-09-12 21:56:49,263 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:56:49,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:57:40,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 964.49963 ± 372.300
2025-09-12 21:57:40,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [664.42236, 671.19476, 849.8329, 566.7051, 663.116, 1267.3114, 1597.8229, 1611.7955, 775.23236, 977.5629]
2025-09-12 21:57:40,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 143.0, 154.0, 108.0, 139.0, 230.0, 309.0, 293.0, 139.0, 190.0]
2025-09-12 21:57:40,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 8 minutes, 58 seconds)
2025-09-12 22:11:37,069 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:11:37,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:12:38,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1113.66956 ± 887.644
2025-09-12 22:12:38,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [891.00385, 3585.235, 1160.588, 327.50696, 895.4043, 1162.064, 670.24756, 1434.4429, 577.37866, 432.82437]
2025-09-12 22:12:38,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [187.0, 696.0, 211.0, 64.0, 177.0, 223.0, 122.0, 265.0, 122.0, 90.0]
2025-09-12 22:12:38,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 55 minutes, 20 seconds)
2025-09-12 22:25:51,676 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:25:51,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:26:45,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 975.64844 ± 342.596
2025-09-12 22:26:45,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [819.07117, 594.9229, 607.568, 1143.2992, 1347.0605, 939.8082, 1594.7388, 641.9128, 717.86053, 1350.2417]
2025-09-12 22:26:45,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 117.0, 136.0, 210.0, 275.0, 180.0, 285.0, 114.0, 141.0, 256.0]
2025-09-12 22:26:45,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 38 minutes, 33 seconds)
2025-09-12 22:40:28,202 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:40:28,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:41:20,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 998.24365 ± 287.211
2025-09-12 22:41:20,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [644.2731, 1023.95496, 708.94934, 1037.5248, 1682.3125, 671.6507, 1090.0125, 970.9101, 1170.5168, 982.3324]
2025-09-12 22:41:20,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 191.0, 129.0, 204.0, 309.0, 140.0, 199.0, 173.0, 205.0, 178.0]
2025-09-12 22:41:20,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 23 minutes, 34 seconds)
2025-09-12 22:54:55,762 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:54:55,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:55:44,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 946.92346 ± 238.718
2025-09-12 22:55:44,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [913.8579, 889.0408, 664.8945, 619.6186, 780.98175, 747.5669, 1135.9275, 1139.8806, 1239.4956, 1337.9702]
2025-09-12 22:55:44,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 153.0, 122.0, 113.0, 155.0, 132.0, 204.0, 206.0, 226.0, 236.0]
2025-09-12 22:55:44,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 8 minutes, 43 seconds)
2025-09-12 23:09:16,251 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:09:16,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:10:21,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1278.19092 ± 408.169
2025-09-12 23:10:21,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2007.3054, 953.21875, 1197.6399, 1582.1107, 752.42114, 1594.8372, 1179.9087, 1234.9432, 1641.457, 638.06726]
2025-09-12 23:10:21,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [370.0, 179.0, 216.0, 287.0, 134.0, 287.0, 211.0, 226.0, 283.0, 115.0]
2025-09-12 23:10:21,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 54 minutes, 26 seconds)
2025-09-12 23:23:51,687 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:23:51,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:24:56,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1231.65039 ± 795.615
2025-09-12 23:24:56,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1177.7605, 677.9762, 945.0698, 3403.9836, 1322.2604, 1754.7942, 738.3601, 886.23785, 604.469, 805.5918]
2025-09-12 23:24:56,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [212.0, 145.0, 186.0, 623.0, 242.0, 317.0, 156.0, 171.0, 130.0, 142.0]
2025-09-12 23:24:56,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 39 minutes, 3 seconds)
2025-09-12 23:38:21,378 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:38:21,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:39:30,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1297.74902 ± 469.958
2025-09-12 23:39:30,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [976.4124, 1892.1464, 645.8207, 933.37775, 955.1366, 1825.8676, 1738.2433, 1946.3932, 887.1321, 1176.9604]
2025-09-12 23:39:30,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [193.0, 351.0, 135.0, 183.0, 179.0, 328.0, 327.0, 380.0, 177.0, 232.0]
2025-09-12 23:39:30,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 25 minutes, 29 seconds)
2025-09-12 23:52:59,738 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:52:59,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:54:11,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1426.02271 ± 750.831
2025-09-12 23:54:11,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1398.2509, 1838.2844, 2301.8384, 887.1457, 1079.2157, 3159.3345, 811.95044, 1101.5775, 1084.7653, 597.86395]
2025-09-12 23:54:11,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [271.0, 346.0, 428.0, 157.0, 198.0, 586.0, 147.0, 199.0, 193.0, 116.0]
2025-09-12 23:54:11,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1426.02) for latency MM1Queue_a033_s075
2025-09-12 23:54:11,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 11 minutes, 7 seconds)
2025-09-13 00:07:40,723 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:07:40,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:08:46,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1219.46655 ± 399.674
2025-09-13 00:08:46,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1029.7413, 1080.648, 808.9481, 1709.3514, 1185.6909, 1334.5049, 1491.0237, 751.151, 2018.116, 785.4911]
2025-09-13 00:08:46,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [193.0, 206.0, 175.0, 326.0, 213.0, 243.0, 268.0, 133.0, 355.0, 146.0]
2025-09-13 00:08:46,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 56 minutes, 51 seconds)
2025-09-13 00:23:07,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:23:07,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:24:10,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1175.45911 ± 387.703
2025-09-13 00:24:10,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [942.50696, 697.57324, 1109.6234, 1081.8395, 988.14325, 935.09467, 986.55774, 1636.6385, 1275.9159, 2100.6985]
2025-09-13 00:24:10,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 131.0, 201.0, 200.0, 186.0, 175.0, 171.0, 308.0, 246.0, 374.0]
2025-09-13 00:24:10,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 43 minutes, 21 seconds)
2025-09-13 00:38:34,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:38:34,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:39:33,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1071.19495 ± 350.445
2025-09-13 00:39:33,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1188.6979, 900.4755, 1251.4906, 1872.8169, 565.32477, 969.375, 849.1772, 685.308, 1261.7352, 1167.5481]
2025-09-13 00:39:33,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [230.0, 165.0, 227.0, 354.0, 117.0, 179.0, 173.0, 148.0, 240.0, 214.0]
2025-09-13 00:39:33,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 29 minutes, 32 seconds)
2025-09-13 00:53:27,474 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:53:27,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:54:29,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1109.18188 ± 648.518
2025-09-13 00:54:29,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [1593.4257, 880.9587, 925.0897, 640.842, 814.0388, 890.3008, 438.71027, 874.4017, 2847.0605, 1186.9907]
2025-09-13 00:54:29,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [313.0, 164.0, 163.0, 117.0, 147.0, 178.0, 94.0, 155.0, 526.0, 216.0]
2025-09-13 00:54:29,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 14 minutes, 58 seconds)
2025-09-13 01:08:51,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:08:51,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:10:11,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1497.97180 ± 441.087
2025-09-13 01:10:11,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [986.30634, 2216.5813, 1755.0425, 1528.9994, 1279.8737, 1065.6473, 819.72253, 1729.842, 1506.4324, 2091.2695]
2025-09-13 01:10:11,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 406.0, 325.0, 274.0, 250.0, 195.0, 155.0, 323.0, 270.0, 384.0]
2025-09-13 01:10:11,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1497.97) for latency MM1Queue_a033_s075
2025-09-13 01:10:11,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 48 seconds)
2025-09-13 01:24:14,371 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:24:14,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:25:50,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1752.01587 ± 616.849
2025-09-13 01:25:50,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2806.343, 1833.2306, 1150.5729, 2737.1123, 1666.9684, 2109.6445, 1747.2727, 1286.2515, 1346.0536, 836.7099]
2025-09-13 01:25:50,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [502.0, 334.0, 235.0, 528.0, 305.0, 426.0, 310.0, 235.0, 236.0, 158.0]
2025-09-13 01:25:50,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1226 [INFO]: New best (1752.02) for latency MM1Queue_a033_s075
2025-09-13 01:25:50,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 46 minutes, 14 seconds)
2025-09-13 01:40:04,858 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:40:04,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:41:16,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1297.28296 ± 522.908
2025-09-13 01:41:16,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [2180.1572, 608.74133, 639.83984, 1212.0958, 1022.47455, 1355.9143, 1074.3687, 1992.3352, 1003.6725, 1883.2295]
2025-09-13 01:41:16,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [400.0, 130.0, 132.0, 211.0, 200.0, 258.0, 193.0, 370.0, 189.0, 342.0]
2025-09-13 01:41:16,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 30 minutes, 50 seconds)
2025-09-13 01:55:31,397 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:55:31,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:56:30,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1054.40076 ± 438.170
2025-09-13 01:56:30,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [555.2039, 703.3469, 1245.4548, 1247.4081, 2037.3232, 823.73346, 1385.1523, 502.5499, 1160.3973, 883.43854]
2025-09-13 01:56:30,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 136.0, 225.0, 249.0, 359.0, 164.0, 286.0, 96.0, 233.0, 159.0]
2025-09-13 01:56:30,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 23 seconds)
2025-09-13 02:10:48,843 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:10:48,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:12:02,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 1307.40747 ± 656.756
2025-09-13 02:12:02,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [564.07983, 1834.9778, 554.4678, 802.9841, 1985.5725, 1056.9517, 1580.7368, 1352.2932, 2618.9746, 723.0357]
2025-09-13 02:12:02,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 355.0, 111.0, 143.0, 366.0, 211.0, 313.0, 254.0, 490.0, 135.0]
2025-09-13 02:12:02,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-humanoid):1251 [DEBUG]: Training session finished
