2025-09-12 01:52:48,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc10-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-12 01:52:48,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc10-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-09-12 01:52:48,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14d30956cf90>}
2025-09-12 01:52:48,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1111 [DEBUG]: using device: cuda
2025-09-12 01:52:48,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1133 [INFO]: Creating new trainer
2025-09-12 01:52:48,178 baseline-mbpac-noiseperc10-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-12 01:52:48,178 baseline-mbpac-noiseperc10-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 01:52:48,190 baseline-mbpac-noiseperc10-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-12 01:52:49,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1194 [DEBUG]: Starting training session...
2025-09-12 01:52:49,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 1/100
2025-09-12 02:05:11,577 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:05:11,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:05:26,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 280.79199 ± 43.752
2025-09-12 02:05:26,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [257.11072, 298.7992, 273.81052, 242.01396, 271.8655, 293.10068, 262.19867, 269.9253, 238.90746, 400.18787]
2025-09-12 02:05:26,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 55.0, 51.0, 47.0, 50.0, 54.0, 48.0, 50.0, 45.0, 74.0]
2025-09-12 02:05:26,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (280.79) for latency MM1Queue_a033_s075
2025-09-12 02:05:26,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 20 hours, 49 minutes, 27 seconds)
2025-09-12 02:19:33,354 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:19:33,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:19:55,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 380.75009 ± 48.461
2025-09-12 02:19:55,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [404.59573, 314.83804, 353.33334, 404.85748, 392.91852, 279.2118, 420.52023, 425.72263, 374.71622, 436.7871]
2025-09-12 02:19:55,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 59.0, 65.0, 76.0, 84.0, 54.0, 80.0, 80.0, 71.0, 89.0]
2025-09-12 02:19:55,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (380.75) for latency MM1Queue_a033_s075
2025-09-12 02:19:55,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 22 hours, 7 minutes, 43 seconds)
2025-09-12 02:33:59,200 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:33:59,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:34:23,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 426.65445 ± 98.809
2025-09-12 02:34:23,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [482.4474, 586.9028, 495.37903, 236.70879, 319.03653, 345.43698, 517.71735, 396.84937, 451.49295, 434.5733]
2025-09-12 02:34:23,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 112.0, 91.0, 44.0, 59.0, 63.0, 97.0, 73.0, 93.0, 80.0]
2025-09-12 02:34:23,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (426.65) for latency MM1Queue_a033_s075
2025-09-12 02:34:23,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 22 hours, 23 minutes, 46 seconds)
2025-09-12 02:48:34,075 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:48:34,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:48:55,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 388.41177 ± 86.646
2025-09-12 02:48:55,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [534.0741, 419.64432, 505.26227, 467.0081, 333.11002, 358.54364, 246.34119, 347.75592, 366.26672, 306.11145]
2025-09-12 02:48:55,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 77.0, 94.0, 86.0, 66.0, 66.0, 53.0, 76.0, 68.0, 65.0]
2025-09-12 02:48:55,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 22 hours, 26 minutes, 35 seconds)
2025-09-12 03:02:59,488 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:02:59,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:03:21,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 394.72552 ± 65.554
2025-09-12 03:03:21,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [354.0364, 362.123, 376.79642, 435.14417, 341.903, 404.91385, 429.5079, 385.83746, 554.6639, 302.3293]
2025-09-12 03:03:21,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 67.0, 72.0, 79.0, 67.0, 76.0, 81.0, 69.0, 106.0, 62.0]
2025-09-12 03:03:21,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 22 hours, 20 minutes, 1 second)
2025-09-12 03:17:29,096 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:17:29,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:17:54,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 451.40356 ± 55.249
2025-09-12 03:17:54,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [533.3168, 452.0442, 512.6325, 400.89984, 452.76627, 478.53986, 347.08286, 420.64572, 507.34592, 408.76178]
2025-09-12 03:17:54,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 85.0, 100.0, 73.0, 83.0, 94.0, 70.0, 83.0, 97.0, 76.0]
2025-09-12 03:17:54,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (451.40) for latency MM1Queue_a033_s075
2025-09-12 03:17:54,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 22 hours, 42 minutes, 19 seconds)
2025-09-12 03:32:01,473 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:32:01,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:32:27,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 479.31851 ± 118.002
2025-09-12 03:32:27,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [543.69385, 359.2866, 326.61365, 421.1076, 340.40106, 505.8263, 447.72592, 545.912, 719.8332, 582.78534]
2025-09-12 03:32:27,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 65.0, 61.0, 75.0, 62.0, 97.0, 80.0, 99.0, 131.0, 106.0]
2025-09-12 03:32:27,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (479.32) for latency MM1Queue_a033_s075
2025-09-12 03:32:27,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 22 hours, 29 minutes, 3 seconds)
2025-09-12 03:46:35,616 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:46:35,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:46:57,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 402.81769 ± 63.505
2025-09-12 03:46:57,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [371.35355, 393.97247, 314.01315, 520.52454, 405.21762, 428.80765, 292.73593, 447.398, 399.18423, 454.96982]
2025-09-12 03:46:57,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 72.0, 58.0, 102.0, 74.0, 87.0, 55.0, 81.0, 76.0, 85.0]
2025-09-12 03:46:57,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 22 hours, 15 minutes, 29 seconds)
2025-09-12 04:01:09,224 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:01:09,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:01:39,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 542.65344 ± 162.560
2025-09-12 04:01:39,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [862.599, 576.60187, 535.9979, 377.56223, 771.85547, 581.99255, 343.06082, 362.51425, 554.89, 459.45996]
2025-09-12 04:01:39,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 110.0, 100.0, 77.0, 139.0, 112.0, 62.0, 68.0, 110.0, 84.0]
2025-09-12 04:01:39,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (542.65) for latency MM1Queue_a033_s075
2025-09-12 04:01:39,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 22 hours, 3 minutes, 37 seconds)
2025-09-12 04:15:50,460 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:15:50,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:16:16,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 458.17096 ± 89.860
2025-09-12 04:16:16,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [408.54376, 437.9003, 688.6843, 470.12024, 403.72955, 374.26413, 346.89062, 478.69293, 476.7327, 496.15125]
2025-09-12 04:16:16,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 88.0, 127.0, 85.0, 81.0, 71.0, 63.0, 97.0, 94.0, 89.0]
2025-09-12 04:16:16,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 21 hours, 52 minutes, 36 seconds)
2025-09-12 04:30:29,086 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:30:29,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:30:56,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 503.10367 ± 130.767
2025-09-12 04:30:56,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [673.5171, 541.30286, 485.57953, 539.65936, 526.64685, 597.68475, 405.1698, 328.07236, 674.6444, 258.75925]
2025-09-12 04:30:56,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 100.0, 87.0, 98.0, 95.0, 109.0, 76.0, 71.0, 123.0, 52.0]
2025-09-12 04:30:56,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 21 hours, 40 minutes)
2025-09-12 04:45:03,904 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:45:03,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:45:31,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 496.18051 ± 86.955
2025-09-12 04:45:31,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [440.1848, 445.98132, 423.6229, 510.80643, 449.05405, 709.8182, 460.81018, 445.84448, 606.2488, 469.43405]
2025-09-12 04:45:31,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 81.0, 79.0, 94.0, 86.0, 132.0, 84.0, 83.0, 128.0, 87.0]
2025-09-12 04:45:31,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 21 hours, 26 minutes, 3 seconds)
2025-09-12 04:59:39,290 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:59:39,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:00:06,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 513.33167 ± 97.993
2025-09-12 05:00:06,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [493.39813, 399.3198, 469.29797, 780.2357, 489.16055, 501.95334, 567.2962, 447.08685, 510.03854, 475.52985]
2025-09-12 05:00:06,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 73.0, 84.0, 137.0, 88.0, 102.0, 105.0, 81.0, 91.0, 86.0]
2025-09-12 05:00:06,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 21 hours, 12 minutes, 47 seconds)
2025-09-12 05:14:17,202 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:14:17,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:14:53,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 661.69519 ± 140.336
2025-09-12 05:14:53,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [605.04785, 427.37274, 841.6101, 720.2712, 896.9577, 676.1074, 698.2823, 529.40106, 502.5995, 719.3011]
2025-09-12 05:14:53,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 89.0, 154.0, 130.0, 161.0, 122.0, 135.0, 95.0, 96.0, 138.0]
2025-09-12 05:14:53,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (661.70) for latency MM1Queue_a033_s075
2025-09-12 05:14:53,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 20 hours, 59 minutes, 34 seconds)
2025-09-12 05:28:59,433 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:28:59,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:29:35,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 635.38171 ± 208.561
2025-09-12 05:29:35,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [603.5485, 1195.0295, 651.3119, 397.08246, 710.93176, 593.8511, 674.45526, 530.1327, 442.44247, 555.032]
2025-09-12 05:29:35,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 221.0, 119.0, 82.0, 126.0, 118.0, 121.0, 107.0, 94.0, 106.0]
2025-09-12 05:29:35,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 20 hours, 46 minutes, 25 seconds)
2025-09-12 05:43:46,504 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:43:46,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:44:29,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 763.31085 ± 262.149
2025-09-12 05:44:29,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [439.31485, 669.89325, 1215.4773, 1028.4745, 567.77875, 565.8848, 1189.7092, 625.35223, 693.9572, 637.2668]
2025-09-12 05:44:29,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 120.0, 233.0, 207.0, 101.0, 112.0, 241.0, 114.0, 130.0, 119.0]
2025-09-12 05:44:29,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (763.31) for latency MM1Queue_a033_s075
2025-09-12 05:44:29,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 20 hours, 35 minutes, 43 seconds)
2025-09-12 05:58:45,622 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:58:45,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:59:18,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 572.05212 ± 131.391
2025-09-12 05:59:18,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [599.9666, 854.1802, 740.00916, 571.0437, 496.8542, 449.80914, 460.80344, 489.96246, 630.487, 427.40527]
2025-09-12 05:59:18,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 163.0, 151.0, 111.0, 89.0, 91.0, 93.0, 91.0, 129.0, 80.0]
2025-09-12 05:59:18,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 20 hours, 24 minutes, 49 seconds)
2025-09-12 06:13:26,062 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:13:26,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:14:00,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 625.13739 ± 142.830
2025-09-12 06:14:00,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [712.9784, 628.7717, 690.5384, 410.9066, 841.42487, 549.89526, 809.4481, 634.36646, 383.62027, 589.4241]
2025-09-12 06:14:00,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 120.0, 123.0, 81.0, 162.0, 100.0, 151.0, 114.0, 85.0, 105.0]
2025-09-12 06:14:00,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 20 hours, 11 minutes, 47 seconds)
2025-09-12 06:28:21,173 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:28:21,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:29:00,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 739.17017 ± 126.792
2025-09-12 06:29:00,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [660.21277, 579.34064, 695.4633, 826.61505, 610.4654, 701.7244, 658.2531, 745.7135, 955.2857, 958.6279]
2025-09-12 06:29:00,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 113.0, 128.0, 148.0, 110.0, 126.0, 116.0, 136.0, 173.0, 169.0]
2025-09-12 06:29:00,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 20 hours, 48 seconds)
2025-09-12 06:43:02,123 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:43:02,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:43:45,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 776.58313 ± 146.674
2025-09-12 06:43:45,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [565.96106, 980.7925, 805.68506, 815.8552, 925.4644, 637.33966, 699.20013, 925.3822, 552.1552, 857.99603]
2025-09-12 06:43:45,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 181.0, 165.0, 165.0, 172.0, 115.0, 126.0, 167.0, 100.0, 168.0]
2025-09-12 06:43:45,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (776.58) for latency MM1Queue_a033_s075
2025-09-12 06:43:45,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 19 hours, 46 minutes, 38 seconds)
2025-09-12 06:58:03,328 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:58:03,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:58:43,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 737.47559 ± 240.790
2025-09-12 06:58:43,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [559.9192, 845.62775, 554.94794, 1146.7385, 369.422, 508.82065, 1102.7129, 835.3194, 676.82043, 774.4273]
2025-09-12 06:58:43,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 156.0, 99.0, 213.0, 67.0, 104.0, 194.0, 158.0, 138.0, 138.0]
2025-09-12 06:58:43,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 19 hours, 32 minutes, 52 seconds)
2025-09-12 07:12:51,555 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:12:51,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:13:51,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 1030.24866 ± 494.098
2025-09-12 07:13:51,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [2448.0547, 912.31586, 746.5139, 846.8884, 724.1256, 1144.0188, 705.0258, 799.4846, 1118.1772, 857.88086]
2025-09-12 07:13:51,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [481.0, 185.0, 155.0, 161.0, 138.0, 233.0, 137.0, 149.0, 216.0, 160.0]
2025-09-12 07:13:51,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (1030.25) for latency MM1Queue_a033_s075
2025-09-12 07:13:51,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 19 hours, 23 minutes)
2025-09-12 07:28:20,615 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:28:20,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:29:13,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 981.00409 ± 177.016
2025-09-12 07:29:13,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [843.9513, 1183.4165, 983.0144, 1001.50385, 958.98236, 1065.9971, 717.42847, 792.9136, 913.05774, 1349.7751]
2025-09-12 07:29:13,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 216.0, 181.0, 186.0, 188.0, 192.0, 128.0, 144.0, 174.0, 247.0]
2025-09-12 07:29:13,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 19 hours, 18 minutes, 26 seconds)
2025-09-12 07:43:20,592 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:43:20,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:44:10,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 897.26208 ± 336.816
2025-09-12 07:44:10,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [495.9884, 1666.7323, 1024.3479, 495.44617, 585.08344, 1099.0265, 686.45105, 928.73083, 974.43115, 1016.38306]
2025-09-12 07:44:10,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 310.0, 196.0, 92.0, 121.0, 212.0, 123.0, 189.0, 176.0, 182.0]
2025-09-12 07:44:10,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 19 hours, 2 minutes, 28 seconds)
2025-09-12 07:58:35,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:58:35,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:59:29,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 951.08008 ± 237.095
2025-09-12 07:59:29,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [1131.1946, 702.54706, 1307.5277, 945.05383, 918.83844, 1319.2162, 546.627, 913.39923, 972.3525, 754.04443]
2025-09-12 07:59:29,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [214.0, 131.0, 254.0, 183.0, 181.0, 251.0, 97.0, 179.0, 183.0, 146.0]
2025-09-12 07:59:29,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 18 hours, 55 minutes, 55 seconds)
2025-09-12 08:13:39,704 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:13:39,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:14:41,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 1111.20618 ± 137.807
2025-09-12 08:14:41,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [1037.8955, 1321.4111, 1193.2075, 931.45935, 993.59015, 1278.0302, 1144.0939, 1118.3711, 1205.5471, 888.4561]
2025-09-12 08:14:41,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [193.0, 247.0, 227.0, 174.0, 183.0, 237.0, 219.0, 214.0, 214.0, 167.0]
2025-09-12 08:14:41,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (1111.21) for latency MM1Queue_a033_s075
2025-09-12 08:14:41,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 18 hours, 44 minutes, 14 seconds)
2025-09-12 08:28:49,484 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:28:49,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:29:37,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 843.08856 ± 260.137
2025-09-12 08:29:37,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [1020.06226, 1255.6837, 942.6083, 662.52386, 1213.2017, 974.8724, 618.02356, 590.1113, 681.6266, 472.17206]
2025-09-12 08:29:37,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [197.0, 244.0, 181.0, 136.0, 235.0, 194.0, 124.0, 116.0, 138.0, 98.0]
2025-09-12 08:29:38,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 18 hours, 26 minutes, 18 seconds)
2025-09-12 08:44:03,885 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:44:03,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:45:14,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 1265.60229 ± 422.285
2025-09-12 08:45:14,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [1957.2405, 1635.9156, 803.6715, 646.78326, 1765.0798, 1091.9127, 1299.5808, 1126.3268, 810.57245, 1518.9385]
2025-09-12 08:45:14,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [369.0, 312.0, 153.0, 122.0, 335.0, 205.0, 241.0, 214.0, 158.0, 291.0]
2025-09-12 08:45:14,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (1265.60) for latency MM1Queue_a033_s075
2025-09-12 08:45:14,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 18 hours, 14 minutes, 36 seconds)
2025-09-12 08:59:20,754 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:59:20,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:00:16,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 989.42480 ± 350.586
2025-09-12 09:00:16,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [647.72107, 1258.5706, 1000.02844, 866.7196, 1369.4578, 684.07007, 1153.7413, 933.0789, 375.25128, 1605.6095]
2025-09-12 09:00:16,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 234.0, 196.0, 166.0, 261.0, 134.0, 213.0, 182.0, 72.0, 306.0]
2025-09-12 09:00:16,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 33 seconds)
2025-09-12 09:14:35,864 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:14:35,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:15:53,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 1361.27954 ± 349.504
2025-09-12 09:15:53,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [1100.7831, 1449.565, 1612.1395, 1006.7143, 1154.6313, 2052.804, 1168.8221, 1140.3136, 1057.583, 1869.4391]
2025-09-12 09:15:53,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [215.0, 269.0, 312.0, 202.0, 222.0, 391.0, 226.0, 227.0, 202.0, 361.0]
2025-09-12 09:15:53,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (1361.28) for latency MM1Queue_a033_s075
2025-09-12 09:15:53,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 17 hours, 49 minutes, 33 seconds)
2025-09-12 09:30:07,590 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:30:07,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:31:18,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 1240.89709 ± 283.415
2025-09-12 09:31:18,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [1051.22, 1612.426, 758.8804, 1404.7314, 1257.1498, 771.4562, 1388.2404, 1517.7074, 1186.8243, 1460.3353]
2025-09-12 09:31:18,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 308.0, 138.0, 270.0, 244.0, 151.0, 268.0, 290.0, 224.0, 283.0]
2025-09-12 09:31:18,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 17 hours, 37 minutes, 21 seconds)
2025-09-12 09:45:40,212 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:45:40,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:46:51,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 1247.42249 ± 485.675
2025-09-12 09:46:51,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [712.72534, 1876.1519, 1755.6888, 265.77032, 1700.962, 1414.9297, 1477.7578, 883.14825, 1070.4423, 1316.6475]
2025-09-12 09:46:51,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 366.0, 339.0, 51.0, 327.0, 270.0, 280.0, 169.0, 196.0, 253.0]
2025-09-12 09:46:51,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 30 minutes, 10 seconds)
2025-09-12 10:01:10,838 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:01:10,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:02:38,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 1538.19983 ± 440.590
2025-09-12 10:02:38,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [1997.8813, 1237.2222, 1599.8153, 684.5127, 2390.0723, 1500.9346, 1434.9146, 1301.898, 1841.6559, 1393.0906]
2025-09-12 10:02:38,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [384.0, 233.0, 309.0, 139.0, 463.0, 287.0, 279.0, 252.0, 352.0, 271.0]
2025-09-12 10:02:38,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (1538.20) for latency MM1Queue_a033_s075
2025-09-12 10:02:38,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 17 minutes, 11 seconds)
2025-09-12 10:17:13,533 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:17:13,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:18:21,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 1158.17236 ± 350.096
2025-09-12 10:18:21,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [788.6985, 1050.7615, 1913.8561, 921.627, 1455.232, 1536.0161, 815.5778, 1170.5881, 1081.3, 848.06635]
2025-09-12 10:18:21,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 206.0, 371.0, 182.0, 299.0, 296.0, 156.0, 229.0, 211.0, 160.0]
2025-09-12 10:18:21,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 10 minutes, 42 seconds)
2025-09-12 10:33:12,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:33:12,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:34:37,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 1518.35547 ± 510.664
2025-09-12 10:34:37,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [2129.3867, 2068.273, 1040.439, 1867.6019, 1466.1262, 628.004, 1627.9205, 805.11694, 2033.5873, 1517.0984]
2025-09-12 10:34:37,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [404.0, 389.0, 196.0, 358.0, 277.0, 112.0, 308.0, 156.0, 387.0, 295.0]
2025-09-12 10:34:37,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 3 minutes, 42 seconds)
2025-09-12 10:48:21,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:48:21,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:50:14,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 1995.27368 ± 519.424
2025-09-12 10:50:14,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [1922.4843, 2635.024, 2916.3198, 2324.9023, 1173.3301, 1450.4183, 1466.98, 1846.1382, 1992.399, 2224.741]
2025-09-12 10:50:14,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [371.0, 494.0, 540.0, 444.0, 222.0, 280.0, 284.0, 349.0, 383.0, 436.0]
2025-09-12 10:50:14,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (1995.27) for latency MM1Queue_a033_s075
2025-09-12 10:50:14,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 50 minutes, 16 seconds)
2025-09-12 11:04:20,333 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:04:20,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:06:02,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 1822.77734 ± 557.732
2025-09-12 11:06:02,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [2095.3774, 2465.1577, 1942.9838, 1224.6135, 927.4, 1250.7122, 2484.3438, 1312.2278, 2399.3606, 2125.5957]
2025-09-12 11:06:02,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [401.0, 476.0, 364.0, 225.0, 175.0, 236.0, 469.0, 250.0, 456.0, 398.0]
2025-09-12 11:06:02,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 37 minutes, 43 seconds)
2025-09-12 11:20:48,822 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:20:48,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:22:36,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 1928.82056 ± 1374.694
2025-09-12 11:22:36,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [1140.7997, 937.5373, 1152.4503, 3391.0432, 1517.5074, 5354.5815, 881.4387, 2423.653, 814.37994, 1674.8131]
2025-09-12 11:22:36,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [225.0, 177.0, 226.0, 620.0, 295.0, 1000.0, 168.0, 451.0, 152.0, 320.0]
2025-09-12 11:22:36,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 31 minutes, 28 seconds)
2025-09-12 11:36:41,973 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:36:41,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:38:59,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 2459.94482 ± 773.523
2025-09-12 11:38:59,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [1589.3, 1141.1868, 2746.7827, 2864.486, 3019.0486, 3223.0374, 1523.0779, 2501.8013, 3663.4111, 2327.315]
2025-09-12 11:38:59,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [308.0, 218.0, 517.0, 536.0, 566.0, 608.0, 287.0, 481.0, 693.0, 438.0]
2025-09-12 11:38:59,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (2459.94) for latency MM1Queue_a033_s075
2025-09-12 11:38:59,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 16 hours, 23 minutes, 48 seconds)
2025-09-12 11:53:36,696 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:53:36,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:56:15,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 2764.06592 ± 1192.761
2025-09-12 11:56:15,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5100.065, 3851.9521, 2456.444, 2106.335, 1552.0809, 3036.042, 3985.107, 2424.6262, 846.4317, 2281.5737]
2025-09-12 11:56:15,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [952.0, 740.0, 468.0, 402.0, 296.0, 563.0, 768.0, 460.0, 166.0, 438.0]
2025-09-12 11:56:15,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (2764.07) for latency MM1Queue_a033_s075
2025-09-12 11:56:15,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 16 hours, 19 minutes, 26 seconds)
2025-09-12 12:10:50,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:10:50,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:13:28,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 2853.63379 ± 1682.006
2025-09-12 12:13:28,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [1560.447, 5460.1553, 1765.4556, 4049.468, 1983.4425, 5387.1465, 1181.9598, 4332.345, 2069.0315, 746.8854]
2025-09-12 12:13:28,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [302.0, 1000.0, 335.0, 760.0, 372.0, 1000.0, 230.0, 816.0, 397.0, 143.0]
2025-09-12 12:13:28,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (2853.63) for latency MM1Queue_a033_s075
2025-09-12 12:13:28,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 16 hours, 22 minutes, 12 seconds)
2025-09-12 12:27:19,774 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:27:19,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:30:35,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 3522.17847 ± 1216.983
2025-09-12 12:30:35,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [2814.8132, 2474.3596, 5030.8755, 2711.295, 3982.3318, 4405.7715, 4193.8354, 1327.6451, 5416.055, 2864.8042]
2025-09-12 12:30:35,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [524.0, 478.0, 949.0, 507.0, 752.0, 825.0, 799.0, 251.0, 1000.0, 543.0]
2025-09-12 12:30:35,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (3522.18) for latency MM1Queue_a033_s075
2025-09-12 12:30:35,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 16 hours, 20 minutes, 47 seconds)
2025-09-12 12:45:12,139 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:45:12,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:48:25,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 3469.50781 ± 1512.208
2025-09-12 12:48:25,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [3102.9167, 3786.9207, 1874.2982, 1729.858, 2250.7725, 4963.2954, 1395.8354, 5328.1553, 5312.6235, 4950.402]
2025-09-12 12:48:25,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [587.0, 718.0, 355.0, 322.0, 427.0, 940.0, 266.0, 1000.0, 1000.0, 927.0]
2025-09-12 12:48:25,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 16 hours, 18 minutes, 18 seconds)
2025-09-12 13:03:36,617 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:03:36,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:06:52,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 3471.24756 ± 1287.138
2025-09-12 13:06:52,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [2729.356, 1666.6073, 3117.464, 3106.1567, 2579.2488, 2180.8845, 5288.1724, 5296.392, 3431.1458, 5317.0503]
2025-09-12 13:06:52,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [522.0, 326.0, 595.0, 594.0, 489.0, 424.0, 1000.0, 1000.0, 641.0, 1000.0]
2025-09-12 13:06:52,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 16 hours, 24 minutes, 11 seconds)
2025-09-12 13:20:38,663 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:20:38,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:24:25,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4089.32617 ± 1534.171
2025-09-12 13:24:25,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5415.5034, 3911.3933, 5425.4497, 2129.4473, 4020.108, 1443.6041, 5460.901, 5444.3022, 5442.159, 2200.394]
2025-09-12 13:24:25,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 737.0, 1000.0, 424.0, 783.0, 276.0, 1000.0, 1000.0, 1000.0, 411.0]
2025-09-12 13:24:25,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (4089.33) for latency MM1Queue_a033_s075
2025-09-12 13:24:25,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 16 hours, 9 minutes, 57 seconds)
2025-09-12 13:38:45,000 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:38:45,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:42:10,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 3664.93823 ± 1389.309
2025-09-12 13:42:10,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [2837.1934, 4519.5103, 1971.3912, 2127.8872, 4749.1304, 1613.3291, 5370.5547, 5306.893, 4891.2446, 3262.248]
2025-09-12 13:42:10,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [535.0, 838.0, 374.0, 392.0, 886.0, 306.0, 1000.0, 1000.0, 926.0, 622.0]
2025-09-12 13:42:10,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 15 hours, 57 minutes, 51 seconds)
2025-09-12 13:56:11,798 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:56:11,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:00:13,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4405.82812 ± 1796.161
2025-09-12 14:00:13,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [3932.1978, 5425.8955, 5478.0327, 1463.6805, 5407.9297, 5473.818, 5455.2876, 5490.9585, 5476.911, 453.5704]
2025-09-12 14:00:13,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [715.0, 1000.0, 1000.0, 284.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 83.0]
2025-09-12 14:00:13,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (4405.83) for latency MM1Queue_a033_s075
2025-09-12 14:00:13,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 15 hours, 50 minutes, 5 seconds)
2025-09-12 14:15:09,662 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:15:09,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:19:51,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5173.09668 ± 764.272
2025-09-12 14:19:51,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5453.8315, 5452.5796, 5442.5986, 5479.371, 5441.3213, 5524.1157, 2901.978, 5421.27, 5123.659, 5490.2397]
2025-09-12 14:19:51,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 552.0, 1000.0, 929.0, 1000.0]
2025-09-12 14:19:51,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (5173.10) for latency MM1Queue_a033_s075
2025-09-12 14:19:51,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 15 hours, 50 minutes, 56 seconds)
2025-09-12 14:34:24,060 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:34:24,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:38:41,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4636.06006 ± 1284.021
2025-09-12 14:38:41,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5298.841, 2133.0178, 5483.8677, 5450.8716, 5409.5107, 5333.4185, 5384.667, 2161.1528, 4334.9385, 5370.3154]
2025-09-12 14:38:41,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 404.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 410.0, 814.0, 1000.0]
2025-09-12 14:38:41,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 15 hours, 36 minutes, 30 seconds)
2025-09-12 14:52:31,192 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:52:31,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:56:38,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4492.12500 ± 1529.490
2025-09-12 14:56:38,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5430.4395, 3509.759, 1076.6414, 5399.7437, 5478.7397, 5450.3096, 5355.2505, 5469.184, 5409.742, 2341.442]
2025-09-12 14:56:38,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 653.0, 208.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 440.0]
2025-09-12 14:56:38,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 15 hours, 22 minutes, 5 seconds)
2025-09-12 15:11:31,218 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:11:31,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:15:17,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4129.87598 ± 1371.879
2025-09-12 15:15:17,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5460.582, 5449.7876, 5337.2812, 5468.0254, 2619.3423, 1842.6733, 5432.3755, 3390.379, 2713.779, 3584.534]
2025-09-12 15:15:17,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 487.0, 349.0, 1000.0, 645.0, 493.0, 667.0]
2025-09-12 15:15:17,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 15 hours, 12 minutes, 37 seconds)
2025-09-12 15:28:57,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:28:57,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:33:19,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4688.78809 ± 926.278
2025-09-12 15:33:19,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5382.218, 5371.7573, 3746.0232, 5279.6973, 2644.718, 5387.6206, 5305.694, 5306.1772, 4755.9, 3708.0742]
2025-09-12 15:33:19,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 706.0, 1000.0, 497.0, 1000.0, 1000.0, 1000.0, 901.0, 714.0]
2025-09-12 15:33:19,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 14 hours, 53 minutes, 52 seconds)
2025-09-12 15:47:44,349 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:47:44,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:52:43,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5309.46387 ± 37.466
2025-09-12 15:52:43,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5262.0386, 5325.6943, 5322.318, 5359.097, 5249.945, 5298.0464, 5273.5156, 5370.492, 5309.676, 5323.8164]
2025-09-12 15:52:43,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:52:43,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (5309.46) for latency MM1Queue_a033_s075
2025-09-12 15:52:43,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 14 hours, 32 minutes, 53 seconds)
2025-09-12 16:07:06,370 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:07:06,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:11:11,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4502.29590 ± 1475.007
2025-09-12 16:11:11,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5469.1357, 5439.4624, 5468.733, 5532.077, 3100.4678, 5409.8135, 1489.0087, 5311.8804, 5434.5273, 2367.8538]
2025-09-12 16:11:11,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 563.0, 1000.0, 279.0, 990.0, 1000.0, 438.0]
2025-09-12 16:11:11,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 14 hours, 11 minutes, 6 seconds)
2025-09-12 16:26:07,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:26:07,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:30:22,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4656.01318 ± 1291.568
2025-09-12 16:30:24,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [2206.4539, 2175.2778, 5384.4834, 5350.6187, 5470.6543, 5501.768, 5430.507, 5458.8345, 5444.3394, 4137.196]
2025-09-12 16:30:24,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [418.0, 403.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 771.0]
2025-09-12 16:30:24,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 14 hours, 3 minutes, 56 seconds)
2025-09-12 16:44:41,803 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:44:41,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:48:21,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 3979.63037 ± 1964.783
2025-09-12 16:48:21,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [552.453, 2292.0107, 5459.2544, 5444.8525, 5453.3667, 5391.5156, 5394.9263, 655.60516, 5510.165, 3642.1562]
2025-09-12 16:48:21,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 416.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 137.0, 1000.0, 679.0]
2025-09-12 16:48:21,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 13 hours, 38 minutes, 55 seconds)
2025-09-12 17:03:14,824 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:03:14,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:07:49,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5038.98975 ± 1110.600
2025-09-12 17:07:49,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5489.9233, 5475.527, 5490.2188, 5466.221, 5403.033, 5511.2935, 4857.528, 1754.468, 5422.4585, 5519.2314]
2025-09-12 17:07:49,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 880.0, 338.0, 1000.0, 1000.0]
2025-09-12 17:07:49,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 13 hours, 32 minutes, 35 seconds)
2025-09-12 17:22:03,322 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:22:03,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:27:01,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5316.95264 ± 40.284
2025-09-12 17:27:01,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5349.7607, 5322.1904, 5314.4087, 5324.874, 5386.1113, 5340.494, 5308.12, 5308.3633, 5293.1035, 5222.0967]
2025-09-12 17:27:01,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:27:01,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (5316.95) for latency MM1Queue_a033_s075
2025-09-12 17:27:01,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 13 hours, 12 minutes, 12 seconds)
2025-09-12 17:41:14,385 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:41:14,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:46:16,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5354.96729 ± 40.153
2025-09-12 17:46:16,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5404.91, 5322.574, 5358.0146, 5375.0977, 5396.021, 5337.206, 5377.3667, 5259.2466, 5374.7236, 5344.512]
2025-09-12 17:46:16,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:46:16,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (5354.97) for latency MM1Queue_a033_s075
2025-09-12 17:46:16,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 12 hours, 59 minutes, 38 seconds)
2025-09-12 18:00:05,903 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:00:05,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:04:19,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4549.90918 ± 1667.386
2025-09-12 18:04:19,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5390.677, 5361.803, 5365.355, 2408.3853, 5414.6245, 307.04684, 5437.1943, 5032.0747, 5374.7446, 5407.1816]
2025-09-12 18:04:19,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 451.0, 1000.0, 58.0, 1000.0, 957.0, 1000.0, 1000.0]
2025-09-12 18:04:19,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 12 hours, 31 minutes, 19 seconds)
2025-09-12 18:19:15,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:19:15,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:24:11,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5470.18213 ± 21.338
2025-09-12 18:24:11,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5484.466, 5499.9917, 5461.3174, 5478.1313, 5444.022, 5483.293, 5442.9565, 5492.261, 5479.157, 5436.2246]
2025-09-12 18:24:11,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:24:11,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (5470.18) for latency MM1Queue_a033_s075
2025-09-12 18:24:11,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 12 hours, 27 minutes, 34 seconds)
2025-09-12 18:38:53,550 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:38:53,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:43:28,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4926.05469 ± 1502.660
2025-09-12 18:43:28,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5402.3604, 419.002, 5418.2417, 5381.316, 5449.8447, 5381.4297, 5429.081, 5476.1377, 5463.4204, 5439.7114]
2025-09-12 18:43:28,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 77.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:43:28,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 12 hours, 7 minutes, 2 seconds)
2025-09-12 18:57:56,392 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:57:56,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:02:55,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5473.03223 ± 33.172
2025-09-12 19:02:55,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5515.7617, 5422.0483, 5465.557, 5468.914, 5509.366, 5487.331, 5484.08, 5492.7607, 5406.3677, 5478.1323]
2025-09-12 19:02:55,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:02:55,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (5473.03) for latency MM1Queue_a033_s075
2025-09-12 19:02:55,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 11 hours, 49 minutes, 35 seconds)
2025-09-12 19:17:20,293 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:17:20,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:22:22,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5431.99609 ± 28.542
2025-09-12 19:22:22,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5438.444, 5481.523, 5422.8447, 5389.5146, 5424.622, 5462.234, 5424.5894, 5466.797, 5400.3354, 5409.056]
2025-09-12 19:22:22,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:22:22,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 11 hours, 31 minutes, 54 seconds)
2025-09-12 19:36:48,311 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:36:48,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:41:26,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5077.14355 ± 1124.560
2025-09-12 19:41:26,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5443.356, 1704.8755, 5458.1787, 5404.2944, 5416.614, 5457.3438, 5482.2627, 5521.189, 5463.9355, 5419.3867]
2025-09-12 19:41:26,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 320.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:41:26,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 11 hours, 19 minutes, 48 seconds)
2025-09-12 19:55:52,179 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:55:52,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:00:50,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5447.96191 ± 25.106
2025-09-12 20:00:50,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5474.4194, 5444.458, 5434.542, 5445.683, 5435.656, 5409.5327, 5484.847, 5414.4424, 5483.6494, 5452.393]
2025-09-12 20:00:50,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:00:50,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 10 hours, 57 minutes, 7 seconds)
2025-09-12 20:14:10,763 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:14:10,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:19:08,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5473.79150 ± 35.742
2025-09-12 20:19:08,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5507.588, 5447.7607, 5454.681, 5445.897, 5474.8022, 5563.052, 5472.463, 5469.5166, 5470.971, 5431.1836]
2025-09-12 20:19:08,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:19:08,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (5473.79) for latency MM1Queue_a033_s075
2025-09-12 20:19:08,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 10 hours, 31 minutes, 23 seconds)
2025-09-12 20:33:36,038 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:33:36,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:38:32,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5431.23877 ± 36.380
2025-09-12 20:38:32,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5413.0703, 5402.467, 5429.8584, 5452.174, 5474.335, 5424.162, 5363.5996, 5489.547, 5401.97, 5461.2036]
2025-09-12 20:38:32,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:38:32,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 10 hours, 11 minutes, 57 seconds)
2025-09-12 20:52:53,670 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:52:53,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:57:52,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5430.31543 ± 96.024
2025-09-12 20:57:52,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5417.0728, 5457.819, 5474.3516, 5419.592, 5152.1846, 5475.3447, 5451.9966, 5488.134, 5497.4893, 5469.172]
2025-09-12 20:57:52,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 970.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:57:52,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 9 hours, 52 minutes, 7 seconds)
2025-09-12 21:12:17,713 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:12:17,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:16:46,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4886.85986 ± 1528.478
2025-09-12 21:16:46,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5390.514, 5374.791, 5359.084, 5444.9707, 5332.0225, 5384.889, 5440.4927, 5431.6943, 302.5875, 5407.551]
2025-09-12 21:16:46,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 57.0, 1000.0]
2025-09-12 21:16:46,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 9 hours, 31 minutes, 58 seconds)
2025-09-12 21:31:12,441 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:31:12,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:35:53,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5122.85449 ± 1179.326
2025-09-12 21:35:53,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5508.8306, 5474.664, 5496.9087, 1585.4996, 5499.924, 5527.7124, 5517.8184, 5522.9478, 5561.1255, 5533.1177]
2025-09-12 21:35:53,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 295.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:35:53,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 9 hours, 11 minutes, 17 seconds)
2025-09-12 21:50:22,002 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:50:22,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:55:19,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5479.90479 ± 37.070
2025-09-12 21:55:19,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5477.3936, 5389.8564, 5475.2563, 5523.9937, 5480.202, 5467.393, 5503.977, 5533.3066, 5473.8833, 5473.7866]
2025-09-12 21:55:19,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:55:19,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (5479.90) for latency MM1Queue_a033_s075
2025-09-12 21:55:19,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 8 hours, 58 minutes, 35 seconds)
2025-09-12 22:09:47,182 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:09:47,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:14:49,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5484.86963 ± 28.567
2025-09-12 22:14:49,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5498.5044, 5527.641, 5451.532, 5456.293, 5522.139, 5467.045, 5442.735, 5487.1797, 5483.756, 5511.8677]
2025-09-12 22:14:49,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:14:49,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (5484.87) for latency MM1Queue_a033_s075
2025-09-12 22:14:49,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 8 hours, 39 minutes, 56 seconds)
2025-09-12 22:29:17,092 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:29:17,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:34:01,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5125.39746 ± 867.368
2025-09-12 22:34:01,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5392.9087, 5417.6465, 2524.7644, 5387.9355, 5446.82, 5442.1787, 5466.1665, 5362.8716, 5401.4214, 5411.2627]
2025-09-12 22:34:01,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 484.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:34:01,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 8 hours, 19 minutes, 55 seconds)
2025-09-12 22:49:12,783 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:49:12,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:53:46,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4983.00195 ± 1344.567
2025-09-12 22:53:46,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5367.2856, 5446.5767, 5463.8066, 5437.9985, 5383.462, 5494.626, 5379.393, 5454.792, 5451.0625, 951.0174]
2025-09-12 22:53:46,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 179.0]
2025-09-12 22:53:46,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 8 hours, 5 minutes)
2025-09-12 23:08:13,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:08:13,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:13:09,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5404.89453 ± 36.003
2025-09-12 23:13:09,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5364.186, 5420.267, 5405.4946, 5421.8555, 5403.371, 5394.6455, 5345.508, 5486.358, 5419.556, 5387.705]
2025-09-12 23:13:09,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:13:09,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 7 hours, 46 minutes, 54 seconds)
2025-09-12 23:27:35,076 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:27:35,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:32:04,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4936.29297 ± 1513.084
2025-09-12 23:32:05,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5503.703, 5471.0117, 398.1183, 5425.0107, 5440.4087, 5467.215, 5396.15, 5401.769, 5404.2573, 5455.2837]
2025-09-12 23:32:05,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 73.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:32:05,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 7 hours, 25 minutes, 5 seconds)
2025-09-12 23:46:00,620 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:46:00,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:50:29,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4933.41309 ± 1521.290
2025-09-12 23:50:29,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5450.144, 5422.7803, 5443.2563, 5478.055, 5436.3687, 5432.4487, 369.77408, 5447.5347, 5429.277, 5424.4937]
2025-09-12 23:50:29,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 67.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:50:30,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 7 hours, 58 seconds)
2025-09-13 00:04:58,272 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:04:58,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:09:58,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5533.25098 ± 40.028
2025-09-13 00:09:58,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5572.4307, 5542.7383, 5507.647, 5468.7217, 5477.007, 5599.5264, 5542.8413, 5567.9116, 5544.821, 5508.863]
2025-09-13 00:09:58,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:09:58,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (5533.25) for latency MM1Queue_a033_s075
2025-09-13 00:09:58,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 6 hours, 42 minutes, 59 seconds)
2025-09-13 00:24:26,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:24:26,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:29:23,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5495.45166 ± 29.949
2025-09-13 00:29:23,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5473.772, 5505.947, 5498.454, 5537.8916, 5471.8804, 5522.3276, 5497.512, 5466.897, 5442.9077, 5536.9307]
2025-09-13 00:29:23,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:29:24,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 6 hours, 22 minutes, 31 seconds)
2025-09-13 00:43:49,494 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:43:49,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:48:50,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5427.74268 ± 18.771
2025-09-13 00:48:50,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5435.0854, 5390.9653, 5427.7324, 5411.853, 5449.9946, 5414.413, 5413.5586, 5437.028, 5453.4565, 5443.335]
2025-09-13 00:48:50,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:48:50,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 6 hours, 3 minutes, 37 seconds)
2025-09-13 01:03:34,495 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:03:34,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:08:32,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5470.62305 ± 51.775
2025-09-13 01:08:32,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5487.3755, 5396.7344, 5497.649, 5540.758, 5444.455, 5555.183, 5460.078, 5455.673, 5386.265, 5482.0586]
2025-09-13 01:08:32,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:08:32,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 47 minutes, 15 seconds)
2025-09-13 01:23:17,706 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:23:17,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:28:17,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5422.48193 ± 31.146
2025-09-13 01:28:17,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5380.7036, 5395.776, 5460.798, 5434.321, 5392.3877, 5447.8516, 5476.846, 5423.1416, 5388.9272, 5424.072]
2025-09-13 01:28:17,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:28:17,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 5 hours, 32 minutes, 28 seconds)
2025-09-13 01:42:43,554 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:42:43,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:47:42,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5318.21045 ± 32.640
2025-09-13 01:47:42,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5406.5195, 5306.543, 5317.824, 5283.1064, 5286.2153, 5306.5806, 5329.729, 5317.3633, 5305.1514, 5323.073]
2025-09-13 01:47:42,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:47:42,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 5 hours, 12 minutes, 46 seconds)
2025-09-13 02:02:20,878 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:02:20,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:07:17,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5442.63574 ± 47.562
2025-09-13 02:07:17,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5344.84, 5465.487, 5471.1396, 5437.259, 5418.28, 5435.924, 5545.278, 5430.921, 5451.8623, 5425.367]
2025-09-13 02:07:17,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:07:17,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 53 minutes, 40 seconds)
2025-09-13 02:21:43,228 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:21:43,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:26:44,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5548.29199 ± 39.584
2025-09-13 02:26:44,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5549.1523, 5578.1855, 5507.7827, 5557.23, 5512.862, 5590.4097, 5578.55, 5606.3853, 5475.737, 5526.62]
2025-09-13 02:26:44,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:26:44,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1226 [INFO]: New best (5548.29) for latency MM1Queue_a033_s075
2025-09-13 02:26:44,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 34 minutes, 5 seconds)
2025-09-13 02:40:34,596 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:40:34,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:45:07,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5030.76807 ± 1424.505
2025-09-13 02:45:07,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5496.8735, 5506.0747, 757.9743, 5555.624, 5527.9146, 5471.608, 5478.419, 5526.831, 5469.6333, 5516.7275]
2025-09-13 02:45:07,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 144.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:45:07,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 4 hours, 11 minutes, 7 seconds)
2025-09-13 02:59:35,352 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:59:35,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:04:33,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5391.10645 ± 41.054
2025-09-13 03:04:33,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5393.4136, 5382.836, 5354.703, 5417.2627, 5436.1196, 5478.6973, 5391.778, 5353.0674, 5367.4575, 5335.7354]
2025-09-13 03:04:33,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:04:34,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 51 minutes, 4 seconds)
2025-09-13 03:19:02,385 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:19:02,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:24:05,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5407.87305 ± 41.118
2025-09-13 03:24:05,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5395.376, 5377.3945, 5371.4673, 5354.8433, 5478.3677, 5353.469, 5439.5586, 5449.1616, 5422.239, 5436.853]
2025-09-13 03:24:05,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:24:05,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 32 minutes, 2 seconds)
2025-09-13 03:38:32,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:38:32,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:43:08,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4969.10303 ± 1292.432
2025-09-13 03:43:08,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5450.124, 5414.34, 5344.6094, 5382.6787, 5447.883, 5378.0327, 5393.438, 5424.5674, 1093.0745, 5362.283]
2025-09-13 03:43:08,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 217.0, 1000.0]
2025-09-13 03:43:08,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 3 hours, 11 minutes, 40 seconds)
2025-09-13 03:57:38,121 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:57:38,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 04:02:39,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5469.76807 ± 30.020
2025-09-13 04:02:39,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5528.3022, 5445.839, 5447.458, 5446.749, 5443.3735, 5486.4795, 5511.2812, 5477.2983, 5475.689, 5435.2095]
2025-09-13 04:02:39,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:02:39,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 52 minutes, 40 seconds)
2025-09-13 04:17:07,007 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:17:07,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 04:21:40,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4961.19922 ± 1447.851
2025-09-13 04:21:40,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5429.0977, 5441.5693, 5468.989, 5432.434, 5433.7915, 5497.837, 618.12494, 5426.577, 5436.276, 5427.292]
2025-09-13 04:21:40,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 117.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:21:40,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 34 minutes, 28 seconds)
2025-09-13 04:36:12,754 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:36:12,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 04:41:12,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5493.69238 ± 43.483
2025-09-13 04:41:12,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5561.1226, 5476.4014, 5488.1265, 5509.7734, 5469.2354, 5527.154, 5474.0635, 5471.707, 5553.1304, 5406.211]
2025-09-13 04:41:12,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:41:12,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 15 minutes, 17 seconds)
2025-09-13 04:55:40,288 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:55:40,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 05:00:36,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5455.20605 ± 36.248
2025-09-13 05:00:36,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5475.169, 5451.9243, 5473.7827, 5445.701, 5538.9976, 5441.7686, 5437.818, 5467.723, 5424.9556, 5394.2236]
2025-09-13 05:00:36,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:00:36,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 55 minutes, 49 seconds)
2025-09-13 05:15:02,864 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:15:02,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 05:20:01,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5336.79346 ± 35.669
2025-09-13 05:20:01,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5331.5127, 5285.226, 5372.8765, 5335.6567, 5344.5996, 5315.0854, 5300.723, 5389.262, 5302.008, 5390.979]
2025-09-13 05:20:01,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:20:01,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 36 minutes, 53 seconds)
2025-09-13 05:34:28,913 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:34:28,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 05:39:30,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5431.57080 ± 45.834
2025-09-13 05:39:30,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5526.4443, 5481.03, 5433.4507, 5389.8984, 5415.3203, 5437.595, 5399.236, 5429.2905, 5353.3105, 5450.1284]
2025-09-13 05:39:30,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:39:30,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 17 minutes, 28 seconds)
2025-09-13 05:53:58,203 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:53:58,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 05:58:30,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4971.31641 ± 1483.103
2025-09-13 05:58:30,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5476.8853, 5481.8145, 5465.294, 5471.231, 523.4003, 5389.375, 5539.0947, 5488.141, 5448.8486, 5429.0825]
2025-09-13 05:58:30,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 94.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 05:58:30,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 58 minutes, 6 seconds)
2025-09-13 06:12:57,527 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:12:57,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 06:17:11,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4626.78223 ± 1801.304
2025-09-13 06:17:11,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5505.966, 5497.373, 192.95395, 2050.6262, 5526.6196, 5568.7427, 5449.6807, 5467.383, 5499.666, 5508.814]
2025-09-13 06:17:11,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 38.0, 382.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:17:11,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 38 minutes, 23 seconds)
2025-09-13 06:30:40,916 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:30:40,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 06:35:42,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 5401.73193 ± 24.666
2025-09-13 06:35:42,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5366.567, 5423.325, 5383.101, 5359.5396, 5438.9165, 5423.6685, 5412.246, 5391.752, 5415.0684, 5403.1313]
2025-09-13 06:35:42,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 06:35:42,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 19 minutes, 1 second)
2025-09-13 06:50:08,120 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:50:08,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 06:54:20,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 4585.75488 ± 1841.115
2025-09-13 06:54:20,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [5470.7524, 5470.385, 5564.331, 5509.4604, 5493.6606, 5522.6855, 342.14197, 1544.8077, 5463.612, 5475.711]
2025-09-13 06:54:20,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 62.0, 282.0, 1000.0, 1000.0]
2025-09-13 06:54:20,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-humanoid):1251 [DEBUG]: Training session finished
