2025-09-12 00:00:44,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc5-hopper/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 00:00:44,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc5-hopper/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 00:00:44,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14779f15abd0>}
2025-09-12 00:00:44,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1111 [DEBUG]: using device: cuda
2025-09-12 00:00:44,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1133 [INFO]: Creating new trainer
2025-09-12 00:00:44,498 baseline-mbpac-noiseperc5-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-12 00:00:44,498 baseline-mbpac-noiseperc5-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 00:00:44,506 baseline-mbpac-noiseperc5-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 00:00:45,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1194 [DEBUG]: Starting training session...
2025-09-12 00:00:45,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 1/100
2025-09-12 00:10:13,572 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:10:13,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:10:20,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 45.37455 ± 2.513
2025-09-12 00:10:20,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [41.35884, 45.927937, 48.098637, 43.03335, 45.402416, 44.4319, 42.649883, 48.955467, 44.97082, 48.916237]
2025-09-12 00:10:20,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 27.0, 27.0, 25.0, 27.0, 26.0, 25.0, 29.0, 27.0, 29.0]
2025-09-12 00:10:20,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (45.37) for latency MM1Queue_a033_s075
2025-09-12 00:10:20,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 15 hours, 48 minutes, 46 seconds)
2025-09-12 00:21:16,192 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:21:16,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:21:47,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 232.14632 ± 50.258
2025-09-12 00:21:47,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [247.7334, 295.3156, 260.18198, 204.70248, 122.190956, 269.51978, 250.30643, 165.41766, 239.95868, 266.1363]
2025-09-12 00:21:47,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 153.0, 129.0, 103.0, 68.0, 132.0, 124.0, 87.0, 118.0, 129.0]
2025-09-12 00:21:47,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (232.15) for latency MM1Queue_a033_s075
2025-09-12 00:21:47,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 17 hours, 10 minutes, 18 seconds)
2025-09-12 00:32:40,381 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:32:40,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:33:04,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 198.14275 ± 89.148
2025-09-12 00:33:04,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [188.77832, 10.839181, 88.24705, 314.93958, 184.34834, 184.74196, 266.8476, 249.20424, 191.22476, 302.2565]
2025-09-12 00:33:04,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 12.0, 50.0, 143.0, 87.0, 85.0, 121.0, 111.0, 89.0, 132.0]
2025-09-12 00:33:04,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 17 hours, 25 minutes, 4 seconds)
2025-09-12 00:44:04,060 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:44:04,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:44:29,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 222.51822 ± 54.185
2025-09-12 00:44:29,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [250.85127, 234.15881, 237.05861, 234.61327, 61.496094, 249.83344, 239.62154, 238.42012, 251.05154, 228.0775]
2025-09-12 00:44:29,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 103.0, 98.0, 98.0, 40.0, 104.0, 101.0, 101.0, 102.0, 98.0]
2025-09-12 00:44:29,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 17 hours, 29 minutes, 30 seconds)
2025-09-12 00:55:36,533 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:55:36,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:56:14,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 231.17685 ± 146.916
2025-09-12 00:56:14,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [445.92752, 352.6923, 92.553406, 192.2372, 91.797485, 80.366394, 98.33492, 164.95027, 484.17123, 308.7377]
2025-09-12 00:56:14,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [226.0, 168.0, 74.0, 178.0, 67.0, 57.0, 94.0, 81.0, 294.0, 139.0]
2025-09-12 00:56:14,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 17 hours, 33 minutes, 58 seconds)
2025-09-12 01:06:56,051 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:06:56,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:07:38,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 343.92398 ± 273.133
2025-09-12 01:07:38,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [12.600237, 50.057228, 61.542965, 450.527, 240.39192, 520.56635, 253.83946, 405.37677, 478.4163, 965.92145]
2025-09-12 01:07:38,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 35.0, 41.0, 179.0, 105.0, 177.0, 124.0, 151.0, 172.0, 559.0]
2025-09-12 01:07:38,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (343.92) for latency MM1Queue_a033_s075
2025-09-12 01:07:38,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 17 hours, 57 minutes, 7 seconds)
2025-09-12 01:18:31,576 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:18:31,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:19:42,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 487.60294 ± 238.552
2025-09-12 01:19:42,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [955.0249, 425.57904, 728.7881, 673.3575, 156.28381, 497.01843, 574.20013, 245.28363, 397.07306, 223.42035]
2025-09-12 01:19:42,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [575.0, 237.0, 261.0, 269.0, 145.0, 261.0, 352.0, 159.0, 250.0, 135.0]
2025-09-12 01:19:42,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (487.60) for latency MM1Queue_a033_s075
2025-09-12 01:19:42,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 17 hours, 57 minutes, 23 seconds)
2025-09-12 01:30:38,387 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:30:38,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:31:04,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 237.46333 ± 186.199
2025-09-12 01:31:04,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [55.02888, 310.10004, 180.2268, 135.29085, 242.71602, 278.91055, 722.22296, 279.96252, 16.363483, 153.81128]
2025-09-12 01:31:04,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [34.0, 124.0, 85.0, 69.0, 105.0, 112.0, 239.0, 122.0, 16.0, 77.0]
2025-09-12 01:31:04,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 17 hours, 47 minutes, 7 seconds)
2025-09-12 01:42:02,787 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:42:02,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:42:47,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 487.67773 ± 110.392
2025-09-12 01:42:47,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [521.9575, 529.16833, 170.2522, 499.17075, 558.53186, 473.16663, 563.00464, 515.09375, 567.34314, 479.08856]
2025-09-12 01:42:47,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 178.0, 79.0, 166.0, 200.0, 160.0, 179.0, 171.0, 190.0, 164.0]
2025-09-12 01:42:47,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (487.68) for latency MM1Queue_a033_s075
2025-09-12 01:42:47,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 17 hours, 41 minutes, 5 seconds)
2025-09-12 01:53:52,782 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:53:52,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:54:39,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 508.04059 ± 208.802
2025-09-12 01:54:39,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [635.7102, 649.09076, 602.97107, 666.24744, 370.13232, 570.39856, 658.85504, 204.00789, 665.3931, 57.59939]
2025-09-12 01:54:39,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [208.0, 203.0, 196.0, 216.0, 151.0, 193.0, 206.0, 92.0, 226.0, 39.0]
2025-09-12 01:54:39,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (508.04) for latency MM1Queue_a033_s075
2025-09-12 01:54:39,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 31 minutes, 31 seconds)
2025-09-12 02:05:34,065 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:05:34,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:06:26,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 566.34216 ± 252.270
2025-09-12 02:06:26,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [670.07654, 60.537052, 501.63666, 684.9693, 676.96356, 676.514, 840.17596, 114.56415, 691.8265, 746.1579]
2025-09-12 02:06:26,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [234.0, 39.0, 179.0, 226.0, 220.0, 226.0, 293.0, 62.0, 219.0, 246.0]
2025-09-12 02:06:26,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (566.34) for latency MM1Queue_a033_s075
2025-09-12 02:06:26,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 26 minutes, 37 seconds)
2025-09-12 02:17:27,439 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:17:27,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:18:21,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 523.41504 ± 184.067
2025-09-12 02:18:21,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [648.7574, 442.1439, 532.7866, 498.54184, 125.99323, 364.57693, 859.3322, 531.99207, 576.8195, 653.2068]
2025-09-12 02:18:21,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [237.0, 172.0, 197.0, 204.0, 66.0, 148.0, 317.0, 203.0, 208.0, 220.0]
2025-09-12 02:18:21,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 12 minutes, 3 seconds)
2025-09-12 02:29:02,590 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:29:02,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:29:51,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 510.98798 ± 315.136
2025-09-12 02:29:51,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1007.7835, 648.2155, 871.31757, 118.191284, 462.39853, 317.80313, 53.573463, 664.29193, 774.16724, 192.13794]
2025-09-12 02:29:51,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [326.0, 214.0, 311.0, 64.0, 171.0, 139.0, 35.0, 227.0, 254.0, 86.0]
2025-09-12 02:29:51,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 2 minutes, 49 seconds)
2025-09-12 02:41:00,816 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:41:00,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:41:53,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 481.34833 ± 445.333
2025-09-12 02:41:53,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [527.7042, 874.7205, 148.77272, 264.48923, 346.25775, 1490.8762, 888.6332, 86.70742, 31.372482, 153.94969]
2025-09-12 02:41:53,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [234.0, 314.0, 79.0, 133.0, 153.0, 571.0, 330.0, 51.0, 25.0, 79.0]
2025-09-12 02:41:53,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 56 minutes, 33 seconds)
2025-09-12 02:52:54,159 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:52:54,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:54:01,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 689.10168 ± 413.386
2025-09-12 02:54:01,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [59.3026, 1399.4243, 791.38214, 960.7411, 138.8991, 777.50604, 512.0572, 1176.1306, 315.80032, 759.77386]
2025-09-12 02:54:01,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [41.0, 460.0, 278.0, 334.0, 72.0, 264.0, 204.0, 398.0, 135.0, 295.0]
2025-09-12 02:54:01,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (689.10) for latency MM1Queue_a033_s075
2025-09-12 02:54:01,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 16 hours, 49 minutes, 22 seconds)
2025-09-12 03:04:52,384 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:04:52,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:05:54,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 655.70605 ± 423.420
2025-09-12 03:05:54,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [721.07574, 83.17313, 1029.1735, 738.6553, 252.54248, 575.6978, 576.03375, 1053.6991, 1455.4185, 71.59113]
2025-09-12 03:05:54,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [229.0, 50.0, 357.0, 229.0, 117.0, 224.0, 222.0, 341.0, 492.0, 45.0]
2025-09-12 03:05:54,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 39 minutes, 5 seconds)
2025-09-12 03:16:51,533 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:16:51,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:18:09,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 821.15295 ± 490.640
2025-09-12 03:18:09,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [91.784035, 442.41272, 649.0563, 1318.7345, 908.0059, 1578.3284, 1576.0325, 747.88226, 555.8254, 343.46826]
2025-09-12 03:18:09,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [53.0, 166.0, 244.0, 471.0, 309.0, 485.0, 541.0, 238.0, 214.0, 141.0]
2025-09-12 03:18:09,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (821.15) for latency MM1Queue_a033_s075
2025-09-12 03:18:09,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 32 minutes, 45 seconds)
2025-09-12 03:28:58,068 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:28:58,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:29:38,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 390.75943 ± 250.082
2025-09-12 03:29:38,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [141.77864, 395.59106, 823.49756, 138.41397, 295.49014, 93.508965, 777.1188, 435.79218, 581.25885, 225.14415]
2025-09-12 03:29:38,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 163.0, 272.0, 72.0, 130.0, 54.0, 250.0, 174.0, 214.0, 107.0]
2025-09-12 03:29:38,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 20 minutes, 24 seconds)
2025-09-12 03:40:35,151 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:40:35,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:41:43,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 740.20581 ± 519.722
2025-09-12 03:41:43,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [651.44946, 1662.785, 1465.6525, 93.44856, 160.38857, 961.7149, 316.1865, 264.1916, 761.3396, 1064.902]
2025-09-12 03:41:43,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [239.0, 541.0, 454.0, 59.0, 79.0, 347.0, 140.0, 110.0, 236.0, 351.0]
2025-09-12 03:41:43,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 9 minutes, 20 seconds)
2025-09-12 03:52:33,749 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:52:33,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:53:27,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 598.70844 ± 571.407
2025-09-12 03:53:27,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1742.2245, 62.85787, 760.0425, 162.9, 834.11743, 1104.4567, 64.02173, 1130.656, 55.71524, 70.09236]
2025-09-12 03:53:27,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [538.0, 41.0, 237.0, 83.0, 257.0, 336.0, 42.0, 347.0, 40.0, 46.0]
2025-09-12 03:53:27,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 15 hours, 50 minutes, 55 seconds)
2025-09-12 04:04:28,751 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:04:28,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:05:17,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 530.23035 ± 308.183
2025-09-12 04:05:17,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [713.4336, 847.5248, 766.61847, 464.547, 53.827915, 259.45566, 466.38223, 1030.2942, 610.9045, 89.315674]
2025-09-12 04:05:17,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [227.0, 258.0, 235.0, 170.0, 37.0, 110.0, 166.0, 326.0, 203.0, 53.0]
2025-09-12 04:05:17,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 15 hours, 38 minutes, 13 seconds)
2025-09-12 04:16:21,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:16:21,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:17:26,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 716.10828 ± 623.569
2025-09-12 04:17:26,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [49.69969, 596.75665, 868.7825, 2129.777, 475.75443, 1163.2413, 493.3014, 67.473885, 77.85369, 1238.4421]
2025-09-12 04:17:26,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [34.0, 212.0, 269.0, 654.0, 177.0, 390.0, 187.0, 43.0, 47.0, 385.0]
2025-09-12 04:17:26,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 24 minutes, 49 seconds)
2025-09-12 04:28:22,641 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:28:22,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:29:31,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 744.13440 ± 309.372
2025-09-12 04:29:31,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [396.40723, 577.0765, 504.72476, 876.11615, 765.6539, 784.2542, 1103.5076, 1180.4156, 188.21266, 1064.9749]
2025-09-12 04:29:31,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [159.0, 198.0, 191.0, 275.0, 274.0, 252.0, 367.0, 383.0, 86.0, 332.0]
2025-09-12 04:29:31,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 22 minutes, 6 seconds)
2025-09-12 04:40:24,160 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:40:24,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:41:35,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 794.54059 ± 351.546
2025-09-12 04:41:35,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [866.32733, 1365.8998, 742.9997, 448.7778, 1197.7565, 493.53116, 494.48944, 1267.904, 335.3279, 732.3921]
2025-09-12 04:41:35,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [291.0, 416.0, 249.0, 164.0, 376.0, 183.0, 188.0, 397.0, 141.0, 232.0]
2025-09-12 04:41:35,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 9 minutes, 50 seconds)
2025-09-12 04:52:45,981 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:52:45,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:53:44,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 651.53888 ± 275.729
2025-09-12 04:53:44,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [895.9348, 356.305, 150.33095, 815.0715, 896.9212, 781.88, 862.9188, 425.15146, 949.953, 380.92206]
2025-09-12 04:53:44,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [274.0, 140.0, 74.0, 273.0, 284.0, 247.0, 258.0, 161.0, 296.0, 147.0]
2025-09-12 04:53:44,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 4 minutes, 6 seconds)
2025-09-12 05:04:54,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:04:54,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:06:28,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1035.47083 ± 786.174
2025-09-12 05:06:28,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1673.5132, 586.37524, 2102.7847, 176.71085, 113.25638, 2471.0, 775.7296, 657.6297, 380.92096, 1416.7874]
2025-09-12 05:06:28,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [541.0, 219.0, 689.0, 83.0, 59.0, 772.0, 281.0, 234.0, 153.0, 437.0]
2025-09-12 05:06:28,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1035.47) for latency MM1Queue_a033_s075
2025-09-12 05:06:28,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 15 hours, 5 minutes, 27 seconds)
2025-09-12 05:17:00,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:17:00,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:18:15,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 848.13654 ± 451.780
2025-09-12 05:18:15,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [607.2938, 1021.2723, 793.24304, 53.22701, 1242.9291, 972.53357, 1384.2157, 100.06083, 1391.6206, 914.96967]
2025-09-12 05:18:15,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [211.0, 314.0, 255.0, 36.0, 385.0, 337.0, 438.0, 56.0, 418.0, 279.0]
2025-09-12 05:18:15,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 47 minutes, 54 seconds)
2025-09-12 05:29:12,096 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:29:12,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:30:51,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1073.66333 ± 1025.508
2025-09-12 05:30:51,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [111.30659, 632.18744, 992.69653, 110.43253, 171.1473, 398.3021, 3004.1975, 1435.0116, 984.2064, 2897.1445]
2025-09-12 05:30:51,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 239.0, 349.0, 58.0, 81.0, 163.0, 1000.0, 452.0, 295.0, 954.0]
2025-09-12 05:30:51,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1073.66) for latency MM1Queue_a033_s075
2025-09-12 05:30:51,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 43 minutes, 21 seconds)
2025-09-12 05:42:06,743 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:42:06,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:43:15,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 741.96582 ± 593.114
2025-09-12 05:43:15,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [250.7495, 254.87747, 74.04338, 253.56998, 1113.221, 251.69386, 1733.3981, 612.221, 1500.8339, 1375.0498]
2025-09-12 05:43:15,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [111.0, 114.0, 44.0, 116.0, 350.0, 114.0, 542.0, 225.0, 471.0, 453.0]
2025-09-12 05:43:15,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 35 minutes, 36 seconds)
2025-09-12 05:54:03,055 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:54:03,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:55:04,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 692.95776 ± 430.378
2025-09-12 05:55:04,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1274.5588, 242.88646, 110.75517, 828.8717, 1191.756, 941.70337, 817.5452, 411.89572, 1060.0891, 49.51633]
2025-09-12 05:55:04,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [398.0, 108.0, 61.0, 276.0, 360.0, 295.0, 248.0, 163.0, 328.0, 35.0]
2025-09-12 05:55:04,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 18 minutes, 50 seconds)
2025-09-12 06:05:58,991 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:05:58,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:07:16,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 847.08154 ± 740.671
2025-09-12 06:07:16,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [48.025524, 1407.084, 1890.0769, 48.62407, 266.67194, 805.58777, 166.16948, 411.08752, 2109.9304, 1317.5573]
2025-09-12 06:07:16,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [34.0, 449.0, 624.0, 35.0, 120.0, 282.0, 81.0, 163.0, 672.0, 409.0]
2025-09-12 06:07:16,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 13 hours, 59 minutes, 10 seconds)
2025-09-12 06:18:18,601 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:18:18,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:19:17,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 651.17816 ± 677.348
2025-09-12 06:19:17,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2281.3208, 1025.2806, 64.836525, 191.6985, 184.44072, 1041.5945, 199.37144, 127.22418, 1135.3032, 260.71112]
2025-09-12 06:19:17,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [711.0, 308.0, 41.0, 90.0, 88.0, 316.0, 92.0, 67.0, 344.0, 112.0]
2025-09-12 06:19:17,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 49 minutes, 59 seconds)
2025-09-12 06:30:17,299 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:30:17,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:31:30,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 839.75427 ± 483.161
2025-09-12 06:31:30,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [760.9592, 482.88467, 95.42258, 616.3273, 965.34814, 682.32495, 368.52527, 1486.5669, 1717.1375, 1222.0466]
2025-09-12 06:31:30,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [235.0, 179.0, 54.0, 220.0, 301.0, 218.0, 144.0, 448.0, 528.0, 393.0]
2025-09-12 06:31:30,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 32 minutes, 39 seconds)
2025-09-12 06:42:23,652 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:42:23,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:44:09,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1226.55737 ± 735.522
2025-09-12 06:44:09,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2576.9517, 112.33389, 919.314, 1718.5448, 1123.0015, 247.71016, 2196.8933, 915.0147, 1249.6093, 1206.2001]
2025-09-12 06:44:09,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [806.0, 62.0, 278.0, 541.0, 390.0, 109.0, 677.0, 280.0, 383.0, 376.0]
2025-09-12 06:44:09,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1226.56) for latency MM1Queue_a033_s075
2025-09-12 06:44:09,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 23 minutes, 57 seconds)
2025-09-12 06:55:03,213 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:55:03,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:56:15,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 800.87439 ± 499.088
2025-09-12 06:56:15,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [370.6964, 1697.0267, 225.13057, 1019.2594, 166.86243, 650.9387, 781.10205, 919.6783, 579.83264, 1598.2162]
2025-09-12 06:56:15,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [148.0, 515.0, 105.0, 311.0, 79.0, 227.0, 236.0, 327.0, 200.0, 493.0]
2025-09-12 06:56:15,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 15 minutes, 13 seconds)
2025-09-12 07:07:23,251 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:07:23,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:08:45,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 983.23273 ± 331.647
2025-09-12 07:08:45,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [855.37555, 949.615, 1105.4323, 1596.6141, 961.6067, 637.54407, 780.6604, 1504.7994, 475.39902, 965.27936]
2025-09-12 07:08:45,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [256.0, 288.0, 336.0, 487.0, 300.0, 201.0, 236.0, 467.0, 174.0, 287.0]
2025-09-12 07:08:45,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 13 hours, 6 minutes, 52 seconds)
2025-09-12 07:19:40,557 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:19:40,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:21:06,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 985.12384 ± 453.915
2025-09-12 07:21:06,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1462.7451, 1333.0876, 929.5878, 738.66125, 29.319283, 965.30695, 1705.6276, 1191.0612, 920.17365, 575.66846]
2025-09-12 07:21:06,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [450.0, 444.0, 281.0, 266.0, 24.0, 327.0, 531.0, 373.0, 276.0, 223.0]
2025-09-12 07:21:06,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 58 minutes, 58 seconds)
2025-09-12 07:32:10,964 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:32:10,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:34:05,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1229.31921 ± 1019.252
2025-09-12 07:34:05,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2925.5332, 1815.6083, 2858.1155, 1422.7887, 503.30475, 399.7809, 169.28851, 69.01383, 1672.4055, 457.35336]
2025-09-12 07:34:05,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [952.0, 643.0, 1000.0, 428.0, 193.0, 160.0, 79.0, 44.0, 571.0, 175.0]
2025-09-12 07:34:05,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1229.32) for latency MM1Queue_a033_s075
2025-09-12 07:34:05,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 56 minutes, 2 seconds)
2025-09-12 07:45:19,183 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:45:19,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:46:39,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 919.53302 ± 473.916
2025-09-12 07:46:39,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [594.08466, 822.7478, 991.37823, 1082.3213, 852.32697, 1192.0879, 116.84264, 392.8659, 1230.1113, 1920.5634]
2025-09-12 07:46:39,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [213.0, 288.0, 306.0, 329.0, 255.0, 364.0, 63.0, 155.0, 384.0, 593.0]
2025-09-12 07:46:39,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 42 minutes, 27 seconds)
2025-09-12 07:57:50,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:57:50,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:59:10,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 896.84540 ± 384.374
2025-09-12 07:59:10,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1181.2571, 233.79651, 1144.7738, 797.83594, 589.5556, 857.89056, 966.02936, 1698.8667, 970.6769, 527.7718]
2025-09-12 07:59:10,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [368.0, 104.0, 354.0, 282.0, 209.0, 285.0, 319.0, 520.0, 330.0, 198.0]
2025-09-12 07:59:10,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 35 minutes, 5 seconds)
2025-09-12 08:09:39,939 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:09:39,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:11:22,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1118.07104 ± 1006.009
2025-09-12 08:11:22,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [903.6979, 351.4366, 386.53912, 1425.4703, 2639.6863, 1458.8514, 96.1707, 179.35258, 3167.876, 571.63]
2025-09-12 08:11:22,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [279.0, 145.0, 158.0, 500.0, 849.0, 502.0, 52.0, 85.0, 1000.0, 208.0]
2025-09-12 08:11:22,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 19 minutes, 1 second)
2025-09-12 08:22:22,626 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:22:22,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:23:39,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 851.73407 ± 354.401
2025-09-12 08:23:39,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [931.733, 779.2442, 909.00775, 758.21765, 850.30975, 529.6791, 1804.4742, 526.0519, 518.0366, 910.58636]
2025-09-12 08:23:39,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [307.0, 251.0, 286.0, 259.0, 273.0, 194.0, 568.0, 191.0, 188.0, 289.0]
2025-09-12 08:23:39,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 5 minutes, 34 seconds)
2025-09-12 08:34:27,779 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:34:27,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:35:15,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 528.70496 ± 408.838
2025-09-12 08:35:15,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [248.40997, 889.1282, 320.56693, 1208.9587, 208.78241, 170.40247, 1007.04803, 954.6519, 58.636147, 220.46463]
2025-09-12 08:35:15,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 267.0, 131.0, 369.0, 98.0, 84.0, 309.0, 292.0, 39.0, 98.0]
2025-09-12 08:35:15,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 37 minutes, 20 seconds)
2025-09-12 08:46:24,975 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:46:24,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:48:02,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1152.74536 ± 425.452
2025-09-12 08:48:02,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1357.315, 191.76797, 1658.7406, 999.0582, 949.4601, 1687.4678, 922.3089, 1426.9425, 933.57446, 1400.8186]
2025-09-12 08:48:02,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [429.0, 89.0, 506.0, 303.0, 285.0, 516.0, 276.0, 436.0, 278.0, 483.0]
2025-09-12 08:48:02,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 27 minutes, 35 seconds)
2025-09-12 08:58:50,305 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:58:50,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:00:07,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 909.17932 ± 235.509
2025-09-12 09:00:07,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1169.8561, 600.6929, 618.45276, 709.54004, 1307.3497, 1140.1205, 1003.06085, 742.438, 996.3099, 803.97235]
2025-09-12 09:00:07,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [361.0, 204.0, 217.0, 245.0, 427.0, 342.0, 310.0, 230.0, 326.0, 250.0]
2025-09-12 09:00:07,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 10 minutes, 25 seconds)
2025-09-12 09:10:47,779 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:10:47,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:12:54,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1491.51172 ± 733.444
2025-09-12 09:12:54,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1258.0752, 920.50995, 3110.8337, 1497.7363, 1765.6779, 1798.3993, 891.165, 658.0085, 2266.1814, 748.52936]
2025-09-12 09:12:54,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [437.0, 274.0, 1000.0, 515.0, 568.0, 551.0, 315.0, 240.0, 692.0, 260.0]
2025-09-12 09:12:54,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1491.51) for latency MM1Queue_a033_s075
2025-09-12 09:12:54,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 4 minutes, 33 seconds)
2025-09-12 09:23:52,367 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:23:52,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:25:09,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 849.03516 ± 572.480
2025-09-12 09:25:09,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1439.6073, 1518.1692, 43.639194, 165.41211, 1170.5637, 1770.36, 405.38373, 418.0015, 678.77893, 880.43585]
2025-09-12 09:25:09,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [449.0, 523.0, 32.0, 80.0, 404.0, 537.0, 161.0, 166.0, 237.0, 318.0]
2025-09-12 09:25:09,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 51 minutes, 55 seconds)
2025-09-12 09:35:43,697 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:35:43,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:37:12,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1023.40656 ± 598.191
2025-09-12 09:37:12,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [277.9726, 854.1112, 1971.7522, 1720.1588, 771.48157, 486.5061, 1471.9825, 1331.6582, 75.32843, 1273.1144]
2025-09-12 09:37:12,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 256.0, 600.0, 587.0, 233.0, 177.0, 477.0, 472.0, 46.0, 431.0]
2025-09-12 09:37:12,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 44 minutes, 14 seconds)
2025-09-12 09:47:35,122 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:47:35,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:48:48,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 835.00165 ± 740.407
2025-09-12 09:48:48,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [104.85966, 99.89386, 2000.9059, 754.1046, 757.5028, 115.63045, 1005.1542, 2349.5793, 764.80524, 397.58017]
2025-09-12 09:48:48,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [56.0, 54.0, 614.0, 266.0, 233.0, 63.0, 309.0, 722.0, 268.0, 154.0]
2025-09-12 09:48:48,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 19 minutes, 42 seconds)
2025-09-12 09:59:40,385 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:59:40,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:01:53,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1447.26392 ± 1139.122
2025-09-12 10:01:53,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [633.0244, 2946.92, 1271.0043, 1428.9043, 2972.2283, 107.442375, 92.35067, 186.31918, 2967.0278, 1867.418]
2025-09-12 10:01:53,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [240.0, 1000.0, 445.0, 494.0, 1000.0, 59.0, 55.0, 86.0, 1000.0, 645.0]
2025-09-12 10:01:53,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 17 minutes, 40 seconds)
2025-09-12 10:12:41,134 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:12:41,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:14:11,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1003.64276 ± 707.943
2025-09-12 10:14:11,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2098.3452, 1096.4004, 586.8422, 2285.8347, 887.20416, 1229.0166, 580.6403, 1124.0813, 96.13618, 51.92673]
2025-09-12 10:14:11,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [704.0, 384.0, 217.0, 755.0, 292.0, 393.0, 214.0, 339.0, 55.0, 36.0]
2025-09-12 10:14:11,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 29 seconds)
2025-09-12 10:24:40,260 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:24:40,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:27:02,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1659.95972 ± 1142.858
2025-09-12 10:27:02,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [236.10422, 2994.5034, 887.42566, 38.25572, 1805.5858, 2418.222, 3045.4358, 433.88095, 3076.6948, 1663.4906]
2025-09-12 10:27:02,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 1000.0, 269.0, 29.0, 538.0, 731.0, 1000.0, 166.0, 1000.0, 559.0]
2025-09-12 10:27:02,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1659.96) for latency MM1Queue_a033_s075
2025-09-12 10:27:02,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 54 minutes, 5 seconds)
2025-09-12 10:37:39,468 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:37:39,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:39:57,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1551.50256 ± 1153.723
2025-09-12 10:39:57,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1200.4576, 943.9221, 2996.0657, 185.63521, 2944.0493, 1682.6409, 97.7307, 2948.7458, 2413.6296, 102.14703]
2025-09-12 10:39:57,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [374.0, 328.0, 1000.0, 87.0, 1000.0, 577.0, 56.0, 984.0, 806.0, 57.0]
2025-09-12 10:39:57,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 49 minutes, 52 seconds)
2025-09-12 10:50:47,492 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:50:47,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:52:37,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1278.61938 ± 662.493
2025-09-12 10:52:37,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [333.2917, 1442.3054, 1754.7214, 1212.895, 2265.7976, 2179.8445, 1063.426, 1459.6759, 168.28453, 905.9518]
2025-09-12 10:52:37,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 435.0, 580.0, 414.0, 704.0, 656.0, 384.0, 487.0, 82.0, 267.0]
2025-09-12 10:52:37,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 47 minutes, 7 seconds)
2025-09-12 11:03:27,916 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:03:27,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:05:32,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1407.22644 ± 765.048
2025-09-12 11:05:32,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1527.1517, 1702.8925, 505.41837, 1256.7665, 1923.7719, 193.79785, 843.42615, 1422.1423, 3085.3926, 1611.5045]
2025-09-12 11:05:32,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [514.0, 568.0, 195.0, 381.0, 636.0, 90.0, 296.0, 480.0, 1000.0, 528.0]
2025-09-12 11:05:32,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 32 minutes, 51 seconds)
2025-09-12 11:15:50,740 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:15:50,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:17:28,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1126.44165 ± 491.557
2025-09-12 11:17:28,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1467.3265, 1867.4463, 287.32648, 782.8432, 996.3884, 641.01526, 965.7879, 985.1327, 1872.276, 1398.8737]
2025-09-12 11:17:28,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [483.0, 599.0, 119.0, 261.0, 298.0, 226.0, 295.0, 339.0, 596.0, 465.0]
2025-09-12 11:17:28,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 16 minutes, 59 seconds)
2025-09-12 11:28:09,906 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:28:09,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:30:01,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1287.50366 ± 696.577
2025-09-12 11:30:01,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2382.7507, 974.79236, 1862.9382, 572.99054, 700.205, 1473.4354, 2391.3943, 1145.8699, 1115.3489, 255.31265]
2025-09-12 11:30:01,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [752.0, 321.0, 570.0, 210.0, 242.0, 482.0, 734.0, 398.0, 372.0, 111.0]
2025-09-12 11:30:01,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 1 minute, 41 seconds)
2025-09-12 11:40:48,511 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:40:48,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:42:38,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1215.26489 ± 1003.495
2025-09-12 11:42:38,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [343.84406, 743.5121, 2573.0166, 3028.5212, 990.3222, 2208.1008, 151.39351, 607.4535, 1457.5885, 48.89527]
2025-09-12 11:42:38,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [140.0, 275.0, 848.0, 1000.0, 340.0, 735.0, 73.0, 229.0, 491.0, 35.0]
2025-09-12 11:42:38,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 46 minutes, 28 seconds)
2025-09-12 11:53:53,238 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:53:53,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:55:29,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1120.60828 ± 622.094
2025-09-12 11:55:29,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [200.4503, 1178.461, 1944.6478, 1804.1332, 1344.0726, 1501.7697, 321.04547, 1536.4891, 1183.0326, 191.98112]
2025-09-12 11:55:29,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 357.0, 638.0, 578.0, 396.0, 482.0, 129.0, 494.0, 355.0, 89.0]
2025-09-12 11:55:29,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 35 minutes, 28 seconds)
2025-09-12 12:05:42,288 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:05:42,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:07:39,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1353.51990 ± 1035.688
2025-09-12 12:07:39,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1714.6765, 1766.3745, 620.52075, 3065.7363, 3194.3625, 1082.5444, 993.03107, 906.9027, 94.85584, 96.195496]
2025-09-12 12:07:39,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [522.0, 540.0, 229.0, 1000.0, 1000.0, 372.0, 340.0, 310.0, 55.0, 53.0]
2025-09-12 12:07:39,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 16 minutes, 53 seconds)
2025-09-12 12:18:47,806 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:18:47,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:20:34,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1280.89160 ± 453.389
2025-09-12 12:20:34,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [981.9044, 1709.6199, 259.33856, 1498.3413, 1437.491, 1872.3707, 1683.318, 894.9288, 1269.0151, 1202.5879]
2025-09-12 12:20:34,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [298.0, 512.0, 110.0, 476.0, 458.0, 569.0, 508.0, 271.0, 379.0, 409.0]
2025-09-12 12:20:34,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 12 minutes, 4 seconds)
2025-09-12 12:31:28,051 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:31:28,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:33:49,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1575.59753 ± 1066.061
2025-09-12 12:33:49,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1767.1626, 2986.053, 1082.8479, 1583.4623, 415.46573, 3117.8037, 399.73477, 394.92297, 3059.595, 948.9278]
2025-09-12 12:33:49,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [576.0, 1000.0, 378.0, 540.0, 161.0, 1000.0, 154.0, 159.0, 1000.0, 330.0]
2025-09-12 12:33:49,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 4 minutes, 52 seconds)
2025-09-12 12:43:59,611 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:43:59,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:45:30,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1071.36450 ± 785.138
2025-09-12 12:45:30,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1177.0397, 1644.6829, 321.63437, 1813.6334, 838.63477, 49.78468, 513.3639, 88.8113, 1929.7338, 2336.3271]
2025-09-12 12:45:30,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [348.0, 493.0, 135.0, 553.0, 261.0, 35.0, 208.0, 51.0, 581.0, 741.0]
2025-09-12 12:45:30,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 45 minutes, 13 seconds)
2025-09-12 12:56:39,543 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:56:39,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:59:28,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1891.42029 ± 968.104
2025-09-12 12:59:28,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3016.127, 2337.9492, 374.4913, 3117.6006, 2432.3848, 630.21075, 2945.9353, 1749.2119, 1197.5598, 1112.7329]
2025-09-12 12:59:28,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 752.0, 155.0, 1000.0, 815.0, 234.0, 1000.0, 595.0, 425.0, 389.0]
2025-09-12 12:59:28,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1891.42) for latency MM1Queue_a033_s075
2025-09-12 12:59:28,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 40 minutes, 42 seconds)
2025-09-12 13:09:39,869 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:09:39,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:12:27,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1905.00269 ± 1173.960
2025-09-12 13:12:27,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [333.9938, 601.1727, 3113.8572, 3093.1401, 1514.7634, 43.12289, 1700.92, 2508.6384, 3085.4849, 3054.9333]
2025-09-12 13:12:27,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 223.0, 1000.0, 1000.0, 518.0, 32.0, 534.0, 829.0, 1000.0, 1000.0]
2025-09-12 13:12:27,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (1905.00) for latency MM1Queue_a033_s075
2025-09-12 13:12:27,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 33 minutes, 37 seconds)
2025-09-12 13:23:09,501 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:23:09,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:25:27,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1588.11597 ± 1201.661
2025-09-12 13:25:27,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3025.4832, 2005.2191, 1063.1385, 593.8878, 366.56326, 3083.8298, 2775.6902, 95.52517, 2762.932, 108.890816]
2025-09-12 13:25:27,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 633.0, 312.0, 222.0, 153.0, 1000.0, 901.0, 55.0, 894.0, 60.0]
2025-09-12 13:25:27,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 21 minutes, 15 seconds)
2025-09-12 13:36:12,268 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:36:12,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:38:08,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1387.26184 ± 975.005
2025-09-12 13:38:08,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [409.50272, 63.34772, 1991.1782, 3253.6936, 2460.3464, 1697.5802, 456.19608, 1252.4152, 553.30743, 1735.0496]
2025-09-12 13:38:08,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [156.0, 41.0, 600.0, 1000.0, 768.0, 519.0, 172.0, 377.0, 206.0, 563.0]
2025-09-12 13:38:08,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 4 minutes, 29 seconds)
2025-09-12 13:49:41,359 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:49:41,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:51:48,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1480.93530 ± 1155.044
2025-09-12 13:51:48,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2516.0164, 3069.42, 308.77325, 202.09796, 2362.0166, 42.866238, 1093.1084, 423.29868, 3084.9397, 1706.8167]
2025-09-12 13:51:48,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [781.0, 959.0, 129.0, 93.0, 784.0, 32.0, 334.0, 166.0, 1000.0, 526.0]
2025-09-12 13:51:48,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 4 minutes, 20 seconds)
2025-09-12 14:02:06,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:02:06,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:04:16,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1511.69666 ± 1026.151
2025-09-12 14:04:16,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1223.2631, 832.2099, 3109.5676, 132.03558, 779.9224, 560.97565, 2915.9072, 797.8857, 2169.6875, 2595.5117]
2025-09-12 14:04:16,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [391.0, 285.0, 1000.0, 68.0, 274.0, 209.0, 895.0, 277.0, 704.0, 805.0]
2025-09-12 14:04:16,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 41 minutes, 45 seconds)
2025-09-12 14:14:59,443 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:14:59,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:16:57,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1398.70923 ± 715.437
2025-09-12 14:16:57,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [843.4958, 1649.7272, 2924.673, 1939.5425, 867.4317, 1729.2291, 463.4677, 1378.0416, 1677.1144, 514.36975]
2025-09-12 14:16:57,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [257.0, 531.0, 891.0, 633.0, 289.0, 513.0, 172.0, 408.0, 506.0, 189.0]
2025-09-12 14:16:57,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 26 minutes, 58 seconds)
2025-09-12 14:27:36,184 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:27:36,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:29:18,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1203.71216 ± 755.574
2025-09-12 14:29:18,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [419.29974, 723.0926, 1230.1464, 2774.5618, 1025.9011, 1059.7231, 2089.248, 1023.20496, 49.32315, 1642.6215]
2025-09-12 14:29:18,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [165.0, 255.0, 370.0, 889.0, 303.0, 340.0, 618.0, 348.0, 35.0, 512.0]
2025-09-12 14:29:18,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 10 minutes, 19 seconds)
2025-09-12 14:40:02,899 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:40:02,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:41:55,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1334.63098 ± 769.150
2025-09-12 14:41:55,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2819.7222, 977.2975, 421.58105, 2012.1862, 2499.4448, 1174.3263, 805.2485, 911.88763, 827.9311, 896.6856]
2025-09-12 14:41:55,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [861.0, 327.0, 159.0, 594.0, 797.0, 356.0, 271.0, 274.0, 253.0, 311.0]
2025-09-12 14:41:55,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 57 minutes, 7 seconds)
2025-09-12 14:52:38,941 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:52:38,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:54:04,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1055.39404 ± 632.177
2025-09-12 14:54:04,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2229.1829, 92.176735, 1298.9857, 1499.4528, 1112.171, 1149.697, 1490.0969, 84.62827, 1080.9135, 516.6364]
2025-09-12 14:54:04,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [670.0, 54.0, 399.0, 451.0, 326.0, 341.0, 447.0, 52.0, 319.0, 178.0]
2025-09-12 14:54:04,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 36 minutes, 13 seconds)
2025-09-12 15:05:11,977 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:05:11,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:06:39,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1030.20911 ± 601.410
2025-09-12 15:06:39,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [997.4422, 358.2406, 1864.1708, 1643.587, 1244.868, 166.95126, 1473.4219, 95.83409, 952.2473, 1505.3285]
2025-09-12 15:06:39,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [321.0, 143.0, 557.0, 530.0, 419.0, 79.0, 441.0, 55.0, 288.0, 452.0]
2025-09-12 15:06:39,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 24 minutes, 26 seconds)
2025-09-12 15:16:41,940 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:16:41,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:18:02,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 992.07605 ± 612.850
2025-09-12 15:18:02,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [867.4219, 326.18607, 918.51074, 959.71686, 841.1895, 2652.771, 928.8645, 348.19415, 830.63025, 1247.2765]
2025-09-12 15:18:02,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [256.0, 128.0, 280.0, 291.0, 256.0, 817.0, 282.0, 138.0, 249.0, 382.0]
2025-09-12 15:18:02,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 5 minutes, 29 seconds)
2025-09-12 15:29:01,139 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:29:01,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:31:29,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1683.88049 ± 1209.194
2025-09-12 15:31:29,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3018.102, 1778.4545, 613.00085, 1232.1011, 3058.97, 3074.8074, 112.835144, 3094.89, 283.8257, 571.818]
2025-09-12 15:31:29,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 584.0, 223.0, 422.0, 1000.0, 1000.0, 59.0, 1000.0, 120.0, 215.0]
2025-09-12 15:31:29,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 58 minutes, 29 seconds)
2025-09-12 15:42:24,449 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:42:24,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:46:05,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2637.74097 ± 714.006
2025-09-12 15:46:05,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3066.4473, 3159.2915, 3090.6865, 1812.1663, 3055.9788, 3147.878, 3113.8396, 2172.6023, 950.08435, 2808.4355]
2025-09-12 15:46:05,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 948.0, 1000.0, 589.0, 1000.0, 1000.0, 1000.0, 662.0, 291.0, 851.0]
2025-09-12 15:46:05,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1226 [INFO]: New best (2637.74) for latency MM1Queue_a033_s075
2025-09-12 15:46:05,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 55 minutes, 9 seconds)
2025-09-12 15:57:10,924 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:57:10,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:58:57,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1272.47253 ± 984.942
2025-09-12 15:58:57,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [273.80222, 1581.493, 990.97, 375.15045, 2614.5442, 768.2626, 2055.177, 3153.427, 152.29439, 759.6057]
2025-09-12 15:58:57,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 484.0, 296.0, 149.0, 793.0, 233.0, 619.0, 1000.0, 76.0, 245.0]
2025-09-12 15:58:57,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 45 minutes, 30 seconds)
2025-09-12 16:09:23,378 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:09:23,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:11:13,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1364.40369 ± 632.877
2025-09-12 16:11:13,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1739.7072, 1724.6771, 440.9601, 2209.7163, 1940.8684, 82.36921, 1684.8997, 1090.0514, 1283.3363, 1447.4514]
2025-09-12 16:11:13,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [534.0, 514.0, 165.0, 661.0, 587.0, 49.0, 529.0, 321.0, 386.0, 433.0]
2025-09-12 16:11:13,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 31 minutes, 7 seconds)
2025-09-12 16:21:26,718 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:21:26,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:22:55,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1035.32104 ± 605.429
2025-09-12 16:22:55,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1541.7816, 186.26451, 123.20604, 728.92847, 2088.3127, 1792.2734, 800.76483, 1151.4973, 928.7604, 1011.4207]
2025-09-12 16:22:55,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [483.0, 84.0, 63.0, 252.0, 628.0, 568.0, 277.0, 385.0, 306.0, 304.0]
2025-09-12 16:22:55,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 19 minutes, 30 seconds)
2025-09-12 16:33:45,362 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:33:45,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:36:08,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1711.00659 ± 824.808
2025-09-12 16:36:08,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [362.78528, 1278.7999, 1398.0569, 2915.1194, 1582.9722, 2766.4812, 2766.8374, 1977.0253, 960.4569, 1101.5304]
2025-09-12 16:36:08,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 388.0, 415.0, 929.0, 484.0, 861.0, 860.0, 633.0, 292.0, 332.0]
2025-09-12 16:36:08,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 5 minutes, 38 seconds)
2025-09-12 16:46:55,868 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:46:55,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:48:39,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1251.54468 ± 739.852
2025-09-12 16:48:39,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [854.1577, 2694.4426, 895.3368, 2642.5383, 876.6532, 860.4242, 613.7477, 571.3305, 1321.207, 1185.6095]
2025-09-12 16:48:39,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [295.0, 815.0, 278.0, 825.0, 300.0, 258.0, 224.0, 205.0, 398.0, 353.0]
2025-09-12 16:48:39,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 45 minutes, 16 seconds)
2025-09-12 16:59:33,066 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:59:33,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:01:37,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1524.56189 ± 889.378
2025-09-12 17:01:37,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2876.125, 1870.7876, 2341.1887, 1210.553, 384.1843, 961.57605, 2603.5864, 1172.7313, 1790.7922, 34.094696]
2025-09-12 17:01:37,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [889.0, 562.0, 714.0, 373.0, 152.0, 292.0, 812.0, 367.0, 544.0, 25.0]
2025-09-12 17:01:37,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 33 minutes, 3 seconds)
2025-09-12 17:12:18,086 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:12:18,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:14:30,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1531.86060 ± 1085.592
2025-09-12 17:14:30,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [141.95833, 2022.6301, 374.0877, 1321.3024, 586.2015, 1538.443, 3164.1208, 3182.8225, 2478.5146, 508.52423]
2025-09-12 17:14:30,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 665.0, 148.0, 395.0, 205.0, 492.0, 1000.0, 1000.0, 805.0, 184.0]
2025-09-12 17:14:30,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 22 minutes, 32 seconds)
2025-09-12 17:25:22,415 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:25:22,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:26:37,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 837.61328 ± 405.247
2025-09-12 17:26:37,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [120.63638, 909.0113, 958.63763, 1245.6163, 1035.6167, 705.3839, 421.91727, 1287.2046, 340.9727, 1351.1361]
2025-09-12 17:26:37,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [62.0, 307.0, 306.0, 422.0, 341.0, 246.0, 166.0, 384.0, 141.0, 456.0]
2025-09-12 17:26:37,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 11 minutes, 6 seconds)
2025-09-12 17:37:16,003 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:37:16,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:38:52,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1097.58923 ± 680.600
2025-09-12 17:38:52,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [64.77151, 2022.1777, 1642.0198, 1403.1002, 1265.3723, 1366.4558, 650.07855, 187.5798, 420.36374, 1953.9731]
2025-09-12 17:38:52,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [39.0, 623.0, 531.0, 478.0, 385.0, 450.0, 235.0, 86.0, 165.0, 645.0]
2025-09-12 17:38:52,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 55 minutes, 38 seconds)
2025-09-12 17:49:26,511 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:49:26,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:51:58,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1781.66479 ± 1323.461
2025-09-12 17:51:58,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [78.72278, 3152.506, 3178.1511, 66.220535, 892.3869, 3172.3796, 3151.6394, 2260.462, 1817.7543, 46.424995]
2025-09-12 17:51:58,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [49.0, 1000.0, 1000.0, 42.0, 312.0, 1000.0, 1000.0, 730.0, 562.0, 33.0]
2025-09-12 17:51:58,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 44 minutes, 36 seconds)
2025-09-12 18:02:47,604 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:02:47,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:04:26,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1151.76538 ± 506.095
2025-09-12 18:04:26,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2444.9011, 931.72205, 822.14996, 893.6225, 1148.8848, 498.41122, 1505.7472, 831.04535, 1241.8601, 1199.3096]
2025-09-12 18:04:26,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [799.0, 286.0, 286.0, 302.0, 391.0, 189.0, 453.0, 250.0, 383.0, 404.0]
2025-09-12 18:04:26,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 30 minutes, 45 seconds)
2025-09-12 18:15:47,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:15:47,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:17:54,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1512.93457 ± 879.926
2025-09-12 18:17:54,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3187.9414, 3095.0718, 774.96, 1348.5581, 484.642, 1775.6547, 1062.999, 915.88574, 1119.4806, 1364.1522]
2025-09-12 18:17:54,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 239.0, 414.0, 184.0, 582.0, 318.0, 283.0, 344.0, 418.0]
2025-09-12 18:17:54,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 19 minutes, 27 seconds)
2025-09-12 18:28:16,451 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:28:16,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:31:12,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2063.34961 ± 1056.887
2025-09-12 18:31:12,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [532.37494, 3022.9639, 3171.5518, 2273.5266, 1266.1276, 3073.2195, 1464.7291, 228.28337, 2512.7039, 3088.0166]
2025-09-12 18:31:12,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [198.0, 979.0, 1000.0, 721.0, 422.0, 1000.0, 439.0, 101.0, 833.0, 1000.0]
2025-09-12 18:31:13,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 9 minutes, 10 seconds)
2025-09-12 18:42:02,410 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:42:02,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:43:43,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1166.26379 ± 1054.603
2025-09-12 18:43:43,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1783.2745, 3117.6372, 873.7439, 2649.4507, 80.89742, 267.66415, 493.3953, 347.19556, 1890.3347, 159.04407]
2025-09-12 18:43:43,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [545.0, 1000.0, 304.0, 808.0, 48.0, 115.0, 183.0, 143.0, 573.0, 78.0]
2025-09-12 18:43:43,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 56 minutes, 44 seconds)
2025-09-12 18:54:58,239 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:54:58,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:57:54,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2058.05542 ± 1133.737
2025-09-12 18:57:54,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1146.8435, 3139.816, 1551.1272, 172.83507, 3110.737, 603.52905, 1460.2972, 3145.962, 3093.594, 3155.8145]
2025-09-12 18:57:54,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [399.0, 1000.0, 516.0, 81.0, 1000.0, 209.0, 489.0, 1000.0, 955.0, 1000.0]
2025-09-12 18:57:54,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 45 minutes, 29 seconds)
2025-09-12 19:07:55,126 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:07:55,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:09:23,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1024.41931 ± 585.401
2025-09-12 19:09:23,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [138.56337, 1505.0219, 862.49054, 1768.5715, 1231.8684, 251.21942, 745.3066, 1998.8221, 630.30676, 1112.0228]
2025-09-12 19:09:23,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [69.0, 469.0, 290.0, 550.0, 378.0, 106.0, 259.0, 651.0, 201.0, 337.0]
2025-09-12 19:09:23,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 30 minutes, 55 seconds)
2025-09-12 19:20:30,492 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:20:30,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:23:49,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2316.38013 ± 1053.587
2025-09-12 19:23:49,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2984.2861, 897.88336, 3091.1025, 3090.7024, 140.5942, 2662.8982, 2787.2927, 3150.299, 3083.1428, 1275.5991]
2025-09-12 19:23:49,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [940.0, 311.0, 1000.0, 1000.0, 70.0, 819.0, 898.0, 1000.0, 1000.0, 428.0]
2025-09-12 19:23:49,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 19 minutes, 6 seconds)
2025-09-12 19:34:24,942 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:34:24,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:37:22,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2069.23560 ± 976.944
2025-09-12 19:37:22,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [3134.9038, 2793.0264, 896.00977, 1081.0309, 3208.633, 3123.426, 873.93555, 923.14185, 1979.2883, 2678.9597]
2025-09-12 19:37:22,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 917.0, 302.0, 331.0, 1000.0, 1000.0, 295.0, 278.0, 647.0, 852.0]
2025-09-12 19:37:22,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 6 minutes, 9 seconds)
2025-09-12 19:48:24,654 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:48:24,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:51:23,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2130.17480 ± 1123.847
2025-09-12 19:51:23,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1236.496, 3162.813, 1040.7013, 2001.9484, 3170.7815, 189.51802, 952.8002, 3144.0566, 3156.3088, 3246.3235]
2025-09-12 19:51:23,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [376.0, 1000.0, 354.0, 650.0, 1000.0, 86.0, 328.0, 1000.0, 1000.0, 998.0]
2025-09-12 19:51:23,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 54 minutes, 8 seconds)
2025-09-12 20:02:28,983 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:02:28,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:04:21,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1271.02966 ± 935.635
2025-09-12 20:04:21,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2066.9563, 420.47137, 423.92432, 1938.9015, 2109.2258, 291.78958, 173.78317, 885.1762, 1335.9962, 3064.072]
2025-09-12 20:04:21,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [685.0, 159.0, 169.0, 631.0, 684.0, 123.0, 84.0, 275.0, 451.0, 1000.0]
2025-09-12 20:04:21,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 39 minutes, 52 seconds)
2025-09-12 20:15:08,858 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:15:08,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:16:42,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1062.72961 ± 989.902
2025-09-12 20:16:42,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1224.0182, 110.359215, 1809.3933, 100.5405, 856.83685, 3128.7637, 68.03603, 2284.2798, 465.0219, 580.0468]
2025-09-12 20:16:42,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [418.0, 61.0, 561.0, 57.0, 262.0, 1000.0, 43.0, 706.0, 174.0, 211.0]
2025-09-12 20:16:42,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 26 minutes, 55 seconds)
2025-09-12 20:27:38,424 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:27:38,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:29:15,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1152.89514 ± 610.645
2025-09-12 20:29:15,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2162.3762, 530.0294, 162.32132, 1360.397, 1141.2015, 1220.7277, 1359.8469, 1886.9095, 337.38348, 1367.7583]
2025-09-12 20:29:15,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [661.0, 187.0, 77.0, 407.0, 341.0, 395.0, 409.0, 616.0, 147.0, 412.0]
2025-09-12 20:29:15,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 13 minutes, 5 seconds)
2025-09-12 20:39:07,490 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:39:07,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:42:07,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 2120.44019 ± 1052.481
2025-09-12 20:42:07,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1222 [DEBUG]: All rewards: [792.8371, 704.6636, 3179.7258, 2803.274, 3196.359, 1160.7344, 918.45557, 2110.73, 3178.624, 3159.0002]
2025-09-12 20:42:07,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [272.0, 252.0, 1000.0, 879.0, 1000.0, 387.0, 316.0, 680.0, 1000.0, 1000.0]
2025-09-12 20:42:07,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-hopper):1251 [DEBUG]: Training session finished
