2025-09-12 00:36:10,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc25-hopper/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 00:36:10,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc25-hopper/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 00:36:10,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x15444761f750>}
2025-09-12 00:36:10,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1111 [DEBUG]: using device: cuda
2025-09-12 00:36:10,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1133 [INFO]: Creating new trainer
2025-09-12 00:36:10,442 baseline-mbpac-noiseperc25-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-12 00:36:10,442 baseline-mbpac-noiseperc25-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 00:36:10,449 baseline-mbpac-noiseperc25-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 00:36:11,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1194 [DEBUG]: Starting training session...
2025-09-12 00:36:11,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 1/100
2025-09-12 00:46:27,700 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:46:27,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:46:39,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 71.55099 ± 21.699
2025-09-12 00:46:39,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [104.419014, 78.424545, 80.71752, 70.42989, 48.23941, 47.32896, 49.268703, 111.8964, 56.326942, 68.45849]
2025-09-12 00:46:39,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 46.0, 48.0, 43.0, 30.0, 29.0, 30.0, 64.0, 34.0, 42.0]
2025-09-12 00:46:39,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (71.55) for latency MM1Queue_a033_s075
2025-09-12 00:46:39,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 17 hours, 16 minutes, 36 seconds)
2025-09-12 00:58:28,534 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:58:28,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:58:45,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 117.42892 ± 73.771
2025-09-12 00:58:45,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [216.42474, 175.03539, 146.94035, 39.70411, 234.60063, 67.86255, 48.29017, 32.97058, 51.273018, 161.18759]
2025-09-12 00:58:45,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 87.0, 78.0, 26.0, 113.0, 40.0, 30.0, 23.0, 32.0, 91.0]
2025-09-12 00:58:45,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (117.43) for latency MM1Queue_a033_s075
2025-09-12 00:58:46,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 18 hours, 26 minutes, 16 seconds)
2025-09-12 01:10:34,842 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:10:34,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:10:58,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 155.79391 ± 82.368
2025-09-12 01:10:58,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [187.08058, 161.45374, 122.760704, 288.24454, 22.335432, 176.5343, 66.47632, 174.41257, 78.01259, 280.62848]
2025-09-12 01:10:58,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 86.0, 67.0, 137.0, 18.0, 100.0, 42.0, 97.0, 48.0, 136.0]
2025-09-12 01:10:58,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (155.79) for latency MM1Queue_a033_s075
2025-09-12 01:10:58,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 44 minutes, 29 seconds)
2025-09-12 01:22:37,209 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:22:37,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:23:12,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 188.95401 ± 114.339
2025-09-12 01:23:12,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [72.29265, 291.5204, 100.985825, 360.1256, 25.015657, 235.78517, 268.54428, 30.744942, 228.63383, 275.89163]
2025-09-12 01:23:12,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [43.0, 157.0, 54.0, 220.0, 24.0, 124.0, 247.0, 24.0, 130.0, 212.0]
2025-09-12 01:23:12,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (188.95) for latency MM1Queue_a033_s075
2025-09-12 01:23:12,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 48 minutes, 24 seconds)
2025-09-12 01:35:02,748 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:35:02,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:35:52,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 264.80637 ± 115.886
2025-09-12 01:35:52,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [443.81613, 314.0661, 150.91023, 293.02866, 297.05072, 368.77545, 104.12749, 60.527397, 269.4826, 346.27905]
2025-09-12 01:35:52,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [398.0, 164.0, 92.0, 146.0, 165.0, 234.0, 63.0, 41.0, 240.0, 202.0]
2025-09-12 01:35:52,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (264.81) for latency MM1Queue_a033_s075
2025-09-12 01:35:52,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 54 minutes, 5 seconds)
2025-09-12 01:47:37,544 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:47:37,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:48:07,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 207.01396 ± 94.557
2025-09-12 01:48:07,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [94.05225, 80.64254, 335.39313, 87.04515, 141.82547, 319.46667, 297.21378, 218.84242, 273.35715, 222.30106]
2025-09-12 01:48:07,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 51.0, 186.0, 50.0, 86.0, 149.0, 137.0, 116.0, 127.0, 103.0]
2025-09-12 01:48:07,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 19 hours, 15 minutes, 28 seconds)
2025-09-12 02:00:02,808 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:00:02,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:00:35,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 241.05807 ± 92.013
2025-09-12 02:00:35,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [158.70216, 115.71992, 325.5194, 256.7589, 345.02615, 92.05656, 243.34724, 193.6888, 340.2091, 339.5525]
2025-09-12 02:00:35,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [111.0, 61.0, 165.0, 111.0, 143.0, 54.0, 138.0, 97.0, 144.0, 145.0]
2025-09-12 02:00:35,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 19 hours, 9 minutes, 57 seconds)
2025-09-12 02:12:18,902 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:12:18,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:12:57,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 280.70905 ± 84.100
2025-09-12 02:12:57,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [418.2898, 336.6192, 285.57504, 323.16052, 253.52496, 318.4017, 86.60042, 299.79633, 285.78052, 199.34195]
2025-09-12 02:12:57,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [222.0, 143.0, 136.0, 146.0, 143.0, 145.0, 61.0, 137.0, 127.0, 98.0]
2025-09-12 02:12:57,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (280.71) for latency MM1Queue_a033_s075
2025-09-12 02:12:57,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 19 hours, 28 seconds)
2025-09-12 02:24:52,859 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:24:52,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:25:37,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 338.03778 ± 259.682
2025-09-12 02:25:37,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [510.3343, 889.84216, 166.40276, 52.923977, 97.50707, 200.67958, 581.80145, 395.07574, 432.54904, 53.26183]
2025-09-12 02:25:37,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [222.0, 352.0, 83.0, 38.0, 54.0, 106.0, 241.0, 193.0, 247.0, 36.0]
2025-09-12 02:25:37,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (338.04) for latency MM1Queue_a033_s075
2025-09-12 02:25:37,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 55 minutes, 56 seconds)
2025-09-12 02:37:33,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:37:33,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:38:11,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 322.95697 ± 199.533
2025-09-12 02:38:11,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [672.73236, 412.68597, 109.04812, 400.54498, 120.9094, 186.2664, 307.56863, 311.86282, 76.33268, 631.6185]
2025-09-12 02:38:11,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [257.0, 164.0, 58.0, 164.0, 73.0, 89.0, 129.0, 135.0, 47.0, 232.0]
2025-09-12 02:38:11,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 18 hours, 41 minutes, 41 seconds)
2025-09-12 02:49:46,273 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:49:46,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:50:33,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 407.40930 ± 391.868
2025-09-12 02:50:33,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [554.77466, 205.96454, 66.282265, 103.97965, 1170.7018, 128.50511, 365.76, 125.23702, 241.50165, 1111.3862]
2025-09-12 02:50:33,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [242.0, 100.0, 40.0, 58.0, 397.0, 67.0, 164.0, 73.0, 112.0, 404.0]
2025-09-12 02:50:33,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (407.41) for latency MM1Queue_a033_s075
2025-09-12 02:50:33,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 18 hours, 31 minutes, 16 seconds)
2025-09-12 03:02:29,296 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:02:29,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:03:09,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 375.43088 ± 301.402
2025-09-12 03:03:09,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [196.82726, 576.64215, 685.8721, 30.113611, 742.7265, 26.682453, 295.8766, 129.07327, 184.68105, 885.8138]
2025-09-12 03:03:09,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [92.0, 206.0, 239.0, 24.0, 252.0, 21.0, 120.0, 72.0, 97.0, 290.0]
2025-09-12 03:03:09,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 18 hours, 21 minutes, 2 seconds)
2025-09-12 03:14:49,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:14:49,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:15:29,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 346.63425 ± 286.481
2025-09-12 03:15:29,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [151.35812, 123.99821, 319.92554, 33.060577, 283.73218, 1057.9243, 484.23026, 109.80285, 567.79944, 334.51096]
2025-09-12 03:15:29,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 65.0, 141.0, 25.0, 139.0, 358.0, 187.0, 60.0, 203.0, 145.0]
2025-09-12 03:15:29,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 18 hours, 8 minutes, 10 seconds)
2025-09-12 03:27:29,470 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:27:29,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:28:17,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 372.84561 ± 313.199
2025-09-12 03:28:17,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [266.07516, 49.405933, 635.7212, 173.0278, 1186.8112, 211.38324, 160.32356, 423.15067, 233.80458, 388.75262]
2025-09-12 03:28:17,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [151.0, 37.0, 260.0, 121.0, 406.0, 98.0, 86.0, 193.0, 124.0, 210.0]
2025-09-12 03:28:17,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 58 minutes, 2 seconds)
2025-09-12 03:40:05,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:40:05,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:40:53,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 410.65973 ± 186.294
2025-09-12 03:40:53,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [668.73456, 362.38965, 398.85947, 556.67926, 60.375893, 244.19058, 682.4655, 516.8522, 304.4246, 311.62576]
2025-09-12 03:40:53,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [271.0, 172.0, 176.0, 225.0, 52.0, 118.0, 237.0, 211.0, 136.0, 134.0]
2025-09-12 03:40:53,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (410.66) for latency MM1Queue_a033_s075
2025-09-12 03:40:53,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 17 hours, 45 minutes, 46 seconds)
2025-09-12 03:52:32,034 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:52:32,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:53:19,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 397.76923 ± 261.025
2025-09-12 03:53:19,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [110.15424, 632.04614, 667.7395, 166.15755, 193.27043, 708.2122, 243.01398, 639.7639, 609.18225, 8.152096]
2025-09-12 03:53:19,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 222.0, 232.0, 86.0, 96.0, 297.0, 121.0, 266.0, 258.0, 10.0]
2025-09-12 03:53:19,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 17 hours, 34 minutes, 30 seconds)
2025-09-12 04:06:02,613 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:06:02,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:06:54,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 457.35406 ± 371.836
2025-09-12 04:06:54,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1056.955, 275.75128, 343.31833, 136.29623, 597.9771, 163.25641, 64.046776, 431.77286, 283.7399, 1220.4266]
2025-09-12 04:06:54,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [376.0, 122.0, 149.0, 70.0, 227.0, 81.0, 43.0, 171.0, 124.0, 442.0]
2025-09-12 04:06:54,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (457.35) for latency MM1Queue_a033_s075
2025-09-12 04:06:54,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 17 hours, 38 minutes, 17 seconds)
2025-09-12 04:18:05,477 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:18:05,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:19:07,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 563.99231 ± 443.631
2025-09-12 04:19:07,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [181.06444, 94.23135, 854.5553, 1208.3264, 303.8144, 350.78723, 167.08766, 194.18112, 1251.0747, 1034.8003]
2025-09-12 04:19:07,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 52.0, 315.0, 459.0, 125.0, 158.0, 93.0, 92.0, 426.0, 357.0]
2025-09-12 04:19:07,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (563.99) for latency MM1Queue_a033_s075
2025-09-12 04:19:07,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 17 hours, 23 minutes, 35 seconds)
2025-09-12 04:30:47,286 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:30:47,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:31:37,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 462.40918 ± 384.915
2025-09-12 04:31:37,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [159.93301, 254.27124, 631.7021, 128.37177, 200.31708, 1352.7264, 156.09108, 217.91423, 681.36584, 841.3992]
2025-09-12 04:31:37,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 108.0, 239.0, 66.0, 90.0, 468.0, 82.0, 108.0, 230.0, 277.0]
2025-09-12 04:31:37,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 17 hours, 5 minutes, 49 seconds)
2025-09-12 04:43:22,631 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:43:22,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:44:33,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 679.76959 ± 254.998
2025-09-12 04:44:33,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [193.64153, 754.61884, 887.661, 857.9029, 484.2264, 328.54187, 777.878, 703.1073, 731.72565, 1078.392]
2025-09-12 04:44:33,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 255.0, 333.0, 298.0, 180.0, 140.0, 269.0, 290.0, 259.0, 406.0]
2025-09-12 04:44:33,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (679.77) for latency MM1Queue_a033_s075
2025-09-12 04:44:33,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 58 minutes, 44 seconds)
2025-09-12 04:56:33,458 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:56:33,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:57:27,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 526.53137 ± 312.254
2025-09-12 04:57:27,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [68.62484, 645.3401, 769.7335, 655.10614, 959.60986, 461.5548, 964.97107, 319.3949, 366.81247, 54.166355]
2025-09-12 04:57:27,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [45.0, 213.0, 269.0, 209.0, 358.0, 166.0, 327.0, 131.0, 147.0, 39.0]
2025-09-12 04:57:27,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 53 minutes, 15 seconds)
2025-09-12 05:09:17,632 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:09:17,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:10:51,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 881.92706 ± 418.141
2025-09-12 05:10:51,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [501.16815, 929.37036, 1492.3914, 150.34572, 726.88464, 682.9842, 705.362, 789.84705, 1349.705, 1491.2122]
2025-09-12 05:10:51,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 364.0, 593.0, 76.0, 272.0, 271.0, 249.0, 262.0, 494.0, 582.0]
2025-09-12 05:10:51,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (881.93) for latency MM1Queue_a033_s075
2025-09-12 05:10:51,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 16 hours, 37 minutes, 46 seconds)
2025-09-12 05:22:32,474 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:22:32,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:23:40,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 655.21735 ± 366.349
2025-09-12 05:23:40,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [74.367836, 897.94073, 228.1984, 826.0755, 873.79565, 1009.4896, 1052.776, 878.21515, 650.39386, 60.920242]
2025-09-12 05:23:40,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [48.0, 329.0, 100.0, 301.0, 291.0, 388.0, 365.0, 315.0, 217.0, 42.0]
2025-09-12 05:23:40,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 16 hours, 34 minutes, 12 seconds)
2025-09-12 05:35:31,449 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:35:31,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:36:28,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 541.05774 ± 404.319
2025-09-12 05:36:28,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1361.3466, 25.562998, 106.285965, 175.875, 153.07333, 822.53455, 732.10016, 860.81824, 610.0532, 562.9271]
2025-09-12 05:36:28,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [510.0, 21.0, 58.0, 81.0, 76.0, 258.0, 268.0, 294.0, 224.0, 197.0]
2025-09-12 05:36:28,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 16 hours, 25 minutes, 46 seconds)
2025-09-12 05:48:19,151 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:48:19,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:49:23,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 584.65839 ± 511.128
2025-09-12 05:49:23,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [695.63226, 868.55133, 145.83182, 1437.8458, 300.59976, 133.0653, 381.60455, 1546.9231, 184.57027, 151.95985]
2025-09-12 05:49:23,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [255.0, 294.0, 74.0, 557.0, 122.0, 68.0, 157.0, 551.0, 90.0, 75.0]
2025-09-12 05:49:23,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 16 hours, 12 minutes, 25 seconds)
2025-09-12 06:01:27,623 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:01:27,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:02:20,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 494.86270 ± 280.387
2025-09-12 06:02:20,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [937.5464, 284.50992, 831.3576, 446.01187, 160.35773, 419.71683, 399.7563, 179.01643, 926.9599, 363.39392]
2025-09-12 06:02:20,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [321.0, 118.0, 302.0, 159.0, 76.0, 165.0, 162.0, 84.0, 336.0, 145.0]
2025-09-12 06:02:20,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 16 hours, 17 seconds)
2025-09-12 06:14:06,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:14:06,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:15:30,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 833.78680 ± 412.260
2025-09-12 06:15:30,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [465.61258, 434.2851, 867.749, 690.52277, 905.90796, 745.5149, 1482.2415, 1669.9442, 344.16342, 731.9264]
2025-09-12 06:15:30,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [187.0, 185.0, 273.0, 225.0, 293.0, 275.0, 527.0, 588.0, 145.0, 241.0]
2025-09-12 06:15:30,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 15 hours, 43 minutes, 41 seconds)
2025-09-12 06:27:37,743 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:27:37,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:29:00,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 765.57886 ± 342.869
2025-09-12 06:29:00,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [672.5486, 510.5169, 874.36475, 517.9491, 1521.0781, 941.6039, 1031.6348, 872.0643, 431.12207, 282.90637]
2025-09-12 06:29:00,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [247.0, 200.0, 313.0, 207.0, 572.0, 352.0, 388.0, 286.0, 180.0, 136.0]
2025-09-12 06:29:00,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 15 hours, 40 minutes, 37 seconds)
2025-09-12 06:40:23,237 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:40:23,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:41:22,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 570.24158 ± 355.879
2025-09-12 06:41:22,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [558.6731, 121.18682, 1100.8905, 782.7284, 70.30562, 821.5792, 437.61526, 962.6381, 747.57336, 99.22522]
2025-09-12 06:41:22,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [215.0, 63.0, 375.0, 286.0, 48.0, 282.0, 167.0, 331.0, 242.0, 55.0]
2025-09-12 06:41:22,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 15 hours, 21 minutes, 30 seconds)
2025-09-12 06:53:14,667 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:53:14,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:54:18,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 602.84033 ± 324.572
2025-09-12 06:54:18,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [794.2241, 620.03516, 1156.683, 132.34608, 845.8839, 544.48914, 965.3272, 482.28305, 348.15347, 138.97855]
2025-09-12 06:54:18,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [261.0, 232.0, 420.0, 69.0, 295.0, 202.0, 377.0, 200.0, 144.0, 71.0]
2025-09-12 06:54:18,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 15 hours, 8 minutes, 54 seconds)
2025-09-12 07:06:17,819 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:06:17,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:07:24,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 674.70007 ± 255.241
2025-09-12 07:07:24,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [631.2851, 1026.4645, 836.753, 852.20404, 666.55457, 380.3075, 487.23358, 170.20888, 973.76794, 722.22235]
2025-09-12 07:07:24,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [230.0, 330.0, 274.0, 314.0, 230.0, 145.0, 178.0, 80.0, 348.0, 226.0]
2025-09-12 07:07:24,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 57 minutes, 57 seconds)
2025-09-12 07:19:03,950 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:19:03,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:20:07,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 613.35132 ± 681.602
2025-09-12 07:20:07,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [210.8028, 344.46863, 1714.4642, 320.06677, 594.4589, 180.16605, 320.15793, 2161.6897, 157.56009, 129.67792]
2025-09-12 07:20:07,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 144.0, 569.0, 134.0, 197.0, 92.0, 131.0, 721.0, 76.0, 67.0]
2025-09-12 07:20:07,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 14 hours, 38 minutes, 47 seconds)
2025-09-12 07:31:57,774 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:31:57,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:33:11,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 734.04071 ± 419.958
2025-09-12 07:33:11,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [589.11115, 738.5473, 176.48209, 569.2222, 1468.7703, 782.37024, 833.18134, 1431.53, 609.1968, 141.99559]
2025-09-12 07:33:11,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [207.0, 236.0, 85.0, 199.0, 531.0, 279.0, 266.0, 474.0, 213.0, 74.0]
2025-09-12 07:33:11,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 14 hours, 20 minutes, 3 seconds)
2025-09-12 07:45:06,292 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:45:06,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:46:37,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 881.51233 ± 738.522
2025-09-12 07:46:37,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [521.2085, 74.340614, 508.6571, 1069.4537, 863.96265, 777.25305, 168.73157, 847.3819, 2861.378, 1122.7555]
2025-09-12 07:46:37,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [221.0, 50.0, 220.0, 405.0, 338.0, 237.0, 82.0, 342.0, 960.0, 372.0]
2025-09-12 07:46:37,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 14 hours, 21 minutes, 19 seconds)
2025-09-12 07:58:21,819 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:58:21,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:59:54,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 917.63788 ± 569.047
2025-09-12 07:59:54,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [368.3346, 409.5159, 1175.2623, 825.91125, 1634.7449, 147.01129, 718.3431, 2030.6271, 604.3987, 1262.2294]
2025-09-12 07:59:54,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [156.0, 161.0, 398.0, 266.0, 545.0, 76.0, 245.0, 750.0, 201.0, 464.0]
2025-09-12 07:59:54,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (917.64) for latency MM1Queue_a033_s075
2025-09-12 07:59:54,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 12 minutes, 42 seconds)
2025-09-12 08:11:50,088 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:11:50,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:13:27,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 968.38562 ± 601.164
2025-09-12 08:13:27,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [348.28018, 491.1353, 803.9529, 285.17087, 655.49744, 1197.6102, 1191.3717, 2038.8564, 668.3367, 2003.6447]
2025-09-12 08:13:27,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 200.0, 263.0, 119.0, 252.0, 404.0, 418.0, 713.0, 258.0, 672.0]
2025-09-12 08:13:27,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (968.39) for latency MM1Queue_a033_s075
2025-09-12 08:13:27,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 14 hours, 5 minutes, 25 seconds)
2025-09-12 08:25:19,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:25:19,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:26:03,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 394.96423 ± 289.869
2025-09-12 08:26:03,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [113.6363, 879.5433, 441.34, 378.3213, 227.90178, 114.37045, 262.982, 977.64276, 404.27325, 149.63124]
2025-09-12 08:26:03,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 286.0, 168.0, 151.0, 102.0, 59.0, 114.0, 340.0, 157.0, 73.0]
2025-09-12 08:26:03,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 13 hours, 50 minutes, 45 seconds)
2025-09-12 08:37:45,469 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:37:45,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:38:41,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 537.30786 ± 328.600
2025-09-12 08:38:41,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [981.1593, 292.2962, 904.7574, 17.129883, 822.0801, 867.2214, 256.41092, 176.23709, 622.09485, 433.69202]
2025-09-12 08:38:41,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [348.0, 125.0, 313.0, 18.0, 269.0, 288.0, 116.0, 85.0, 230.0, 168.0]
2025-09-12 08:38:41,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 13 hours, 32 minutes, 18 seconds)
2025-09-12 08:50:49,965 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:50:49,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:51:58,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 668.10925 ± 410.647
2025-09-12 08:51:58,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [904.17773, 813.0415, 768.9963, 961.27875, 39.124603, 570.00494, 845.67426, 1433.2374, 189.28685, 156.27074]
2025-09-12 08:51:58,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [336.0, 278.0, 246.0, 322.0, 31.0, 245.0, 277.0, 511.0, 90.0, 79.0]
2025-09-12 08:51:58,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 13 hours, 17 minutes, 17 seconds)
2025-09-12 09:03:44,894 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:03:44,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:04:35,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 470.76788 ± 499.043
2025-09-12 09:04:35,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [197.27975, 649.0171, 176.22832, 934.13544, 45.334396, 49.39456, 226.81781, 257.63028, 429.38034, 1742.4608]
2025-09-12 09:04:35,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 235.0, 85.0, 298.0, 35.0, 35.0, 100.0, 112.0, 169.0, 637.0]
2025-09-12 09:04:35,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 56 minutes, 13 seconds)
2025-09-12 09:16:23,331 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:16:23,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:17:24,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 560.33020 ± 300.570
2025-09-12 09:17:24,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [944.9606, 268.18158, 321.03033, 97.56181, 795.4586, 404.0259, 715.1647, 921.1876, 280.90448, 854.82605]
2025-09-12 09:17:24,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [344.0, 113.0, 136.0, 58.0, 287.0, 165.0, 293.0, 296.0, 129.0, 299.0]
2025-09-12 09:17:24,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 34 minutes, 35 seconds)
2025-09-12 09:29:13,290 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:29:13,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:30:32,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 807.24475 ± 295.516
2025-09-12 09:30:32,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [239.97247, 753.41724, 1264.5457, 824.1867, 1298.521, 810.6958, 777.0972, 626.6892, 904.359, 572.964]
2025-09-12 09:30:32,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 236.0, 420.0, 259.0, 460.0, 301.0, 274.0, 228.0, 299.0, 196.0]
2025-09-12 09:30:32,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 27 minutes, 59 seconds)
2025-09-12 09:42:25,991 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:42:25,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:44:00,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 957.43665 ± 685.512
2025-09-12 09:44:00,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1521.5253, 144.046, 915.28876, 1753.2598, 1032.4315, 2318.3335, 260.12363, 853.4187, 186.67445, 589.26355]
2025-09-12 09:44:00,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [488.0, 75.0, 336.0, 634.0, 333.0, 761.0, 110.0, 282.0, 87.0, 215.0]
2025-09-12 09:44:00,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 12 hours, 24 minutes, 37 seconds)
2025-09-12 09:56:18,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:56:18,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:57:29,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 634.13440 ± 470.851
2025-09-12 09:57:29,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [445.23685, 391.41183, 1167.2875, 242.94826, 495.74582, 999.6313, 53.7059, 631.82965, 1659.3756, 254.17125]
2025-09-12 09:57:29,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [172.0, 156.0, 429.0, 106.0, 203.0, 385.0, 39.0, 263.0, 593.0, 111.0]
2025-09-12 09:57:29,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 12 hours, 13 minutes, 46 seconds)
2025-09-12 10:08:51,977 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:08:51,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:10:28,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 959.64825 ± 720.783
2025-09-12 10:10:28,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [401.6996, 812.97943, 256.4294, 1060.0665, 2840.937, 1073.9426, 357.8927, 787.2515, 549.8599, 1455.4241]
2025-09-12 10:10:28,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [156.0, 256.0, 111.0, 382.0, 1000.0, 388.0, 146.0, 265.0, 208.0, 478.0]
2025-09-12 10:10:29,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 12 hours, 4 minutes, 50 seconds)
2025-09-12 10:22:43,835 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:22:43,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:23:50,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 645.78577 ± 338.282
2025-09-12 10:23:50,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [799.4308, 1329.4952, 701.771, 889.3715, 654.80554, 748.78076, 645.6286, 68.30957, 206.88985, 413.375]
2025-09-12 10:23:50,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [265.0, 480.0, 230.0, 291.0, 240.0, 271.0, 227.0, 45.0, 92.0, 161.0]
2025-09-12 10:23:50,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 57 minutes, 29 seconds)
2025-09-12 10:35:10,587 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:35:10,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:36:17,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 627.57812 ± 301.005
2025-09-12 10:36:17,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1270.4501, 670.5221, 575.21356, 732.8951, 165.71597, 666.98724, 156.20944, 786.4753, 711.5777, 539.735]
2025-09-12 10:36:17,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [457.0, 227.0, 218.0, 269.0, 80.0, 238.0, 75.0, 302.0, 275.0, 212.0]
2025-09-12 10:36:17,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 37 minutes)
2025-09-12 10:48:19,244 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:48:19,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:49:35,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 749.51758 ± 434.288
2025-09-12 10:49:35,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1115.9777, 140.75879, 1038.3844, 1407.5537, 1002.70703, 770.5294, 318.95255, 1139.8384, 385.10583, 175.3684]
2025-09-12 10:49:35,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [393.0, 73.0, 373.0, 473.0, 358.0, 260.0, 135.0, 365.0, 155.0, 83.0]
2025-09-12 10:49:35,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 22 minutes, 1 second)
2025-09-12 11:01:22,725 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:01:22,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:02:53,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 896.35321 ± 746.882
2025-09-12 11:02:53,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [780.0962, 191.78702, 2571.4983, 192.36017, 1942.5116, 533.74225, 878.35645, 259.89624, 1050.8284, 562.4556]
2025-09-12 11:02:53,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [260.0, 92.0, 867.0, 89.0, 735.0, 212.0, 291.0, 109.0, 359.0, 218.0]
2025-09-12 11:02:53,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 11 hours, 7 minutes, 9 seconds)
2025-09-12 11:14:53,187 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:14:53,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:16:00,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 652.92242 ± 366.895
2025-09-12 11:16:00,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1131.4979, 732.7086, 260.91693, 726.33215, 757.7506, 1228.1454, 327.92178, 235.01083, 168.32222, 960.6178]
2025-09-12 11:16:00,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [415.0, 248.0, 114.0, 291.0, 267.0, 409.0, 137.0, 102.0, 82.0, 331.0]
2025-09-12 11:16:00,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 55 minutes, 19 seconds)
2025-09-12 11:27:46,843 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:27:46,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:29:03,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 733.00250 ± 308.610
2025-09-12 11:29:03,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1239.1204, 531.7724, 696.37866, 262.60974, 917.9902, 1132.0615, 937.03394, 732.3184, 556.07355, 324.66602]
2025-09-12 11:29:03,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [408.0, 205.0, 280.0, 119.0, 357.0, 415.0, 305.0, 280.0, 213.0, 133.0]
2025-09-12 11:29:03,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 39 minutes, 4 seconds)
2025-09-12 11:41:01,413 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:41:01,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:42:29,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 865.25305 ± 743.377
2025-09-12 11:42:29,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [432.36478, 466.97992, 1624.6962, 865.6959, 276.96762, 896.94354, 759.76996, 536.5624, 2732.897, 59.653526]
2025-09-12 11:42:29,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 170.0, 551.0, 270.0, 116.0, 337.0, 280.0, 201.0, 927.0, 43.0]
2025-09-12 11:42:29,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 35 minutes, 29 seconds)
2025-09-12 11:54:22,960 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:54:22,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:55:08,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 412.26416 ± 284.204
2025-09-12 11:55:08,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [166.49121, 208.04608, 77.17593, 662.50385, 478.25528, 675.56165, 261.77313, 244.48613, 1040.9407, 307.40747]
2025-09-12 11:55:08,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 96.0, 50.0, 246.0, 185.0, 247.0, 126.0, 111.0, 364.0, 133.0]
2025-09-12 11:55:08,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 16 minutes, 14 seconds)
2025-09-12 12:06:50,419 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:06:50,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:07:54,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 644.78357 ± 325.582
2025-09-12 12:07:54,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [788.8131, 112.34903, 817.909, 930.2522, 225.54153, 579.5333, 809.8045, 932.68475, 205.08498, 1045.8632]
2025-09-12 12:07:54,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [283.0, 63.0, 253.0, 289.0, 109.0, 221.0, 305.0, 290.0, 94.0, 345.0]
2025-09-12 12:07:54,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 58 minutes, 6 seconds)
2025-09-12 12:19:46,087 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:19:46,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:21:06,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 820.51990 ± 492.877
2025-09-12 12:21:06,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [197.56781, 875.03674, 1367.8152, 253.16197, 400.3216, 716.2924, 889.3009, 1209.7367, 1803.2294, 492.73676]
2025-09-12 12:21:06,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 313.0, 488.0, 108.0, 161.0, 227.0, 325.0, 383.0, 564.0, 181.0]
2025-09-12 12:21:06,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 45 minutes, 49 seconds)
2025-09-12 12:33:29,127 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:33:29,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:34:47,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 783.18152 ± 305.805
2025-09-12 12:34:47,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [844.99475, 455.47488, 473.24594, 1147.4886, 1076.1803, 844.6369, 1003.67865, 138.83997, 957.5427, 889.732]
2025-09-12 12:34:47,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [264.0, 182.0, 177.0, 426.0, 362.0, 303.0, 333.0, 68.0, 321.0, 333.0]
2025-09-12 12:34:47,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 38 minutes, 28 seconds)
2025-09-12 12:45:52,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:45:52,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:46:49,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 553.79834 ± 395.106
2025-09-12 12:46:49,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [541.63763, 1162.1204, 36.453068, 472.3414, 374.92297, 184.88531, 577.40814, 68.69755, 1077.6558, 1041.8616]
2025-09-12 12:46:49,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [208.0, 367.0, 30.0, 183.0, 158.0, 92.0, 218.0, 45.0, 362.0, 349.0]
2025-09-12 12:46:49,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 13 minutes, 19 seconds)
2025-09-12 12:58:53,730 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:58:53,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:00:07,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 695.61414 ± 392.415
2025-09-12 13:00:07,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1086.2565, 214.21877, 374.81915, 1224.6548, 178.08524, 799.76794, 181.035, 980.2843, 1029.2802, 887.73987]
2025-09-12 13:00:07,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [384.0, 99.0, 146.0, 444.0, 82.0, 304.0, 87.0, 373.0, 379.0, 331.0]
2025-09-12 13:00:07,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 5 minutes, 49 seconds)
2025-09-12 13:11:37,778 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:11:37,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:12:32,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 547.79810 ± 388.272
2025-09-12 13:12:32,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [101.12666, 69.26532, 723.3224, 826.21716, 973.4359, 571.4943, 1282.7181, 117.5163, 297.41968, 515.4655]
2025-09-12 13:12:32,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 45.0, 251.0, 269.0, 323.0, 208.0, 394.0, 62.0, 125.0, 202.0]
2025-09-12 13:12:32,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 49 minutes, 59 seconds)
2025-09-12 13:24:28,384 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:24:28,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:25:55,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 832.72784 ± 986.313
2025-09-12 13:25:55,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [953.82263, 273.01947, 149.20297, 34.90352, 60.076885, 2606.7395, 142.84044, 2831.4158, 585.0616, 690.196]
2025-09-12 13:25:55,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [356.0, 121.0, 74.0, 29.0, 42.0, 950.0, 71.0, 1000.0, 221.0, 263.0]
2025-09-12 13:25:55,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 38 minutes, 35 seconds)
2025-09-12 13:37:41,966 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:37:41,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:38:40,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 557.24524 ± 541.930
2025-09-12 13:38:40,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [807.3718, 2042.173, 231.73329, 393.4446, 498.5115, 40.33245, 687.564, 336.67355, 394.05588, 140.59157]
2025-09-12 13:38:40,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [299.0, 685.0, 105.0, 154.0, 197.0, 31.0, 268.0, 140.0, 154.0, 69.0]
2025-09-12 13:38:40,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 18 minutes, 21 seconds)
2025-09-12 13:50:36,346 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:50:36,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:51:41,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 599.80450 ± 530.455
2025-09-12 13:51:41,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [530.7784, 526.8222, 190.03172, 166.89488, 233.81755, 1162.4475, 1846.0055, 245.1017, 944.12555, 152.02]
2025-09-12 13:51:41,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [203.0, 207.0, 84.0, 81.0, 103.0, 421.0, 626.0, 105.0, 366.0, 75.0]
2025-09-12 13:51:41,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 13 minutes, 1 second)
2025-09-12 14:03:39,194 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:03:39,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:05:25,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 1077.08508 ± 655.543
2025-09-12 14:05:25,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [814.94135, 1795.0599, 847.42914, 2244.7517, 716.46295, 205.27928, 997.12964, 1363.4751, 102.335594, 1683.985]
2025-09-12 14:05:25,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [294.0, 575.0, 276.0, 785.0, 266.0, 93.0, 307.0, 474.0, 56.0, 579.0]
2025-09-12 14:05:25,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (1077.09) for latency MM1Queue_a033_s075
2025-09-12 14:05:25,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 3 minutes, 14 seconds)
2025-09-12 14:17:17,474 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:17:17,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:18:46,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 939.97791 ± 414.238
2025-09-12 14:18:46,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [364.8985, 424.37128, 808.0232, 1007.3945, 1971.459, 1022.98816, 962.58167, 889.06354, 871.8359, 1077.1633]
2025-09-12 14:18:46,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [150.0, 167.0, 306.0, 318.0, 663.0, 330.0, 298.0, 296.0, 294.0, 357.0]
2025-09-12 14:18:46,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 56 minutes, 55 seconds)
2025-09-12 14:30:23,006 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:30:23,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:31:51,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 859.87177 ± 462.210
2025-09-12 14:31:51,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [734.8534, 2027.4713, 1127.721, 953.79706, 773.4475, 546.6096, 498.817, 869.9841, 870.01996, 195.99661]
2025-09-12 14:31:51,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [287.0, 746.0, 361.0, 323.0, 287.0, 187.0, 187.0, 318.0, 292.0, 94.0]
2025-09-12 14:31:51,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 41 minutes, 26 seconds)
2025-09-12 14:43:42,436 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:43:42,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:44:41,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 565.03351 ± 448.979
2025-09-12 14:44:41,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [604.0041, 796.01074, 1650.2356, 156.08157, 109.02372, 326.79886, 724.9985, 829.82446, 342.1663, 111.19083]
2025-09-12 14:44:41,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [217.0, 250.0, 582.0, 75.0, 59.0, 135.0, 229.0, 291.0, 143.0, 59.0]
2025-09-12 14:44:41,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 28 minutes, 49 seconds)
2025-09-12 14:56:41,756 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:56:41,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:57:44,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 594.80792 ± 320.250
2025-09-12 14:57:44,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1299.1045, 604.4768, 689.00793, 153.25468, 757.19086, 394.39883, 479.64252, 719.3155, 720.15234, 131.53499]
2025-09-12 14:57:44,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [455.0, 218.0, 264.0, 78.0, 263.0, 146.0, 180.0, 270.0, 259.0, 70.0]
2025-09-12 14:57:44,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 15 minutes, 54 seconds)
2025-09-12 15:09:31,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:09:31,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:10:53,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 838.44324 ± 690.934
2025-09-12 15:10:53,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [2592.3906, 969.9765, 687.5489, 547.8331, 1489.6595, 846.5709, 458.40652, 353.63684, 254.65086, 183.75853]
2025-09-12 15:10:53,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [893.0, 314.0, 260.0, 197.0, 475.0, 266.0, 178.0, 139.0, 111.0, 84.0]
2025-09-12 15:10:53,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 58 minutes, 54 seconds)
2025-09-12 15:22:37,360 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:22:37,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:24:18,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 1011.25842 ± 691.125
2025-09-12 15:24:18,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [627.01917, 858.2427, 792.0053, 1615.899, 1467.072, 170.00206, 2574.5718, 210.83789, 610.13477, 1186.799]
2025-09-12 15:24:18,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [203.0, 273.0, 315.0, 575.0, 527.0, 79.0, 894.0, 96.0, 234.0, 376.0]
2025-09-12 15:24:18,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 46 minutes, 15 seconds)
2025-09-12 15:36:16,346 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:36:16,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:37:49,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 903.46057 ± 796.216
2025-09-12 15:37:49,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [76.420296, 1223.0728, 123.453575, 536.14404, 2731.2058, 221.19797, 1131.8782, 1557.0359, 240.02759, 1194.17]
2025-09-12 15:37:49,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [48.0, 410.0, 67.0, 188.0, 981.0, 101.0, 414.0, 546.0, 102.0, 433.0]
2025-09-12 15:37:49,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 35 minutes, 48 seconds)
2025-09-12 15:50:14,989 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:50:14,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:51:45,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 886.43585 ± 585.087
2025-09-12 15:51:45,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1248.3691, 655.4878, 107.365585, 192.67932, 2015.836, 1336.1202, 1377.8925, 777.7051, 913.6406, 239.263]
2025-09-12 15:51:45,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [465.0, 241.0, 60.0, 93.0, 686.0, 472.0, 501.0, 269.0, 280.0, 102.0]
2025-09-12 15:51:45,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 29 minutes, 1 second)
2025-09-12 16:03:10,561 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:03:10,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:04:44,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 946.94043 ± 474.777
2025-09-12 16:04:44,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [280.17944, 1003.0211, 417.59744, 1056.3445, 1807.625, 1087.6274, 238.64256, 1106.1952, 1063.7285, 1408.4434]
2025-09-12 16:04:44,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 317.0, 166.0, 340.0, 602.0, 345.0, 103.0, 395.0, 363.0, 520.0]
2025-09-12 16:04:44,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 15 minutes, 11 seconds)
2025-09-12 16:16:23,324 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:16:23,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:17:44,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 825.97949 ± 557.596
2025-09-12 16:17:44,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [829.3322, 1268.7438, 247.11693, 857.12897, 1641.5541, 205.71408, 927.15656, 1762.1094, 282.42004, 238.51906]
2025-09-12 16:17:44,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [282.0, 409.0, 109.0, 289.0, 522.0, 90.0, 285.0, 622.0, 129.0, 101.0]
2025-09-12 16:17:44,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 59 seconds)
2025-09-12 16:29:35,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:29:35,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:30:25,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 483.87457 ± 358.093
2025-09-12 16:30:25,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [336.49146, 408.13046, 109.29554, 24.478798, 923.18304, 180.59514, 189.42062, 1048.336, 784.28253, 834.5321]
2025-09-12 16:30:25,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 180.0, 58.0, 24.0, 293.0, 86.0, 86.0, 376.0, 251.0, 260.0]
2025-09-12 16:30:25,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 43 minutes, 50 seconds)
2025-09-12 16:42:18,360 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:42:18,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:43:59,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 985.81970 ± 817.109
2025-09-12 16:43:59,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [255.54726, 791.40625, 1530.0919, 119.50155, 537.8535, 1019.29407, 2814.938, 238.19507, 1901.9377, 649.4321]
2025-09-12 16:43:59,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 270.0, 560.0, 62.0, 201.0, 375.0, 1000.0, 104.0, 654.0, 248.0]
2025-09-12 16:43:59,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 30 minutes, 52 seconds)
2025-09-12 16:55:53,228 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:55:53,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:57:06,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 743.82043 ± 494.168
2025-09-12 16:57:06,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1250.9832, 873.6595, 298.97147, 157.41736, 211.95012, 1026.6674, 700.692, 1076.0958, 1658.5598, 183.20795]
2025-09-12 16:57:06,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [455.0, 274.0, 129.0, 77.0, 101.0, 351.0, 253.0, 320.0, 568.0, 88.0]
2025-09-12 16:57:06,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 13 minutes, 43 seconds)
2025-09-12 17:08:48,751 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:08:48,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:09:50,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 634.67206 ± 278.116
2025-09-12 17:09:50,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [368.51428, 229.65314, 687.6316, 815.84436, 953.7534, 192.58902, 1077.3142, 714.4306, 690.4019, 616.5882]
2025-09-12 17:09:50,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 101.0, 219.0, 250.0, 289.0, 89.0, 369.0, 229.0, 255.0, 219.0]
2025-09-12 17:09:50,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 59 minutes, 26 seconds)
2025-09-12 17:21:48,714 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:21:48,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:23:24,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 915.79510 ± 529.700
2025-09-12 17:23:24,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [54.459732, 907.503, 338.82733, 808.4587, 1508.6172, 1367.8165, 711.225, 848.8151, 1943.5483, 668.6801]
2025-09-12 17:23:24,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [39.0, 332.0, 144.0, 259.0, 543.0, 513.0, 264.0, 308.0, 696.0, 261.0]
2025-09-12 17:23:24,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 48 minutes, 58 seconds)
2025-09-12 17:35:19,906 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:35:19,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:36:39,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 810.27081 ± 735.375
2025-09-12 17:36:39,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [710.0705, 119.52011, 808.51697, 518.57, 815.52155, 823.6805, 939.1141, 244.49095, 263.62198, 2859.6016]
2025-09-12 17:36:39,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [257.0, 62.0, 255.0, 201.0, 268.0, 271.0, 295.0, 103.0, 116.0, 1000.0]
2025-09-12 17:36:39,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 38 minutes, 9 seconds)
2025-09-12 17:48:21,732 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:48:21,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:49:34,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 734.18756 ± 465.407
2025-09-12 17:49:34,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [809.3858, 546.2047, 718.41895, 1644.0662, 1054.6987, 27.608952, 493.51355, 152.94295, 624.914, 1270.1215]
2025-09-12 17:49:34,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [264.0, 195.0, 236.0, 574.0, 381.0, 25.0, 185.0, 73.0, 227.0, 414.0]
2025-09-12 17:49:34,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 22 minutes, 20 seconds)
2025-09-12 18:01:21,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:01:21,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:02:45,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 811.67285 ± 776.009
2025-09-12 18:02:46,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [924.49896, 1286.2373, 208.40396, 121.797935, 367.0661, 171.33623, 2734.506, 147.1924, 1244.9921, 910.69727]
2025-09-12 18:02:46,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [289.0, 469.0, 91.0, 66.0, 146.0, 81.0, 1000.0, 73.0, 417.0, 344.0]
2025-09-12 18:02:46,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 9 minutes, 28 seconds)
2025-09-12 18:14:45,026 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:14:45,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:16:08,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 793.23285 ± 738.952
2025-09-12 18:16:08,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1222.1407, 926.0196, 77.811516, 918.63934, 647.75653, 2739.5327, 226.50133, 572.73364, 483.50934, 117.68367]
2025-09-12 18:16:08,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [444.0, 348.0, 49.0, 351.0, 243.0, 903.0, 111.0, 228.0, 194.0, 61.0]
2025-09-12 18:16:08,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 58 minutes, 42 seconds)
2025-09-12 18:27:44,806 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:27:44,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:29:07,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 814.57489 ± 352.081
2025-09-12 18:29:07,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [869.5913, 1237.5237, 747.6936, 1103.464, 749.96454, 155.37437, 281.73483, 1307.7316, 815.45465, 877.21643]
2025-09-12 18:29:07,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [279.0, 467.0, 248.0, 379.0, 293.0, 74.0, 124.0, 427.0, 287.0, 309.0]
2025-09-12 18:29:07,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 43 minutes, 25 seconds)
2025-09-12 18:41:02,836 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:41:02,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:42:06,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 657.11633 ± 415.338
2025-09-12 18:42:06,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [161.82278, 580.4412, 665.9726, 1469.125, 874.395, 1019.1626, 158.62888, 291.94647, 318.69147, 1030.9763]
2025-09-12 18:42:06,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 198.0, 227.0, 509.0, 277.0, 307.0, 76.0, 125.0, 124.0, 313.0]
2025-09-12 18:42:06,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 29 minutes, 26 seconds)
2025-09-12 18:54:09,693 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:54:09,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:55:39,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 929.99121 ± 367.215
2025-09-12 18:55:39,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [973.2268, 848.61475, 1326.9865, 1495.6548, 645.4861, 1118.8005, 885.3152, 105.323715, 1127.6117, 772.8928]
2025-09-12 18:55:39,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [361.0, 262.0, 500.0, 529.0, 213.0, 393.0, 268.0, 61.0, 346.0, 247.0]
2025-09-12 18:55:39,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 18 minutes, 12 seconds)
2025-09-12 19:07:16,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:07:16,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:08:32,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 767.21826 ± 430.365
2025-09-12 19:08:32,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1096.6865, 811.37164, 1280.7782, 906.5593, 348.24072, 212.98914, 139.4247, 1328.1116, 408.67, 1139.3508]
2025-09-12 19:08:32,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [340.0, 257.0, 470.0, 284.0, 147.0, 94.0, 69.0, 501.0, 161.0, 384.0]
2025-09-12 19:08:32,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 4 minutes, 10 seconds)
2025-09-12 19:20:00,367 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:20:00,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:21:15,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 776.83472 ± 452.424
2025-09-12 19:21:15,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [468.71957, 173.91736, 976.3943, 589.27893, 1124.8381, 1123.4254, 1631.7429, 131.1703, 1023.8612, 524.9989]
2025-09-12 19:21:15,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 81.0, 365.0, 230.0, 344.0, 340.0, 535.0, 68.0, 333.0, 197.0]
2025-09-12 19:21:15,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 49 minutes, 16 seconds)
2025-09-12 19:33:13,188 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:33:13,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:35:08,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 1189.83521 ± 677.597
2025-09-12 19:35:08,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1608.3206, 1485.7607, 239.12779, 338.49216, 1096.1536, 1571.6735, 580.863, 2613.048, 909.5325, 1455.3794]
2025-09-12 19:35:08,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [557.0, 496.0, 107.0, 141.0, 368.0, 504.0, 224.0, 867.0, 284.0, 464.0]
2025-09-12 19:35:08,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (1189.84) for latency MM1Queue_a033_s075
2025-09-12 19:35:08,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 38 minutes, 25 seconds)
2025-09-12 19:47:26,797 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:47:26,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:49:06,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 991.30627 ± 781.916
2025-09-12 19:49:06,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [209.40945, 899.7318, 289.84573, 1786.2129, 504.51584, 1003.4112, 547.1721, 363.04123, 2800.4053, 1509.3169]
2025-09-12 19:49:06,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 277.0, 122.0, 645.0, 186.0, 336.0, 212.0, 146.0, 1000.0, 529.0]
2025-09-12 19:49:06,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 27 minutes, 24 seconds)
2025-09-12 20:00:20,009 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:00:20,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:01:56,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 978.53967 ± 584.306
2025-09-12 20:01:56,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [450.15955, 1189.023, 799.2179, 2288.2446, 356.1654, 852.07947, 545.68054, 1417.6538, 1471.408, 415.76462]
2025-09-12 20:01:56,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [181.0, 366.0, 255.0, 799.0, 139.0, 292.0, 203.0, 485.0, 482.0, 157.0]
2025-09-12 20:01:56,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 12 minutes, 33 seconds)
2025-09-12 20:13:48,050 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:13:48,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:14:52,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 639.33093 ± 486.493
2025-09-12 20:14:52,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [846.8946, 1084.4467, 1666.0983, 619.25037, 705.2519, 58.08506, 206.7851, 93.86038, 216.99667, 895.64]
2025-09-12 20:14:52,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [272.0, 373.0, 584.0, 234.0, 224.0, 40.0, 93.0, 55.0, 96.0, 322.0]
2025-09-12 20:14:52,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 59 minutes, 24 seconds)
2025-09-12 20:26:44,957 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:26:44,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:27:55,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 663.02246 ± 505.856
2025-09-12 20:27:55,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [942.1916, 828.30505, 1663.6794, 520.1305, 453.2528, 188.70767, 134.25124, 196.76172, 1396.8663, 306.07816]
2025-09-12 20:27:55,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [363.0, 270.0, 598.0, 197.0, 172.0, 90.0, 68.0, 95.0, 462.0, 132.0]
2025-09-12 20:27:55,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 46 minutes, 39 seconds)
2025-09-12 20:39:40,979 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:39:40,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:41:00,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 795.16199 ± 556.494
2025-09-12 20:41:00,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [289.74652, 512.7862, 1115.5446, 715.12585, 1162.8743, 125.88267, 784.8136, 1904.2886, 34.663754, 1305.894]
2025-09-12 20:41:00,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 192.0, 380.0, 229.0, 371.0, 64.0, 291.0, 627.0, 29.0, 472.0]
2025-09-12 20:41:00,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 32 minutes, 13 seconds)
2025-09-12 20:52:38,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:52:38,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:53:49,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 713.00787 ± 266.248
2025-09-12 20:53:49,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1046.4178, 948.4162, 1001.8443, 624.9208, 731.3368, 853.9448, 769.2958, 352.40506, 174.1409, 627.3558]
2025-09-12 20:53:49,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [346.0, 293.0, 361.0, 218.0, 227.0, 321.0, 248.0, 143.0, 81.0, 242.0]
2025-09-12 20:53:49,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 17 minutes, 39 seconds)
2025-09-12 21:05:59,308 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:05:59,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:07:21,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 791.88000 ± 729.237
2025-09-12 21:07:21,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [350.24753, 450.16006, 822.50275, 369.61655, 2821.3325, 143.064, 648.78827, 1016.59247, 943.9992, 352.49664]
2025-09-12 21:07:21,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [147.0, 168.0, 294.0, 150.0, 1000.0, 71.0, 238.0, 356.0, 309.0, 144.0]
2025-09-12 21:07:21,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 5 minutes, 25 seconds)
2025-09-12 21:18:41,210 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:18:41,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:19:33,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 539.66675 ± 291.070
2025-09-12 21:19:33,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [240.08107, 807.8604, 903.87335, 597.9146, 7.833662, 664.54504, 868.5952, 165.63882, 593.6606, 546.6646]
2025-09-12 21:19:33,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 255.0, 289.0, 210.0, 23.0, 228.0, 265.0, 78.0, 218.0, 203.0]
2025-09-12 21:19:33,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 51 minutes, 44 seconds)
2025-09-12 21:31:27,585 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:31:27,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:32:25,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 566.21497 ± 415.761
2025-09-12 21:32:25,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [454.13812, 97.92406, 1180.2345, 836.9861, 208.5092, 185.09569, 363.1031, 213.6998, 1303.6882, 818.7712]
2025-09-12 21:32:25,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [173.0, 55.0, 418.0, 256.0, 88.0, 85.0, 147.0, 94.0, 472.0, 249.0]
2025-09-12 21:32:25,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 38 minutes, 42 seconds)
2025-09-12 21:44:17,237 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:44:17,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:45:54,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 969.74628 ± 591.008
2025-09-12 21:45:54,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [646.3605, 1956.0017, 698.24146, 578.8732, 1160.7733, 1046.462, 2117.5957, 285.57016, 387.86224, 819.72144]
2025-09-12 21:45:54,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [227.0, 699.0, 235.0, 211.0, 424.0, 372.0, 690.0, 117.0, 152.0, 253.0]
2025-09-12 21:45:54,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 25 minutes, 57 seconds)
2025-09-12 21:57:29,636 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:57:29,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:59:01,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 944.94464 ± 690.461
2025-09-12 21:59:01,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [707.12335, 1006.883, 980.43024, 161.61577, 591.7657, 753.75604, 2845.3374, 921.1891, 397.66232, 1083.6838]
2025-09-12 21:59:01,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [262.0, 371.0, 306.0, 79.0, 217.0, 238.0, 1000.0, 290.0, 147.0, 387.0]
2025-09-12 21:59:01,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 13 minutes, 2 seconds)
2025-09-12 22:10:53,915 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:10:53,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:12:23,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 915.37354 ± 593.590
2025-09-12 22:12:23,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [442.59476, 598.46075, 1086.9922, 888.75366, 1561.6179, 645.69, 97.78565, 932.14685, 601.6247, 2298.0696]
2025-09-12 22:12:23,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 222.0, 395.0, 280.0, 501.0, 232.0, 57.0, 303.0, 207.0, 766.0]
2025-09-12 22:12:23,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1251 [DEBUG]: Training session finished
