2025-09-12 00:14:42,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc10-hopper/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 00:14:42,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc10-hopper/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 00:14:42,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14e490f95410>}
2025-09-12 00:14:42,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1111 [DEBUG]: using device: cuda
2025-09-12 00:14:42,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1133 [INFO]: Creating new trainer
2025-09-12 00:14:42,239 baseline-mbpac-noiseperc10-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-12 00:14:42,239 baseline-mbpac-noiseperc10-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 00:14:42,247 baseline-mbpac-noiseperc10-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 00:14:43,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1194 [DEBUG]: Starting training session...
2025-09-12 00:14:43,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 1/100
2025-09-12 00:24:36,433 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:24:36,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:24:59,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 155.30124 ± 57.983
2025-09-12 00:24:59,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [192.09084, 215.9246, 106.96081, 198.59375, 103.30838, 163.31645, 120.25522, 34.532642, 205.1801, 212.84976]
2025-09-12 00:24:59,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 121.0, 61.0, 104.0, 62.0, 88.0, 69.0, 26.0, 106.0, 111.0]
2025-09-12 00:24:59,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (155.30) for latency MM1Queue_a033_s075
2025-09-12 00:24:59,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 16 hours, 57 minutes, 17 seconds)
2025-09-12 00:36:14,212 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:36:14,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:36:35,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 129.59392 ± 77.262
2025-09-12 00:36:35,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [299.39963, 96.09619, 163.89592, 51.656006, 139.41174, 86.97478, 216.29425, 143.08272, 43.792274, 55.33569]
2025-09-12 00:36:35,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 58.0, 84.0, 33.0, 74.0, 52.0, 133.0, 78.0, 30.0, 35.0]
2025-09-12 00:36:35,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 17 hours, 51 minutes, 39 seconds)
2025-09-12 00:47:56,239 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:47:56,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:48:22,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 196.09415 ± 105.513
2025-09-12 00:48:22,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [321.39795, 199.18758, 92.23124, 31.167786, 83.50096, 206.15819, 108.3674, 292.67947, 327.86002, 298.39078]
2025-09-12 00:48:22,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [146.0, 113.0, 55.0, 24.0, 52.0, 108.0, 67.0, 128.0, 149.0, 134.0]
2025-09-12 00:48:22,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (196.09) for latency MM1Queue_a033_s075
2025-09-12 00:48:22,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 8 minutes, 20 seconds)
2025-09-12 00:59:43,715 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:59:43,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:00:43,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 355.90201 ± 177.825
2025-09-12 01:00:43,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [522.04126, 360.7697, 586.9482, 328.07468, 88.76977, 331.09082, 662.2799, 144.83324, 202.79602, 331.4164]
2025-09-12 01:00:43,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [354.0, 179.0, 390.0, 158.0, 54.0, 154.0, 481.0, 113.0, 143.0, 163.0]
2025-09-12 01:00:43,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (355.90) for latency MM1Queue_a033_s075
2025-09-12 01:00:43,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 24 minutes, 3 seconds)
2025-09-12 01:11:49,625 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:11:49,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:12:19,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 205.16110 ± 83.971
2025-09-12 01:12:19,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [152.2934, 130.19698, 212.08372, 340.9924, 326.24265, 144.66255, 311.90253, 116.918076, 187.12744, 129.1915]
2025-09-12 01:12:19,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 68.0, 132.0, 157.0, 143.0, 72.0, 142.0, 63.0, 153.0, 68.0]
2025-09-12 01:12:19,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 14 minutes, 23 seconds)
2025-09-12 01:23:31,437 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:23:31,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:24:03,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 274.90701 ± 50.167
2025-09-12 01:24:03,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [264.36014, 321.3741, 288.2328, 279.03687, 295.68317, 275.37885, 318.0973, 295.13733, 133.53851, 278.23102]
2025-09-12 01:24:03,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 132.0, 125.0, 119.0, 132.0, 124.0, 139.0, 133.0, 67.0, 123.0]
2025-09-12 01:24:03,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 18 hours, 30 minutes, 32 seconds)
2025-09-12 01:35:23,525 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:35:23,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:36:02,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 304.33191 ± 81.217
2025-09-12 01:36:02,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [336.96844, 330.59055, 332.38834, 326.00702, 335.32336, 61.45184, 336.02762, 342.00525, 322.2086, 320.3482]
2025-09-12 01:36:02,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [162.0, 153.0, 148.0, 147.0, 157.0, 39.0, 151.0, 164.0, 145.0, 169.0]
2025-09-12 01:36:02,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 25 minutes, 51 seconds)
2025-09-12 01:47:10,923 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:47:10,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:47:45,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 297.77594 ± 61.363
2025-09-12 01:47:45,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [308.41086, 307.6795, 418.48062, 307.63995, 160.20372, 316.40683, 265.8941, 297.50162, 262.50867, 333.0331]
2025-09-12 01:47:45,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 133.0, 168.0, 132.0, 78.0, 136.0, 116.0, 129.0, 112.0, 140.0]
2025-09-12 01:47:45,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 12 minutes, 35 seconds)
2025-09-12 01:59:02,560 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:59:02,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:59:30,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 230.79752 ± 110.727
2025-09-12 01:59:30,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [54.642746, 227.31456, 347.64795, 163.21606, 302.69894, 211.28873, 29.450226, 313.50443, 352.34668, 305.86465]
2025-09-12 01:59:30,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 106.0, 138.0, 83.0, 132.0, 98.0, 22.0, 134.0, 151.0, 137.0]
2025-09-12 01:59:30,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 17 hours, 49 minutes, 56 seconds)
2025-09-12 02:10:43,502 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:10:43,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:11:12,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 244.59660 ± 130.659
2025-09-12 02:11:12,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [46.3944, 329.36713, 334.84314, 142.44678, 64.557335, 108.21127, 349.24335, 296.9347, 401.38217, 372.5859]
2025-09-12 02:11:12,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [35.0, 143.0, 139.0, 75.0, 41.0, 60.0, 152.0, 126.0, 169.0, 159.0]
2025-09-12 02:11:12,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 40 minutes, 3 seconds)
2025-09-12 02:22:15,103 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:22:15,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:23:05,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 458.33075 ± 106.259
2025-09-12 02:23:05,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [530.28925, 534.1921, 456.65485, 424.4971, 434.1044, 517.5244, 172.19438, 579.22076, 460.64053, 473.9897]
2025-09-12 02:23:05,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [214.0, 190.0, 189.0, 173.0, 189.0, 232.0, 85.0, 228.0, 193.0, 176.0]
2025-09-12 02:23:05,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (458.33) for latency MM1Queue_a033_s075
2025-09-12 02:23:05,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 30 minutes, 35 seconds)
2025-09-12 02:34:05,599 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:34:05,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:34:50,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 425.62997 ± 189.430
2025-09-12 02:34:50,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [545.06256, 378.9559, 419.4883, 194.6749, 586.0344, 431.81357, 69.27825, 429.4761, 407.22958, 794.2862]
2025-09-12 02:34:50,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 160.0, 193.0, 105.0, 215.0, 170.0, 42.0, 172.0, 171.0, 261.0]
2025-09-12 02:34:50,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 14 minutes, 55 seconds)
2025-09-12 02:45:42,338 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:45:42,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:46:22,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 358.52106 ± 179.373
2025-09-12 02:46:22,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [137.73116, 702.2568, 146.18526, 446.8598, 517.33527, 176.94508, 459.03384, 191.51854, 374.5919, 432.75293]
2025-09-12 02:46:22,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [74.0, 250.0, 77.0, 196.0, 197.0, 89.0, 176.0, 97.0, 173.0, 179.0]
2025-09-12 02:46:22,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 16 hours, 59 minutes, 50 seconds)
2025-09-12 02:57:34,414 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:57:34,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:58:25,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 519.46954 ± 266.378
2025-09-12 02:58:25,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [720.7942, 732.9273, 352.83905, 83.974045, 122.567825, 606.5509, 467.4831, 412.2402, 772.4113, 922.90784]
2025-09-12 02:58:25,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [258.0, 257.0, 150.0, 48.0, 69.0, 207.0, 172.0, 184.0, 259.0, 311.0]
2025-09-12 02:58:25,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (519.47) for latency MM1Queue_a033_s075
2025-09-12 02:58:25,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 53 minutes, 19 seconds)
2025-09-12 03:09:19,812 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:09:19,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:09:51,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 292.24713 ± 216.331
2025-09-12 03:09:51,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [72.771736, 212.51219, 660.9211, 166.42253, 125.695335, 407.91632, 43.869728, 549.09357, 551.4589, 131.8097]
2025-09-12 03:09:51,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [47.0, 96.0, 228.0, 83.0, 66.0, 168.0, 34.0, 200.0, 197.0, 69.0]
2025-09-12 03:09:51,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 16 hours, 37 minutes)
2025-09-12 03:20:52,618 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:20:52,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:21:19,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 212.50835 ± 183.212
2025-09-12 03:21:19,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [88.17899, 135.14828, 235.19833, 95.43452, 159.09398, 144.63506, 725.79266, 127.24356, 102.88177, 311.47656]
2025-09-12 03:21:19,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 73.0, 115.0, 55.0, 83.0, 75.0, 283.0, 70.0, 58.0, 149.0]
2025-09-12 03:21:19,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 18 minutes, 19 seconds)
2025-09-12 03:32:26,186 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:32:26,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:33:21,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 559.33136 ± 213.725
2025-09-12 03:33:21,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [463.63025, 455.82172, 943.2221, 649.1189, 167.61967, 661.8401, 711.1031, 323.86392, 477.76782, 739.3261]
2025-09-12 03:33:21,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 191.0, 303.0, 244.0, 83.0, 255.0, 242.0, 144.0, 202.0, 241.0]
2025-09-12 03:33:21,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (559.33) for latency MM1Queue_a033_s075
2025-09-12 03:33:21,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 11 minutes, 23 seconds)
2025-09-12 03:44:42,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:44:42,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:45:53,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 786.29492 ± 352.238
2025-09-12 03:45:53,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [852.60406, 376.13245, 107.91796, 691.371, 1066.4911, 694.22375, 698.14545, 815.4193, 1193.2987, 1367.3451]
2025-09-12 03:45:53,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [272.0, 146.0, 59.0, 227.0, 328.0, 241.0, 225.0, 281.0, 398.0, 467.0]
2025-09-12 03:45:53,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (786.29) for latency MM1Queue_a033_s075
2025-09-12 03:45:53,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 16 minutes, 11 seconds)
2025-09-12 03:56:34,613 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:56:34,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:57:30,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 537.21472 ± 438.584
2025-09-12 03:57:30,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [233.67633, 329.5114, 1278.7985, 119.76034, 259.16187, 291.63098, 335.00223, 249.91301, 1340.6895, 934.00305]
2025-09-12 03:57:30,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 145.0, 435.0, 64.0, 136.0, 145.0, 149.0, 127.0, 479.0, 317.0]
2025-09-12 03:57:30,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 15 hours, 57 minutes, 6 seconds)
2025-09-12 04:08:25,259 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:08:25,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:09:51,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 987.81769 ± 558.715
2025-09-12 04:09:51,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [85.86249, 1349.1371, 799.44434, 1637.6415, 354.03464, 755.88666, 1019.4224, 2078.295, 728.22406, 1070.2286]
2025-09-12 04:09:51,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [53.0, 440.0, 248.0, 568.0, 141.0, 233.0, 310.0, 664.0, 272.0, 330.0]
2025-09-12 04:09:51,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (987.82) for latency MM1Queue_a033_s075
2025-09-12 04:09:51,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 15 hours, 59 minutes, 59 seconds)
2025-09-12 04:20:48,646 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:20:48,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:21:52,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 659.37769 ± 319.738
2025-09-12 04:21:52,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [721.7696, 309.07477, 385.71075, 1088.0839, 427.0068, 1094.6256, 552.195, 377.52502, 464.08026, 1173.705]
2025-09-12 04:21:52,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [265.0, 132.0, 157.0, 430.0, 180.0, 332.0, 201.0, 161.0, 180.0, 428.0]
2025-09-12 04:21:52,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 15 hours, 56 minutes, 48 seconds)
2025-09-12 04:32:20,569 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:32:20,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:33:45,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 985.31543 ± 537.952
2025-09-12 04:33:45,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [86.646095, 2271.7278, 1217.7001, 1164.3136, 1222.2048, 835.0615, 950.7073, 576.8389, 745.32837, 782.626]
2025-09-12 04:33:45,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [51.0, 727.0, 398.0, 364.0, 379.0, 280.0, 311.0, 203.0, 286.0, 250.0]
2025-09-12 04:33:45,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 42 minutes, 3 seconds)
2025-09-12 04:44:36,083 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:44:36,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:46:23,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1211.15479 ± 781.771
2025-09-12 04:46:23,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [904.6369, 72.86282, 983.41144, 1264.4274, 1702.3433, 2359.6204, 2539.3865, 1239.4965, 78.804596, 966.55774]
2025-09-12 04:46:23,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [297.0, 42.0, 330.0, 441.0, 601.0, 773.0, 849.0, 403.0, 48.0, 327.0]
2025-09-12 04:46:23,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (1211.15) for latency MM1Queue_a033_s075
2025-09-12 04:46:23,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 31 minutes, 39 seconds)
2025-09-12 04:56:58,026 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:56:58,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:58:12,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 841.21545 ± 262.783
2025-09-12 04:58:12,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [770.6977, 674.3925, 247.64465, 1015.6842, 1044.4651, 1009.6823, 1239.4797, 751.177, 967.83093, 691.10016]
2025-09-12 04:58:12,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [265.0, 246.0, 113.0, 330.0, 338.0, 327.0, 390.0, 260.0, 315.0, 258.0]
2025-09-12 04:58:12,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 22 minutes, 48 seconds)
2025-09-12 05:08:45,845 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:08:45,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:10:15,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1033.09692 ± 434.487
2025-09-12 05:10:15,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1240.3947, 1017.3584, 122.532875, 1019.69415, 1244.1877, 1270.1816, 612.2785, 1039.0587, 886.14984, 1879.134]
2025-09-12 05:10:15,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [398.0, 333.0, 65.0, 327.0, 419.0, 408.0, 230.0, 332.0, 283.0, 619.0]
2025-09-12 05:10:15,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 5 minutes, 53 seconds)
2025-09-12 05:21:02,613 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:21:02,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:22:07,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 725.93311 ± 492.479
2025-09-12 05:22:07,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [375.8906, 918.4502, 136.03033, 1153.7443, 1692.4662, 647.1462, 1161.115, 133.1159, 229.74239, 811.62994]
2025-09-12 05:22:07,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [148.0, 284.0, 70.0, 367.0, 591.0, 232.0, 361.0, 69.0, 101.0, 287.0]
2025-09-12 05:22:07,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 14 hours, 51 minutes, 37 seconds)
2025-09-12 05:32:38,547 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:32:38,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:34:15,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1081.97522 ± 817.550
2025-09-12 05:34:15,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [836.942, 593.7147, 47.91191, 990.63007, 2366.681, 64.44702, 861.4336, 1942.3842, 727.07825, 2388.5298]
2025-09-12 05:34:15,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [304.0, 224.0, 33.0, 343.0, 778.0, 39.0, 305.0, 602.0, 268.0, 799.0]
2025-09-12 05:34:15,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 43 minutes, 15 seconds)
2025-09-12 05:45:07,163 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:45:07,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:46:45,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1118.49915 ± 672.349
2025-09-12 05:46:45,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1286.9203, 2052.176, 667.65564, 2587.6675, 966.70215, 754.49756, 967.67316, 746.4162, 1000.08264, 155.19913]
2025-09-12 05:46:45,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [454.0, 652.0, 250.0, 832.0, 314.0, 263.0, 322.0, 255.0, 336.0, 75.0]
2025-09-12 05:46:45,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 29 minutes, 17 seconds)
2025-09-12 05:57:50,403 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:57:50,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:59:27,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1089.26721 ± 954.760
2025-09-12 05:59:27,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [115.036125, 901.12787, 125.05446, 2234.782, 195.29857, 3028.6475, 938.67175, 2006.0989, 919.6034, 428.3511]
2025-09-12 05:59:27,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 323.0, 66.0, 723.0, 93.0, 1000.0, 316.0, 627.0, 285.0, 174.0]
2025-09-12 05:59:27,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 29 minutes, 35 seconds)
2025-09-12 06:09:59,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:09:59,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:11:43,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1207.68384 ± 665.736
2025-09-12 06:11:43,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2760.9797, 1145.8894, 1545.9731, 948.5932, 1403.5884, 734.4085, 202.5552, 519.5012, 1345.6646, 1469.686]
2025-09-12 06:11:43,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [877.0, 385.0, 500.0, 299.0, 445.0, 239.0, 94.0, 199.0, 427.0, 468.0]
2025-09-12 06:11:43,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 20 minutes, 43 seconds)
2025-09-12 06:23:10,466 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:23:10,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:25:13,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1294.80774 ± 856.071
2025-09-12 06:25:13,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1228.1951, 2878.6565, 1123.7162, 1343.4661, 741.1513, 1199.4021, 448.8553, 256.20602, 857.8567, 2870.5725]
2025-09-12 06:25:13,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [444.0, 1000.0, 394.0, 500.0, 274.0, 432.0, 177.0, 131.0, 340.0, 1000.0]
2025-09-12 06:25:13,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (1294.81) for latency MM1Queue_a033_s075
2025-09-12 06:25:13,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 30 minutes, 48 seconds)
2025-09-12 06:35:14,818 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:35:14,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:36:27,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 831.40149 ± 500.726
2025-09-12 06:36:27,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [968.3568, 1017.5366, 209.81662, 890.2633, 119.94209, 180.9774, 1401.071, 1697.3635, 761.6475, 1067.0404]
2025-09-12 06:36:27,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [308.0, 326.0, 95.0, 307.0, 66.0, 89.0, 443.0, 537.0, 249.0, 353.0]
2025-09-12 06:36:27,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 14 hours, 6 minutes, 5 seconds)
2025-09-12 06:47:40,972 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:47:40,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:48:59,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 895.76056 ± 579.444
2025-09-12 06:48:59,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [617.57495, 1129.9944, 962.88403, 2144.9226, 808.1157, 79.87961, 1439.29, 120.47094, 625.79663, 1028.6771]
2025-09-12 06:48:59,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [230.0, 372.0, 299.0, 667.0, 288.0, 46.0, 450.0, 64.0, 218.0, 329.0]
2025-09-12 06:48:59,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 53 minutes, 56 seconds)
2025-09-12 06:59:24,744 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:59:24,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:00:53,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 933.96161 ± 879.626
2025-09-12 07:00:53,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2354.4856, 1259.1233, 180.57825, 1196.2931, 243.35907, 103.617065, 180.48007, 705.83044, 2656.2292, 459.62006]
2025-09-12 07:00:53,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [814.0, 442.0, 87.0, 374.0, 112.0, 56.0, 86.0, 256.0, 928.0, 182.0]
2025-09-12 07:00:53,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 30 minutes, 56 seconds)
2025-09-12 07:12:08,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:12:08,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:14:03,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1246.87244 ± 860.448
2025-09-12 07:14:03,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1082.7716, 723.9852, 2391.8943, 2296.3809, 2127.0247, 2098.7104, 592.2428, 107.86635, 66.02415, 981.8231]
2025-09-12 07:14:03,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [400.0, 254.0, 883.0, 721.0, 746.0, 687.0, 215.0, 60.0, 40.0, 358.0]
2025-09-12 07:14:03,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 30 minutes, 12 seconds)
2025-09-12 07:24:58,838 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:24:58,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:26:18,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 858.76038 ± 435.321
2025-09-12 07:26:18,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [847.28534, 976.34894, 936.3933, 1671.0217, 1110.4039, 948.1894, 692.7651, 1130.1033, 102.281044, 172.81119]
2025-09-12 07:26:18,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [294.0, 370.0, 328.0, 540.0, 403.0, 343.0, 272.0, 355.0, 57.0, 84.0]
2025-09-12 07:26:18,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 13 hours, 1 minute, 44 seconds)
2025-09-12 07:36:37,276 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:36:37,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:37:54,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 872.82452 ± 725.961
2025-09-12 07:37:54,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [861.5443, 835.6032, 420.0675, 2552.9858, 107.50994, 1179.9778, 275.03424, 34.933605, 1580.8099, 879.7787]
2025-09-12 07:37:54,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [267.0, 257.0, 164.0, 839.0, 61.0, 391.0, 120.0, 26.0, 550.0, 323.0]
2025-09-12 07:37:54,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 54 minutes, 6 seconds)
2025-09-12 07:48:32,763 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:48:32,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:50:44,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1617.89795 ± 493.075
2025-09-12 07:50:44,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1949.1239, 1750.086, 1372.2415, 2161.2192, 1047.9865, 1023.30817, 1159.3629, 2515.567, 1244.0508, 1956.0334]
2025-09-12 07:50:44,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [612.0, 564.0, 442.0, 687.0, 354.0, 324.0, 383.0, 799.0, 384.0, 610.0]
2025-09-12 07:50:44,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (1617.90) for latency MM1Queue_a033_s075
2025-09-12 07:50:44,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 45 minutes, 38 seconds)
2025-09-12 08:01:03,700 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:01:03,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:02:29,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 936.02100 ± 691.613
2025-09-12 08:02:29,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [884.23096, 1311.8362, 454.53488, 1915.2834, 2248.0796, 799.83746, 83.23829, 336.11536, 1149.7898, 177.26384]
2025-09-12 08:02:29,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [284.0, 415.0, 180.0, 602.0, 719.0, 270.0, 50.0, 152.0, 405.0, 86.0]
2025-09-12 08:02:29,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 31 minutes, 32 seconds)
2025-09-12 08:14:36,089 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:14:36,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:16:05,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 935.60468 ± 837.141
2025-09-12 08:16:05,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1273.6755, 81.88858, 767.7132, 967.9252, 1101.6241, 101.338036, 1515.5286, 2980.2024, 135.00162, 431.14798]
2025-09-12 08:16:05,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [439.0, 49.0, 278.0, 320.0, 355.0, 57.0, 481.0, 965.0, 68.0, 170.0]
2025-09-12 08:16:05,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 24 minutes, 26 seconds)
2025-09-12 08:27:20,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:27:20,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:29:41,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1381.83728 ± 675.687
2025-09-12 08:29:41,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1595.9417, 635.53, 1327.1166, 979.9207, 1513.6013, 2048.7842, 1517.191, 2702.5808, 1370.1366, 127.569984]
2025-09-12 08:29:41,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [576.0, 241.0, 498.0, 359.0, 513.0, 693.0, 569.0, 1000.0, 500.0, 68.0]
2025-09-12 08:29:41,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 28 minutes, 3 seconds)
2025-09-12 08:41:31,391 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:41:31,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:43:19,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1058.80688 ± 691.847
2025-09-12 08:43:19,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [480.36346, 598.1475, 285.06055, 1317.6442, 1572.8851, 1526.1984, 1168.007, 2627.1462, 438.51578, 574.10077]
2025-09-12 08:43:19,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 228.0, 126.0, 468.0, 548.0, 541.0, 409.0, 909.0, 172.0, 223.0]
2025-09-12 08:43:19,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 38 minutes, 59 seconds)
2025-09-12 08:54:51,736 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:54:51,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:55:59,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 704.36609 ± 657.999
2025-09-12 08:55:59,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [414.55817, 252.56914, 156.18427, 933.4525, 1466.3676, 117.06604, 1943.5979, 1472.0815, 213.23154, 74.55225]
2025-09-12 08:55:59,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [165.0, 121.0, 75.0, 298.0, 471.0, 64.0, 611.0, 468.0, 109.0, 44.0]
2025-09-12 08:55:59,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 12 hours, 23 minutes, 55 seconds)
2025-09-12 09:07:59,600 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:07:59,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:09:49,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1215.78784 ± 404.619
2025-09-12 09:09:49,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1483.3646, 1061.5394, 1111.527, 1604.3022, 1529.7792, 1244.5544, 459.77512, 535.1423, 1604.4675, 1523.4263]
2025-09-12 09:09:49,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [488.0, 337.0, 356.0, 505.0, 491.0, 427.0, 181.0, 204.0, 489.0, 473.0]
2025-09-12 09:09:49,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 12 hours, 34 minutes, 5 seconds)
2025-09-12 09:21:24,599 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:21:24,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:23:13,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1158.49060 ± 600.117
2025-09-12 09:23:13,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [985.89325, 1095.628, 1765.2325, 1067.9801, 2108.1575, 278.05145, 964.52313, 124.437874, 1492.7433, 1702.259]
2025-09-12 09:23:13,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [316.0, 394.0, 551.0, 386.0, 673.0, 121.0, 341.0, 67.0, 475.0, 517.0]
2025-09-12 09:23:13,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 12 hours, 18 minutes, 24 seconds)
2025-09-12 09:34:54,653 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:34:54,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:36:37,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1067.35242 ± 767.761
2025-09-12 09:36:37,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1759.6919, 1853.7133, 1472.832, 2501.833, 269.16257, 531.95355, 260.78668, 383.96793, 1273.0177, 366.565]
2025-09-12 09:36:37,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [554.0, 626.0, 464.0, 806.0, 123.0, 195.0, 117.0, 155.0, 397.0, 153.0]
2025-09-12 09:36:37,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 2 minutes, 52 seconds)
2025-09-12 09:48:44,886 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:48:44,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:50:31,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1110.87646 ± 687.827
2025-09-12 09:50:31,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [372.65747, 426.19427, 2638.4434, 1186.2047, 160.08835, 1307.0853, 1683.5927, 1188.3516, 1295.3358, 850.8116]
2025-09-12 09:50:31,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [153.0, 171.0, 864.0, 412.0, 77.0, 411.0, 553.0, 424.0, 406.0, 312.0]
2025-09-12 09:50:31,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 52 minutes, 18 seconds)
2025-09-12 10:02:07,523 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:02:07,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:03:25,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 781.96765 ± 568.618
2025-09-12 10:03:25,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [312.77286, 1156.8893, 250.23122, 254.92921, 2134.7734, 268.72327, 1236.1189, 795.87787, 696.0793, 713.2816]
2025-09-12 10:03:25,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [137.0, 398.0, 113.0, 116.0, 721.0, 119.0, 381.0, 298.0, 257.0, 250.0]
2025-09-12 10:03:25,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 41 minutes, 17 seconds)
2025-09-12 10:15:26,322 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:15:26,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:17:24,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1218.36987 ± 965.701
2025-09-12 10:17:24,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3099.64, 928.33453, 713.62836, 750.4449, 304.93945, 1352.1344, 73.492, 1968.3027, 412.69916, 2580.084]
2025-09-12 10:17:24,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 309.0, 255.0, 254.0, 143.0, 472.0, 43.0, 656.0, 160.0, 858.0]
2025-09-12 10:17:24,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 11 hours, 29 minutes, 26 seconds)
2025-09-12 10:29:05,144 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:29:05,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:30:05,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 600.16846 ± 501.135
2025-09-12 10:30:05,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1387.1617, 279.8961, 845.7062, 230.54811, 148.41719, 169.76501, 1337.9874, 339.10226, 89.01253, 1174.0878]
2025-09-12 10:30:05,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [424.0, 119.0, 322.0, 111.0, 82.0, 81.0, 422.0, 145.0, 52.0, 386.0]
2025-09-12 10:30:05,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 8 minutes, 46 seconds)
2025-09-12 10:41:55,162 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:41:55,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:44:04,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1430.99976 ± 634.591
2025-09-12 10:44:04,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2512.541, 1121.3517, 1159.389, 1813.3748, 1149.6055, 1373.361, 1595.3531, 93.617966, 2217.5466, 1273.8574]
2025-09-12 10:44:04,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [771.0, 378.0, 369.0, 559.0, 347.0, 413.0, 539.0, 54.0, 710.0, 397.0]
2025-09-12 10:44:04,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 11 hours, 54 seconds)
2025-09-12 10:55:54,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:55:54,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:58:07,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1322.19153 ± 923.816
2025-09-12 10:58:07,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1936.5804, 254.683, 996.64667, 165.07806, 848.89404, 1215.3583, 669.6022, 2851.4998, 1357.9272, 2925.6453]
2025-09-12 10:58:07,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [664.0, 113.0, 356.0, 79.0, 317.0, 387.0, 244.0, 965.0, 492.0, 1000.0]
2025-09-12 10:58:07,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 48 minutes, 52 seconds)
2025-09-12 11:09:38,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:09:38,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:11:01,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 848.77618 ± 611.884
2025-09-12 11:11:01,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [258.77243, 1281.6531, 1207.6421, 1778.9047, 546.533, 151.25331, 227.88933, 1882.0251, 709.577, 443.51062]
2025-09-12 11:11:01,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 423.0, 368.0, 617.0, 200.0, 74.0, 101.0, 574.0, 257.0, 181.0]
2025-09-12 11:11:01,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 35 minutes, 21 seconds)
2025-09-12 11:23:21,223 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:23:21,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:25:33,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1421.79517 ± 975.846
2025-09-12 11:25:33,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1819.2283, 705.25653, 1059.3027, 1094.4924, 619.09717, 1939.4926, 179.1655, 3096.8254, 631.9783, 3073.1123]
2025-09-12 11:25:33,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [609.0, 260.0, 334.0, 339.0, 230.0, 627.0, 85.0, 993.0, 241.0, 1000.0]
2025-09-12 11:25:33,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 27 minutes)
2025-09-12 11:37:36,362 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:37:36,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:39:38,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1307.69031 ± 767.526
2025-09-12 11:39:38,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1416.9972, 1864.762, 1417.9827, 1191.9858, 1443.782, 3019.2168, 1437.579, 72.85249, 790.9274, 420.81726]
2025-09-12 11:39:38,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [475.0, 588.0, 438.0, 370.0, 484.0, 952.0, 443.0, 46.0, 281.0, 166.0]
2025-09-12 11:39:38,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 10 hours, 25 minutes, 52 seconds)
2025-09-12 11:50:55,271 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:50:55,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:52:09,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 742.20697 ± 738.065
2025-09-12 11:52:09,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [121.71818, 343.38452, 606.6736, 101.541245, 1669.9851, 895.3033, 2431.0898, 88.32639, 222.08855, 941.9595]
2025-09-12 11:52:09,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 146.0, 236.0, 57.0, 540.0, 312.0, 802.0, 48.0, 101.0, 322.0]
2025-09-12 11:52:09,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 59 minutes, 14 seconds)
2025-09-12 12:04:40,556 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:04:40,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:06:11,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 954.89276 ± 696.542
2025-09-12 12:06:11,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1925.3945, 1460.5316, 1176.747, 120.33027, 367.3569, 490.33932, 2036.4562, 1431.9111, 193.2015, 346.65985]
2025-09-12 12:06:11,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [615.0, 434.0, 350.0, 66.0, 144.0, 184.0, 660.0, 487.0, 91.0, 145.0]
2025-09-12 12:06:11,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 45 minutes, 24 seconds)
2025-09-12 12:17:30,174 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:17:30,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:19:32,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1276.07141 ± 1027.739
2025-09-12 12:19:32,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [585.8377, 1936.8243, 238.46774, 920.2203, 2971.806, 273.54208, 3085.1123, 170.72963, 1100.8392, 1477.3353]
2025-09-12 12:19:32,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [212.0, 607.0, 105.0, 322.0, 1000.0, 119.0, 1000.0, 80.0, 378.0, 447.0]
2025-09-12 12:19:32,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 35 minutes, 39 seconds)
2025-09-12 12:31:12,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:31:12,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:33:15,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1398.51587 ± 669.470
2025-09-12 12:33:15,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [929.2555, 2343.165, 665.9165, 1016.13446, 1781.1023, 1965.566, 1429.9508, 256.6809, 1253.9756, 2343.4124]
2025-09-12 12:33:15,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [294.0, 722.0, 231.0, 345.0, 554.0, 616.0, 440.0, 112.0, 383.0, 737.0]
2025-09-12 12:33:15,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 15 minutes, 5 seconds)
2025-09-12 12:45:13,260 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:45:13,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:46:21,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 699.59241 ± 479.378
2025-09-12 12:46:21,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [952.70654, 48.174915, 178.53885, 81.731995, 1027.693, 1209.7185, 1355.9156, 489.4935, 450.71182, 1201.2394]
2025-09-12 12:46:21,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [321.0, 34.0, 84.0, 48.0, 349.0, 388.0, 443.0, 186.0, 172.0, 365.0]
2025-09-12 12:46:21,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 53 minutes, 47 seconds)
2025-09-12 12:58:15,497 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:58:15,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:00:02,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1080.95386 ± 810.554
2025-09-12 13:00:02,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1038.7666, 1129.2495, 99.49264, 111.40575, 1907.3204, 833.9661, 145.05957, 2405.363, 932.48773, 2206.4277]
2025-09-12 13:00:02,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [382.0, 365.0, 55.0, 61.0, 629.0, 294.0, 73.0, 821.0, 324.0, 739.0]
2025-09-12 13:00:02,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 49 minutes, 27 seconds)
2025-09-12 13:11:39,295 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:11:39,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:13:18,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1026.42017 ± 739.793
2025-09-12 13:13:18,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [884.68945, 1226.7264, 1703.3157, 1211.0105, 334.17358, 431.236, 126.944664, 191.48264, 1645.9803, 2508.6416]
2025-09-12 13:13:18,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [311.0, 415.0, 569.0, 398.0, 141.0, 161.0, 66.0, 90.0, 493.0, 818.0]
2025-09-12 13:13:18,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 30 minutes, 8 seconds)
2025-09-12 13:25:31,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:25:31,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:28:12,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1689.55542 ± 868.829
2025-09-12 13:28:12,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1714.9266, 3067.8718, 1734.214, 2384.04, 658.84357, 3054.1917, 1770.936, 1008.6986, 810.3055, 691.52734]
2025-09-12 13:28:12,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [542.0, 1000.0, 541.0, 785.0, 256.0, 1000.0, 593.0, 346.0, 283.0, 245.0]
2025-09-12 13:28:12,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (1689.56) for latency MM1Queue_a033_s075
2025-09-12 13:28:12,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 28 minutes, 1 second)
2025-09-12 13:39:49,729 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:39:49,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:41:39,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1123.36267 ± 1004.004
2025-09-12 13:41:39,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [204.41896, 78.39674, 1552.7607, 331.0907, 2975.9507, 1214.8456, 694.78143, 2788.2131, 1231.1925, 161.97696]
2025-09-12 13:41:39,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 47.0, 478.0, 138.0, 1000.0, 434.0, 249.0, 933.0, 385.0, 79.0]
2025-09-12 13:41:39,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 12 minutes, 27 seconds)
2025-09-12 13:53:24,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:53:24,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:55:07,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1129.09070 ± 518.278
2025-09-12 13:55:07,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1357.2822, 1499.5317, 628.32117, 1281.894, 1544.5507, 798.1747, 1923.1769, 1442.55, 700.1065, 115.31888]
2025-09-12 13:55:07,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [440.0, 440.0, 224.0, 423.0, 460.0, 280.0, 616.0, 435.0, 249.0, 63.0]
2025-09-12 13:55:07,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 1 minute, 22 seconds)
2025-09-12 14:07:44,722 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:07:44,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:08:51,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 710.03986 ± 646.486
2025-09-12 14:08:51,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [94.80428, 538.298, 1191.2097, 2160.3923, 89.27544, 899.1045, 73.24711, 88.20957, 790.3414, 1175.5159]
2025-09-12 14:08:51,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 190.0, 370.0, 667.0, 52.0, 308.0, 45.0, 51.0, 269.0, 363.0]
2025-09-12 14:08:51,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 47 minutes, 57 seconds)
2025-09-12 14:20:19,146 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:20:19,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:22:15,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1258.79102 ± 752.355
2025-09-12 14:22:15,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1547.5648, 1070.851, 1568.7853, 2630.086, 1509.4305, 2022.4418, 895.1734, 73.84282, 1184.5323, 85.20233]
2025-09-12 14:22:15,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [472.0, 362.0, 510.0, 797.0, 466.0, 673.0, 293.0, 43.0, 404.0, 50.0]
2025-09-12 14:22:15,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 35 minutes, 1 second)
2025-09-12 14:34:23,756 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:34:23,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:36:11,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1201.49609 ± 690.794
2025-09-12 14:36:11,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1786.3727, 161.13264, 819.34845, 65.93704, 1475.9412, 1643.277, 842.1885, 1187.5786, 2319.4543, 1713.731]
2025-09-12 14:36:11,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [550.0, 77.0, 279.0, 40.0, 437.0, 492.0, 279.0, 389.0, 760.0, 520.0]
2025-09-12 14:36:11,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 15 minutes, 10 seconds)
2025-09-12 14:47:35,141 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:47:35,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:49:35,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1318.93335 ± 703.444
2025-09-12 14:49:35,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1812.7753, 165.12323, 1092.9053, 1854.318, 1923.3191, 1354.29, 359.779, 1336.8513, 2521.3433, 768.6286]
2025-09-12 14:49:35,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [558.0, 80.0, 385.0, 593.0, 625.0, 423.0, 146.0, 445.0, 761.0, 270.0]
2025-09-12 14:49:35,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 1 minute, 14 seconds)
2025-09-12 15:01:18,930 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:01:18,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:02:38,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 877.22803 ± 491.406
2025-09-12 15:02:38,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [32.021652, 1106.0652, 408.7321, 1452.6565, 1129.3586, 1165.7206, 813.1487, 990.81006, 154.83148, 1518.9352]
2025-09-12 15:02:38,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 332.0, 159.0, 433.0, 341.0, 364.0, 269.0, 309.0, 74.0, 463.0]
2025-09-12 15:02:38,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 45 minutes, 4 seconds)
2025-09-12 15:14:50,762 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:14:50,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:15:53,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 642.73456 ± 383.077
2025-09-12 15:15:53,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [584.48596, 1254.1962, 612.95154, 180.60547, 692.074, 773.96826, 591.82935, 281.08804, 1323.1149, 133.03246]
2025-09-12 15:15:53,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [209.0, 388.0, 226.0, 85.0, 249.0, 258.0, 217.0, 119.0, 438.0, 69.0]
2025-09-12 15:15:53,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 28 minutes, 49 seconds)
2025-09-12 15:27:31,241 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:27:31,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:29:59,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1606.49182 ± 982.365
2025-09-12 15:29:59,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [654.1959, 678.5401, 3105.428, 912.9451, 1334.684, 1262.7031, 2188.764, 2407.231, 3162.9043, 357.5217]
2025-09-12 15:29:59,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [240.0, 243.0, 1000.0, 312.0, 406.0, 432.0, 668.0, 753.0, 1000.0, 148.0]
2025-09-12 15:29:59,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 19 minutes, 16 seconds)
2025-09-12 15:41:39,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:41:39,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:43:21,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1138.82031 ± 564.575
2025-09-12 15:43:21,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1671.6036, 1164.1609, 823.55286, 63.065357, 936.88824, 1002.2782, 623.83124, 1202.3247, 1970.2493, 1930.2494]
2025-09-12 15:43:21,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [518.0, 355.0, 281.0, 38.0, 295.0, 306.0, 223.0, 402.0, 599.0, 598.0]
2025-09-12 15:43:21,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 2 minutes, 42 seconds)
2025-09-12 15:55:19,676 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:55:19,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:57:36,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1480.23096 ± 1132.166
2025-09-12 15:57:36,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3195.7307, 3131.314, 1796.8356, 184.99692, 280.01675, 183.09523, 1271.1229, 1143.3094, 2776.4478, 839.4407]
2025-09-12 15:57:36,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 571.0, 89.0, 119.0, 87.0, 416.0, 344.0, 906.0, 288.0]
2025-09-12 15:57:36,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 53 minutes, 40 seconds)
2025-09-12 16:10:30,689 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:10:30,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:12:20,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1184.03931 ± 724.149
2025-09-12 16:12:20,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1665.7023, 98.728546, 447.55298, 865.05896, 985.7812, 1726.2633, 905.35315, 1403.6234, 2816.2505, 926.0778]
2025-09-12 16:12:20,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [500.0, 54.0, 175.0, 270.0, 332.0, 516.0, 313.0, 457.0, 900.0, 306.0]
2025-09-12 16:12:20,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 48 minutes, 28 seconds)
2025-09-12 16:23:30,193 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:23:30,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:25:23,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1213.06421 ± 750.039
2025-09-12 16:25:23,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [353.22415, 2136.756, 1045.1096, 2526.2139, 968.7715, 560.17975, 1156.5732, 1425.9795, 74.3815, 1883.4524]
2025-09-12 16:25:23,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [144.0, 698.0, 320.0, 809.0, 324.0, 201.0, 384.0, 427.0, 45.0, 631.0]
2025-09-12 16:25:23,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 33 minutes, 32 seconds)
2025-09-12 16:36:59,206 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:36:59,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:38:08,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 749.85608 ± 697.276
2025-09-12 16:38:08,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1241.6106, 1069.0092, 2275.5203, 287.49948, 98.88833, 1373.0259, 82.05244, 106.86026, 197.54039, 766.55334]
2025-09-12 16:38:08,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [407.0, 321.0, 722.0, 124.0, 56.0, 435.0, 47.0, 57.0, 90.0, 231.0]
2025-09-12 16:38:08,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 13 minutes, 30 seconds)
2025-09-12 16:50:04,806 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:50:04,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:51:33,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 958.48401 ± 902.155
2025-09-12 16:51:33,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1857.9402, 1115.7985, 498.91156, 1209.6682, 581.95386, 121.17809, 3188.3901, 354.5216, 160.55322, 495.92618]
2025-09-12 16:51:33,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [590.0, 331.0, 186.0, 375.0, 195.0, 62.0, 1000.0, 144.0, 76.0, 185.0]
2025-09-12 16:51:33,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 3 seconds)
2025-09-12 17:03:50,645 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:03:50,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:05:48,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1258.74023 ± 629.100
2025-09-12 17:05:48,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1135.4545, 1377.1954, 810.38434, 2282.6858, 1740.7874, 450.0065, 455.66217, 812.18524, 1291.0874, 2231.953]
2025-09-12 17:05:48,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [368.0, 427.0, 277.0, 709.0, 568.0, 170.0, 170.0, 269.0, 443.0, 667.0]
2025-09-12 17:05:48,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 46 minutes, 23 seconds)
2025-09-12 17:17:06,158 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:17:06,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:18:32,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 980.99823 ± 601.393
2025-09-12 17:18:32,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [891.34985, 1420.5463, 1314.8065, 825.86426, 871.1623, 876.5174, 1156.5686, 119.80334, 67.68539, 2265.6777]
2025-09-12 17:18:32,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [267.0, 427.0, 393.0, 248.0, 258.0, 272.0, 364.0, 64.0, 42.0, 699.0]
2025-09-12 17:18:32,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 24 minutes, 47 seconds)
2025-09-12 17:30:30,707 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:30:30,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:33:01,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1641.28064 ± 969.922
2025-09-12 17:33:01,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2113.8054, 2468.721, 837.48486, 944.2777, 861.32715, 716.1271, 3181.6338, 1833.3014, 3046.9268, 409.2021]
2025-09-12 17:33:01,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [708.0, 790.0, 288.0, 309.0, 296.0, 247.0, 1000.0, 588.0, 916.0, 157.0]
2025-09-12 17:33:01,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 17 minutes, 2 seconds)
2025-09-12 17:45:02,138 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:45:02,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:46:43,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1091.81128 ± 692.088
2025-09-12 17:46:43,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [177.95789, 933.61, 294.70676, 943.8688, 2082.1455, 2372.593, 1128.9734, 1631.5883, 757.882, 594.787]
2025-09-12 17:46:43,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 287.0, 125.0, 320.0, 665.0, 760.0, 350.0, 508.0, 252.0, 207.0]
2025-09-12 17:46:43,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 6 minutes, 53 seconds)
2025-09-12 17:58:23,543 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:58:23,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:00:34,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1455.06702 ± 1021.712
2025-09-12 18:00:34,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1382.8888, 3214.7024, 1759.7189, 3078.5476, 804.33673, 691.76324, 592.9456, 638.9459, 2207.4626, 179.35768]
2025-09-12 18:00:34,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [428.0, 993.0, 535.0, 966.0, 249.0, 228.0, 222.0, 226.0, 705.0, 83.0]
2025-09-12 18:00:34,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 54 minutes, 40 seconds)
2025-09-12 18:12:37,588 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:12:37,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:14:25,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1183.16687 ± 889.700
2025-09-12 18:14:25,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3147.598, 154.62816, 178.61911, 1676.1775, 824.23267, 1950.2338, 849.85266, 1667.0385, 1008.63257, 374.65625]
2025-09-12 18:14:25,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 78.0, 82.0, 551.0, 261.0, 587.0, 293.0, 494.0, 322.0, 147.0]
2025-09-12 18:14:25,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 39 minutes, 37 seconds)
2025-09-12 18:26:12,454 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:26:12,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:28:18,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1364.52612 ± 1156.214
2025-09-12 18:28:18,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2540.8455, 3165.2634, 287.88217, 702.4512, 3111.3623, 300.0199, 79.56745, 675.99695, 2036.3889, 745.48376]
2025-09-12 18:28:18,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [808.0, 1000.0, 122.0, 233.0, 1000.0, 126.0, 48.0, 234.0, 646.0, 258.0]
2025-09-12 18:28:18,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 29 minutes, 19 seconds)
2025-09-12 18:40:09,205 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:40:09,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:41:28,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 870.80261 ± 807.154
2025-09-12 18:41:28,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [822.4818, 214.9173, 986.65063, 121.53459, 881.1944, 838.52655, 459.80124, 3062.0042, 187.33076, 1133.5853]
2025-09-12 18:41:28,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [254.0, 96.0, 297.0, 62.0, 267.0, 265.0, 170.0, 920.0, 92.0, 349.0]
2025-09-12 18:41:28,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 11 minutes, 37 seconds)
2025-09-12 18:53:36,542 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:53:36,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:56:32,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1920.93384 ± 1013.857
2025-09-12 18:56:32,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2631.7175, 3106.3096, 665.63666, 2802.14, 1307.6154, 389.1828, 2701.6743, 1409.5446, 3184.1318, 1011.3854]
2025-09-12 18:56:32,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [840.0, 1000.0, 230.0, 895.0, 437.0, 152.0, 862.0, 480.0, 1000.0, 343.0]
2025-09-12 18:56:32,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (1920.93) for latency MM1Queue_a033_s075
2025-09-12 18:56:32,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 1 minute, 32 seconds)
2025-09-12 19:08:09,281 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:08:09,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:10:06,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1257.53577 ± 714.385
2025-09-12 19:10:06,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [976.476, 2388.4648, 1004.4735, 590.41956, 643.8494, 110.90063, 1009.65405, 1782.2158, 2057.4778, 2011.4257]
2025-09-12 19:10:06,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [334.0, 772.0, 335.0, 209.0, 220.0, 60.0, 331.0, 547.0, 655.0, 641.0]
2025-09-12 19:10:06,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 46 minutes, 51 seconds)
2025-09-12 19:22:07,261 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:22:07,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:24:03,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1226.61682 ± 899.484
2025-09-12 19:24:03,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [398.2571, 742.57574, 2053.1665, 114.51704, 1644.815, 130.02907, 1534.2317, 1069.6901, 1425.0212, 3153.864]
2025-09-12 19:24:03,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [150.0, 269.0, 678.0, 60.0, 527.0, 66.0, 510.0, 369.0, 474.0, 1000.0]
2025-09-12 19:24:03,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 33 minutes, 11 seconds)
2025-09-12 19:36:19,135 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:36:19,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:38:47,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1638.20044 ± 1041.399
2025-09-12 19:38:47,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [311.37372, 1390.92, 1377.2317, 262.63998, 1985.0933, 1192.2826, 2758.7087, 757.9405, 3134.1047, 3211.708]
2025-09-12 19:38:47,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 448.0, 430.0, 111.0, 619.0, 339.0, 885.0, 249.0, 1000.0, 1000.0]
2025-09-12 19:38:47,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 20 minutes, 58 seconds)
2025-09-12 19:50:33,754 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:50:33,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:53:44,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 2071.04102 ± 931.102
2025-09-12 19:53:44,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2625.2903, 3113.223, 3103.9304, 1571.4785, 1066.1437, 2539.1343, 1145.1348, 2879.8796, 2365.1282, 301.06647]
2025-09-12 19:53:44,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [840.0, 1000.0, 1000.0, 509.0, 356.0, 782.0, 342.0, 928.0, 771.0, 123.0]
2025-09-12 19:53:44,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (2071.04) for latency MM1Queue_a033_s075
2025-09-12 19:53:44,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 10 minutes, 5 seconds)
2025-09-12 20:05:50,600 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:05:50,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:09:14,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 2245.07422 ± 1166.007
2025-09-12 20:09:14,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [475.36435, 3179.1125, 3169.446, 2654.9312, 3186.6816, 382.3153, 684.7354, 3187.4749, 2348.002, 3182.6807]
2025-09-12 20:09:14,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [181.0, 1000.0, 1000.0, 841.0, 1000.0, 149.0, 238.0, 1000.0, 746.0, 1000.0]
2025-09-12 20:09:14,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (2245.07) for latency MM1Queue_a033_s075
2025-09-12 20:09:14,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 56 minutes, 18 seconds)
2025-09-12 20:20:41,127 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:20:41,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:22:04,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 928.07373 ± 386.823
2025-09-12 20:22:04,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1427.6276, 96.6594, 1154.2178, 844.35406, 1461.9004, 573.53644, 1139.6439, 998.4125, 786.5923, 797.79333]
2025-09-12 20:22:04,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [436.0, 55.0, 347.0, 249.0, 466.0, 214.0, 335.0, 314.0, 262.0, 266.0]
2025-09-12 20:22:04,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 40 minutes, 45 seconds)
2025-09-12 20:34:26,802 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:34:26,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:37:02,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1720.63281 ± 1249.206
2025-09-12 20:37:02,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3051.1272, 1018.9908, 117.546776, 1099.1641, 138.722, 2569.6033, 3220.543, 2630.2456, 251.39163, 3108.9941]
2025-09-12 20:37:02,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [945.0, 308.0, 63.0, 365.0, 69.0, 771.0, 1000.0, 852.0, 110.0, 1000.0]
2025-09-12 20:37:02,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 27 minutes, 34 seconds)
2025-09-12 20:48:18,097 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:48:18,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:50:34,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1499.24304 ± 1225.797
2025-09-12 20:50:34,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [201.77121, 47.510918, 80.96641, 111.18582, 2493.0928, 3020.1345, 2245.8972, 3166.2708, 2293.7593, 1331.8405]
2025-09-12 20:50:34,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 33.0, 48.0, 61.0, 745.0, 912.0, 726.0, 1000.0, 720.0, 427.0]
2025-09-12 20:50:34,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 11 minutes, 46 seconds)
2025-09-12 21:02:31,206 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:02:31,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:04:52,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1544.64722 ± 886.337
2025-09-12 21:04:52,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2056.4573, 1558.9877, 1506.219, 1938.508, 1210.088, 3079.526, 303.31094, 216.59822, 909.1994, 2667.5771]
2025-09-12 21:04:52,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [650.0, 506.0, 494.0, 620.0, 362.0, 1000.0, 130.0, 96.0, 283.0, 858.0]
2025-09-12 21:04:52,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 56 minutes, 54 seconds)
2025-09-12 21:17:36,880 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:17:36,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:20:14,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1760.37085 ± 823.632
2025-09-12 21:20:14,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [3228.152, 769.41614, 1205.4691, 1594.8962, 1162.059, 3231.2249, 1252.4572, 2262.427, 1656.7166, 1240.8903]
2025-09-12 21:20:14,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 266.0, 398.0, 491.0, 369.0, 1000.0, 406.0, 690.0, 532.0, 407.0]
2025-09-12 21:20:14,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 42 minutes, 36 seconds)
2025-09-12 21:31:30,940 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:31:30,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:33:44,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1484.31262 ± 1156.346
2025-09-12 21:33:44,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1960.5205, 253.932, 292.11755, 1797.0692, 3185.0837, 543.3938, 844.2154, 2592.2786, 3206.1338, 168.3808]
2025-09-12 21:33:44,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [596.0, 107.0, 122.0, 586.0, 1000.0, 194.0, 257.0, 803.0, 1000.0, 80.0]
2025-09-12 21:33:44,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 28 minutes, 39 seconds)
2025-09-12 21:45:21,202 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:45:21,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:46:52,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 965.53192 ± 1033.106
2025-09-12 21:46:52,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [466.12973, 370.0595, 2483.2493, 131.80954, 1496.8489, 314.8872, 429.94995, 72.45745, 3241.4763, 648.45123]
2025-09-12 21:46:52,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 146.0, 773.0, 68.0, 484.0, 127.0, 161.0, 45.0, 1000.0, 234.0]
2025-09-12 21:46:52,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 13 minutes, 57 seconds)
2025-09-12 21:58:50,205 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:58:50,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:00:56,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1397.74316 ± 943.107
2025-09-12 22:00:56,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [411.10992, 683.1165, 2236.4863, 918.09, 1159.8818, 859.15894, 3186.9307, 125.388954, 2081.9473, 2315.3208]
2025-09-12 22:00:56,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 239.0, 664.0, 312.0, 348.0, 279.0, 1000.0, 63.0, 670.0, 701.0]
2025-09-12 22:00:56,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1251 [DEBUG]: Training session finished
