2025-09-12 00:21:26,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc20-hopper/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 00:21:26,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc20-hopper/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 00:21:26,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1475ba7a5050>}
2025-09-12 00:21:26,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1111 [DEBUG]: using device: cuda
2025-09-12 00:21:26,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1133 [INFO]: Creating new trainer
2025-09-12 00:21:26,661 baseline-mbpac-noiseperc20-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-12 00:21:26,661 baseline-mbpac-noiseperc20-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 00:21:26,669 baseline-mbpac-noiseperc20-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 00:21:27,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1194 [DEBUG]: Starting training session...
2025-09-12 00:21:27,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 1/100
2025-09-12 00:30:59,397 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:30:59,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:31:08,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 62.12625 ± 13.241
2025-09-12 00:31:08,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [83.421776, 51.65451, 50.691475, 44.190235, 46.77426, 72.08544, 56.81463, 65.99124, 79.149536, 70.489365]
2025-09-12 00:31:08,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [45.0, 29.0, 29.0, 26.0, 27.0, 40.0, 32.0, 36.0, 42.0, 39.0]
2025-09-12 00:31:08,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (62.13) for latency MM1Queue_a033_s075
2025-09-12 00:31:08,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 15 hours, 58 minutes, 42 seconds)
2025-09-12 00:42:04,745 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:42:04,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:42:23,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 123.23440 ± 71.353
2025-09-12 00:42:23,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [56.82364, 96.796524, 132.43872, 165.08984, 10.617336, 166.1635, 262.32846, 105.8578, 48.330116, 187.89807]
2025-09-12 00:42:23,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [37.0, 61.0, 74.0, 82.0, 12.0, 86.0, 131.0, 72.0, 32.0, 94.0]
2025-09-12 00:42:23,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (123.23) for latency MM1Queue_a033_s075
2025-09-12 00:42:23,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 17 hours, 5 minutes, 18 seconds)
2025-09-12 00:53:28,802 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:53:28,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:54:54,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 308.85678 ± 364.053
2025-09-12 00:54:54,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [54.751854, 270.26645, 29.666218, 152.3439, 440.3515, 996.8867, 21.121109, 988.6068, 115.26967, 19.303541]
2025-09-12 00:54:54,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 268.0, 36.0, 150.0, 456.0, 1000.0, 28.0, 1000.0, 122.0, 30.0]
2025-09-12 00:54:54,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (308.86) for latency MM1Queue_a033_s075
2025-09-12 00:54:54,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 1 minute, 17 seconds)
2025-09-12 01:05:55,875 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:05:55,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:06:49,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 193.69885 ± 130.107
2025-09-12 01:06:49,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [55.463936, 92.02, 71.90459, 325.8788, 300.7394, 287.16763, 449.23074, 76.79714, 173.25436, 104.53183]
2025-09-12 01:06:49,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 90.0, 69.0, 367.0, 187.0, 324.0, 493.0, 84.0, 205.0, 97.0]
2025-09-12 01:06:49,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 8 minutes, 49 seconds)
2025-09-12 01:17:40,115 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:17:40,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:18:10,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 152.35721 ± 92.471
2025-09-12 01:18:10,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [251.44307, 46.14712, 242.32086, 47.578876, 318.21127, 122.41184, 127.785095, 53.656185, 214.21112, 99.80669]
2025-09-12 01:18:10,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 31.0, 113.0, 31.0, 269.0, 73.0, 88.0, 34.0, 238.0, 89.0]
2025-09-12 01:18:10,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 17 hours, 57 minutes, 30 seconds)
2025-09-12 01:29:10,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:29:10,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:29:46,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 220.87717 ± 100.568
2025-09-12 01:29:46,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [300.53717, 263.28314, 193.837, 46.483265, 34.46383, 229.08658, 358.072, 269.72046, 297.06058, 216.22772]
2025-09-12 01:29:46,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [156.0, 218.0, 112.0, 31.0, 26.0, 104.0, 223.0, 144.0, 166.0, 114.0]
2025-09-12 01:29:46,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 18 hours, 22 minutes, 9 seconds)
2025-09-12 01:40:47,973 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:40:47,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:41:35,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 249.65251 ± 192.844
2025-09-12 01:41:35,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [105.35873, 32.52295, 286.48508, 282.31085, 20.236027, 593.3259, 328.93695, 564.74756, 112.10505, 170.49599]
2025-09-12 01:41:35,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [109.0, 25.0, 137.0, 156.0, 19.0, 403.0, 314.0, 393.0, 83.0, 90.0]
2025-09-12 01:41:35,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 21 minutes, 12 seconds)
2025-09-12 01:52:44,476 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:52:44,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:54:18,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 455.45087 ± 231.735
2025-09-12 01:54:18,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [898.34174, 422.53412, 310.9908, 395.46814, 635.90686, 347.102, 744.66144, 37.339935, 353.8253, 408.33847]
2025-09-12 01:54:18,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [723.0, 222.0, 284.0, 212.0, 586.0, 312.0, 620.0, 28.0, 188.0, 218.0]
2025-09-12 01:54:18,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (455.45) for latency MM1Queue_a033_s075
2025-09-12 01:54:18,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 12 minutes, 58 seconds)
2025-09-12 02:05:15,606 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:05:15,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:05:49,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 284.50629 ± 69.589
2025-09-12 02:05:49,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [345.51822, 195.33258, 323.65387, 341.516, 305.73636, 118.2056, 324.90186, 323.665, 301.89966, 264.63345]
2025-09-12 02:05:49,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 91.0, 143.0, 164.0, 128.0, 61.0, 134.0, 136.0, 129.0, 129.0]
2025-09-12 02:05:49,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 17 hours, 53 minutes, 52 seconds)
2025-09-12 02:16:47,328 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:16:47,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:17:18,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 274.67795 ± 27.238
2025-09-12 02:17:18,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [276.91214, 290.25473, 275.09637, 261.38184, 290.72235, 300.1121, 203.41675, 284.23236, 302.36917, 262.28165]
2025-09-12 02:17:18,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 116.0, 115.0, 111.0, 115.0, 125.0, 100.0, 116.0, 125.0, 113.0]
2025-09-12 02:17:18,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 44 minutes, 23 seconds)
2025-09-12 02:28:09,614 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:28:09,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:28:42,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 281.84418 ± 49.328
2025-09-12 02:28:42,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [257.39603, 350.15155, 211.75896, 332.79358, 356.69806, 246.13347, 313.45612, 238.54034, 270.71167, 240.8021]
2025-09-12 02:28:42,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 146.0, 94.0, 135.0, 153.0, 109.0, 142.0, 108.0, 117.0, 105.0]
2025-09-12 02:28:42,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 28 minutes, 59 seconds)
2025-09-12 02:39:26,726 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:39:26,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:39:56,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 245.05417 ± 92.233
2025-09-12 02:39:56,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [175.03513, 324.67978, 328.88113, 279.73666, 170.90022, 348.02478, 131.6581, 244.73994, 357.04935, 89.83657]
2025-09-12 02:39:56,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 136.0, 143.0, 128.0, 84.0, 150.0, 70.0, 119.0, 155.0, 56.0]
2025-09-12 02:39:56,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 7 minutes, 1 second)
2025-09-12 02:50:37,058 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:50:37,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:51:05,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 261.59662 ± 124.810
2025-09-12 02:51:05,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [175.10681, 77.15235, 153.81218, 326.48453, 71.02475, 405.3575, 303.27713, 352.0873, 427.9675, 323.6961]
2025-09-12 02:51:05,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 45.0, 78.0, 138.0, 45.0, 145.0, 124.0, 138.0, 153.0, 138.0]
2025-09-12 02:51:05,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 16 hours, 28 minutes, 13 seconds)
2025-09-12 03:01:53,677 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:01:53,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:02:23,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 277.81216 ± 85.783
2025-09-12 03:02:23,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [165.87396, 286.91095, 286.4622, 297.02774, 359.82074, 385.48727, 174.77318, 350.02203, 342.78555, 128.95786]
2025-09-12 03:02:23,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 121.0, 122.0, 118.0, 137.0, 142.0, 85.0, 130.0, 134.0, 69.0]
2025-09-12 03:02:23,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 12 minutes, 59 seconds)
2025-09-12 03:13:09,713 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:13:09,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:13:42,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 325.19388 ± 123.366
2025-09-12 03:13:42,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [154.96288, 443.95227, 330.09714, 350.49728, 113.00586, 316.24985, 458.46674, 502.26923, 368.4079, 214.02957]
2025-09-12 03:13:42,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 154.0, 129.0, 137.0, 59.0, 129.0, 157.0, 168.0, 140.0, 95.0]
2025-09-12 03:13:42,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 15 hours, 58 minutes, 55 seconds)
2025-09-12 03:24:31,854 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:24:31,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:25:06,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 343.46594 ± 127.487
2025-09-12 03:25:06,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [480.34647, 143.34187, 402.09933, 497.36685, 343.9739, 338.1754, 374.67194, 437.99435, 85.52831, 331.16095]
2025-09-12 03:25:06,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [160.0, 73.0, 144.0, 165.0, 140.0, 131.0, 141.0, 154.0, 52.0, 132.0]
2025-09-12 03:25:06,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 15 hours, 47 minutes, 31 seconds)
2025-09-12 03:35:52,319 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:35:52,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:36:32,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 402.40421 ± 176.708
2025-09-12 03:36:32,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [270.69113, 702.0508, 515.62, 276.2871, 267.09225, 111.996414, 281.78726, 486.38138, 506.43347, 605.70215]
2025-09-12 03:36:32,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [109.0, 257.0, 175.0, 111.0, 109.0, 58.0, 120.0, 166.0, 178.0, 202.0]
2025-09-12 03:36:32,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 15 hours, 39 minutes, 24 seconds)
2025-09-12 03:47:29,394 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:47:29,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:48:04,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 362.36502 ± 156.119
2025-09-12 03:48:04,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [530.68634, 444.14597, 42.15617, 370.17395, 561.04395, 497.3649, 199.77095, 340.5512, 224.73976, 413.01715]
2025-09-12 03:48:04,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [181.0, 157.0, 33.0, 140.0, 181.0, 176.0, 91.0, 136.0, 98.0, 144.0]
2025-09-12 03:48:04,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 15 hours, 34 minutes, 33 seconds)
2025-09-12 03:58:51,040 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:58:51,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:59:32,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 433.12866 ± 112.667
2025-09-12 03:59:32,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [517.28687, 275.019, 430.06573, 470.19327, 489.0657, 514.1063, 532.8568, 172.37042, 508.72134, 421.60095]
2025-09-12 03:59:32,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 121.0, 155.0, 165.0, 170.0, 176.0, 171.0, 81.0, 175.0, 169.0]
2025-09-12 03:59:32,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 15 hours, 25 minutes, 36 seconds)
2025-09-12 04:10:14,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:10:14,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:10:56,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 443.56680 ± 214.735
2025-09-12 04:10:56,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [193.96541, 213.36868, 567.798, 466.4045, 97.35788, 688.18054, 575.7571, 403.1407, 805.09216, 424.60294]
2025-09-12 04:10:56,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 96.0, 195.0, 178.0, 54.0, 224.0, 181.0, 162.0, 275.0, 158.0]
2025-09-12 04:10:56,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 15 hours, 15 minutes, 49 seconds)
2025-09-12 04:21:45,373 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:21:45,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:22:43,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 633.37628 ± 157.642
2025-09-12 04:22:43,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [984.3016, 591.6984, 612.1684, 596.29535, 687.90796, 646.6595, 479.20105, 630.5187, 754.7324, 350.2793]
2025-09-12 04:22:43,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [339.0, 181.0, 232.0, 197.0, 226.0, 227.0, 192.0, 214.0, 256.0, 140.0]
2025-09-12 04:22:43,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (633.38) for latency MM1Queue_a033_s075
2025-09-12 04:22:44,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 15 hours, 10 minutes, 35 seconds)
2025-09-12 04:33:33,649 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:33:33,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:34:18,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 456.70435 ± 357.557
2025-09-12 04:34:18,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [728.1342, 717.8279, 810.9371, 1171.8846, 282.9825, 200.66164, 164.81238, 104.85755, 358.71686, 26.22857]
2025-09-12 04:34:18,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [256.0, 248.0, 251.0, 390.0, 131.0, 95.0, 84.0, 56.0, 140.0, 24.0]
2025-09-12 04:34:18,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 1 minute, 15 seconds)
2025-09-12 04:45:19,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:45:19,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:46:08,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 506.40543 ± 343.068
2025-09-12 04:46:08,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [650.1577, 515.39856, 71.3873, 764.785, 109.44377, 251.83023, 1084.3372, 63.32638, 803.0215, 750.3667]
2025-09-12 04:46:08,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [240.0, 218.0, 48.0, 260.0, 58.0, 112.0, 356.0, 44.0, 254.0, 232.0]
2025-09-12 04:46:08,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 14 hours, 54 minutes, 1 second)
2025-09-12 04:56:42,919 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:56:42,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:57:36,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 569.33130 ± 349.359
2025-09-12 04:57:36,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1195.1487, 180.93297, 353.96884, 790.6758, 191.52132, 925.10504, 392.03235, 365.5725, 311.81757, 986.5382]
2025-09-12 04:57:36,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [378.0, 91.0, 146.0, 246.0, 89.0, 302.0, 153.0, 146.0, 131.0, 326.0]
2025-09-12 04:57:36,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 14 hours, 42 minutes, 40 seconds)
2025-09-12 05:08:46,330 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:08:46,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:09:28,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 415.38950 ± 197.459
2025-09-12 05:09:28,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [161.46631, 516.7664, 415.0663, 707.09094, 289.76666, 184.41895, 525.0764, 596.4856, 617.0343, 140.72313]
2025-09-12 05:09:28,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 202.0, 166.0, 224.0, 130.0, 86.0, 202.0, 200.0, 224.0, 72.0]
2025-09-12 05:09:28,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 14 hours, 37 minutes, 50 seconds)
2025-09-12 05:19:55,564 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:19:55,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:20:48,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 556.32117 ± 185.493
2025-09-12 05:20:48,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [734.20776, 817.6707, 612.8245, 732.4659, 297.08835, 524.11316, 200.41998, 586.2183, 608.5696, 449.63345]
2025-09-12 05:20:48,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [235.0, 246.0, 238.0, 226.0, 127.0, 200.0, 94.0, 212.0, 220.0, 171.0]
2025-09-12 05:20:48,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 14 hours, 19 minutes, 26 seconds)
2025-09-12 05:31:31,801 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:31:31,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:32:32,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 626.78656 ± 358.140
2025-09-12 05:32:32,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [240.80736, 924.28845, 686.0822, 340.76633, 787.6544, 311.25766, 214.08672, 404.6073, 1226.491, 1131.8243]
2025-09-12 05:32:32,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 302.0, 232.0, 130.0, 257.0, 131.0, 100.0, 192.0, 392.0, 434.0]
2025-09-12 05:32:32,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 10 minutes, 17 seconds)
2025-09-12 05:43:25,750 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:43:25,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:44:15,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 532.26733 ± 315.965
2025-09-12 05:44:15,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [709.67255, 352.4946, 432.36063, 686.7272, 169.01566, 835.18634, 185.2316, 1157.9062, 650.57776, 143.50084]
2025-09-12 05:44:15,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [235.0, 144.0, 167.0, 214.0, 86.0, 266.0, 87.0, 385.0, 210.0, 81.0]
2025-09-12 05:44:15,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 13 hours, 56 minutes, 57 seconds)
2025-09-12 05:55:08,036 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:55:08,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:56:20,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 781.75922 ± 430.429
2025-09-12 05:56:20,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [984.49005, 613.0463, 270.1378, 814.1044, 144.98596, 247.95761, 1020.08856, 1475.6295, 1293.8141, 953.3378]
2025-09-12 05:56:20,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [350.0, 188.0, 121.0, 266.0, 69.0, 113.0, 324.0, 485.0, 422.0, 343.0]
2025-09-12 05:56:20,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (781.76) for latency MM1Queue_a033_s075
2025-09-12 05:56:20,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 13 hours, 53 minutes, 58 seconds)
2025-09-12 06:07:08,967 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:07:08,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:08:02,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 555.16467 ± 372.571
2025-09-12 06:08:02,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [537.39545, 215.10364, 599.18726, 753.0094, 1078.7457, 1039.8402, 948.98444, 213.50586, 128.68207, 37.192745]
2025-09-12 06:08:02,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [226.0, 96.0, 220.0, 255.0, 363.0, 341.0, 304.0, 100.0, 72.0, 31.0]
2025-09-12 06:08:02,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 13 hours, 39 minutes, 53 seconds)
2025-09-12 06:18:52,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:18:52,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:19:58,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 722.04895 ± 352.350
2025-09-12 06:19:58,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [893.73505, 211.79419, 1417.2292, 434.60553, 1231.4777, 738.9146, 527.8815, 710.89276, 604.5701, 449.3892]
2025-09-12 06:19:58,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [291.0, 93.0, 457.0, 167.0, 393.0, 241.0, 201.0, 223.0, 222.0, 174.0]
2025-09-12 06:19:58,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 13 hours, 36 minutes, 34 seconds)
2025-09-12 06:30:40,871 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:30:40,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:32:22,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1003.25061 ± 521.271
2025-09-12 06:32:22,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [563.0103, 732.0504, 2414.042, 840.76385, 800.4618, 809.7687, 1104.2139, 1295.7505, 977.28827, 495.15613]
2025-09-12 06:32:22,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [209.0, 290.0, 895.0, 266.0, 268.0, 307.0, 417.0, 508.0, 395.0, 195.0]
2025-09-12 06:32:22,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (1003.25) for latency MM1Queue_a033_s075
2025-09-12 06:32:22,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 33 minutes, 34 seconds)
2025-09-12 06:43:20,529 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:43:20,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:44:31,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 722.74432 ± 553.673
2025-09-12 06:44:31,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1142.4133, 129.84023, 321.07352, 684.99725, 507.32086, 864.9292, 243.42227, 967.8179, 289.68655, 2075.9421]
2025-09-12 06:44:31,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [429.0, 68.0, 135.0, 241.0, 192.0, 315.0, 109.0, 308.0, 127.0, 718.0]
2025-09-12 06:44:31,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 27 minutes, 38 seconds)
2025-09-12 06:55:16,917 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:55:16,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:56:14,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 596.24097 ± 397.345
2025-09-12 06:56:14,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [125.83355, 1087.0286, 1062.8788, 1063.8136, 874.991, 736.0485, 119.128784, 116.175545, 526.8846, 249.6264]
2025-09-12 06:56:14,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 398.0, 362.0, 345.0, 288.0, 238.0, 64.0, 61.0, 201.0, 115.0]
2025-09-12 06:56:14,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 10 minutes, 41 seconds)
2025-09-12 07:07:11,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:07:11,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:08:18,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 641.19617 ± 463.123
2025-09-12 07:08:18,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [343.2983, 267.628, 507.41644, 119.70788, 1366.2869, 586.04376, 827.1277, 154.71034, 690.6933, 1549.05]
2025-09-12 07:08:18,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 119.0, 193.0, 66.0, 482.0, 221.0, 324.0, 76.0, 262.0, 589.0]
2025-09-12 07:08:18,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 3 minutes, 26 seconds)
2025-09-12 07:19:13,759 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:19:13,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:19:52,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 345.90851 ± 214.980
2025-09-12 07:19:52,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [140.23752, 319.9397, 152.59921, 332.386, 430.41818, 283.736, 848.56946, 524.52545, 53.794933, 372.879]
2025-09-12 07:19:52,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 134.0, 74.0, 141.0, 173.0, 123.0, 345.0, 192.0, 41.0, 147.0]
2025-09-12 07:19:52,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 46 minutes, 36 seconds)
2025-09-12 07:30:49,483 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:30:49,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:31:50,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 636.60016 ± 407.384
2025-09-12 07:31:50,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [980.2138, 60.23886, 57.89092, 568.8064, 1371.6085, 1020.8479, 703.60614, 361.74597, 395.3364, 845.70654]
2025-09-12 07:31:50,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [304.0, 37.0, 42.0, 213.0, 471.0, 364.0, 254.0, 148.0, 157.0, 283.0]
2025-09-12 07:31:50,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 29 minutes, 23 seconds)
2025-09-12 07:42:20,150 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:42:20,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:43:06,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 449.27924 ± 293.005
2025-09-12 07:43:06,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [865.2659, 804.8531, 580.88873, 119.08677, 265.04468, 257.65063, 227.34052, 898.7453, 135.49347, 338.4234]
2025-09-12 07:43:06,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [314.0, 292.0, 200.0, 63.0, 123.0, 113.0, 101.0, 290.0, 70.0, 155.0]
2025-09-12 07:43:06,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 6 minutes, 16 seconds)
2025-09-12 07:54:16,535 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:54:16,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:55:19,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 661.62292 ± 228.826
2025-09-12 07:55:19,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [618.4856, 351.78033, 801.95056, 843.1022, 489.4145, 1080.6317, 324.39008, 524.0959, 820.212, 762.16675]
2025-09-12 07:55:19,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [240.0, 155.0, 263.0, 298.0, 190.0, 360.0, 139.0, 200.0, 266.0, 248.0]
2025-09-12 07:55:19,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 48 seconds)
2025-09-12 08:06:05,832 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:06:05,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:07:03,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 603.48822 ± 464.400
2025-09-12 08:07:03,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [564.7748, 354.0951, 1659.7177, 1005.9587, 255.13112, 357.33548, 1041.8489, 9.761119, 404.53455, 381.72455]
2025-09-12 08:07:03,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [210.0, 148.0, 538.0, 369.0, 112.0, 153.0, 325.0, 11.0, 157.0, 159.0]
2025-09-12 08:07:03,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 11 hours, 45 minutes, 8 seconds)
2025-09-12 08:17:46,020 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:17:46,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:19:03,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 871.41486 ± 397.697
2025-09-12 08:19:03,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1412.2264, 876.25165, 783.2978, 1043.4932, 139.72354, 1079.5509, 629.2215, 889.52924, 1481.3857, 379.46838]
2025-09-12 08:19:03,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [454.0, 286.0, 246.0, 334.0, 71.0, 355.0, 226.0, 279.0, 476.0, 156.0]
2025-09-12 08:19:03,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 11 hours, 38 minutes, 26 seconds)
2025-09-12 08:29:49,535 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:29:49,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:30:57,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 738.53674 ± 283.861
2025-09-12 08:30:57,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [941.8276, 1209.8845, 991.5125, 569.56976, 507.28644, 139.26712, 708.62695, 627.6442, 870.573, 819.175]
2025-09-12 08:30:57,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [295.0, 410.0, 320.0, 225.0, 207.0, 72.0, 271.0, 239.0, 287.0, 267.0]
2025-09-12 08:30:57,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 25 minutes, 49 seconds)
2025-09-12 08:41:40,138 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:41:40,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:42:44,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 728.85925 ± 333.486
2025-09-12 08:42:44,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1148.247, 1032.4037, 234.66164, 797.78033, 882.0192, 950.29065, 598.6979, 789.775, 827.4671, 27.250114]
2025-09-12 08:42:44,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [363.0, 318.0, 105.0, 255.0, 270.0, 298.0, 215.0, 303.0, 258.0, 24.0]
2025-09-12 08:42:44,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 19 minutes, 59 seconds)
2025-09-12 08:53:45,088 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:53:45,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:54:31,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 470.00067 ± 344.783
2025-09-12 08:54:31,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [197.89923, 1294.7511, 346.48474, 402.50034, 822.8126, 364.88824, 160.5969, 113.84021, 332.28253, 663.9509]
2025-09-12 08:54:31,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 431.0, 141.0, 158.0, 285.0, 143.0, 78.0, 61.0, 140.0, 208.0]
2025-09-12 08:54:31,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 3 minutes, 1 second)
2025-09-12 09:05:10,360 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:05:10,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:06:17,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 766.19727 ± 341.333
2025-09-12 09:06:17,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1312.3992, 842.31415, 104.87968, 905.0628, 786.5285, 180.81447, 812.20917, 914.0409, 886.7257, 916.9983]
2025-09-12 09:06:17,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [430.0, 266.0, 56.0, 300.0, 240.0, 89.0, 266.0, 289.0, 290.0, 299.0]
2025-09-12 09:06:17,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 10 hours, 51 minutes, 34 seconds)
2025-09-12 09:16:58,136 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:16:58,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:18:02,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 709.31970 ± 446.897
2025-09-12 09:18:02,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [885.71826, 1073.788, 1567.7413, 1045.7352, 617.5593, 284.76227, 307.39902, 175.52159, 959.76715, 175.20483]
2025-09-12 09:18:02,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [277.0, 343.0, 498.0, 343.0, 231.0, 121.0, 134.0, 86.0, 285.0, 85.0]
2025-09-12 09:18:02,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 10 hours, 37 minutes)
2025-09-12 09:28:49,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:28:49,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:29:46,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 598.41980 ± 354.406
2025-09-12 09:29:46,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [120.24377, 143.23763, 681.41455, 916.7989, 561.37805, 197.26605, 832.3526, 880.9134, 430.21872, 1220.374]
2025-09-12 09:29:46,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 71.0, 243.0, 283.0, 201.0, 102.0, 259.0, 320.0, 168.0, 440.0]
2025-09-12 09:29:46,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 23 minutes, 21 seconds)
2025-09-12 09:40:49,359 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:40:49,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:41:59,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 788.49866 ± 351.351
2025-09-12 09:41:59,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [310.72305, 1047.542, 870.15234, 1011.0576, 399.8505, 665.59906, 817.1015, 1108.7257, 1379.5692, 274.66504]
2025-09-12 09:41:59,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 335.0, 282.0, 327.0, 162.0, 238.0, 254.0, 343.0, 424.0, 116.0]
2025-09-12 09:41:59,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 16 minutes, 3 seconds)
2025-09-12 09:52:31,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:52:31,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:54:13,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1079.66724 ± 690.262
2025-09-12 09:54:13,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1324.868, 2476.2422, 519.456, 200.97636, 1538.6779, 191.40157, 924.5671, 880.61316, 1836.9714, 902.8978]
2025-09-12 09:54:13,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [477.0, 848.0, 203.0, 97.0, 578.0, 93.0, 339.0, 281.0, 587.0, 305.0]
2025-09-12 09:54:13,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1226 [INFO]: New best (1079.67) for latency MM1Queue_a033_s075
2025-09-12 09:54:13,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 9 minutes)
2025-09-12 10:05:07,700 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:05:07,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:06:08,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 651.27698 ± 451.455
2025-09-12 10:06:08,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [937.03534, 1674.5356, 228.37006, 179.67555, 357.70413, 633.1872, 875.0698, 571.3992, 118.80083, 936.9917]
2025-09-12 10:06:08,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [309.0, 560.0, 105.0, 86.0, 155.0, 231.0, 273.0, 214.0, 64.0, 290.0]
2025-09-12 10:06:08,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 9 hours, 58 minutes, 30 seconds)
2025-09-12 10:16:54,212 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:16:54,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:17:56,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 687.85803 ± 379.652
2025-09-12 10:17:56,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [183.45334, 152.95363, 824.4504, 802.16046, 1482.5619, 874.3107, 513.43524, 838.3143, 863.7275, 343.21298]
2025-09-12 10:17:56,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 82.0, 265.0, 276.0, 470.0, 279.0, 190.0, 263.0, 312.0, 143.0]
2025-09-12 10:17:56,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 9 hours, 47 minutes, 5 seconds)
2025-09-12 10:28:37,416 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:28:37,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:29:52,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 753.71875 ± 368.133
2025-09-12 10:29:52,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [923.4623, 117.24216, 149.27646, 1102.0458, 599.1367, 697.23145, 801.20056, 744.7869, 1248.1058, 1154.6987]
2025-09-12 10:29:52,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [282.0, 62.0, 74.0, 373.0, 226.0, 277.0, 314.0, 263.0, 474.0, 439.0]
2025-09-12 10:29:52,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 36 minutes, 55 seconds)
2025-09-12 10:40:54,104 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:40:54,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:42:00,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 685.84949 ± 381.772
2025-09-12 10:42:00,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [250.30644, 50.350254, 978.3777, 644.54736, 998.46533, 688.8311, 697.9463, 1421.4136, 331.65347, 796.6033]
2025-09-12 10:42:00,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 30.0, 305.0, 234.0, 310.0, 270.0, 256.0, 526.0, 137.0, 305.0]
2025-09-12 10:42:00,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 24 minutes, 16 seconds)
2025-09-12 10:52:51,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:52:51,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:53:45,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 537.84924 ± 374.920
2025-09-12 10:53:45,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [250.01315, 396.574, 155.69736, 978.9762, 922.26514, 1221.5916, 187.33018, 285.45004, 211.1608, 769.4344]
2025-09-12 10:53:45,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [111.0, 162.0, 78.0, 349.0, 304.0, 442.0, 94.0, 125.0, 98.0, 250.0]
2025-09-12 10:53:45,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 7 minutes, 41 seconds)
2025-09-12 11:04:26,330 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:04:26,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:05:24,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 603.32117 ± 400.553
2025-09-12 11:05:24,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [180.31993, 104.19231, 573.77673, 509.58862, 804.57666, 156.79489, 827.9233, 1441.7626, 988.6887, 445.58722]
2025-09-12 11:05:24,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 60.0, 225.0, 194.0, 294.0, 81.0, 277.0, 456.0, 318.0, 180.0]
2025-09-12 11:05:24,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 8 hours, 53 minutes, 17 seconds)
2025-09-12 11:16:06,314 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:16:06,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:17:41,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1002.68683 ± 719.765
2025-09-12 11:17:41,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [57.364, 2250.3723, 223.91882, 1916.548, 937.6026, 882.0103, 984.56866, 1704.7866, 123.25216, 946.4444]
2025-09-12 11:17:41,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [40.0, 733.0, 101.0, 651.0, 338.0, 318.0, 352.0, 551.0, 66.0, 292.0]
2025-09-12 11:17:41,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 8 hours, 45 minutes, 44 seconds)
2025-09-12 11:29:00,510 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:29:00,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:30:38,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1020.16345 ± 528.634
2025-09-12 11:30:38,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1036.6515, 2417.607, 764.2121, 855.52496, 325.47797, 1333.2208, 948.2434, 979.9839, 635.0537, 905.6603]
2025-09-12 11:30:38,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [317.0, 827.0, 286.0, 313.0, 137.0, 475.0, 338.0, 332.0, 240.0, 308.0]
2025-09-12 11:30:38,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 42 minutes, 36 seconds)
2025-09-12 11:42:15,594 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:42:15,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:43:18,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 694.16730 ± 304.171
2025-09-12 11:43:18,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [567.20044, 969.6236, 164.65216, 837.8776, 536.0109, 779.3233, 904.7464, 1010.9835, 992.82715, 178.42798]
2025-09-12 11:43:18,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [218.0, 311.0, 80.0, 265.0, 195.0, 243.0, 285.0, 318.0, 315.0, 87.0]
2025-09-12 11:43:18,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 34 minutes, 52 seconds)
2025-09-12 11:54:17,646 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:54:17,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:55:52,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1036.41992 ± 572.060
2025-09-12 11:55:52,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1221.4711, 552.443, 390.808, 1737.642, 2117.1758, 968.292, 819.2685, 1287.9235, 139.86769, 1129.3071]
2025-09-12 11:55:52,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [394.0, 204.0, 164.0, 548.0, 724.0, 302.0, 270.0, 414.0, 71.0, 354.0]
2025-09-12 11:55:52,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 29 minutes, 21 seconds)
2025-09-12 12:07:57,795 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:07:57,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:09:08,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 776.13513 ± 446.700
2025-09-12 12:09:08,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [270.69336, 887.12024, 129.74199, 322.8571, 992.7996, 873.874, 511.52563, 986.1028, 1100.1415, 1686.4951]
2025-09-12 12:09:08,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 275.0, 69.0, 134.0, 305.0, 288.0, 204.0, 301.0, 335.0, 534.0]
2025-09-12 12:09:08,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 29 minutes, 51 seconds)
2025-09-12 12:20:00,153 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:20:00,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:21:16,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 807.61932 ± 496.161
2025-09-12 12:21:16,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [69.85971, 1035.6377, 472.1749, 838.0277, 702.7294, 1072.3301, 2001.2798, 827.527, 740.10394, 316.5227]
2025-09-12 12:21:16,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [47.0, 319.0, 188.0, 263.0, 262.0, 354.0, 658.0, 319.0, 260.0, 132.0]
2025-09-12 12:21:16,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 15 minutes, 59 seconds)
2025-09-12 12:32:31,476 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:32:31,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:33:27,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 579.97064 ± 334.111
2025-09-12 12:33:27,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [223.71867, 492.53275, 695.8577, 133.88899, 181.07104, 525.70013, 471.6399, 1050.7441, 1004.56946, 1019.98376]
2025-09-12 12:33:27,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 192.0, 223.0, 70.0, 86.0, 197.0, 178.0, 332.0, 352.0, 312.0]
2025-09-12 12:33:27,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 57 minutes, 23 seconds)
2025-09-12 12:44:59,062 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:44:59,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:45:43,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 438.55859 ± 212.853
2025-09-12 12:45:43,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [517.9592, 525.5427, 401.91913, 526.7204, 423.83673, 821.7675, 158.70512, 672.18396, 214.44077, 122.51006]
2025-09-12 12:45:43,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 208.0, 150.0, 179.0, 167.0, 261.0, 83.0, 225.0, 95.0, 63.0]
2025-09-12 12:45:43,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 41 minutes, 55 seconds)
2025-09-12 12:56:55,148 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:56:55,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:58:27,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1010.11505 ± 885.735
2025-09-12 12:58:27,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [124.85572, 876.9274, 2629.6323, 339.65366, 691.1381, 1058.3103, 2716.3337, 748.35345, 830.795, 85.15023]
2025-09-12 12:58:27,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 276.0, 865.0, 139.0, 233.0, 335.0, 916.0, 239.0, 251.0, 53.0]
2025-09-12 12:58:27,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 30 minutes, 35 seconds)
2025-09-12 13:09:46,413 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:09:46,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:11:12,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 937.28546 ± 611.232
2025-09-12 13:11:12,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [218.84047, 1240.6896, 912.84155, 172.04974, 2339.1199, 787.0693, 882.69556, 1362.9076, 352.5057, 1104.1359]
2025-09-12 13:11:12,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 421.0, 279.0, 86.0, 743.0, 281.0, 275.0, 487.0, 146.0, 347.0]
2025-09-12 13:11:12,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 14 minutes, 32 seconds)
2025-09-12 13:22:44,903 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:22:44,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:23:31,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 430.65143 ± 239.879
2025-09-12 13:23:31,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [140.1943, 608.5784, 703.0464, 190.55736, 243.73474, 455.68976, 255.7596, 303.6561, 923.2233, 482.07404]
2025-09-12 13:23:31,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 233.0, 263.0, 90.0, 108.0, 180.0, 116.0, 129.0, 339.0, 181.0]
2025-09-12 13:23:31,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 3 minutes, 18 seconds)
2025-09-12 13:34:33,379 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:34:33,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:35:49,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 837.61633 ± 418.452
2025-09-12 13:35:49,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [871.3629, 1532.251, 890.77057, 1340.5901, 987.13226, 179.89862, 126.21741, 996.83673, 720.432, 730.671]
2025-09-12 13:35:49,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [326.0, 495.0, 280.0, 481.0, 310.0, 87.0, 66.0, 318.0, 261.0, 247.0]
2025-09-12 13:35:49,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 51 minutes, 40 seconds)
2025-09-12 13:47:04,340 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:47:04,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:48:14,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 774.55261 ± 305.609
2025-09-12 13:48:14,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [839.00916, 98.31423, 1034.585, 296.968, 826.5723, 776.8718, 1086.9867, 952.9185, 962.96173, 870.33905]
2025-09-12 13:48:14,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [269.0, 58.0, 335.0, 126.0, 265.0, 257.0, 356.0, 353.0, 354.0, 276.0]
2025-09-12 13:48:14,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 40 minutes, 6 seconds)
2025-09-12 13:58:59,939 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:58:59,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:00:31,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1010.53418 ± 593.384
2025-09-12 14:00:31,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [872.71014, 102.36649, 742.98254, 946.9704, 1527.6779, 235.04591, 2311.0813, 1134.0107, 1072.5728, 1159.9243]
2025-09-12 14:00:31,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [281.0, 57.0, 240.0, 291.0, 498.0, 103.0, 782.0, 392.0, 332.0, 369.0]
2025-09-12 14:00:31,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 24 minutes, 47 seconds)
2025-09-12 14:11:30,805 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:11:30,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:12:26,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 616.71497 ± 372.286
2025-09-12 14:12:26,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1032.6259, 197.44795, 854.5865, 957.4806, 120.42041, 906.7159, 270.70065, 1060.8783, 615.4978, 150.79488]
2025-09-12 14:12:26,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [358.0, 92.0, 270.0, 294.0, 65.0, 283.0, 119.0, 328.0, 226.0, 74.0]
2025-09-12 14:12:26,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 7 minutes, 22 seconds)
2025-09-12 14:23:01,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:23:01,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:24:18,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 866.54999 ± 385.726
2025-09-12 14:24:18,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1006.14813, 1349.253, 1026.0582, 421.49356, 710.0738, 909.8135, 192.54997, 1528.1058, 572.2311, 949.77246]
2025-09-12 14:24:18,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [306.0, 490.0, 324.0, 165.0, 262.0, 296.0, 93.0, 500.0, 211.0, 292.0]
2025-09-12 14:24:18,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 52 minutes, 32 seconds)
2025-09-12 14:35:10,318 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:35:10,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:36:20,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 791.82452 ± 328.916
2025-09-12 14:36:20,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1074.9878, 285.0957, 825.1735, 1109.0884, 1048.0265, 1003.8529, 1020.37115, 437.71426, 207.25525, 906.679]
2025-09-12 14:36:20,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [333.0, 116.0, 278.0, 348.0, 378.0, 329.0, 323.0, 176.0, 93.0, 287.0]
2025-09-12 14:36:20,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 38 minutes, 55 seconds)
2025-09-12 14:47:04,241 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:47:04,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:48:06,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 702.02527 ± 540.819
2025-09-12 14:48:06,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [117.60307, 1883.8071, 676.2492, 316.22534, 133.49242, 1248.3954, 887.5431, 845.9896, 109.98762, 800.9599]
2025-09-12 14:48:06,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 587.0, 236.0, 135.0, 68.0, 385.0, 288.0, 272.0, 59.0, 250.0]
2025-09-12 14:48:06,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 23 minutes, 13 seconds)
2025-09-12 14:58:32,451 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:58:32,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:59:35,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 761.77051 ± 304.156
2025-09-12 14:59:35,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [930.8593, 854.3281, 1037.1285, 972.4841, 446.56125, 340.8809, 950.3988, 797.6401, 1105.4916, 181.93234]
2025-09-12 14:59:35,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [292.0, 263.0, 306.0, 300.0, 172.0, 139.0, 287.0, 251.0, 331.0, 88.0]
2025-09-12 14:59:35,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 7 minutes, 11 seconds)
2025-09-12 15:10:21,621 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:10:21,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:11:10,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 505.09473 ± 319.795
2025-09-12 15:11:10,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [109.924675, 234.34096, 134.0027, 880.30164, 578.7538, 455.6187, 296.29248, 939.6991, 403.12643, 1018.88654]
2025-09-12 15:11:10,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 104.0, 71.0, 285.0, 208.0, 184.0, 136.0, 291.0, 156.0, 337.0]
2025-09-12 15:11:10,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 4 hours, 53 minutes, 38 seconds)
2025-09-12 15:21:48,658 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:21:48,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:22:51,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 712.76154 ± 315.427
2025-09-12 15:22:51,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [567.8199, 817.88135, 1241.4692, 131.72049, 293.4186, 975.1772, 723.91144, 906.25354, 558.2332, 911.7306]
2025-09-12 15:22:51,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [210.0, 258.0, 403.0, 68.0, 124.0, 304.0, 225.0, 280.0, 202.0, 283.0]
2025-09-12 15:22:51,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 41 minutes)
2025-09-12 15:33:36,761 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:33:36,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:34:59,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 894.23547 ± 749.962
2025-09-12 15:34:59,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [385.10754, 565.01996, 1236.9512, 533.5287, 818.11725, 173.36441, 999.1325, 395.6982, 2949.5237, 885.9111]
2025-09-12 15:34:59,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [151.0, 205.0, 392.0, 210.0, 264.0, 84.0, 304.0, 159.0, 1000.0, 299.0]
2025-09-12 15:34:59,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 29 minutes, 44 seconds)
2025-09-12 15:45:33,201 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:45:33,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:46:27,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 605.31952 ± 399.278
2025-09-12 15:46:27,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1044.5325, 896.4345, 310.77875, 943.2865, 1035.1464, 1072.768, 222.86427, 142.64627, 266.3371, 118.40112]
2025-09-12 15:46:27,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [314.0, 322.0, 129.0, 289.0, 311.0, 339.0, 98.0, 74.0, 114.0, 65.0]
2025-09-12 15:46:27,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 16 minutes, 46 seconds)
2025-09-12 15:57:17,498 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:57:17,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:58:12,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 561.88983 ± 264.836
2025-09-12 15:58:12,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [248.09486, 392.538, 968.9292, 731.7354, 587.4015, 666.30176, 127.60906, 887.5182, 321.4834, 687.2869]
2025-09-12 15:58:12,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [109.0, 158.0, 310.0, 260.0, 220.0, 253.0, 68.0, 281.0, 145.0, 258.0]
2025-09-12 15:58:12,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 6 minutes, 10 seconds)
2025-09-12 16:09:18,253 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:09:18,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:10:09,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 558.99573 ± 380.639
2025-09-12 16:10:09,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1061.1655, 905.23395, 172.75925, 131.7245, 832.22906, 1027.4359, 450.99133, 99.13303, 775.6235, 133.66129]
2025-09-12 16:10:09,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [342.0, 282.0, 85.0, 66.0, 269.0, 318.0, 170.0, 56.0, 265.0, 70.0]
2025-09-12 16:10:09,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 3 hours, 55 minutes, 58 seconds)
2025-09-12 16:20:32,428 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:20:32,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:21:44,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 784.82648 ± 531.808
2025-09-12 16:21:44,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [120.36284, 603.5899, 1972.7102, 465.08585, 405.4137, 1109.2021, 898.65704, 265.28378, 1329.8829, 678.07605]
2025-09-12 16:21:44,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 220.0, 701.0, 174.0, 156.0, 331.0, 274.0, 115.0, 412.0, 249.0]
2025-09-12 16:21:44,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 43 minutes, 45 seconds)
2025-09-12 16:32:33,863 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:32:33,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:33:31,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 656.89783 ± 335.431
2025-09-12 16:33:31,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [846.0652, 744.36584, 1061.758, 793.32135, 130.89232, 656.42004, 739.91144, 1142.7859, 295.38925, 158.0688]
2025-09-12 16:33:31,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [266.0, 232.0, 334.0, 253.0, 67.0, 242.0, 236.0, 347.0, 120.0, 77.0]
2025-09-12 16:33:31,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 30 minutes, 45 seconds)
2025-09-12 16:44:49,094 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:44:49,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:45:54,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 712.29187 ± 352.809
2025-09-12 16:45:54,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [594.80786, 1114.5966, 268.30383, 534.06775, 465.66946, 824.8031, 1131.9897, 1272.4518, 729.7566, 186.47249]
2025-09-12 16:45:54,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [230.0, 338.0, 119.0, 199.0, 178.0, 275.0, 342.0, 411.0, 256.0, 86.0]
2025-09-12 16:45:54,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 22 minutes, 5 seconds)
2025-09-12 16:56:16,254 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:56:16,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:57:29,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 809.20789 ± 654.110
2025-09-12 16:57:29,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [2432.5842, 1047.1797, 582.70496, 778.475, 508.31396, 1324.3452, 109.86144, 169.2638, 856.9862, 282.3638]
2025-09-12 16:57:29,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [799.0, 368.0, 198.0, 254.0, 179.0, 427.0, 59.0, 80.0, 261.0, 116.0]
2025-09-12 16:57:29,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 9 minutes, 43 seconds)
2025-09-12 17:08:14,993 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:08:14,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:09:22,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 777.62134 ± 411.874
2025-09-12 17:09:22,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1175.335, 890.68427, 1002.2658, 1492.9519, 570.0141, 1073.6581, 564.34406, 178.73688, 700.56793, 127.65548]
2025-09-12 17:09:22,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [366.0, 271.0, 313.0, 483.0, 192.0, 323.0, 205.0, 83.0, 260.0, 67.0]
2025-09-12 17:09:22,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 2 hours, 57 minutes, 39 seconds)
2025-09-12 17:20:09,375 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:20:09,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:21:10,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 717.07141 ± 290.367
2025-09-12 17:21:10,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [244.68843, 898.98065, 941.26465, 856.5859, 901.14496, 122.280205, 542.69495, 867.4048, 823.1826, 972.48737]
2025-09-12 17:21:10,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 277.0, 286.0, 264.0, 284.0, 63.0, 180.0, 268.0, 255.0, 310.0]
2025-09-12 17:21:10,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 46 minutes, 24 seconds)
2025-09-12 17:31:53,258 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:31:53,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:33:08,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 805.92242 ± 426.768
2025-09-12 17:33:08,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1673.2313, 379.53333, 1332.1128, 316.6882, 974.7358, 817.3313, 331.7432, 885.2555, 870.0412, 478.55115]
2025-09-12 17:33:08,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [588.0, 145.0, 454.0, 140.0, 312.0, 253.0, 140.0, 323.0, 272.0, 185.0]
2025-09-12 17:33:08,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 34 minutes, 58 seconds)
2025-09-12 17:43:45,641 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:43:45,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:45:13,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 935.11584 ± 701.781
2025-09-12 17:45:13,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [207.64235, 1112.9856, 1168.5771, 748.33734, 214.29147, 603.4652, 698.3241, 2823.319, 984.9725, 789.2433]
2025-09-12 17:45:13,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 399.0, 352.0, 281.0, 96.0, 227.0, 221.0, 1000.0, 350.0, 246.0]
2025-09-12 17:45:13,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 22 minutes, 21 seconds)
2025-09-12 17:56:00,775 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:56:00,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:57:04,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 707.19238 ± 386.107
2025-09-12 17:57:04,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [787.4435, 491.85892, 852.81085, 1699.8796, 590.4871, 691.54474, 812.9255, 337.4097, 197.08517, 610.4788]
2025-09-12 17:57:04,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [273.0, 192.0, 271.0, 532.0, 216.0, 224.0, 249.0, 141.0, 92.0, 205.0]
2025-09-12 17:57:04,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 11 minutes, 3 seconds)
2025-09-12 18:07:45,053 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:07:45,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:09:12,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 988.72736 ± 409.429
2025-09-12 18:09:12,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1470.4683, 1429.1249, 326.04395, 463.90814, 1284.4108, 936.8329, 1137.183, 1444.9194, 847.94, 546.4419]
2025-09-12 18:09:12,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [467.0, 502.0, 134.0, 175.0, 389.0, 281.0, 349.0, 499.0, 271.0, 196.0]
2025-09-12 18:09:12,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 59 minutes, 38 seconds)
2025-09-12 18:19:45,463 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:19:45,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:21:18,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1071.77417 ± 565.892
2025-09-12 18:21:18,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [709.3648, 1240.7052, 1843.1921, 1118.0186, 1126.0455, 831.80664, 301.99084, 1806.3956, 1635.3354, 104.88656]
2025-09-12 18:21:18,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [229.0, 424.0, 606.0, 343.0, 336.0, 272.0, 125.0, 592.0, 498.0, 58.0]
2025-09-12 18:21:18,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 48 minutes, 14 seconds)
2025-09-12 18:32:09,673 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:32:09,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:32:51,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 434.83261 ± 310.486
2025-09-12 18:32:51,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [181.43845, 758.109, 822.0037, 52.241077, 176.33125, 257.94257, 338.69873, 204.52861, 552.2564, 1004.77637]
2025-09-12 18:32:51,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 239.0, 248.0, 40.0, 82.0, 109.0, 131.0, 93.0, 194.0, 350.0]
2025-09-12 18:32:51,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 35 minutes, 33 seconds)
2025-09-12 18:43:31,828 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:43:31,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:44:58,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 958.44299 ± 506.914
2025-09-12 18:44:58,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [733.5052, 979.5959, 1290.398, 609.6581, 1362.8026, 133.48605, 583.77075, 1049.6074, 762.6154, 2078.9905]
2025-09-12 18:44:58,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [267.0, 293.0, 458.0, 196.0, 493.0, 68.0, 221.0, 319.0, 272.0, 684.0]
2025-09-12 18:44:58,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 23 minutes, 39 seconds)
2025-09-12 18:55:47,535 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:55:47,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:56:48,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 682.15558 ± 343.501
2025-09-12 18:56:48,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [865.39374, 406.3606, 159.6064, 598.7501, 1082.399, 169.82977, 520.20074, 1135.7255, 919.51465, 963.7753]
2025-09-12 18:56:48,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [266.0, 162.0, 80.0, 220.0, 325.0, 84.0, 188.0, 341.0, 270.0, 341.0]
2025-09-12 18:56:48,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 11 minutes, 40 seconds)
2025-09-12 19:07:40,195 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:07:40,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:08:49,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 792.25574 ± 305.589
2025-09-12 19:08:49,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1246.9158, 708.6385, 643.9486, 524.41797, 885.4419, 910.1382, 890.20557, 93.844475, 1087.1554, 931.8518]
2025-09-12 19:08:49,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [384.0, 233.0, 220.0, 188.0, 273.0, 319.0, 286.0, 57.0, 376.0, 280.0]
2025-09-12 19:08:49,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 59 minutes, 37 seconds)
2025-09-12 19:19:41,690 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:19:41,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:21:03,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 852.26672 ± 902.926
2025-09-12 19:21:03,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [270.84525, 850.0706, 367.99677, 242.24663, 2888.2842, 181.6908, 948.57495, 280.91434, 2237.793, 254.2514]
2025-09-12 19:21:03,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 315.0, 149.0, 107.0, 1000.0, 87.0, 294.0, 123.0, 783.0, 108.0]
2025-09-12 19:21:03,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 47 minutes, 48 seconds)
2025-09-12 19:31:22,121 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:31:22,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:32:35,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 837.00305 ± 439.390
2025-09-12 19:32:35,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [670.38763, 1761.1333, 562.1274, 1282.0703, 537.45026, 927.88416, 864.80786, 941.98846, 22.101923, 800.07837]
2025-09-12 19:32:35,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [236.0, 557.0, 205.0, 406.0, 197.0, 287.0, 322.0, 288.0, 21.0, 255.0]
2025-09-12 19:32:35,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 35 minutes, 50 seconds)
2025-09-12 19:43:37,576 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:43:37,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:45:06,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 1019.97461 ± 679.951
2025-09-12 19:45:06,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1269.4832, 1892.4508, 2474.2654, 1221.0765, 363.07077, 305.10577, 340.19815, 628.82245, 976.7331, 728.53973]
2025-09-12 19:45:06,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [436.0, 601.0, 812.0, 374.0, 151.0, 127.0, 133.0, 205.0, 307.0, 229.0]
2025-09-12 19:45:06,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 24 minutes, 3 seconds)
2025-09-12 19:55:40,411 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:55:40,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:56:46,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 749.42511 ± 375.325
2025-09-12 19:56:46,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [1562.81, 674.59436, 164.77097, 1003.633, 212.76157, 859.4318, 611.2883, 872.31476, 776.0635, 756.58307]
2025-09-12 19:56:46,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [508.0, 243.0, 80.0, 301.0, 96.0, 269.0, 231.0, 272.0, 248.0, 259.0]
2025-09-12 19:56:46,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 11 minutes, 59 seconds)
2025-09-12 20:07:33,368 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:07:33,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:08:30,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 608.29462 ± 324.559
2025-09-12 20:08:30,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1222 [DEBUG]: All rewards: [858.46277, 667.0384, 596.58765, 647.90326, 1065.8799, 879.64307, 200.21872, 92.90926, 176.3468, 897.95636]
2025-09-12 20:08:30,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [316.0, 217.0, 216.0, 237.0, 316.0, 318.0, 93.0, 52.0, 82.0, 281.0]
2025-09-12 20:08:30,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-hopper):1251 [DEBUG]: Training session finished
