2025-09-12 02:43:50,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc0-walker2d/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 02:43:50,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc0-walker2d/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 02:43:50,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14d8c53c6010>}
2025-09-12 02:43:50,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1111 [DEBUG]: using device: cuda
2025-09-12 02:43:50,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1133 [INFO]: Creating new trainer
2025-09-12 02:43:50,067 baseline-mbpac-noiseperc0-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 02:43:50,068 baseline-mbpac-noiseperc0-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 02:43:50,076 baseline-mbpac-noiseperc0-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 02:43:51,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1194 [DEBUG]: Starting training session...
2025-09-12 02:43:51,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 1/100
2025-09-12 02:53:45,508 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:53:45,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:54:34,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 172.80003 ± 128.309
2025-09-12 02:54:34,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [156.23761, 31.914318, 59.093735, 370.03027, 76.835266, 66.5216, 312.20422, 338.42395, 269.9353, 46.804073]
2025-09-12 02:54:34,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [85.0, 142.0, 171.0, 254.0, 192.0, 176.0, 195.0, 231.0, 150.0, 154.0]
2025-09-12 02:54:34,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (172.80) for latency MM1Queue_a033_s075
2025-09-12 02:54:34,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 17 hours, 41 minutes, 19 seconds)
2025-09-12 03:06:06,198 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:06:06,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:06:56,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 192.73337 ± 152.148
2025-09-12 03:06:56,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [292.13312, 44.142956, 41.70358, 27.211802, 262.50436, 96.49307, 464.02777, 172.05814, 423.43854, 103.62028]
2025-09-12 03:06:56,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [225.0, 53.0, 128.0, 130.0, 171.0, 99.0, 369.0, 104.0, 397.0, 161.0]
2025-09-12 03:06:56,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (192.73) for latency MM1Queue_a033_s075
2025-09-12 03:06:56,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 18 hours, 51 minutes, 39 seconds)
2025-09-12 03:18:57,452 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:18:57,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:19:49,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 280.40195 ± 117.142
2025-09-12 03:19:49,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [388.45474, 274.1564, 205.60362, 296.84598, 581.88635, 238.18167, 225.11522, 215.00975, 228.54092, 150.22452]
2025-09-12 03:19:49,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [264.0, 182.0, 127.0, 180.0, 514.0, 175.0, 131.0, 120.0, 150.0, 79.0]
2025-09-12 03:19:49,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (280.40) for latency MM1Queue_a033_s075
2025-09-12 03:19:49,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 23 minutes, 11 seconds)
2025-09-12 03:30:32,432 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:30:32,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:31:36,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 431.58969 ± 153.364
2025-09-12 03:31:36,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [455.25903, 495.70706, 330.1515, 489.96756, 25.88567, 592.19995, 489.49814, 582.2228, 413.92093, 441.08444]
2025-09-12 03:31:36,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [241.0, 269.0, 166.0, 246.0, 36.0, 324.0, 279.0, 301.0, 228.0, 263.0]
2025-09-12 03:31:36,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (431.59) for latency MM1Queue_a033_s075
2025-09-12 03:31:36,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 6 minutes, 5 seconds)
2025-09-12 03:42:38,290 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:42:38,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:43:41,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 327.88040 ± 123.624
2025-09-12 03:43:41,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [367.69965, 55.68614, 326.7856, 520.60046, 381.59863, 217.8271, 478.8404, 298.34357, 292.73422, 338.6884]
2025-09-12 03:43:41,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [210.0, 114.0, 235.0, 253.0, 184.0, 134.0, 536.0, 140.0, 200.0, 284.0]
2025-09-12 03:43:41,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 56 minutes, 50 seconds)
2025-09-12 03:55:04,178 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:55:04,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:55:58,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 311.35477 ± 163.331
2025-09-12 03:55:58,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [291.87064, 259.27203, 286.3642, 257.3371, 405.6175, 57.53692, 228.25134, 230.9278, 723.2223, 373.1476]
2025-09-12 03:55:58,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [138.0, 143.0, 163.0, 147.0, 277.0, 96.0, 143.0, 148.0, 465.0, 298.0]
2025-09-12 03:55:58,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 19 hours, 14 minutes, 29 seconds)
2025-09-12 04:06:57,199 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:06:57,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:07:59,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 351.55939 ± 107.374
2025-09-12 04:07:59,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [202.78554, 343.77023, 299.69556, 241.70868, 266.3329, 438.52023, 411.8109, 498.43628, 279.13174, 533.4018]
2025-09-12 04:07:59,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [148.0, 222.0, 155.0, 206.0, 167.0, 317.0, 234.0, 364.0, 187.0, 307.0]
2025-09-12 04:07:59,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 55 minutes, 26 seconds)
2025-09-12 04:19:29,008 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:19:29,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:20:38,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 358.45410 ± 163.741
2025-09-12 04:20:38,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [463.84097, 322.12808, 328.97308, 101.1536, 578.2273, 331.14816, 565.3492, 88.160355, 303.3472, 502.21307]
2025-09-12 04:20:38,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [341.0, 157.0, 203.0, 129.0, 613.0, 222.0, 285.0, 91.0, 207.0, 322.0]
2025-09-12 04:20:38,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 39 minutes, 9 seconds)
2025-09-12 04:31:55,620 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:31:55,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:33:21,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 550.38214 ± 261.181
2025-09-12 04:33:21,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [807.7733, 732.394, 392.1884, 1030.3264, 183.94408, 382.6545, 709.7003, 289.87537, 308.8224, 666.14233]
2025-09-12 04:33:21,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [484.0, 413.0, 193.0, 545.0, 83.0, 235.0, 402.0, 141.0, 205.0, 396.0]
2025-09-12 04:33:21,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (550.38) for latency MM1Queue_a033_s075
2025-09-12 04:33:21,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 43 minutes, 46 seconds)
2025-09-12 04:44:17,643 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:44:17,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:45:27,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 398.83197 ± 206.816
2025-09-12 04:45:27,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [54.67867, 450.0251, 544.8372, 687.0992, 465.4863, 304.26923, 592.33594, 55.362755, 545.1442, 289.0814]
2025-09-12 04:45:27,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [206.0, 245.0, 373.0, 462.0, 248.0, 201.0, 307.0, 76.0, 248.0, 153.0]
2025-09-12 04:45:27,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 18 hours, 31 minutes, 50 seconds)
2025-09-12 04:56:49,075 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:56:49,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:58:03,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 457.51807 ± 90.397
2025-09-12 04:58:03,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [467.73935, 389.3057, 594.08966, 346.74652, 591.6826, 549.70636, 322.13782, 442.87247, 432.1756, 438.72473]
2025-09-12 04:58:03,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [286.0, 176.0, 442.0, 199.0, 348.0, 382.0, 148.0, 303.0, 207.0, 215.0]
2025-09-12 04:58:03,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 18 hours, 24 minutes, 57 seconds)
2025-09-12 05:09:34,888 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:09:34,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:10:20,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 297.26663 ± 132.196
2025-09-12 05:10:20,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [506.57806, 22.080248, 348.3244, 440.31976, 282.5652, 309.0404, 410.2312, 213.94789, 226.29628, 213.28271]
2025-09-12 05:10:20,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [260.0, 43.0, 182.0, 191.0, 179.0, 182.0, 181.0, 140.0, 128.0, 165.0]
2025-09-12 05:10:20,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 18 hours, 17 minutes, 17 seconds)
2025-09-12 05:21:22,062 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:21:22,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:22:25,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 482.37973 ± 309.328
2025-09-12 05:22:25,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [82.69131, 684.4016, 327.8489, 626.6279, 519.59296, 392.40613, 594.84656, 372.69724, 43.818455, 1178.8663]
2025-09-12 05:22:25,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [78.0, 306.0, 151.0, 342.0, 234.0, 171.0, 258.0, 212.0, 53.0, 558.0]
2025-09-12 05:22:25,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 55 minutes, 2 seconds)
2025-09-12 05:33:39,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:33:39,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:34:32,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 355.71445 ± 151.973
2025-09-12 05:34:32,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [625.68396, 295.88028, 280.80435, 91.53541, 285.57037, 280.90686, 298.89023, 371.9146, 424.60272, 601.3557]
2025-09-12 05:34:32,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [240.0, 141.0, 159.0, 126.0, 147.0, 256.0, 193.0, 170.0, 185.0, 348.0]
2025-09-12 05:34:32,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 32 minutes, 34 seconds)
2025-09-12 05:45:38,988 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:45:39,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:46:56,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 587.65955 ± 251.855
2025-09-12 05:46:56,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [437.22214, 766.422, 599.8552, 355.54007, 674.87726, 1129.9537, 279.24197, 509.88885, 315.32672, 808.2669]
2025-09-12 05:46:56,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [212.0, 325.0, 255.0, 194.0, 303.0, 515.0, 168.0, 270.0, 158.0, 430.0]
2025-09-12 05:46:56,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (587.66) for latency MM1Queue_a033_s075
2025-09-12 05:46:56,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 17 hours, 25 minutes, 19 seconds)
2025-09-12 05:58:11,325 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:58:11,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:59:32,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 627.48712 ± 189.497
2025-09-12 05:59:32,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [451.49765, 623.1674, 567.85675, 578.59924, 859.86597, 919.88293, 718.0141, 214.83083, 655.3323, 685.8244]
2025-09-12 05:59:32,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [271.0, 228.0, 259.0, 245.0, 631.0, 390.0, 270.0, 115.0, 316.0, 282.0]
2025-09-12 05:59:32,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (627.49) for latency MM1Queue_a033_s075
2025-09-12 05:59:32,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 17 hours, 12 minutes, 50 seconds)
2025-09-12 06:10:45,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:10:45,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:11:45,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 387.24622 ± 177.669
2025-09-12 06:11:45,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [585.8883, 213.96954, 548.4861, 364.4897, 600.3519, 543.04346, 433.19196, 101.01904, 133.38176, 348.64047]
2025-09-12 06:11:45,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [238.0, 120.0, 245.0, 195.0, 243.0, 257.0, 201.0, 232.0, 327.0, 167.0]
2025-09-12 06:11:45,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 59 minutes, 38 seconds)
2025-09-12 06:23:12,395 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:23:12,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:24:11,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 453.23480 ± 196.879
2025-09-12 06:24:11,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [486.04257, 208.15712, 559.63477, 517.73895, 463.17032, 66.57812, 312.3101, 754.64294, 675.58984, 488.48337]
2025-09-12 06:24:11,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [223.0, 126.0, 272.0, 215.0, 200.0, 115.0, 130.0, 286.0, 279.0, 329.0]
2025-09-12 06:24:11,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 52 minutes, 45 seconds)
2025-09-12 06:35:21,149 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:35:21,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:36:28,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 513.20593 ± 181.502
2025-09-12 06:36:28,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [458.2351, 727.7877, 354.45477, 625.0221, 488.93384, 103.78783, 680.9428, 656.9511, 404.83472, 631.10956]
2025-09-12 06:36:28,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [257.0, 346.0, 253.0, 226.0, 202.0, 107.0, 262.0, 303.0, 254.0, 289.0]
2025-09-12 06:36:28,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 43 minutes, 17 seconds)
2025-09-12 06:47:30,955 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:47:30,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:48:28,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 515.47339 ± 116.300
2025-09-12 06:48:28,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [501.08566, 589.3781, 650.39276, 358.49445, 510.71283, 599.96027, 267.30246, 483.80222, 640.1635, 553.44147]
2025-09-12 06:48:28,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [211.0, 220.0, 271.0, 163.0, 205.0, 234.0, 122.0, 219.0, 244.0, 207.0]
2025-09-12 06:48:28,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 24 minutes, 38 seconds)
2025-09-12 06:59:40,278 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:59:40,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:00:37,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 517.62170 ± 210.640
2025-09-12 07:00:37,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [24.738947, 611.24603, 485.6322, 630.7808, 892.074, 365.18497, 594.2239, 517.51807, 593.1829, 461.63547]
2025-09-12 07:00:37,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 259.0, 200.0, 262.0, 318.0, 141.0, 234.0, 234.0, 247.0, 193.0]
2025-09-12 07:00:37,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 5 minutes, 17 seconds)
2025-09-12 07:11:52,589 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:11:52,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:13:06,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 653.00812 ± 120.215
2025-09-12 07:13:06,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [577.9819, 599.55444, 598.7512, 712.3287, 700.485, 564.6972, 500.0046, 622.5436, 695.1047, 958.62994]
2025-09-12 07:13:06,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [220.0, 236.0, 230.0, 357.0, 289.0, 230.0, 197.0, 245.0, 338.0, 340.0]
2025-09-12 07:13:06,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (653.01) for latency MM1Queue_a033_s075
2025-09-12 07:13:06,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 56 minutes, 56 seconds)
2025-09-12 07:24:23,543 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:24:23,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:25:29,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 579.47681 ± 214.359
2025-09-12 07:25:29,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [588.03015, 603.8925, 387.04404, 147.66122, 745.6574, 541.3512, 939.8599, 393.90582, 737.1135, 710.25256]
2025-09-12 07:25:29,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [249.0, 241.0, 177.0, 135.0, 276.0, 213.0, 320.0, 178.0, 305.0, 296.0]
2025-09-12 07:25:29,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 44 minutes, 3 seconds)
2025-09-12 07:36:45,855 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:36:45,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:38:04,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 709.42346 ± 195.645
2025-09-12 07:38:04,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [333.81616, 640.28656, 947.5906, 700.0663, 720.7777, 740.52814, 1080.3225, 535.7428, 771.6726, 623.4305]
2025-09-12 07:38:04,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [158.0, 260.0, 357.0, 266.0, 275.0, 304.0, 371.0, 231.0, 377.0, 260.0]
2025-09-12 07:38:04,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (709.42) for latency MM1Queue_a033_s075
2025-09-12 07:38:04,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 36 minutes, 17 seconds)
2025-09-12 07:49:17,196 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:49:17,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:50:21,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 614.39270 ± 147.698
2025-09-12 07:50:21,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [467.56702, 799.7964, 633.68256, 758.9695, 720.9236, 555.9757, 603.1928, 794.6679, 352.09854, 457.05273]
2025-09-12 07:50:21,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [179.0, 288.0, 255.0, 274.0, 267.0, 204.0, 238.0, 301.0, 147.0, 178.0]
2025-09-12 07:50:21,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 28 minutes, 7 seconds)
2025-09-12 08:01:50,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:01:50,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:03:00,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 646.63483 ± 268.310
2025-09-12 08:03:00,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1205.8116, 424.2062, 458.7906, 571.1217, 1071.7865, 715.42334, 488.94324, 600.37305, 326.65805, 603.2338]
2025-09-12 08:03:00,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [517.0, 174.0, 210.0, 201.0, 399.0, 277.0, 194.0, 211.0, 135.0, 249.0]
2025-09-12 08:03:00,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 15 hours, 23 minutes, 16 seconds)
2025-09-12 08:13:56,053 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:13:56,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:15:15,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 733.47900 ± 333.623
2025-09-12 08:15:15,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [727.3054, 536.3718, 484.24783, 826.911, 489.2089, 223.4856, 702.55927, 1517.1143, 955.50824, 872.0776]
2025-09-12 08:15:15,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [301.0, 239.0, 197.0, 301.0, 205.0, 295.0, 282.0, 496.0, 351.0, 294.0]
2025-09-12 08:15:15,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (733.48) for latency MM1Queue_a033_s075
2025-09-12 08:15:15,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 15 hours, 7 minutes, 31 seconds)
2025-09-12 08:26:31,464 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:26:31,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:28:19,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 844.55029 ± 335.921
2025-09-12 08:28:19,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [856.05444, 988.6783, 975.7362, 398.04367, 1120.3586, 230.19504, 722.59845, 793.8281, 1492.2063, 867.8037]
2025-09-12 08:28:19,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [312.0, 407.0, 439.0, 181.0, 723.0, 272.0, 328.0, 382.0, 618.0, 334.0]
2025-09-12 08:28:19,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (844.55) for latency MM1Queue_a033_s075
2025-09-12 08:28:19,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 15 hours, 4 minutes, 48 seconds)
2025-09-12 08:39:44,504 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:39:44,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:41:54,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1164.03662 ± 353.398
2025-09-12 08:41:54,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [912.14355, 769.40076, 968.99945, 1202.2797, 720.28815, 1227.9148, 911.01685, 1750.9037, 1688.943, 1488.4769]
2025-09-12 08:41:54,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [339.0, 268.0, 510.0, 533.0, 265.0, 497.0, 437.0, 616.0, 615.0, 733.0]
2025-09-12 08:41:54,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (1164.04) for latency MM1Queue_a033_s075
2025-09-12 08:41:54,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 15 hours, 6 minutes, 30 seconds)
2025-09-12 08:53:00,135 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:53:00,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:54:35,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 876.52228 ± 350.736
2025-09-12 08:54:35,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [833.0165, 719.4925, 746.0523, 708.955, 1454.8486, 712.3938, 1151.637, 807.49786, 220.16197, 1411.1665]
2025-09-12 08:54:35,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [322.0, 262.0, 309.0, 263.0, 462.0, 270.0, 436.0, 329.0, 305.0, 495.0]
2025-09-12 08:54:35,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 59 minutes, 11 seconds)
2025-09-12 09:05:48,182 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:05:48,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:07:32,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1027.80823 ± 504.397
2025-09-12 09:07:32,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1928.7599, 1051.9868, 668.2951, 89.68276, 819.5183, 1703.455, 1351.6188, 693.6615, 1021.4616, 949.643]
2025-09-12 09:07:32,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [667.0, 373.0, 266.0, 78.0, 321.0, 478.0, 498.0, 274.0, 573.0, 357.0]
2025-09-12 09:07:32,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 50 minutes, 34 seconds)
2025-09-12 09:19:22,440 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:19:22,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:21:14,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1133.36401 ± 339.751
2025-09-12 09:21:14,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1095.2461, 753.02905, 1562.6285, 774.1566, 1086.0813, 1011.2285, 1053.943, 1534.4163, 735.797, 1727.1145]
2025-09-12 09:21:14,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [472.0, 288.0, 495.0, 285.0, 425.0, 353.0, 335.0, 579.0, 305.0, 595.0]
2025-09-12 09:21:14,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 14 hours, 57 minutes, 18 seconds)
2025-09-12 09:31:54,800 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:31:54,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:34:11,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1524.62134 ± 480.696
2025-09-12 09:34:11,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1280.6798, 987.6799, 1070.0037, 1794.1711, 1663.7775, 1323.7361, 1789.804, 2541.2197, 903.74023, 1891.4015]
2025-09-12 09:34:11,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [411.0, 306.0, 376.0, 591.0, 561.0, 456.0, 611.0, 831.0, 286.0, 629.0]
2025-09-12 09:34:11,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (1524.62) for latency MM1Queue_a033_s075
2025-09-12 09:34:11,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 14 hours, 42 minutes, 37 seconds)
2025-09-12 09:45:32,798 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:45:32,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:47:28,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1320.61987 ± 534.039
2025-09-12 09:47:28,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1845.0431, 2386.0286, 734.2769, 658.99615, 1190.6345, 1808.3516, 1277.3833, 795.8941, 1010.19147, 1499.3981]
2025-09-12 09:47:28,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [564.0, 661.0, 246.0, 238.0, 405.0, 584.0, 411.0, 269.0, 321.0, 489.0]
2025-09-12 09:47:28,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 14 hours, 25 minutes, 20 seconds)
2025-09-12 09:59:03,830 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:59:03,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:01:50,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1681.83240 ± 975.549
2025-09-12 10:01:50,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2628.9895, 1097.8218, 2663.737, 1544.4469, 830.1994, 264.99954, 259.56293, 2597.7964, 2951.8662, 1978.9048]
2025-09-12 10:01:50,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 391.0, 1000.0, 510.0, 342.0, 173.0, 138.0, 864.0, 1000.0, 714.0]
2025-09-12 10:01:50,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (1681.83) for latency MM1Queue_a033_s075
2025-09-12 10:01:50,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 34 minutes, 17 seconds)
2025-09-12 10:13:05,660 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:13:05,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:16:31,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2063.84253 ± 988.076
2025-09-12 10:16:31,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2698.8176, 59.749134, 1177.7411, 2762.3916, 658.6927, 2150.074, 2905.794, 2758.0784, 2749.2056, 2717.8838]
2025-09-12 10:16:31,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 69.0, 428.0, 1000.0, 279.0, 761.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:16:31,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (2063.84) for latency MM1Queue_a033_s075
2025-09-12 10:16:31,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 14 hours, 42 minutes, 48 seconds)
2025-09-12 10:27:38,934 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:27:38,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:31:14,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2253.28857 ± 783.032
2025-09-12 10:31:14,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2583.9998, 1216.1614, 2089.7925, 2895.7754, 2677.5054, 2956.2925, 2796.112, 3111.303, 824.2301, 1381.713]
2025-09-12 10:31:14,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [965.0, 386.0, 682.0, 1000.0, 994.0, 1000.0, 1000.0, 1000.0, 316.0, 506.0]
2025-09-12 10:31:14,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (2253.29) for latency MM1Queue_a033_s075
2025-09-12 10:31:15,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 14 hours, 42 minutes, 5 seconds)
2025-09-12 10:42:32,582 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:42:32,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:46:00,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2349.03906 ± 911.022
2025-09-12 10:46:00,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1440.9913, 899.17413, 2922.175, 1679.669, 3103.6753, 3172.0642, 1097.1326, 3048.7888, 2694.4917, 3432.2295]
2025-09-12 10:46:00,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [524.0, 312.0, 1000.0, 550.0, 951.0, 1000.0, 371.0, 901.0, 1000.0, 1000.0]
2025-09-12 10:46:00,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (2349.04) for latency MM1Queue_a033_s075
2025-09-12 10:46:00,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 14 hours, 50 minutes, 36 seconds)
2025-09-12 10:57:08,847 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:57:08,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:00:34,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2340.28809 ± 920.965
2025-09-12 11:00:34,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1125.8812, 1376.8967, 3143.796, 1740.5424, 3090.0574, 1023.4724, 2059.8198, 3453.7551, 3202.6628, 3185.9966]
2025-09-12 11:00:34,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [392.0, 491.0, 1000.0, 690.0, 1000.0, 355.0, 649.0, 982.0, 999.0, 1000.0]
2025-09-12 11:00:34,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 14 hours, 51 minutes, 52 seconds)
2025-09-12 11:11:57,455 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:11:57,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:15:00,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2104.76099 ± 956.073
2025-09-12 11:15:00,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1354.7019, 3284.268, 1323.3197, 3172.557, 1328.6847, 1989.3, 3375.272, 714.2355, 3036.5286, 1468.7429]
2025-09-12 11:15:00,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [437.0, 1000.0, 451.0, 1000.0, 444.0, 657.0, 1000.0, 252.0, 1000.0, 400.0]
2025-09-12 11:15:00,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 14 hours, 38 minutes, 1 second)
2025-09-12 11:26:01,493 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:26:01,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:28:28,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1724.95972 ± 794.203
2025-09-12 11:28:28,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2077.7234, 787.12836, 3286.7542, 2309.7512, 1430.6807, 1056.9845, 1071.4412, 2016.089, 755.12964, 2457.9143]
2025-09-12 11:28:28,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [596.0, 290.0, 1000.0, 682.0, 433.0, 348.0, 369.0, 608.0, 259.0, 760.0]
2025-09-12 11:28:28,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 9 minutes, 4 seconds)
2025-09-12 11:39:27,557 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:39:27,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:41:56,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1653.19995 ± 968.354
2025-09-12 11:41:56,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3327.8762, 3587.7976, 1051.5857, 831.1247, 1370.871, 1253.7999, 2133.3467, 1039.4574, 909.37354, 1026.7684]
2025-09-12 11:41:56,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 343.0, 322.0, 506.0, 477.0, 715.0, 360.0, 315.0, 371.0]
2025-09-12 11:41:56,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 13 hours, 40 minutes, 6 seconds)
2025-09-12 11:53:35,769 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:53:35,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:56:40,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2294.02856 ± 1245.120
2025-09-12 11:56:40,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1493.7366, 1423.8593, 767.7692, 2847.9902, 90.50817, 3609.9893, 2047.1244, 3543.7288, 3377.41, 3738.1697]
2025-09-12 11:56:40,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [443.0, 432.0, 305.0, 845.0, 76.0, 994.0, 624.0, 1000.0, 1000.0, 987.0]
2025-09-12 11:56:40,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 13 hours, 25 minutes, 35 seconds)
2025-09-12 12:07:45,887 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:07:45,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:12:20,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3268.03955 ± 102.864
2025-09-12 12:12:20,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3363.411, 3168.377, 3212.237, 3296.5676, 3388.947, 3224.612, 3318.7087, 3052.6604, 3255.5464, 3399.3308]
2025-09-12 12:12:20,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:12:20,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3268.04) for latency MM1Queue_a033_s075
2025-09-12 12:12:20,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 13 hours, 23 minutes, 41 seconds)
2025-09-12 12:23:57,111 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:23:57,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:28:14,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3254.83447 ± 574.066
2025-09-12 12:28:14,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3572.854, 3319.7334, 3474.8694, 3177.3787, 3331.2847, 3381.494, 3620.673, 3556.1968, 1578.3936, 3535.464]
2025-09-12 12:28:14,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 536.0, 1000.0]
2025-09-12 12:28:14,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 13 hours, 25 minutes, 38 seconds)
2025-09-12 12:38:46,677 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:38:46,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:42:56,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3150.71191 ± 976.340
2025-09-12 12:42:56,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3441.6428, 3641.5627, 255.02663, 3641.4668, 3274.3896, 3669.8691, 3497.0896, 3206.6697, 3494.5957, 3384.8074]
2025-09-12 12:42:56,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 139.0, 1000.0, 1000.0, 1000.0, 1000.0, 906.0, 1000.0, 1000.0]
2025-09-12 12:42:56,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 13 hours, 24 minutes, 13 seconds)
2025-09-12 12:54:11,505 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:54:11,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:58:03,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2775.26416 ± 1142.333
2025-09-12 12:58:03,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3196.8213, 3209.758, 82.39465, 3460.2664, 3453.917, 3294.9583, 985.815, 3283.1558, 3458.3132, 3327.2415]
2025-09-12 12:58:03,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 79.0, 1000.0, 1000.0, 1000.0, 295.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:58:03,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 26 minutes, 45 seconds)
2025-09-12 13:09:44,411 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:09:44,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:14:21,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3489.42920 ± 121.535
2025-09-12 13:14:21,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3596.5466, 3408.6611, 3359.3013, 3334.765, 3362.2131, 3659.3308, 3436.5447, 3484.59, 3583.8044, 3668.5364]
2025-09-12 13:14:21,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:14:21,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3489.43) for latency MM1Queue_a033_s075
2025-09-12 13:14:21,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 27 minutes, 53 seconds)
2025-09-12 13:25:40,494 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:25:40,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:30:12,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3568.95117 ± 115.924
2025-09-12 13:30:12,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3582.7905, 3529.3733, 3582.0051, 3569.419, 3389.2075, 3772.9204, 3735.4778, 3582.2607, 3394.8193, 3551.24]
2025-09-12 13:30:12,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:30:12,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3568.95) for latency MM1Queue_a033_s075
2025-09-12 13:30:12,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 14 minutes, 15 seconds)
2025-09-12 13:41:23,250 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:41:23,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:45:34,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3221.69287 ± 1005.836
2025-09-12 13:45:34,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3692.1267, 3657.1038, 3471.593, 3536.5261, 3123.8435, 243.75891, 3708.8188, 3575.4475, 3525.084, 3682.626]
2025-09-12 13:45:34,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 129.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:45:34,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 53 minutes, 17 seconds)
2025-09-12 13:56:53,924 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:56:53,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:01:16,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3398.05542 ± 589.467
2025-09-12 14:01:16,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3517.4333, 3822.814, 3746.0923, 3652.8213, 1701.0394, 3665.1711, 3543.589, 3491.6062, 3656.5308, 3183.4565]
2025-09-12 14:01:16,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 559.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:01:16,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 47 minutes, 46 seconds)
2025-09-12 14:11:56,179 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:11:56,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:16:31,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3565.31836 ± 107.402
2025-09-12 14:16:31,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3544.9026, 3575.9148, 3588.6934, 3628.3962, 3676.4507, 3571.042, 3365.6487, 3628.1113, 3374.9927, 3699.0305]
2025-09-12 14:16:31,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:16:31,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 33 minutes, 18 seconds)
2025-09-12 14:28:17,720 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:28:17,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:32:24,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3215.78784 ± 1036.406
2025-09-12 14:32:24,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3568.6296, 3223.9036, 3669.3406, 3522.4214, 3744.8682, 3505.7346, 3471.3926, 3556.7698, 137.17453, 3757.6462]
2025-09-12 14:32:24,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 136.0, 1000.0]
2025-09-12 14:32:24,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 13 minutes, 41 seconds)
2025-09-12 14:43:39,334 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:43:39,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:48:14,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3824.13599 ± 134.922
2025-09-12 14:48:14,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4012.8977, 3719.105, 3767.143, 3969.0527, 3797.1877, 3916.765, 3993.2224, 3794.7515, 3661.0166, 3610.2175]
2025-09-12 14:48:14,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:48:14,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3824.14) for latency MM1Queue_a033_s075
2025-09-12 14:48:14,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 58 minutes)
2025-09-12 14:59:29,493 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:59:29,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:04:05,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3781.47729 ± 203.969
2025-09-12 15:04:05,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3460.3303, 3818.641, 3561.2283, 3864.6921, 3877.4514, 3809.75, 3643.7605, 3806.8574, 4253.2173, 3718.8428]
2025-09-12 15:04:05,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:04:05,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 46 minutes, 38 seconds)
2025-09-12 15:15:21,993 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:15:21,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:19:56,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4045.70117 ± 133.608
2025-09-12 15:19:56,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3920.9397, 3934.332, 3873.5627, 4084.4922, 3965.602, 3902.9014, 4145.2495, 4177.2676, 4218.4644, 4234.1987]
2025-09-12 15:19:56,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:19:56,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (4045.70) for latency MM1Queue_a033_s075
2025-09-12 15:19:56,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 32 minutes, 16 seconds)
2025-09-12 15:31:12,591 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:31:12,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:35:48,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3939.56201 ± 123.259
2025-09-12 15:35:48,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3764.1736, 3986.5024, 3815.5547, 3825.5544, 3835.3997, 4097.2085, 4016.2346, 4129.489, 3885.7502, 4039.753]
2025-09-12 15:35:48,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:35:48,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 21 minutes, 47 seconds)
2025-09-12 15:47:04,313 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:47:04,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:51:31,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3765.63135 ± 386.437
2025-09-12 15:51:31,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3858.2742, 3873.5505, 3753.156, 3788.7112, 3953.6638, 3915.92, 3965.5513, 3860.6711, 4053.6624, 2633.152]
2025-09-12 15:51:31,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 683.0]
2025-09-12 15:51:31,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 4 minutes, 31 seconds)
2025-09-12 16:02:38,261 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:02:38,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:07:13,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3802.35547 ± 166.633
2025-09-12 16:07:13,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3723.6313, 4156.254, 3898.2307, 3881.0247, 3474.8057, 3855.2527, 3758.2778, 3852.1191, 3712.7456, 3711.2146]
2025-09-12 16:07:13,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:07:13,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 47 minutes, 33 seconds)
2025-09-12 16:18:29,864 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:18:29,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:23:01,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3923.20557 ± 80.827
2025-09-12 16:23:01,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3899.5488, 3895.353, 3767.4326, 4016.1611, 3929.8381, 3984.2803, 4028.323, 3955.1548, 3803.3042, 3952.6614]
2025-09-12 16:23:01,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:23:01,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 31 minutes, 27 seconds)
2025-09-12 16:34:14,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:34:14,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:38:49,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3955.35815 ± 125.793
2025-09-12 16:38:49,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3826.5022, 3843.4272, 4168.7114, 4052.419, 4032.7158, 3844.8228, 3815.1172, 4134.1064, 3945.5957, 3890.164]
2025-09-12 16:38:49,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:38:49,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 15 minutes, 14 seconds)
2025-09-12 16:50:01,646 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:50:01,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:54:35,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3937.70581 ± 54.519
2025-09-12 16:54:35,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3911.6528, 3875.0498, 3916.1548, 3972.675, 3961.345, 3817.6482, 3956.3767, 3995.396, 3996.2437, 3974.5144]
2025-09-12 16:54:35,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:54:35,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 58 minutes, 49 seconds)
2025-09-12 17:05:47,999 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:05:48,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:10:19,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3824.98047 ± 46.468
2025-09-12 17:10:19,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3778.663, 3857.8445, 3869.6108, 3720.9502, 3864.548, 3829.7603, 3868.1394, 3826.9663, 3847.965, 3785.3557]
2025-09-12 17:10:19,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:10:19,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 43 minutes, 11 seconds)
2025-09-12 17:21:34,156 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:21:34,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:25:57,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3751.37549 ± 503.618
2025-09-12 17:25:57,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4080.447, 3782.582, 3951.229, 3752.0928, 3835.7979, 4161.095, 3895.4255, 2305.4382, 3687.5408, 4062.107]
2025-09-12 17:25:57,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 678.0, 1000.0, 1000.0]
2025-09-12 17:25:57,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 26 minutes, 54 seconds)
2025-09-12 17:36:26,690 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:36:26,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:40:59,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3886.05225 ± 166.098
2025-09-12 17:40:59,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4000.0452, 3813.832, 4008.0466, 3636.2747, 3923.0596, 4090.8257, 3700.7122, 3635.4824, 4007.0896, 4045.1575]
2025-09-12 17:40:59,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:40:59,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 5 minutes, 45 seconds)
2025-09-12 17:52:14,149 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:52:14,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:56:45,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3926.19580 ± 73.519
2025-09-12 17:56:45,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3830.896, 3905.0386, 3941.5837, 4058.3855, 4025.7144, 3887.5586, 3920.0176, 3980.382, 3892.502, 3819.8796]
2025-09-12 17:56:45,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:56:45,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 49 minutes, 55 seconds)
2025-09-12 18:07:59,834 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:07:59,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:12:32,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4068.40698 ± 120.769
2025-09-12 18:12:32,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4041.048, 4274.225, 4115.252, 4128.3125, 3934.108, 4070.555, 4152.0146, 3832.6936, 3977.4717, 4158.3906]
2025-09-12 18:12:32,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:12:32,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (4068.41) for latency MM1Queue_a033_s075
2025-09-12 18:12:32,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 34 minutes, 22 seconds)
2025-09-12 18:23:46,872 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:23:46,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:28:22,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3979.95068 ± 121.676
2025-09-12 18:28:22,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4119.503, 3958.7153, 3943.433, 4142.184, 3881.841, 3797.7878, 4050.6704, 3968.5962, 4130.044, 3806.7358]
2025-09-12 18:28:22,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:28:22,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 19 minutes, 30 seconds)
2025-09-12 18:39:36,252 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:39:36,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:44:08,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4106.33057 ± 92.511
2025-09-12 18:44:08,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4178.5713, 4054.639, 3948.5308, 4324.231, 4088.1995, 4059.5845, 4083.5933, 4108.653, 4075.002, 4142.295]
2025-09-12 18:44:08,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:44:08,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (4106.33) for latency MM1Queue_a033_s075
2025-09-12 18:44:08,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 4 minutes, 44 seconds)
2025-09-12 18:55:22,491 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:55:22,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:59:59,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4139.45215 ± 95.506
2025-09-12 18:59:59,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4187.4287, 4022.102, 4227.6475, 4046.587, 4181.413, 4128.374, 4113.296, 3978.261, 4212.1533, 4297.257]
2025-09-12 18:59:59,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:59:59,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (4139.45) for latency MM1Queue_a033_s075
2025-09-12 18:59:59,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 53 minutes, 58 seconds)
2025-09-12 19:11:23,400 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:11:23,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:16:00,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4100.45215 ± 70.190
2025-09-12 19:16:00,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4076.993, 4098.0654, 4179.1646, 4054.662, 4050.5437, 4023.1733, 4034.7405, 4226.7446, 4059.8083, 4200.626]
2025-09-12 19:16:00,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:16:00,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 39 minutes, 37 seconds)
2025-09-12 19:27:15,755 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:27:15,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:31:51,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4109.96143 ± 78.006
2025-09-12 19:31:51,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3945.7554, 4061.1956, 4179.5405, 4097.365, 4140.9346, 4076.7673, 4252.484, 4077.8564, 4165.171, 4102.543]
2025-09-12 19:31:51,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:31:51,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 24 minutes, 14 seconds)
2025-09-12 19:43:02,015 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:43:02,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:47:37,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4070.52930 ± 61.138
2025-09-12 19:47:37,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4077.6125, 4131.529, 4031.6345, 4154.947, 4067.9583, 4089.4, 4028.2646, 4107.027, 4091.4758, 3925.4468]
2025-09-12 19:47:37,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:47:37,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 7 minutes, 56 seconds)
2025-09-12 19:59:00,859 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:59:00,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:03:36,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4138.73535 ± 61.067
2025-09-12 20:03:36,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4241.6353, 4122.9116, 4074.64, 4161.927, 4054.7058, 4073.3882, 4234.572, 4129.8013, 4123.6646, 4170.1104]
2025-09-12 20:03:36,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:03:36,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 53 minutes, 14 seconds)
2025-09-12 20:15:07,522 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:15:07,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:19:43,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4178.64844 ± 105.986
2025-09-12 20:19:43,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4236.343, 4268.295, 4065.29, 4225.483, 4090.6309, 4219.7495, 4253.855, 4002.8108, 4074.7498, 4349.278]
2025-09-12 20:19:43,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:19:43,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (4178.65) for latency MM1Queue_a033_s075
2025-09-12 20:19:43,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 38 minutes, 41 seconds)
2025-09-12 20:31:06,155 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:31:06,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:35:44,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4177.79980 ± 116.604
2025-09-12 20:35:44,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3995.1384, 4128.7393, 4393.2524, 4105.1685, 4197.595, 4167.4795, 4317.111, 4181.022, 4030.9426, 4261.549]
2025-09-12 20:35:44,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:35:44,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 22 minutes, 44 seconds)
2025-09-12 20:47:10,965 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:47:10,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:51:51,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4176.64111 ± 75.823
2025-09-12 20:51:51,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4208.713, 4252.795, 4176.4766, 3996.6455, 4260.7207, 4189.2925, 4185.8105, 4214.9795, 4081.891, 4199.0874]
2025-09-12 20:51:51,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:51:51,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 7 minutes, 59 seconds)
2025-09-12 21:03:17,211 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:03:17,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:07:37,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3931.89917 ± 858.475
2025-09-12 21:07:37,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4328.467, 4104.5815, 4136.3667, 1366.8428, 4308.891, 4249.983, 4315.9873, 4171.6875, 4195.3804, 4140.8066]
2025-09-12 21:07:37,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 371.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:07:37,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 51 minutes, 59 seconds)
2025-09-12 21:19:04,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:19:04,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:23:42,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4196.64307 ± 75.247
2025-09-12 21:23:42,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4236.8354, 4252.3857, 4030.8572, 4239.869, 4248.602, 4239.3955, 4214.3506, 4094.513, 4264.558, 4145.0645]
2025-09-12 21:23:42,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:23:42,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (4196.64) for latency MM1Queue_a033_s075
2025-09-12 21:23:42,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 36 minutes, 27 seconds)
2025-09-12 21:35:10,961 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:35:10,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:39:48,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4259.83301 ± 66.788
2025-09-12 21:39:48,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4259.152, 4341.6304, 4153.43, 4358.949, 4288.8784, 4230.358, 4285.334, 4157.357, 4305.777, 4217.464]
2025-09-12 21:39:48,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:39:48,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (4259.83) for latency MM1Queue_a033_s075
2025-09-12 21:39:48,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 20 minutes, 21 seconds)
2025-09-12 21:51:16,119 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:51:16,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:55:50,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4087.80151 ± 65.651
2025-09-12 21:55:50,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4022.9568, 4018.6157, 4024.801, 4062.8481, 4230.325, 4111.893, 4041.7573, 4081.6943, 4123.0337, 4160.0913]
2025-09-12 21:55:50,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:55:50,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 4 minutes, 23 seconds)
2025-09-12 22:07:19,360 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:07:19,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:12:06,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4283.97852 ± 110.262
2025-09-12 22:12:06,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4241.059, 4339.398, 4421.5107, 4010.1506, 4248.766, 4278.853, 4314.8745, 4229.773, 4373.1895, 4382.2134]
2025-09-12 22:12:06,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:12:06,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (4283.98) for latency MM1Queue_a033_s075
2025-09-12 22:12:06,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 48 minutes, 51 seconds)
2025-09-12 22:23:48,987 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:23:48,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:28:32,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4268.05957 ± 152.439
2025-09-12 22:28:32,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4273.576, 4388.0205, 4440.5522, 4250.7046, 4113.302, 4082.8733, 4419.423, 4475.492, 4021.1711, 4215.4814]
2025-09-12 22:28:32,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:28:32,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 35 minutes, 5 seconds)
2025-09-12 22:40:12,147 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:40:12,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:44:56,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4377.92334 ± 90.386
2025-09-12 22:44:56,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4517.0347, 4394.7144, 4379.689, 4435.6333, 4450.811, 4297.9355, 4324.4624, 4370.487, 4433.161, 4175.308]
2025-09-12 22:44:56,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:44:56,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (4377.92) for latency MM1Queue_a033_s075
2025-09-12 22:44:56,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 19 minutes, 55 seconds)
2025-09-12 22:56:34,073 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:56:34,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:01:12,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4355.34131 ± 114.071
2025-09-12 23:01:12,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4197.907, 4393.9497, 4354.6016, 4220.5186, 4517.053, 4490.102, 4328.9536, 4191.6924, 4391.02, 4467.619]
2025-09-12 23:01:12,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:01:12,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 4 minutes, 12 seconds)
2025-09-12 23:12:56,528 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:12:56,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:17:31,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4313.52930 ± 81.650
2025-09-12 23:17:31,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4527.7856, 4306.0356, 4348.2583, 4278.606, 4350.2266, 4266.416, 4278.658, 4207.162, 4304.449, 4267.695]
2025-09-12 23:17:31,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:17:31,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 48 minutes, 41 seconds)
2025-09-12 23:28:59,337 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:28:59,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:33:39,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4401.82080 ± 78.296
2025-09-12 23:33:39,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4479.2656, 4452.8276, 4394.002, 4484.436, 4413.957, 4238.559, 4290.173, 4366.2656, 4468.9214, 4429.8]
2025-09-12 23:33:39,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:33:39,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (4401.82) for latency MM1Queue_a033_s075
2025-09-12 23:33:39,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 32 minutes, 3 seconds)
2025-09-12 23:45:09,721 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:45:09,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:49:47,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4320.54590 ± 85.711
2025-09-12 23:49:47,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4455.721, 4285.57, 4313.056, 4347.43, 4370.783, 4348.501, 4381.594, 4116.548, 4254.149, 4332.1113]
2025-09-12 23:49:47,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:49:47,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 15 minutes)
2025-09-13 00:01:17,966 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:01:17,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:05:57,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4166.23340 ± 69.599
2025-09-13 00:05:57,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4097.968, 4266.831, 4081.5225, 4122.519, 4188.284, 4223.001, 4105.7783, 4097.797, 4208.962, 4269.673]
2025-09-13 00:05:57,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:05:57,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 58 minutes, 15 seconds)
2025-09-13 00:17:27,378 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:17:27,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:22:08,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4275.50293 ± 58.949
2025-09-13 00:22:08,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4381.2646, 4277.941, 4214.393, 4185.3496, 4348.1274, 4332.272, 4278.681, 4242.792, 4263.017, 4231.194]
2025-09-13 00:22:08,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:22:08,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 41 minutes, 51 seconds)
2025-09-13 00:33:43,979 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:33:43,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:38:26,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4400.14746 ± 45.108
2025-09-13 00:38:26,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4357.0947, 4422.807, 4399.732, 4415.4478, 4404.5254, 4291.6113, 4407.283, 4441.355, 4396.7656, 4464.8486]
2025-09-13 00:38:26,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:38:26,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 25 minutes, 40 seconds)
2025-09-13 00:50:07,110 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:50:07,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:54:46,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4475.45947 ± 84.005
2025-09-13 00:54:46,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4489.921, 4424.8423, 4404.3003, 4553.209, 4412.757, 4469.9424, 4610.871, 4513.6504, 4560.1387, 4314.959]
2025-09-13 00:54:46,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:54:46,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (4475.46) for latency MM1Queue_a033_s075
2025-09-13 00:54:46,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 9 minutes, 46 seconds)
2025-09-13 01:06:19,223 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:06:19,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:11:01,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4409.43604 ± 69.653
2025-09-13 01:11:01,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4432.9473, 4505.1636, 4416.994, 4359.2, 4395.4897, 4455.5557, 4286.2373, 4364.1772, 4353.0215, 4525.571]
2025-09-13 01:11:01,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:11:01,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 53 minutes, 44 seconds)
2025-09-13 01:22:36,233 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:22:36,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:27:14,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4378.32275 ± 70.418
2025-09-13 01:27:14,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4403.037, 4317.5884, 4437.58, 4402.2993, 4415.5767, 4238.9473, 4274.879, 4457.7817, 4429.2104, 4406.3276]
2025-09-13 01:27:14,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:27:14,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 37 minutes, 32 seconds)
2025-09-13 01:38:48,667 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:38:48,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:43:25,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4376.25244 ± 96.398
2025-09-13 01:43:25,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4408.448, 4299.7944, 4508.2534, 4457.923, 4155.9326, 4390.951, 4372.0566, 4470.523, 4386.9824, 4311.6567]
2025-09-13 01:43:25,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:43:25,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 21 minutes, 16 seconds)
2025-09-13 01:54:59,615 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:54:59,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:59:24,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4190.31396 ± 919.920
2025-09-13 01:59:24,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1439.105, 4555.568, 4490.88, 4367.813, 4487.4453, 4373.5435, 4501.1055, 4601.938, 4514.1396, 4571.5996]
2025-09-13 01:59:24,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [382.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:59:24,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 4 minutes, 46 seconds)
2025-09-13 02:10:58,851 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:10:58,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:15:35,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4336.51367 ± 133.791
2025-09-13 02:15:35,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3968.221, 4399.9062, 4413.2974, 4292.289, 4350.345, 4496.656, 4355.7227, 4353.142, 4328.698, 4406.859]
2025-09-13 02:15:35,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:15:35,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 48 minutes, 29 seconds)
2025-09-13 02:27:08,755 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:27:08,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:31:46,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4485.74121 ± 55.723
2025-09-13 02:31:46,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4610.2593, 4513.226, 4488.7603, 4432.4478, 4477.8687, 4526.1514, 4444.937, 4475.836, 4394.0034, 4493.926]
2025-09-13 02:31:46,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:31:46,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (4485.74) for latency MM1Queue_a033_s075
2025-09-13 02:31:46,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 32 minutes, 18 seconds)
2025-09-13 02:43:29,284 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:43:29,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:48:11,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4435.79053 ± 118.701
2025-09-13 02:48:11,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4114.2812, 4513.929, 4427.8994, 4458.7563, 4463.0654, 4505.9155, 4497.428, 4495.6055, 4534.727, 4346.2983]
2025-09-13 02:48:11,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:48:11,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 11 seconds)
2025-09-13 02:59:41,789 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:59:41,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:04:22,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4280.92676 ± 60.342
2025-09-13 03:04:22,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4271.034, 4316.278, 4422.7827, 4227.6587, 4237.884, 4275.4946, 4289.2393, 4330.8164, 4220.3096, 4217.7725]
2025-09-13 03:04:22,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:04:22,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1251 [DEBUG]: Training session finished
