2025-05-09 09:43:30,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay
2025-05-09 09:43:30,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay
2025-05-09 09:43:30,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x154e84c61610>}
2025-05-09 09:43:30,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1111 [DEBUG]: using device: cuda
2025-05-09 09:43:30,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1133 [INFO]: Creating new trainer
2025-05-09 09:43:30,166 baseline-mbpac-noisy-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-09 09:43:30,166 baseline-mbpac-noisy-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-09 09:43:30,174 baseline-mbpac-noisy-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-05-09 09:43:30,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1194 [DEBUG]: Starting training session...
2025-05-09 09:43:30,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 1/100
2025-05-09 09:52:34,083 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 09:52:34,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:53:30,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 299.55954 ± 131.349
2025-05-09 09:53:30,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [252.30446, 570.6029, 299.90005, 264.1632, 347.3146, 100.7083, 369.41702, 97.06211, 306.11334, 388.0096]
2025-05-09 09:53:30,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 486.0, 388.0, 157.0, 206.0, 110.0, 272.0, 133.0, 188.0, 299.0]
2025-05-09 09:53:30,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (299.56) for latency MM1Queue_a033_s075
2025-05-09 09:53:30,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 09:53:30,545 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 09:53:30,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 16 hours, 29 minutes, 35 seconds)
2025-05-09 10:03:30,315 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:03:30,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:04:00,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 18.72745 ± 78.736
2025-05-09 10:04:00,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [7.992301, 236.01083, -3.2695084, -5.6278133, -14.1973715, -40.199635, -0.79209757, -68.009346, 52.01005, 23.357069]
2025-05-09 10:04:00,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [155.0, 232.0, 117.0, 76.0, 113.0, 145.0, 22.0, 198.0, 226.0, 124.0]
2025-05-09 10:04:01,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 16 hours, 44 minutes, 54 seconds)
2025-05-09 10:13:58,742 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:13:58,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:14:35,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 199.04565 ± 97.039
2025-05-09 10:14:35,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [49.87261, 179.79332, 408.07083, 186.02592, 210.9527, 186.0677, 110.76294, 260.6002, 110.30676, 288.00354]
2025-05-09 10:14:35,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [137.0, 146.0, 308.0, 103.0, 111.0, 116.0, 211.0, 148.0, 178.0, 187.0]
2025-05-09 10:14:35,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 16 hours, 44 minutes, 50 seconds)
2025-05-09 10:24:13,705 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:24:13,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:24:55,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 249.80305 ± 116.719
2025-05-09 10:24:55,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [348.35358, 245.57089, 223.80875, 53.44041, 291.8262, 509.7864, 187.02866, 230.05896, 272.63107, 135.5256]
2025-05-09 10:24:55,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [230.0, 371.0, 131.0, 189.0, 184.0, 334.0, 112.0, 121.0, 148.0, 86.0]
2025-05-09 10:24:55,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 16 hours, 33 minutes, 55 seconds)
2025-05-09 10:33:45,011 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:33:45,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:34:20,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 292.26453 ± 58.492
2025-05-09 10:34:20,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [277.05225, 283.8612, 415.40063, 275.57434, 313.13495, 283.193, 358.732, 286.9856, 188.21245, 240.49867]
2025-05-09 10:34:20,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [155.0, 142.0, 248.0, 137.0, 171.0, 171.0, 187.0, 147.0, 110.0, 116.0]
2025-05-09 10:34:20,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 16 hours, 5 minutes, 44 seconds)
2025-05-09 10:43:26,699 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:43:26,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:44:49,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 533.57349 ± 268.030
2025-05-09 10:44:49,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [974.9148, 1087.3405, 531.1133, 586.1489, 317.39227, 267.69785, 489.67474, 414.5191, 316.4356, 350.49774]
2025-05-09 10:44:49,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 951.0, 210.0, 332.0, 152.0, 330.0, 207.0, 211.0, 158.0, 149.0]
2025-05-09 10:44:49,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (533.57) for latency MM1Queue_a033_s075
2025-05-09 10:44:49,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 10:44:49,125 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:44:49,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 16 hours, 4 minutes, 40 seconds)
2025-05-09 10:54:07,552 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:54:07,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:54:46,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 375.75735 ± 135.174
2025-05-09 10:54:46,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [474.5023, 374.682, 359.1562, 365.13803, 9.712052, 478.1384, 310.67654, 438.36334, 453.51492, 493.68967]
2025-05-09 10:54:46,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [192.0, 180.0, 189.0, 171.0, 19.0, 246.0, 156.0, 182.0, 233.0, 201.0]
2025-05-09 10:54:46,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 15 hours, 43 minutes, 56 seconds)
2025-05-09 11:03:57,270 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:03:57,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:04:31,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 321.41006 ± 124.422
2025-05-09 11:04:31,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [489.78708, 316.84003, 280.14603, 466.22046, 328.36313, 272.48596, 37.522778, 435.66248, 238.20392, 348.86887]
2025-05-09 11:04:31,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [209.0, 163.0, 146.0, 182.0, 157.0, 134.0, 56.0, 181.0, 131.0, 179.0]
2025-05-09 11:04:31,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 15 hours, 18 minutes, 44 seconds)
2025-05-09 11:13:47,757 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:13:47,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:14:28,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 384.90353 ± 92.228
2025-05-09 11:14:28,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [379.42102, 254.42528, 371.47128, 512.77563, 465.66464, 347.649, 440.249, 316.74197, 246.50465, 514.13275]
2025-05-09 11:14:28,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [164.0, 121.0, 161.0, 251.0, 199.0, 160.0, 235.0, 149.0, 117.0, 261.0]
2025-05-09 11:14:28,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 15 hours, 1 minute, 37 seconds)
2025-05-09 11:23:35,914 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:23:36,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:24:10,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 307.43274 ± 67.265
2025-05-09 11:24:10,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [315.0208, 315.05038, 301.20538, 365.7602, 268.0004, 250.71043, 344.89514, 161.25192, 327.82922, 424.6038]
2025-05-09 11:24:10,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [158.0, 155.0, 165.0, 185.0, 127.0, 134.0, 165.0, 93.0, 177.0, 203.0]
2025-05-09 11:24:10,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 14 hours, 56 minutes, 54 seconds)
2025-05-09 11:33:26,627 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:33:26,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:34:11,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 383.29211 ± 131.596
2025-05-09 11:34:11,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [302.2297, 312.82892, 724.7925, 374.19385, 240.01067, 331.2838, 395.54166, 466.39023, 267.42944, 418.22037]
2025-05-09 11:34:11,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 141.0, 456.0, 175.0, 113.0, 166.0, 232.0, 217.0, 133.0, 202.0]
2025-05-09 11:34:11,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 14 hours, 38 minutes, 39 seconds)
2025-05-09 11:43:25,744 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:43:25,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:44:08,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 365.67368 ± 166.212
2025-05-09 11:44:08,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [294.91968, 591.01666, 160.11256, 524.4351, 201.30495, 647.9952, 163.50098, 410.43225, 308.24365, 354.77594]
2025-05-09 11:44:08,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [161.0, 331.0, 106.0, 298.0, 126.0, 314.0, 97.0, 217.0, 150.0, 169.0]
2025-05-09 11:44:08,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 14 hours, 28 minutes, 52 seconds)
2025-05-09 11:53:20,392 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:53:20,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:53:55,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 239.79375 ± 140.358
2025-05-09 11:53:55,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [170.29427, 502.84717, 108.401344, 295.65448, 204.39122, 455.93372, 187.37523, 30.053148, 161.44261, 281.54453]
2025-05-09 11:53:55,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [256.0, 370.0, 113.0, 168.0, 110.0, 180.0, 112.0, 44.0, 82.0, 165.0]
2025-05-09 11:53:55,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 14 hours, 19 minutes, 35 seconds)
2025-05-09 12:03:14,620 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:03:14,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:03:51,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 315.39801 ± 74.283
2025-05-09 12:03:51,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [304.83615, 394.04303, 205.89435, 404.13504, 324.2712, 342.9856, 163.79643, 305.65762, 321.83734, 386.52316]
2025-05-09 12:03:51,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [158.0, 217.0, 113.0, 211.0, 164.0, 172.0, 97.0, 145.0, 175.0, 211.0]
2025-05-09 12:03:51,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 14 hours, 9 minutes, 40 seconds)
2025-05-09 12:13:06,984 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:13:07,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:13:53,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 386.91693 ± 48.095
2025-05-09 12:13:53,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [438.9435, 392.84766, 276.80634, 388.7646, 388.78674, 354.6014, 346.0644, 417.44775, 447.7125, 417.1947]
2025-05-09 12:13:53,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [208.0, 181.0, 364.0, 194.0, 198.0, 156.0, 166.0, 207.0, 244.0, 214.0]
2025-05-09 12:13:53,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 14 hours, 5 minutes, 10 seconds)
2025-05-09 12:23:06,023 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:23:06,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:24:00,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 432.98257 ± 194.402
2025-05-09 12:24:00,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [432.83664, 395.014, 53.65487, 599.0284, 433.7473, 340.46457, 628.7107, 426.9281, 239.22517, 780.21576]
2025-05-09 12:24:00,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [238.0, 212.0, 76.0, 277.0, 229.0, 172.0, 274.0, 266.0, 128.0, 598.0]
2025-05-09 12:24:00,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 13 hours, 57 minutes, 5 seconds)
2025-05-09 12:33:17,762 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:33:17,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:34:07,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 451.96234 ± 176.433
2025-05-09 12:34:07,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [505.85165, 493.34845, 708.7391, 483.1587, 373.2176, 693.3435, 380.1658, 474.76193, 51.573456, 355.46402]
2025-05-09 12:34:07,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [221.0, 215.0, 543.0, 211.0, 187.0, 305.0, 189.0, 192.0, 70.0, 165.0]
2025-05-09 12:34:07,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 13 hours, 49 minutes, 49 seconds)
2025-05-09 12:43:21,572 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:43:21,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:44:14,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 542.03131 ± 183.671
2025-05-09 12:44:14,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [624.11365, 1.2879068, 613.77527, 594.1316, 608.312, 549.088, 679.9547, 581.9192, 613.78, 553.9508]
2025-05-09 12:44:14,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [256.0, 13.0, 289.0, 268.0, 248.0, 234.0, 345.0, 234.0, 287.0, 233.0]
2025-05-09 12:44:14,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (542.03) for latency MM1Queue_a033_s075
2025-05-09 12:44:14,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 12:44:14,816 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:44:14,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 13 hours, 45 minutes, 18 seconds)
2025-05-09 12:54:30,561 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:54:30,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:55:25,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 574.32788 ± 54.533
2025-05-09 12:55:25,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [604.41675, 664.3791, 578.8407, 594.73285, 449.9677, 542.3399, 567.32544, 583.18994, 621.62244, 536.4639]
2025-05-09 12:55:25,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [249.0, 267.0, 256.0, 247.0, 209.0, 204.0, 235.0, 227.0, 236.0, 234.0]
2025-05-09 12:55:25,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (574.33) for latency MM1Queue_a033_s075
2025-05-09 12:55:25,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 12:55:25,806 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:55:25,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 13 hours, 55 minutes, 20 seconds)
2025-05-09 13:05:38,938 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:05:39,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:06:42,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 632.44617 ± 70.920
2025-05-09 13:06:42,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [668.13666, 595.43243, 672.23065, 707.34393, 634.0327, 687.1797, 462.19952, 701.76135, 572.4799, 623.66455]
2025-05-09 13:06:42,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [310.0, 246.0, 269.0, 309.0, 251.0, 274.0, 209.0, 322.0, 241.0, 276.0]
2025-05-09 13:06:42,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (632.45) for latency MM1Queue_a033_s075
2025-05-09 13:06:42,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 13:06:42,576 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:06:42,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 14 hours, 5 minutes, 16 seconds)
2025-05-09 13:16:17,015 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:16:17,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:17:07,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 558.69684 ± 73.851
2025-05-09 13:17:07,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [494.17426, 577.1851, 433.5027, 518.427, 605.6241, 472.03467, 588.70734, 693.39667, 612.8995, 591.0172]
2025-05-09 13:17:07,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [205.0, 225.0, 174.0, 220.0, 247.0, 195.0, 236.0, 265.0, 257.0, 234.0]
2025-05-09 13:17:07,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 13 hours, 59 minutes, 7 seconds)
2025-05-09 13:26:54,748 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:26:55,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:27:48,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 559.64441 ± 175.378
2025-05-09 13:27:48,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [723.9172, 484.59488, 654.3355, 485.761, 725.3163, 350.17383, 171.68079, 719.53613, 636.7527, 644.37616]
2025-05-09 13:27:48,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [286.0, 200.0, 255.0, 192.0, 310.0, 164.0, 91.0, 283.0, 247.0, 261.0]
2025-05-09 13:27:48,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 13 hours, 57 minutes, 27 seconds)
2025-05-09 13:37:05,290 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:37:05,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:37:58,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 660.10126 ± 93.707
2025-05-09 13:37:58,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [676.4113, 781.03436, 839.7588, 593.6031, 642.00006, 582.60394, 489.63705, 668.8201, 682.65015, 644.49426]
2025-05-09 13:37:58,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [261.0, 271.0, 292.0, 226.0, 247.0, 232.0, 199.0, 240.0, 250.0, 234.0]
2025-05-09 13:37:58,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (660.10) for latency MM1Queue_a033_s075
2025-05-09 13:37:58,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 13:37:58,395 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:37:58,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 13 hours, 47 minutes, 24 seconds)
2025-05-09 13:47:04,870 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:47:04,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:48:10,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 706.20392 ± 134.120
2025-05-09 13:48:10,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [615.98425, 707.13495, 790.80505, 1071.5906, 636.6672, 644.0659, 705.2347, 585.1777, 684.3395, 621.0398]
2025-05-09 13:48:10,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [251.0, 252.0, 284.0, 615.0, 257.0, 243.0, 279.0, 275.0, 254.0, 227.0]
2025-05-09 13:48:10,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (706.20) for latency MM1Queue_a033_s075
2025-05-09 13:48:10,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 13:48:10,390 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:48:10,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 13 hours, 21 minutes, 41 seconds)
2025-05-09 13:57:15,086 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:57:15,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:58:25,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 955.70166 ± 299.501
2025-05-09 13:58:25,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [752.57184, 799.06665, 1016.9248, 784.9627, 1278.3917, 711.81305, 770.31104, 978.023, 1705.0372, 759.9154]
2025-05-09 13:58:25,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [257.0, 260.0, 329.0, 260.0, 387.0, 273.0, 292.0, 384.0, 501.0, 251.0]
2025-05-09 13:58:25,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (955.70) for latency MM1Queue_a033_s075
2025-05-09 13:58:25,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 13:58:25,478 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:58:25,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 12 hours, 55 minutes, 37 seconds)
2025-05-09 14:07:12,639 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:07:12,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:09:24,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1704.74780 ± 739.163
2025-05-09 14:09:24,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [2874.0183, 1878.3378, 1116.8439, 1361.7963, 907.8725, 1960.4795, 1084.4768, 2942.1277, 802.0793, 2119.446]
2025-05-09 14:09:24,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 690.0, 407.0, 512.0, 364.0, 645.0, 394.0, 1000.0, 290.0, 731.0]
2025-05-09 14:09:24,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (1704.75) for latency MM1Queue_a033_s075
2025-05-09 14:09:24,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 14:09:24,778 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:09:24,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 12 hours, 53 minutes, 55 seconds)
2025-05-09 14:18:49,943 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:18:49,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:20:05,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 970.73694 ± 350.626
2025-05-09 14:20:05,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [1949.3486, 972.3358, 759.38245, 897.4025, 618.86145, 797.35345, 769.9178, 971.55414, 869.4062, 1101.8064]
2025-05-09 14:20:05,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [731.0, 319.0, 269.0, 310.0, 264.0, 289.0, 264.0, 350.0, 297.0, 368.0]
2025-05-09 14:20:05,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 12 hours, 43 minutes, 20 seconds)
2025-05-09 14:28:35,431 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:28:35,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:30:29,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1703.13318 ± 576.408
2025-05-09 14:30:29,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [1803.9564, 1230.1915, 1845.7104, 715.0165, 2332.3098, 2188.6072, 898.4059, 2344.4648, 1408.5515, 2264.1184]
2025-05-09 14:30:29,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [547.0, 438.0, 551.0, 258.0, 693.0, 649.0, 305.0, 690.0, 451.0, 652.0]
2025-05-09 14:30:29,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 12 hours, 36 minutes, 18 seconds)
2025-05-09 14:39:56,061 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:39:56,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:41:56,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1810.05200 ± 680.657
2025-05-09 14:41:56,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [890.7429, 1472.3433, 2855.8958, 1730.1735, 1084.8667, 2267.7317, 1674.3176, 2823.7136, 2259.1755, 1041.5583]
2025-05-09 14:41:56,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [303.0, 428.0, 847.0, 507.0, 371.0, 603.0, 538.0, 838.0, 665.0, 420.0]
2025-05-09 14:41:56,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (1810.05) for latency MM1Queue_a033_s075
2025-05-09 14:41:56,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 14:41:56,824 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:41:56,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 12 hours, 43 minutes, 35 seconds)
2025-05-09 14:50:47,211 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:50:47,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:52:37,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1675.96216 ± 911.364
2025-05-09 14:52:37,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [1974.006, 1706.1329, 812.2803, 1196.5553, 3280.7356, 3447.1272, 970.0129, 1099.4912, 1377.0277, 896.25116]
2025-05-09 14:52:37,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [558.0, 507.0, 284.0, 381.0, 1000.0, 950.0, 303.0, 338.0, 418.0, 288.0]
2025-05-09 14:52:37,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 12 hours, 38 minutes, 49 seconds)
2025-05-09 15:01:47,439 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:01:47,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:04:04,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2056.47046 ± 1038.219
2025-05-09 15:04:04,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [2549.3625, 1081.8604, 1344.1187, 1304.1206, 3427.9514, 1970.9384, 1071.772, 825.76904, 3543.714, 3445.0977]
2025-05-09 15:04:04,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [745.0, 405.0, 425.0, 396.0, 1000.0, 637.0, 334.0, 321.0, 1000.0, 931.0]
2025-05-09 15:04:04,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (2056.47) for latency MM1Queue_a033_s075
2025-05-09 15:04:04,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 15:04:04,404 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:04:04,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 12 hours, 34 minutes, 18 seconds)
2025-05-09 15:13:33,733 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:13:33,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:16:22,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2753.25244 ± 990.172
2025-05-09 15:16:22,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [1198.1906, 3204.8699, 2906.3496, 1936.6406, 3899.1335, 3466.6555, 2845.524, 3427.8533, 938.5099, 3708.796]
2025-05-09 15:16:22,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [371.0, 867.0, 838.0, 530.0, 1000.0, 1000.0, 764.0, 909.0, 316.0, 1000.0]
2025-05-09 15:16:22,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (2753.25) for latency MM1Queue_a033_s075
2025-05-09 15:16:22,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 15:16:22,299 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:16:22,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 12 hours, 45 minutes, 22 seconds)
2025-05-09 15:25:16,664 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:25:16,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:26:54,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1509.37512 ± 573.328
2025-05-09 15:26:54,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [1260.422, 875.2141, 1324.9036, 1399.0757, 1883.9318, 2416.2578, 2598.7043, 1406.5045, 1070.9214, 857.8154]
2025-05-09 15:26:54,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [386.0, 306.0, 434.0, 396.0, 511.0, 642.0, 703.0, 416.0, 336.0, 286.0]
2025-05-09 15:26:54,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 12 hours, 36 minutes, 1 second)
2025-05-09 15:36:23,033 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:36:23,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:38:51,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2286.80811 ± 911.138
2025-05-09 15:38:52,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3624.982, 1468.6078, 2234.4333, 1491.0856, 1780.3536, 2066.8696, 1960.656, 3565.1418, 3593.967, 1081.983]
2025-05-09 15:38:52,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 455.0, 649.0, 450.0, 535.0, 601.0, 593.0, 1000.0, 1000.0, 396.0]
2025-05-09 15:38:52,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 12 hours, 31 minutes, 22 seconds)
2025-05-09 15:48:20,754 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:48:20,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:51:26,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3052.23438 ± 807.677
2025-05-09 15:51:27,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3381.4856, 2040.0161, 1762.3494, 3636.5151, 3706.2393, 1698.2468, 3617.952, 3421.2817, 3663.7478, 3594.5105]
2025-05-09 15:51:27,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [926.0, 587.0, 531.0, 1000.0, 1000.0, 510.0, 1000.0, 985.0, 1000.0, 1000.0]
2025-05-09 15:51:27,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (3052.23) for latency MM1Queue_a033_s075
2025-05-09 15:51:27,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 15:51:27,060 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:51:27,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 12 hours, 44 minutes, 43 seconds)
2025-05-09 16:00:09,911 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:00:09,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:03:50,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3378.93677 ± 281.242
2025-05-09 16:03:50,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3596.9565, 3575.3577, 3521.9377, 3355.6438, 3234.9045, 3407.4548, 3436.887, 2600.4958, 3468.9517, 3590.7764]
2025-05-09 16:03:50,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 917.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:03:50,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (3378.94) for latency MM1Queue_a033_s075
2025-05-09 16:03:50,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 16:03:50,355 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 16:03:50,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 45 minutes)
2025-05-09 16:13:40,432 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:13:40,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:16:06,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2314.37158 ± 914.090
2025-05-09 16:16:06,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [2761.4863, 3668.0474, 3026.913, 2710.1768, 1407.9196, 2026.9695, 1693.4484, 3564.2734, 1314.0646, 970.4167]
2025-05-09 16:16:06,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [733.0, 993.0, 810.0, 751.0, 460.0, 581.0, 497.0, 1000.0, 422.0, 347.0]
2025-05-09 16:16:06,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 32 minutes, 41 seconds)
2025-05-09 16:25:10,253 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:25:10,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:27:53,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2661.09204 ± 933.374
2025-05-09 16:27:53,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [2577.1582, 1843.4772, 2825.5828, 1514.2083, 1533.7328, 3551.1475, 3650.9263, 3790.534, 3751.891, 1572.2643]
2025-05-09 16:27:53,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [709.0, 544.0, 796.0, 447.0, 448.0, 1000.0, 1000.0, 1000.0, 1000.0, 451.0]
2025-05-09 16:27:53,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 36 minutes, 3 seconds)
2025-05-09 16:36:55,405 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:36:55,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:40:16,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3284.10474 ± 555.083
2025-05-09 16:40:16,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3606.2832, 3473.5713, 3528.2085, 3732.1426, 1875.9911, 3647.8826, 3447.363, 3543.0574, 3382.6008, 2603.9443]
2025-05-09 16:40:16,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 540.0, 1000.0, 1000.0, 1000.0, 916.0, 730.0]
2025-05-09 16:40:16,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 29 minutes, 12 seconds)
2025-05-09 16:49:20,634 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:49:20,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:52:22,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3044.48242 ± 1218.956
2025-05-09 16:52:22,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3812.368, 97.88414, 3773.2651, 3266.7183, 3594.4077, 1272.9473, 3811.5232, 3486.27, 3586.4204, 3743.0212]
2025-05-09 16:52:22,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 60.0, 1000.0, 878.0, 997.0, 415.0, 1000.0, 938.0, 1000.0, 1000.0]
2025-05-09 16:52:22,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 11 minutes, 1 second)
2025-05-09 17:01:54,262 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:01:54,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:05:05,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3232.70239 ± 1244.732
2025-05-09 17:05:05,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3708.815, 1677.8566, 3849.0234, 3919.6606, 3824.0002, 3741.5256, 3906.7454, 3857.9265, 36.41846, 3805.0532]
2025-05-09 17:05:05,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 501.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 44.0, 1000.0]
2025-05-09 17:05:05,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 2 minutes, 50 seconds)
2025-05-09 17:13:54,692 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:13:54,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:17:36,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3755.92651 ± 51.724
2025-05-09 17:17:37,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3810.47, 3691.9426, 3735.4502, 3791.4043, 3686.0266, 3789.557, 3769.4307, 3834.1526, 3769.1597, 3681.6736]
2025-05-09 17:17:37,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:17:37,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (3755.93) for latency MM1Queue_a033_s075
2025-05-09 17:17:37,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 17:17:37,498 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 17:17:38,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 53 minutes, 47 seconds)
2025-05-09 17:26:36,687 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:26:36,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:30:01,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3528.62256 ± 487.674
2025-05-09 17:30:01,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3824.8677, 3743.2441, 2729.406, 3769.1812, 3534.3735, 3811.0347, 3736.2825, 2430.626, 3790.0344, 3917.1758]
2025-05-09 17:30:01,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 737.0, 1000.0, 1000.0, 1000.0, 1000.0, 670.0, 1000.0, 1000.0]
2025-05-09 17:30:01,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 48 minutes, 25 seconds)
2025-05-09 17:39:12,765 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:39:12,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:42:48,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3623.57935 ± 410.245
2025-05-09 17:42:48,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3785.7249, 3706.7168, 3743.8474, 3824.6438, 3877.0916, 3802.4563, 3725.3132, 3555.015, 2417.9607, 3797.024]
2025-05-09 17:42:48,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 659.0, 1000.0]
2025-05-09 17:42:48,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 40 minutes, 17 seconds)
2025-05-09 17:52:02,994 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:52:03,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:55:23,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3532.01709 ± 968.450
2025-05-09 17:55:23,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3895.9685, 3925.9485, 3952.9038, 3913.1653, 3899.515, 3036.6125, 4000.6006, 3985.9475, 3965.7942, 743.7163]
2025-05-09 17:55:23,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 767.0, 1000.0, 1000.0, 1000.0, 290.0]
2025-05-09 17:55:23,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 33 minutes, 12 seconds)
2025-05-09 18:04:32,571 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:04:32,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:08:10,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3688.92310 ± 77.564
2025-05-09 18:08:10,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3705.351, 3684.836, 3628.9373, 3694.5571, 3633.9937, 3798.4048, 3788.3542, 3719.2292, 3718.565, 3517.0002]
2025-05-09 18:08:10,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:08:10,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 21 minutes, 11 seconds)
2025-05-09 18:17:25,430 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:17:25,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:20:49,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3546.71924 ± 740.706
2025-05-09 18:20:49,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3892.1987, 3743.4995, 3773.2551, 1333.3579, 3804.2153, 3734.2522, 3750.1875, 3711.6187, 3928.7686, 3795.8374]
2025-05-09 18:20:49,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 403.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:20:49,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 9 minutes, 46 seconds)
2025-05-09 18:30:18,204 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:30:18,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:34:01,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3850.64795 ± 76.648
2025-05-09 18:34:01,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3937.993, 3814.931, 3903.3455, 3730.9128, 3942.8208, 3862.7856, 3818.928, 3888.639, 3710.5686, 3895.5547]
2025-05-09 18:34:01,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:34:01,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (3850.65) for latency MM1Queue_a033_s075
2025-05-09 18:34:01,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 18:34:01,509 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 18:34:01,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 5 minutes, 32 seconds)
2025-05-09 18:43:39,118 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:43:39,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:47:05,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3434.05591 ± 872.206
2025-05-09 18:47:05,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [1702.6053, 3758.6868, 3828.8542, 3854.7422, 3934.2603, 3878.211, 3757.45, 3937.6765, 3999.3145, 1688.7566]
2025-05-09 18:47:05,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [490.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 494.0]
2025-05-09 18:47:05,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 55 minutes, 39 seconds)
2025-05-09 18:56:46,129 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:56:46,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:00:37,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3789.31689 ± 84.618
2025-05-09 19:00:37,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3898.1082, 3775.367, 3864.6199, 3738.6543, 3874.0273, 3627.426, 3719.4265, 3888.9038, 3769.995, 3736.6416]
2025-05-09 19:00:37,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:00:37,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 52 minutes, 24 seconds)
2025-05-09 19:10:05,433 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:10:05,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:13:52,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3906.89893 ± 74.381
2025-05-09 19:13:52,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3895.1523, 3817.2861, 4002.082, 3861.455, 3990.2249, 3849.71, 3888.5732, 3799.485, 3947.0798, 4017.9424]
2025-05-09 19:13:52,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:13:52,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (3906.90) for latency MM1Queue_a033_s075
2025-05-09 19:13:52,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 19:13:52,467 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 19:13:52,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 43 minutes, 53 seconds)
2025-05-09 19:23:34,827 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:23:34,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:26:54,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3442.87158 ± 957.974
2025-05-09 19:26:54,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4031.7085, 1794.4368, 3988.2415, 1457.3298, 4035.1338, 3903.4897, 4056.3225, 3036.3533, 4064.0886, 4061.6104]
2025-05-09 19:26:54,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 475.0, 1000.0, 395.0, 1000.0, 1000.0, 1000.0, 780.0, 1000.0, 1000.0]
2025-05-09 19:26:54,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 34 minutes, 24 seconds)
2025-05-09 19:36:40,248 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:36:40,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:40:18,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3641.57617 ± 869.345
2025-05-09 19:40:18,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3972.1294, 1039.0725, 3905.7043, 3936.0005, 3984.4692, 3960.7646, 3795.1667, 3934.2227, 3884.4915, 4003.7395]
2025-05-09 19:40:18,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 325.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:40:18,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 23 minutes, 7 seconds)
2025-05-09 19:50:00,496 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:50:00,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:53:53,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4093.73193 ± 98.356
2025-05-09 19:53:53,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4261.5586, 4130.3447, 4045.35, 4157.1855, 4008.001, 4007.6738, 4220.4727, 3970.4414, 4150.4126, 3985.8782]
2025-05-09 19:53:53,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:53:53,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (4093.73) for latency MM1Queue_a033_s075
2025-05-09 19:53:53,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 19:53:53,486 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 19:53:53,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 14 minutes, 36 seconds)
2025-05-09 20:03:26,600 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:03:26,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:07:14,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4022.85034 ± 65.590
2025-05-09 20:07:14,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4082.4143, 4098.8076, 4123.1113, 4027.2415, 3913.2078, 3989.997, 4046.295, 4016.2766, 4006.672, 3924.4797]
2025-05-09 20:07:14,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 20:07:14,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 59 minutes, 28 seconds)
2025-05-09 20:16:55,644 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:16:55,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:20:41,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4026.42773 ± 42.812
2025-05-09 20:20:41,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3976.997, 4009.9395, 4062.1843, 3982.1409, 3990.482, 4017.1245, 4064.7832, 4112.4204, 3991.5562, 4056.6497]
2025-05-09 20:20:41,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 20:20:41,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 47 minutes, 59 seconds)
2025-05-09 20:30:54,119 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:30:54,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:34:45,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3984.20068 ± 60.182
2025-05-09 20:34:45,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3978.6423, 4020.434, 3907.0908, 4032.3354, 4019.3206, 3934.0444, 3874.2795, 4040.718, 4069.3376, 3965.8037]
2025-05-09 20:34:45,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 20:34:45,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 43 minutes, 24 seconds)
2025-05-09 20:44:26,755 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:44:27,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:48:15,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3973.31372 ± 79.325
2025-05-09 20:48:15,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3915.8066, 3931.6387, 4084.9746, 4012.989, 3848.8013, 3955.729, 3889.2917, 4074.5737, 4073.186, 3946.1438]
2025-05-09 20:48:15,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 20:48:15,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 30 minutes, 44 seconds)
2025-05-09 20:57:44,202 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:57:44,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:01:28,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4078.27466 ± 111.704
2025-05-09 21:01:28,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4061.2637, 4179.8843, 4232.2163, 4127.2095, 4147.99, 4107.354, 4111.8145, 3927.8384, 3837.9282, 4049.2473]
2025-05-09 21:01:28,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 21:01:28,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 14 minutes, 9 seconds)
2025-05-09 21:10:34,524 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:10:34,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:14:11,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3921.99805 ± 250.124
2025-05-09 21:14:11,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4109.932, 4050.0725, 3894.1392, 3192.6206, 3938.588, 4013.0708, 3965.0386, 4054.306, 4019.6526, 3982.5574]
2025-05-09 21:14:11,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 818.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 21:14:11,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 55 minutes, 40 seconds)
2025-05-09 21:23:10,345 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:23:10,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:26:50,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3983.25146 ± 85.191
2025-05-09 21:26:50,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [3992.691, 4090.4756, 4062.083, 4046.6646, 4016.5615, 3841.7874, 3983.8481, 3922.994, 3833.1802, 4042.2312]
2025-05-09 21:26:50,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 21:26:50,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 36 minutes, 1 second)
2025-05-09 21:35:48,153 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:35:48,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:39:28,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4089.33447 ± 105.814
2025-05-09 21:39:28,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4177.1553, 4133.0913, 4123.3384, 3947.486, 4081.216, 3854.6704, 4201.7754, 4053.3374, 4190.821, 4130.453]
2025-05-09 21:39:28,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 21:39:28,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 11 minutes, 53 seconds)
2025-05-09 21:48:27,017 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:48:27,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:52:08,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4203.20605 ± 51.336
2025-05-09 21:52:08,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4217.001, 4129.1523, 4231.767, 4199.1167, 4295.5894, 4249.4893, 4200.2905, 4168.2627, 4116.718, 4224.6743]
2025-05-09 21:52:08,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 21:52:08,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (4203.21) for latency MM1Queue_a033_s075
2025-05-09 21:52:08,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 21:52:08,147 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 21:52:08,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 52 minutes, 41 seconds)
2025-05-09 22:00:52,550 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:00:52,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:04:32,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4234.57129 ± 56.867
2025-05-09 22:04:32,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4304.665, 4218.167, 4135.114, 4254.674, 4289.9727, 4328.355, 4180.0913, 4225.4463, 4195.9697, 4213.2603]
2025-05-09 22:04:32,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 22:04:32,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (4234.57) for latency MM1Queue_a033_s075
2025-05-09 22:04:32,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 22:04:32,933 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:04:32,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 34 minutes, 9 seconds)
2025-05-09 22:13:30,626 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:13:30,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:17:09,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4149.25684 ± 87.478
2025-05-09 22:17:09,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4069.504, 4244.96, 4109.4614, 4222.8594, 3979.233, 4121.0366, 4267.2705, 4239.56, 4133.964, 4104.7197]
2025-05-09 22:17:09,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 962.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 22:17:09,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 20 minutes, 45 seconds)
2025-05-09 22:26:31,884 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:26:31,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:30:10,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4247.34619 ± 68.717
2025-05-09 22:30:11,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4324.321, 4226.609, 4342.2036, 4236.95, 4195.902, 4288.8877, 4118.8027, 4164.722, 4307.6147, 4267.4478]
2025-05-09 22:30:11,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 22:30:11,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (4247.35) for latency MM1Queue_a033_s075
2025-05-09 22:30:11,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 22:30:11,241 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:30:11,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 10 minutes, 46 seconds)
2025-05-09 22:38:48,934 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:38:49,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:42:42,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4180.68311 ± 74.325
2025-05-09 22:42:42,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4149.2593, 4158.1724, 4323.141, 4181.422, 4140.658, 4120.5845, 4048.8542, 4186.4507, 4222.1997, 4276.0947]
2025-05-09 22:42:42,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 22:42:42,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 57 minutes, 19 seconds)
2025-05-09 22:52:07,036 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:52:07,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:55:46,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4220.24902 ± 53.530
2025-05-09 22:55:46,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4281.0933, 4303.4087, 4204.953, 4175.8394, 4188.4756, 4152.229, 4155.495, 4275.933, 4196.4863, 4268.5796]
2025-05-09 22:55:46,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 22:55:47,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 47 minutes, 22 seconds)
2025-05-09 23:04:45,973 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:04:46,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:08:26,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4289.52979 ± 31.557
2025-05-09 23:08:26,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4257.272, 4279.71, 4320.5176, 4350.277, 4250.167, 4308.3296, 4249.95, 4309.793, 4295.663, 4273.6187]
2025-05-09 23:08:26,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:08:26,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (4289.53) for latency MM1Queue_a033_s075
2025-05-09 23:08:26,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 23:08:26,785 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 23:08:26,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 36 minutes, 9 seconds)
2025-05-09 23:17:25,092 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:17:25,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:21:07,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4329.08447 ± 72.649
2025-05-09 23:21:07,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4306.6504, 4418.392, 4289.7134, 4307.513, 4395.3516, 4165.7705, 4345.1216, 4431.4326, 4307.7715, 4323.127]
2025-05-09 23:21:07,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:21:07,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (4329.08) for latency MM1Queue_a033_s075
2025-05-09 23:21:07,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 23:21:07,408 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 23:21:07,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 23 minutes, 45 seconds)
2025-05-09 23:30:04,225 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:30:04,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:33:44,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4274.91064 ± 82.109
2025-05-09 23:33:44,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4204.5273, 4163.1484, 4252.3066, 4293.67, 4284.3843, 4173.618, 4343.3906, 4389.9395, 4231.5254, 4412.5967]
2025-05-09 23:33:44,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:33:44,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 8 minutes, 32 seconds)
2025-05-09 23:42:49,558 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:42:49,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:46:27,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4234.49121 ± 120.653
2025-05-09 23:46:28,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4202.764, 4281.153, 4360.3994, 4346.323, 4222.665, 4393.0806, 3954.2778, 4162.106, 4255.129, 4167.0127]
2025-05-09 23:46:28,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:46:28,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 57 minutes, 5 seconds)
2025-05-09 23:55:26,803 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:55:26,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:59:07,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4365.27441 ± 86.439
2025-05-09 23:59:07,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4381.5317, 4428.8794, 4380.071, 4323.1577, 4352.6997, 4127.617, 4426.198, 4444.8257, 4400.9126, 4386.852]
2025-05-09 23:59:07,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:59:07,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (4365.27) for latency MM1Queue_a033_s075
2025-05-09 23:59:07,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 23:59:07,692 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 23:59:07,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 42 minutes, 2 seconds)
2025-05-10 00:08:03,603 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:08:03,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:11:18,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3844.27271 ± 1266.826
2025-05-10 00:11:18,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4375.6035, 4274.4966, 4217.359, 4323.998, 47.949883, 4194.785, 4175.9976, 4270.8286, 4323.4473, 4238.26]
2025-05-10 00:11:18,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 44.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 00:11:18,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 26 minutes, 51 seconds)
2025-05-10 00:20:14,295 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:20:14,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:23:55,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4351.22217 ± 64.754
2025-05-10 00:23:55,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4405.2666, 4239.5337, 4330.797, 4252.5527, 4377.258, 4429.5293, 4434.698, 4365.1646, 4371.098, 4306.324]
2025-05-10 00:23:55,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 00:23:55,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 13 minutes, 58 seconds)
2025-05-10 00:32:53,059 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:32:53,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:36:24,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4269.57422 ± 397.313
2025-05-10 00:36:24,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4385.306, 4414.0845, 3081.4604, 4405.362, 4427.311, 4409.343, 4396.4727, 4352.0435, 4468.081, 4356.277]
2025-05-10 00:36:24,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 709.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 00:36:24,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 47 seconds)
2025-05-10 00:45:17,828 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:45:17,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:48:47,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4137.13623 ± 636.501
2025-05-10 00:48:47,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4376.618, 4333.39, 4148.096, 4379.897, 4446.112, 4310.834, 4363.1685, 4366.6763, 4405.4824, 2241.0898]
2025-05-10 00:48:47,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 537.0]
2025-05-10 00:48:47,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 46 minutes, 42 seconds)
2025-05-10 00:57:45,855 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:57:45,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:01:02,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 3863.47192 ± 1286.352
2025-05-10 01:01:02,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4306.6704, 8.339877, 4140.343, 4253.367, 4294.6387, 4307.295, 4313.2583, 4349.9893, 4361.3735, 4299.442]
2025-05-10 01:01:02,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 19.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 01:01:02,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 32 minutes, 25 seconds)
2025-05-10 01:10:00,742 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:10:00,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:13:40,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4347.53418 ± 89.299
2025-05-10 01:13:40,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4364.4307, 4372.0645, 4329.031, 4276.8604, 4442.9404, 4127.307, 4355.134, 4468.5737, 4358.488, 4380.5103]
2025-05-10 01:13:40,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 01:13:40,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 21 minutes, 56 seconds)
2025-05-10 01:22:37,728 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:22:37,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:26:07,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4136.87793 ± 694.030
2025-05-10 01:26:07,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4391.0063, 4207.1177, 4436.809, 4434.8823, 4366.1987, 4348.7705, 4479.246, 2066.7036, 4349.2446, 4288.7974]
2025-05-10 01:26:07,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 518.0, 1000.0, 1000.0]
2025-05-10 01:26:07,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 8 minutes, 48 seconds)
2025-05-10 01:35:05,282 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:35:05,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:38:47,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4465.10596 ± 35.029
2025-05-10 01:38:47,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4460.036, 4459.7036, 4449.165, 4464.227, 4528.4844, 4426.081, 4428.551, 4457.4663, 4532.92, 4444.427]
2025-05-10 01:38:47,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 01:38:47,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (4465.11) for latency MM1Queue_a033_s075
2025-05-10 01:38:47,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 01:38:47,064 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 01:38:47,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 57 minutes, 2 seconds)
2025-05-10 01:47:47,536 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:47:47,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:51:26,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4310.19336 ± 87.594
2025-05-10 01:51:26,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4243.004, 4331.0713, 4326.332, 4232.0146, 4427.7163, 4395.5483, 4324.8096, 4115.084, 4387.3145, 4319.043]
2025-05-10 01:51:26,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 01:51:26,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 45 minutes, 31 seconds)
2025-05-10 02:00:23,475 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:00:23,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:04:02,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4364.43115 ± 121.782
2025-05-10 02:04:02,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4104.6514, 4457.709, 4311.0483, 4397.344, 4534.736, 4227.4697, 4397.461, 4317.243, 4487.2637, 4409.382]
2025-05-10 02:04:02,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 02:04:02,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 34 minutes, 12 seconds)
2025-05-10 02:13:20,988 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:13:20,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:17:02,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4360.73877 ± 152.229
2025-05-10 02:17:02,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4292.5503, 4513.1514, 4396.604, 4530.0503, 4607.463, 4122.341, 4171.6206, 4425.8994, 4297.1772, 4250.532]
2025-05-10 02:17:02,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 02:17:02,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 22 minutes, 47 seconds)
2025-05-10 02:26:01,002 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:26:01,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:29:42,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4447.67236 ± 123.986
2025-05-10 02:29:42,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4459.5933, 4522.0566, 4496.1245, 4579.356, 4544.201, 4521.3857, 4465.42, 4448.8457, 4150.929, 4288.8076]
2025-05-10 02:29:42,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 02:29:42,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 10 minutes, 45 seconds)
2025-05-10 02:38:40,858 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:38:40,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:42:18,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4424.63184 ± 77.989
2025-05-10 02:42:18,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4368.522, 4335.717, 4489.431, 4487.338, 4377.6396, 4386.0605, 4557.034, 4338.481, 4377.6353, 4528.4614]
2025-05-10 02:42:18,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 02:42:18,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 57 minutes, 51 seconds)
2025-05-10 02:51:18,159 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:51:18,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:54:46,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4152.14990 ± 813.226
2025-05-10 02:54:46,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4494.1025, 1718.4207, 4485.079, 4295.979, 4425.23, 4435.396, 4408.0767, 4480.4717, 4364.6494, 4414.093]
2025-05-10 02:54:46,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 451.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 02:54:46,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 44 minutes, 39 seconds)
2025-05-10 03:03:44,968 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:03:45,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:07:28,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4407.28809 ± 80.119
2025-05-10 03:07:29,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4456.383, 4388.5425, 4415.3477, 4249.3916, 4427.99, 4299.396, 4389.8813, 4544.719, 4477.2495, 4423.982]
2025-05-10 03:07:29,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 03:07:29,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 32 minutes, 16 seconds)
2025-05-10 03:15:48,872 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:15:48,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:19:28,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4511.08057 ± 35.319
2025-05-10 03:19:28,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4495.7637, 4478.497, 4528.3154, 4461.505, 4543.4263, 4457.8755, 4574.083, 4524.3096, 4516.7007, 4530.33]
2025-05-10 03:19:28,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 03:19:28,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (4511.08) for latency MM1Queue_a033_s075
2025-05-10 03:19:28,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 03:19:28,596 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 03:19:28,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 17 minutes, 21 seconds)
2025-05-10 03:28:26,469 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:28:26,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:32:04,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4429.35693 ± 67.205
2025-05-10 03:32:04,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4320.412, 4515.9663, 4520.3306, 4436.087, 4355.8496, 4400.804, 4348.42, 4473.617, 4480.2363, 4441.843]
2025-05-10 03:32:04,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 03:32:04,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 4 minutes, 43 seconds)
2025-05-10 03:41:03,217 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:41:03,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:44:41,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4369.70508 ± 103.246
2025-05-10 03:44:41,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4381.4707, 4355.5923, 4357.736, 4112.946, 4337.8945, 4457.3867, 4361.2246, 4443.705, 4358.7007, 4530.3994]
2025-05-10 03:44:41,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 03:44:41,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 52 minutes, 17 seconds)
2025-05-10 03:53:39,323 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:53:39,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:57:17,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4386.29346 ± 88.317
2025-05-10 03:57:17,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4377.084, 4518.0386, 4399.632, 4370.085, 4462.232, 4408.0903, 4192.149, 4277.1655, 4412.1035, 4446.355]
2025-05-10 03:57:17,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 03:57:17,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 40 minutes, 1 second)
2025-05-10 04:06:14,091 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:06:14,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:09:54,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4501.49121 ± 106.607
2025-05-10 04:09:54,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4530.18, 4650.9897, 4577.2383, 4456.378, 4222.1606, 4498.62, 4552.756, 4541.957, 4503.2285, 4481.407]
2025-05-10 04:09:54,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 04:09:54,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 27 minutes, 23 seconds)
2025-05-10 04:18:53,597 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:18:53,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:22:31,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4562.33008 ± 87.983
2025-05-10 04:22:32,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4648.8975, 4448.5884, 4394.926, 4633.3525, 4476.238, 4632.2153, 4541.7446, 4579.0825, 4616.78, 4651.472]
2025-05-10 04:22:32,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 04:22:32,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1226 [INFO]: New best (4562.33) for latency MM1Queue_a033_s075
2025-05-10 04:22:32,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 04:22:32,042 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 04:22:32,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 15 minutes, 40 seconds)
2025-05-10 04:31:40,518 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:31:40,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:35:31,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4425.34863 ± 73.898
2025-05-10 04:35:31,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4394.4277, 4414.6567, 4342.7944, 4525.1606, 4451.4526, 4496.128, 4262.584, 4483.6025, 4422.996, 4459.6875]
2025-05-10 04:35:31,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 04:35:31,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 3 minutes, 27 seconds)
2025-05-10 04:45:20,397 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:45:20,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:49:08,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4543.24219 ± 79.454
2025-05-10 04:49:08,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4501.455, 4575.8853, 4660.5767, 4335.613, 4547.933, 4535.936, 4573.4785, 4580.5015, 4575.7656, 4545.2764]
2025-05-10 04:49:08,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 04:49:08,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 51 minutes, 33 seconds)
2025-05-10 04:58:08,535 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:58:08,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:01:49,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4390.81787 ± 186.351
2025-05-10 05:01:49,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4079.2913, 4600.14, 4477.329, 4360.997, 4568.439, 4087.14, 4402.191, 4463.919, 4254.101, 4614.634]
2025-05-10 05:01:49,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 05:01:49,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 38 minutes, 43 seconds)
2025-05-10 05:10:47,690 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:10:47,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:14:25,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4499.00684 ± 77.793
2025-05-10 05:14:25,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4582.184, 4328.154, 4454.529, 4568.2783, 4477.477, 4448.3984, 4549.8984, 4446.8125, 4567.348, 4566.9917]
2025-05-10 05:14:25,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 05:14:25,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 25 minutes, 48 seconds)
2025-05-10 05:23:23,149 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:23:23,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:27:00,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4556.07910 ± 78.407
2025-05-10 05:27:00,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4520.9556, 4657.803, 4491.7466, 4442.7944, 4552.88, 4448.189, 4604.105, 4538.8145, 4638.3125, 4665.193]
2025-05-10 05:27:00,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 05:27:00,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 12 minutes, 53 seconds)
2025-05-10 05:35:58,674 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:35:58,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:39:36,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1221 [DEBUG]: Total Reward: 4515.53809 ± 97.732
2025-05-10 05:39:36,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1222 [DEBUG]: All rewards: [4485.539, 4580.271, 4435.0093, 4575.3096, 4256.8486, 4553.1055, 4587.472, 4582.55, 4544.865, 4554.412]
2025-05-10 05:39:36,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 05:39:36,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1251 [DEBUG]: Training session finished
