2025-05-09 09:43:41,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-05-09 09:43:41,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-05-09 09:43:41,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14a9d85e1450>}
2025-05-09 09:43:41,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1111 [DEBUG]: using device: cuda
2025-05-09 09:43:41,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1133 [INFO]: Creating new trainer
2025-05-09 09:43:41,966 baseline-mbpac-noisy-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-09 09:43:41,966 baseline-mbpac-noisy-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-09 09:43:41,976 baseline-mbpac-noisy-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-05-09 09:43:43,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1194 [DEBUG]: Starting training session...
2025-05-09 09:43:43,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 1/100
2025-05-09 09:54:01,703 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 09:54:01,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:55:46,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: -78.22031 ± 84.134
2025-05-09 09:55:46,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1.1905956, -98.538536, -1.9602404, -186.71007, -5.8344398, -134.6116, 17.015266, -7.0203166, -214.64775, -151.08592]
2025-05-09 09:55:46,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 314.0, 90.0, 1000.0, 283.0, 574.0, 38.0, 47.0, 1000.0, 1000.0]
2025-05-09 09:55:46,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (-78.22) for latency MM1Queue_a033_s075
2025-05-09 09:55:46,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 09:55:46,073 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 09:55:46,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 19 hours, 53 minutes, 7 seconds)
2025-05-09 10:05:10,855 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:05:10,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:06:26,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 41.47369 ± 54.726
2025-05-09 10:06:26,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [-39.113663, 33.600674, 170.02905, 24.113634, 83.801605, 69.28873, 2.6053522, 9.860135, 55.168972, 5.3824105]
2025-05-09 10:06:26,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [319.0, 64.0, 1000.0, 432.0, 295.0, 72.0, 87.0, 237.0, 457.0, 144.0]
2025-05-09 10:06:26,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (41.47) for latency MM1Queue_a033_s075
2025-05-09 10:06:26,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 10:06:26,751 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:06:26,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 18 hours, 33 minutes, 46 seconds)
2025-05-09 10:16:30,984 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:16:30,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:17:47,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 51.51968 ± 64.659
2025-05-09 10:17:47,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [43.230988, -61.5711, -22.312479, 18.799622, 59.68289, 149.3998, 54.067417, 33.849697, 159.24606, 80.80391]
2025-05-09 10:17:47,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [182.0, 353.0, 27.0, 355.0, 192.0, 358.0, 295.0, 62.0, 1000.0, 311.0]
2025-05-09 10:17:47,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (51.52) for latency MM1Queue_a033_s075
2025-05-09 10:17:47,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 10:17:47,089 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:17:47,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 21 minutes, 35 seconds)
2025-05-09 10:27:35,286 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:27:35,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:30:05,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 164.57881 ± 87.960
2025-05-09 10:30:05,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [300.5458, 146.86168, 161.77884, 247.87619, 20.55394, 113.26568, 160.34892, 248.30582, 217.41455, 28.836723]
2025-05-09 10:30:05,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 267.0, 481.0, 1000.0, 27.0, 365.0, 1000.0, 1000.0, 1000.0, 40.0]
2025-05-09 10:30:05,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (164.58) for latency MM1Queue_a033_s075
2025-05-09 10:30:05,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 10:30:05,456 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:30:05,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 32 minutes, 59 seconds)
2025-05-09 10:40:27,944 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:40:28,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:41:27,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 111.80396 ± 78.616
2025-05-09 10:41:27,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [65.41954, 189.13077, 25.778782, 108.321655, 143.09088, 89.17349, 28.534552, 17.464077, 264.64294, 186.48285]
2025-05-09 10:41:27,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [83.0, 398.0, 29.0, 241.0, 243.0, 108.0, 46.0, 44.0, 1000.0, 383.0]
2025-05-09 10:41:27,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 17 minutes, 4 seconds)
2025-05-09 10:51:58,945 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:51:58,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:55:03,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 368.42926 ± 186.199
2025-05-09 10:55:03,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [440.55066, 483.64108, 568.8757, 24.85418, 444.58038, 175.36528, 441.66898, 90.18833, 454.4351, 560.1329]
2025-05-09 10:55:03,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 29.0, 1000.0, 179.0, 1000.0, 115.0, 1000.0, 1000.0]
2025-05-09 10:55:03,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (368.43) for latency MM1Queue_a033_s075
2025-05-09 10:55:03,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 10:55:03,720 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:55:03,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 18 hours, 34 minutes, 43 seconds)
2025-05-09 11:05:27,668 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:05:27,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:08:07,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 385.69141 ± 206.840
2025-05-09 11:08:07,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [512.9973, 67.100746, 533.6954, 202.64651, 300.52496, 535.76776, 33.89695, 586.2821, 464.30792, 619.6947]
2025-05-09 11:08:07,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 50.0, 1000.0, 194.0, 373.0, 1000.0, 35.0, 1000.0, 1000.0, 1000.0]
2025-05-09 11:08:07,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (385.69) for latency MM1Queue_a033_s075
2025-05-09 11:08:07,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 11:08:07,077 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:08:07,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 19 hours, 7 minutes, 5 seconds)
2025-05-09 11:18:21,227 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:18:21,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:20:51,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 438.42172 ± 301.872
2025-05-09 11:20:52,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [448.16794, 52.5292, 673.4382, 202.74123, 596.92584, 1043.4926, 649.202, 443.21188, 6.123705, 268.38464]
2025-05-09 11:20:52,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 61.0, 626.0, 242.0, 1000.0, 1000.0, 1000.0, 1000.0, 13.0, 265.0]
2025-05-09 11:20:52,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (438.42) for latency MM1Queue_a033_s075
2025-05-09 11:20:52,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 11:20:52,613 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:20:53,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 19 hours, 21 minutes, 3 seconds)
2025-05-09 11:30:45,769 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:30:46,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:33:05,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 505.77789 ± 407.897
2025-05-09 11:33:05,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [612.15497, 572.15894, 134.95903, 124.91229, 857.29456, 279.16125, 1313.6453, 16.695229, 953.9216, 192.87561]
2025-05-09 11:33:05,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 117.0, 97.0, 1000.0, 247.0, 1000.0, 27.0, 1000.0, 147.0]
2025-05-09 11:33:05,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (505.78) for latency MM1Queue_a033_s075
2025-05-09 11:33:05,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 11:33:05,247 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:33:05,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 19 hours, 6 minutes, 32 seconds)
2025-05-09 11:43:23,812 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:43:23,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:45:14,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 317.35880 ± 316.794
2025-05-09 11:45:14,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [48.522053, 40.27544, 188.73402, 49.01313, 529.2782, 721.23016, 859.89557, 54.59978, 30.013851, 652.0255]
2025-05-09 11:45:14,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 41.0, 139.0, 45.0, 1000.0, 1000.0, 1000.0, 52.0, 40.0, 1000.0]
2025-05-09 11:45:14,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 19 hours, 8 minutes, 2 seconds)
2025-05-09 11:56:10,105 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:56:10,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:58:32,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 498.84839 ± 266.192
2025-05-09 11:58:32,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [344.5217, 329.32034, 196.72084, 619.3958, 756.3218, 235.7364, 176.04378, 860.58136, 550.2899, 919.5519]
2025-05-09 11:58:32,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [198.0, 263.0, 166.0, 1000.0, 1000.0, 160.0, 110.0, 1000.0, 1000.0, 1000.0]
2025-05-09 11:58:32,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 18 hours, 49 minutes, 48 seconds)
2025-05-09 12:08:28,180 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:08:28,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:11:25,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 636.70807 ± 368.656
2025-05-09 12:11:25,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [74.2773, 1165.3307, 766.70966, 236.35321, 590.42975, 601.8221, 267.91263, 673.8258, 1302.0684, 688.35095]
2025-05-09 12:11:25,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [55.0, 1000.0, 1000.0, 157.0, 1000.0, 1000.0, 241.0, 1000.0, 1000.0, 1000.0]
2025-05-09 12:11:25,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (636.71) for latency MM1Queue_a033_s075
2025-05-09 12:11:25,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 12:11:25,791 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:11:25,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 18 hours, 34 minutes, 18 seconds)
2025-05-09 12:21:37,691 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:21:37,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:24:03,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 484.35126 ± 229.493
2025-05-09 12:24:03,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [439.38602, 588.53125, 685.30975, 598.84467, 268.0772, 574.574, 2.0058591, 280.58636, 836.34564, 569.8523]
2025-05-09 12:24:03,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [351.0, 1000.0, 1000.0, 413.0, 235.0, 1000.0, 14.0, 222.0, 1000.0, 1000.0]
2025-05-09 12:24:03,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 18 hours, 19 minutes, 12 seconds)
2025-05-09 12:33:32,240 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:33:32,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:36:22,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 787.64612 ± 373.319
2025-05-09 12:36:22,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [691.0195, 1141.1686, 1467.15, 660.17395, 1295.5441, 541.3946, 568.7828, 164.56535, 689.39764, 657.2652]
2025-05-09 12:36:22,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [383.0, 1000.0, 1000.0, 1000.0, 1000.0, 415.0, 317.0, 125.0, 1000.0, 1000.0]
2025-05-09 12:36:22,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (787.65) for latency MM1Queue_a033_s075
2025-05-09 12:36:22,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 12:36:22,463 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:36:22,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 18 hours, 8 minutes, 32 seconds)
2025-05-09 12:46:06,478 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:46:06,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:49:07,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 589.09216 ± 218.520
2025-05-09 12:49:07,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [706.2452, 852.8826, 423.68527, 976.09973, 634.20685, 584.3028, 185.14911, 563.5056, 601.6397, 363.20447]
2025-05-09 12:49:07,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 519.0, 249.0, 1000.0, 1000.0, 1000.0, 140.0, 1000.0, 1000.0, 249.0]
2025-05-09 12:49:07,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 18 hours, 6 minutes, 12 seconds)
2025-05-09 12:59:09,544 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:59:09,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:02:44,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1035.17737 ± 288.195
2025-05-09 13:02:44,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [833.75385, 1416.3359, 768.68097, 679.94385, 1113.3309, 1621.0377, 993.03284, 739.1959, 1061.3014, 1125.1588]
2025-05-09 13:02:44,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [544.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 748.0, 745.0]
2025-05-09 13:02:44,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (1035.18) for latency MM1Queue_a033_s075
2025-05-09 13:02:44,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 13:02:44,563 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:02:44,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 17 hours, 58 minutes, 41 seconds)
2025-05-09 13:12:39,766 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:12:40,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:14:21,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 406.38864 ± 248.950
2025-05-09 13:14:21,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [303.85104, 127.911674, 751.9955, 418.92923, 594.6387, 878.01874, 177.44019, 153.13959, 442.31296, 215.64873]
2025-05-09 13:14:21,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [230.0, 110.0, 1000.0, 240.0, 1000.0, 1000.0, 98.0, 118.0, 342.0, 145.0]
2025-05-09 13:14:21,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 17 hours, 24 minutes, 39 seconds)
2025-05-09 13:24:48,503 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:24:48,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:26:48,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 412.65985 ± 303.953
2025-05-09 13:26:48,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [216.89076, 565.73517, 127.31174, 52.537262, 131.24551, 620.8141, 579.6867, 1003.73456, 139.35551, 689.2876]
2025-05-09 13:26:48,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [150.0, 1000.0, 83.0, 41.0, 79.0, 342.0, 1000.0, 1000.0, 104.0, 1000.0]
2025-05-09 13:26:48,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 17 hours, 9 minutes, 5 seconds)
2025-05-09 13:36:36,638 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:36:36,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:38:12,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 366.82407 ± 339.026
2025-05-09 13:38:12,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [238.67207, 104.68994, 1149.1776, 94.885574, 660.1465, 117.95845, 51.67794, 137.67995, 612.974, 500.37854]
2025-05-09 13:38:12,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [148.0, 65.0, 1000.0, 77.0, 485.0, 78.0, 26.0, 75.0, 1000.0, 1000.0]
2025-05-09 13:38:12,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 41 minutes, 36 seconds)
2025-05-09 13:48:48,261 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:48:48,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:51:31,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 803.54846 ± 540.670
2025-05-09 13:51:31,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [174.02983, 1693.4797, 1163.3124, 51.899235, 579.3097, 1000.85535, 1560.723, 978.94006, 570.7574, 262.1779]
2025-05-09 13:51:31,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [149.0, 1000.0, 1000.0, 24.0, 1000.0, 604.0, 1000.0, 1000.0, 1000.0, 161.0]
2025-05-09 13:51:31,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 38 minutes, 20 seconds)
2025-05-09 14:01:25,766 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:01:25,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:03:30,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 514.06140 ± 414.008
2025-05-09 14:03:30,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [352.3293, 576.56494, 574.5579, 132.87277, 1316.3234, 539.0148, 46.22942, 201.41956, 1202.9003, 198.40155]
2025-05-09 14:03:30,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [249.0, 1000.0, 1000.0, 51.0, 724.0, 1000.0, 31.0, 91.0, 1000.0, 94.0]
2025-05-09 14:03:30,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 10 seconds)
2025-05-09 14:12:33,363 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:12:33,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:14:58,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 828.44122 ± 553.401
2025-05-09 14:14:58,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1254.9329, 814.28345, 742.51447, 1194.8281, 1274.1163, 127.14022, 1868.0164, 93.9518, 180.99667, 733.63245]
2025-05-09 14:14:58,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 431.0, 1000.0, 1000.0, 1000.0, 102.0, 1000.0, 68.0, 96.0, 444.0]
2025-05-09 14:14:58,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 45 minutes, 31 seconds)
2025-05-09 14:24:54,294 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:24:54,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:27:10,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 461.76465 ± 333.864
2025-05-09 14:27:10,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [588.6231, 1031.024, 521.8516, 64.195145, 66.55366, 765.7469, 24.423677, 193.52809, 630.7453, 730.95496]
2025-05-09 14:27:10,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 58.0, 61.0, 1000.0, 22.0, 136.0, 368.0, 1000.0]
2025-05-09 14:27:10,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 29 minutes, 37 seconds)
2025-05-09 14:36:58,066 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:36:58,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:37:46,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 356.19983 ± 399.429
2025-05-09 14:37:46,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [192.48778, 554.03314, 422.1183, 12.375862, 495.08392, 275.70746, 67.7307, 16.418001, 1415.5131, 110.53009]
2025-05-09 14:37:46,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [116.0, 290.0, 270.0, 16.0, 285.0, 149.0, 42.0, 18.0, 762.0, 80.0]
2025-05-09 14:37:46,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 5 minutes, 30 seconds)
2025-05-09 14:48:36,837 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:48:36,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:51:20,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 992.93860 ± 577.867
2025-05-09 14:51:21,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1227.4636, 200.5499, 889.7226, 57.2037, 1952.0079, 470.54678, 1032.2019, 1431.0454, 1608.997, 1059.6472]
2025-05-09 14:51:21,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 86.0, 1000.0, 43.0, 1000.0, 249.0, 579.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:51:21,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 14 hours, 57 minutes, 19 seconds)
2025-05-09 15:00:41,748 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:00:42,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:02:58,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 854.23407 ± 541.413
2025-05-09 15:02:58,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [992.33215, 517.78094, 996.35724, 122.06234, 595.9804, 1877.4703, 1092.1373, 20.69981, 1468.749, 858.77167]
2025-05-09 15:02:58,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [509.0, 283.0, 557.0, 67.0, 296.0, 1000.0, 1000.0, 15.0, 1000.0, 1000.0]
2025-05-09 15:02:58,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 14 hours, 40 minutes, 3 seconds)
2025-05-09 15:12:36,154 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:12:36,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:15:25,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1066.98181 ± 575.046
2025-05-09 15:15:25,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [567.8821, 1546.0748, 326.0021, 903.8792, 929.3986, 1494.8665, 403.4905, 1908.8239, 663.0927, 1926.3074]
2025-05-09 15:15:25,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 180.0, 1000.0, 473.0, 1000.0, 203.0, 1000.0, 325.0, 993.0]
2025-05-09 15:15:25,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (1066.98) for latency MM1Queue_a033_s075
2025-05-09 15:15:25,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 15:15:25,450 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:15:25,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 42 minutes, 36 seconds)
2025-05-09 15:25:41,246 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:25:41,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:28:19,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 979.93829 ± 572.427
2025-05-09 15:28:19,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2039.9729, 937.16406, 297.9751, 870.9609, 140.32504, 1412.0709, 1637.8658, 1247.5165, 631.1383, 584.3925]
2025-05-09 15:28:19,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 146.0, 413.0, 56.0, 722.0, 1000.0, 1000.0, 1000.0, 321.0]
2025-05-09 15:28:19,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 40 minutes, 35 seconds)
2025-05-09 15:38:34,402 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:38:34,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:41:01,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 890.01917 ± 643.007
2025-05-09 15:41:01,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2182.8667, 1057.1013, 291.70837, 873.26465, 1849.5127, 583.6508, 947.5366, 157.68562, 175.08646, 781.7785]
2025-05-09 15:41:01,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 482.0, 134.0, 1000.0, 1000.0, 1000.0, 1000.0, 102.0, 80.0, 360.0]
2025-05-09 15:41:01,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 58 minutes, 12 seconds)
2025-05-09 15:50:30,988 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:50:31,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:53:31,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1415.20483 ± 628.802
2025-05-09 15:53:31,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1466.2576, 706.32776, 943.8002, 1024.724, 2226.1863, 764.6389, 2208.9324, 2514.9963, 1074.1464, 1222.0388]
2025-05-09 15:53:31,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [680.0, 374.0, 465.0, 485.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 563.0]
2025-05-09 15:53:31,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (1415.20) for latency MM1Queue_a033_s075
2025-05-09 15:53:31,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 15:53:31,542 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:53:31,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 30 minutes, 27 seconds)
2025-05-09 16:03:12,319 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:03:12,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:04:35,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 637.30957 ± 440.136
2025-05-09 16:04:35,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [994.3226, 110.71406, 310.32266, 999.86163, 50.537693, 1093.8599, 246.01324, 1002.6195, 328.191, 1236.6539]
2025-05-09 16:04:35,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 54.0, 130.0, 464.0, 40.0, 461.0, 111.0, 491.0, 159.0, 547.0]
2025-05-09 16:04:35,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 10 minutes, 12 seconds)
2025-05-09 16:15:22,361 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:15:22,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:17:03,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 643.27490 ± 666.007
2025-05-09 16:17:03,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [42.802696, 122.22996, 1026.2913, 207.26245, 23.987286, 763.3634, 978.3975, 2253.7456, 103.619484, 911.0493]
2025-05-09 16:17:03,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 45.0, 465.0, 126.0, 25.0, 1000.0, 1000.0, 1000.0, 47.0, 425.0]
2025-05-09 16:17:03,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 58 minutes, 10 seconds)
2025-05-09 16:26:01,735 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:26:01,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:29:03,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1741.58923 ± 555.677
2025-05-09 16:29:03,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2232.7393, 1211.8351, 2422.5815, 1257.5483, 2431.831, 748.88477, 1830.0076, 1407.1884, 1612.4216, 2260.8562]
2025-05-09 16:29:03,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 524.0, 1000.0, 580.0, 1000.0, 317.0, 788.0, 611.0, 683.0, 1000.0]
2025-05-09 16:29:03,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (1741.59) for latency MM1Queue_a033_s075
2025-05-09 16:29:03,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 16:29:03,724 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 16:29:03,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 33 minutes, 50 seconds)
2025-05-09 16:39:01,155 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:39:01,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:41:34,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1021.54309 ± 754.115
2025-05-09 16:41:34,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1980.7822, 1621.2053, 1002.5653, 433.20157, 916.91895, 213.00528, 259.38043, 2500.7886, 1057.0217, 230.56181]
2025-05-09 16:41:34,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 161.0, 1000.0, 96.0, 151.0, 1000.0, 1000.0, 105.0]
2025-05-09 16:41:34,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 19 minutes, 6 seconds)
2025-05-09 16:51:59,014 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:51:59,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:53:58,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1252.41736 ± 823.397
2025-05-09 16:53:58,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [989.27264, 372.65103, 1628.5011, 267.42923, 2646.7505, 1505.3217, 1857.8928, 2331.5205, 599.47107, 325.36295]
2025-05-09 16:53:58,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [388.0, 135.0, 567.0, 115.0, 1000.0, 607.0, 1000.0, 925.0, 210.0, 147.0]
2025-05-09 16:53:58,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 5 minutes, 51 seconds)
2025-05-09 17:03:28,749 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:03:28,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:04:56,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 537.43982 ± 386.445
2025-05-09 17:04:56,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [431.0574, 656.2367, 354.8641, 1569.9877, 587.5368, 454.56717, 86.55395, 151.50789, 482.589, 599.49756]
2025-05-09 17:04:56,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [172.0, 1000.0, 134.0, 632.0, 1000.0, 192.0, 53.0, 61.0, 183.0, 255.0]
2025-05-09 17:04:56,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 52 minutes, 31 seconds)
2025-05-09 17:14:50,662 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:14:50,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:18:25,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 904.76532 ± 387.361
2025-05-09 17:18:26,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [644.0217, 572.1135, 1332.3787, 903.3011, 672.03595, 1502.4952, 821.5218, 615.5691, 434.56165, 1549.6548]
2025-05-09 17:18:26,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 300.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 726.0]
2025-05-09 17:18:26,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 53 minutes, 21 seconds)
2025-05-09 17:28:44,980 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:28:44,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:31:07,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1176.48291 ± 803.232
2025-05-09 17:31:07,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2548.3997, 1424.4014, 910.96875, 744.12085, 909.52747, 582.77905, 1427.8121, 2649.4497, 287.81827, 279.55167]
2025-05-09 17:31:07,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 599.0, 1000.0, 1000.0, 399.0, 252.0, 519.0, 1000.0, 109.0, 123.0]
2025-05-09 17:31:07,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 49 minutes, 29 seconds)
2025-05-09 17:41:19,070 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:41:19,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:42:56,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 853.51648 ± 715.430
2025-05-09 17:42:56,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [836.36426, 387.3687, 1373.1567, 731.27826, 262.93384, 2555.5085, 1486.3915, 211.43785, 202.11107, 488.61465]
2025-05-09 17:42:56,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [309.0, 150.0, 501.0, 1000.0, 97.0, 999.0, 572.0, 104.0, 84.0, 211.0]
2025-05-09 17:42:56,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 28 minutes, 40 seconds)
2025-05-09 17:52:47,139 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:52:47,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:55:39,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1709.00293 ± 784.257
2025-05-09 17:55:39,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2673.123, 1591.3375, 2424.5632, 238.25648, 1713.1987, 2619.8, 1079.2075, 921.9454, 2462.9941, 1365.6041]
2025-05-09 17:55:39,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 94.0, 690.0, 1000.0, 424.0, 366.0, 1000.0, 568.0]
2025-05-09 17:55:39,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 20 minutes, 10 seconds)
2025-05-09 18:04:50,986 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:04:51,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:07:47,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1562.08716 ± 789.766
2025-05-09 18:07:47,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [711.55646, 693.0731, 1729.3132, 1624.1813, 2642.082, 1462.3339, 2731.5474, 520.4548, 2469.0217, 1037.3076]
2025-05-09 18:07:47,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 605.0, 593.0, 1000.0, 554.0, 1000.0, 197.0, 969.0, 422.0]
2025-05-09 18:07:47,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 21 minutes, 37 seconds)
2025-05-09 18:18:24,597 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:18:25,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:19:44,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 625.60394 ± 740.851
2025-05-09 18:19:44,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [225.78061, 368.4586, 106.01951, 1272.5552, 450.88367, 44.292007, 938.6557, 2539.4712, 125.362114, 184.56032]
2025-05-09 18:19:44,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [107.0, 132.0, 51.0, 543.0, 1000.0, 24.0, 372.0, 1000.0, 48.0, 82.0]
2025-05-09 18:19:44,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 51 minutes, 10 seconds)
2025-05-09 18:29:40,586 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:29:40,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:31:55,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1397.09802 ± 707.427
2025-05-09 18:31:55,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2133.123, 715.3552, 2646.7625, 1865.6156, 1377.4358, 2085.5952, 420.24878, 848.0134, 755.3395, 1123.491]
2025-05-09 18:31:55,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 262.0, 1000.0, 720.0, 543.0, 1000.0, 161.0, 312.0, 292.0, 449.0]
2025-05-09 18:31:55,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 33 minutes, 13 seconds)
2025-05-09 18:41:32,211 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:41:32,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:44:40,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2016.43298 ± 849.157
2025-05-09 18:44:40,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2625.5955, 300.62253, 533.2438, 2723.3513, 1650.9104, 2565.9675, 2425.9727, 2430.2844, 2592.2976, 2316.0833]
2025-05-09 18:44:40,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 119.0, 214.0, 1000.0, 660.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:44:40,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (2016.43) for latency MM1Queue_a033_s075
2025-05-09 18:44:40,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 18:44:40,245 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 18:44:40,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 31 minutes, 26 seconds)
2025-05-09 18:54:15,846 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:54:15,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:55:53,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1153.56824 ± 937.336
2025-05-09 18:55:53,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1055.0215, 29.186497, 2809.9082, 853.0816, 468.53088, 529.2366, 385.3908, 2822.7698, 1741.2562, 841.3012]
2025-05-09 18:55:53,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [421.0, 26.0, 954.0, 265.0, 172.0, 219.0, 140.0, 1000.0, 582.0, 316.0]
2025-05-09 18:55:53,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 2 minutes, 31 seconds)
2025-05-09 19:06:41,459 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:06:41,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:08:31,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1042.36414 ± 850.826
2025-05-09 19:08:31,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1175.707, 245.40662, 2803.093, 1073.5172, 276.9605, 138.63853, 422.67426, 637.5581, 1431.7401, 2218.346]
2025-05-09 19:08:31,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [388.0, 102.0, 1000.0, 393.0, 120.0, 79.0, 200.0, 1000.0, 515.0, 792.0]
2025-05-09 19:08:31,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 10 hours, 56 minutes, 1 second)
2025-05-09 19:18:20,982 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:18:20,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:21:13,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1614.95679 ± 1151.845
2025-05-09 19:21:13,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [237.26237, 1239.2966, 3026.1685, 2843.9583, 2892.7334, 470.6959, 463.86145, 2976.5066, 1650.6902, 348.39468]
2025-05-09 19:21:13,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [87.0, 408.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 599.0, 164.0]
2025-05-09 19:21:13,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 51 minutes, 48 seconds)
2025-05-09 19:30:21,888 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:30:21,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:31:47,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 761.34711 ± 751.080
2025-05-09 19:31:47,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [680.2508, 793.7172, 488.7641, 36.304577, 251.89667, 567.97034, 200.65076, 1815.1698, 2512.517, 266.23013]
2025-05-09 19:31:47,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [226.0, 1000.0, 173.0, 35.0, 108.0, 235.0, 102.0, 684.0, 864.0, 88.0]
2025-05-09 19:31:47,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 22 minutes, 32 seconds)
2025-05-09 19:42:34,420 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:42:34,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:44:23,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 858.85339 ± 513.740
2025-05-09 19:44:23,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [199.08098, 1273.1268, 1427.4033, 1238.7311, 539.75366, 82.6843, 444.22574, 1481.0078, 574.11566, 1328.4049]
2025-05-09 19:44:23,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [71.0, 449.0, 1000.0, 459.0, 220.0, 44.0, 1000.0, 567.0, 196.0, 494.0]
2025-05-09 19:44:23,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 9 minutes, 11 seconds)
2025-05-09 19:53:31,979 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:53:32,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:55:54,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1251.10144 ± 903.638
2025-05-09 19:55:54,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [194.58694, 151.41608, 936.40643, 1001.57184, 1258.8176, 1870.2273, 3028.4622, 2470.5806, 1146.851, 452.09515]
2025-05-09 19:55:54,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [79.0, 59.0, 388.0, 379.0, 1000.0, 681.0, 1000.0, 898.0, 461.0, 1000.0]
2025-05-09 19:55:54,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 15 seconds)
2025-05-09 20:06:14,873 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:06:14,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:08:02,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1183.68030 ± 1117.862
2025-05-09 20:08:02,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [17.26546, 2119.2776, 171.75317, 3290.1987, 541.335, 2720.8184, 841.1181, 1565.3809, 553.6628, 15.991714]
2025-05-09 20:08:02,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 727.0, 84.0, 1000.0, 180.0, 1000.0, 303.0, 1000.0, 214.0, 16.0]
2025-05-09 20:08:02,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 9 hours, 43 minutes, 11 seconds)
2025-05-09 20:17:55,980 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:17:55,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:20:39,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1760.54431 ± 961.739
2025-05-09 20:20:39,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [3010.875, 820.71625, 2951.4714, 811.32385, 1117.1027, 2886.9773, 828.8863, 1003.06934, 1330.8561, 2844.1646]
2025-05-09 20:20:39,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 302.0, 410.0, 1000.0, 338.0, 363.0, 481.0, 1000.0]
2025-05-09 20:20:40,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 30 minutes, 34 seconds)
2025-05-09 20:30:41,248 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:30:41,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:33:36,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1737.44788 ± 684.289
2025-05-09 20:33:37,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1436.203, 792.0127, 2648.485, 1021.3099, 1768.1511, 2922.754, 2311.735, 1965.894, 1477.2238, 1030.7097]
2025-05-09 20:33:37,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [524.0, 1000.0, 1000.0, 397.0, 1000.0, 1000.0, 817.0, 760.0, 509.0, 444.0]
2025-05-09 20:33:37,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 41 minutes, 11 seconds)
2025-05-09 20:43:07,978 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:43:08,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:45:04,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1368.57642 ± 782.348
2025-05-09 20:45:04,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1837.2589, 108.577736, 712.11633, 1983.6671, 907.9233, 1591.6655, 1757.3022, 665.1298, 2965.8179, 1156.3053]
2025-05-09 20:45:04,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [594.0, 55.0, 210.0, 594.0, 298.0, 1000.0, 570.0, 222.0, 1000.0, 415.0]
2025-05-09 20:45:04,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 18 minutes, 16 seconds)
2025-05-09 20:54:56,916 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:54:56,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:56:06,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 828.85236 ± 691.128
2025-05-09 20:56:06,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1215.5461, 1752.9275, 224.3928, 196.46739, 2190.1184, 624.36346, 83.855644, 1082.0482, 127.079, 791.7243]
2025-05-09 20:56:06,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [388.0, 600.0, 93.0, 80.0, 725.0, 258.0, 41.0, 398.0, 70.0, 265.0]
2025-05-09 20:56:06,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 1 minute, 42 seconds)
2025-05-09 21:06:29,754 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:06:29,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:09:17,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1947.16736 ± 779.219
2025-05-09 21:09:18,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1223.2936, 2245.42, 3125.2852, 1277.847, 2960.2703, 1091.75, 1923.9119, 2010.5665, 852.5253, 2760.8052]
2025-05-09 21:09:18,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [355.0, 806.0, 1000.0, 1000.0, 1000.0, 386.0, 583.0, 682.0, 317.0, 1000.0]
2025-05-09 21:09:18,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 8 hours, 59 minutes, 5 seconds)
2025-05-09 21:19:17,091 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:19:17,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:21:19,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1437.52368 ± 1020.343
2025-05-09 21:21:19,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1206.0331, 272.43555, 3125.6604, 211.36975, 3047.0579, 210.20227, 1857.2856, 1034.3193, 1521.7529, 1889.1194]
2025-05-09 21:21:19,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [456.0, 104.0, 1000.0, 103.0, 966.0, 70.0, 584.0, 318.0, 1000.0, 595.0]
2025-05-09 21:21:19,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 41 minutes, 36 seconds)
2025-05-09 21:30:55,798 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:30:55,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:33:56,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1514.08130 ± 689.588
2025-05-09 21:33:56,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2268.9424, 1243.1085, 568.0335, 1484.3636, 2424.114, 1722.0956, 1208.7882, 2608.9292, 646.9712, 965.46747]
2025-05-09 21:33:56,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [845.0, 1000.0, 249.0, 1000.0, 1000.0, 708.0, 469.0, 1000.0, 247.0, 1000.0]
2025-05-09 21:33:56,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 26 minutes, 45 seconds)
2025-05-09 21:44:25,822 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:44:25,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:47:12,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2082.34521 ± 918.100
2025-05-09 21:47:12,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [526.1435, 337.51578, 2410.9514, 2998.4927, 1717.2081, 2762.5872, 2853.059, 2911.4666, 1899.941, 2406.088]
2025-05-09 21:47:12,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [223.0, 116.0, 793.0, 1000.0, 583.0, 881.0, 1000.0, 1000.0, 635.0, 824.0]
2025-05-09 21:47:12,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (2082.35) for latency MM1Queue_a033_s075
2025-05-09 21:47:12,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 21:47:12,336 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 21:47:12,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 29 minutes, 27 seconds)
2025-05-09 21:56:55,161 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:56:55,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:59:55,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2009.11719 ± 755.653
2025-05-09 21:59:55,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [880.69727, 2544.9448, 718.6484, 2815.3936, 2992.9421, 2133.3499, 2506.8933, 1500.2578, 2413.76, 1584.2843]
2025-05-09 21:59:55,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [385.0, 890.0, 268.0, 1000.0, 1000.0, 797.0, 938.0, 1000.0, 836.0, 512.0]
2025-05-09 21:59:55,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 30 minutes, 32 seconds)
2025-05-09 22:09:49,932 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:09:50,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:13:16,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2156.64502 ± 759.701
2025-05-09 22:13:16,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2041.184, 1015.87067, 1481.9933, 2985.7512, 846.46625, 2719.1143, 2172.5554, 2441.1018, 2940.7092, 2921.7036]
2025-05-09 22:13:16,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [715.0, 347.0, 1000.0, 1000.0, 1000.0, 1000.0, 780.0, 944.0, 1000.0, 1000.0]
2025-05-09 22:13:16,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (2156.65) for latency MM1Queue_a033_s075
2025-05-09 22:13:16,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-09 22:13:16,576 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:13:16,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 19 minutes)
2025-05-09 22:23:04,426 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:23:04,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:24:57,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1311.58960 ± 1089.763
2025-05-09 22:24:58,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [809.32715, 173.92535, 2470.6453, 3294.561, 438.47195, 620.6594, 2553.449, 2005.4312, 242.67137, 506.7562]
2025-05-09 22:24:58,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [263.0, 63.0, 818.0, 1000.0, 160.0, 1000.0, 696.0, 588.0, 96.0, 165.0]
2025-05-09 22:24:58,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 3 minutes, 43 seconds)
2025-05-09 22:35:15,797 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:35:15,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:37:54,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1803.35034 ± 1051.914
2025-05-09 22:37:55,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1066.7344, 3189.5989, 2592.149, 1785.4072, 3066.1135, 1487.7664, 630.02856, 2994.7349, 1177.196, 43.77823]
2025-05-09 22:37:55,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 634.0, 1000.0, 503.0, 223.0, 1000.0, 401.0, 30.0]
2025-05-09 22:37:55,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 53 minutes, 24 seconds)
2025-05-09 22:47:01,211 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:47:01,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:49:19,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1637.68823 ± 913.363
2025-05-09 22:49:20,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [327.8454, 1906.4014, 1681.2218, 1118.99, 1216.5099, 1031.8523, 2024.0936, 3205.9233, 716.0622, 3147.982]
2025-05-09 22:49:20,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [133.0, 669.0, 549.0, 330.0, 1000.0, 292.0, 578.0, 1000.0, 256.0, 1000.0]
2025-05-09 22:49:20,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 27 minutes, 22 seconds)
2025-05-09 22:59:24,358 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:59:24,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:02:59,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1878.39453 ± 1129.917
2025-05-09 23:02:59,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2802.1453, 2519.129, 420.2875, 706.6111, 2998.344, 2653.139, 491.05936, 2829.6252, 2955.139, 408.46515]
2025-05-09 23:02:59,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 152.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:02:59,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 21 minutes, 32 seconds)
2025-05-09 23:12:39,775 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:12:39,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:14:25,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1414.05334 ± 1242.288
2025-05-09 23:14:25,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1918.7733, 372.98596, 23.38899, 613.60876, 3106.9194, 3353.3423, 337.21747, 635.937, 732.3758, 3045.9846]
2025-05-09 23:14:25,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [647.0, 122.0, 16.0, 199.0, 1000.0, 1000.0, 112.0, 167.0, 242.0, 1000.0]
2025-05-09 23:14:25,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 55 minutes, 50 seconds)
2025-05-09 23:24:20,417 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:24:20,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:27:52,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1939.83569 ± 1175.820
2025-05-09 23:27:52,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2981.802, 283.0347, 1958.9634, 2365.9702, 827.2474, 3053.46, 3386.0422, 974.68475, 323.60696, 3243.5457]
2025-05-09 23:27:52,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 648.0, 766.0, 307.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:27:52,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 55 minutes, 9 seconds)
2025-05-09 23:38:20,140 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:38:20,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:41:15,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2147.50195 ± 926.064
2025-05-09 23:41:15,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1173.4827, 1722.6678, 3187.4424, 3071.945, 1002.6445, 2693.3293, 3042.0798, 3176.914, 1559.4072, 845.1054]
2025-05-09 23:41:15,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [391.0, 1000.0, 1000.0, 1000.0, 331.0, 918.0, 1000.0, 1000.0, 491.0, 346.0]
2025-05-09 23:41:15,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 45 minutes, 24 seconds)
2025-05-09 23:50:50,284 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:50:50,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:53:11,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1894.50562 ± 1236.341
2025-05-09 23:53:12,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [3097.5732, 125.39148, 205.31107, 336.0338, 3143.4414, 1454.3597, 2376.9717, 3156.6445, 3250.0972, 1799.2333]
2025-05-09 23:53:12,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 60.0, 71.0, 123.0, 913.0, 476.0, 679.0, 1000.0, 1000.0, 604.0]
2025-05-09 23:53:12,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 35 minutes, 58 seconds)
2025-05-10 00:03:45,507 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:03:45,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:05:50,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1377.62146 ± 1137.041
2025-05-10 00:05:50,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [596.15656, 730.99896, 1160.6168, 148.50098, 354.96878, 1943.5596, 138.45567, 2347.6318, 3079.8796, 3275.4465]
2025-05-10 00:05:50,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [211.0, 264.0, 354.0, 47.0, 1000.0, 576.0, 48.0, 770.0, 1000.0, 1000.0]
2025-05-10 00:05:50,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 17 minutes, 6 seconds)
2025-05-10 00:15:17,507 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:15:17,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:16:43,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 922.34827 ± 967.849
2025-05-10 00:16:44,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [76.71317, 332.92856, 157.93423, 3267.2969, 586.88586, 1097.6467, 303.9974, 1958.3578, 1241.2239, 200.49799]
2025-05-10 00:16:44,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 114.0, 61.0, 1000.0, 246.0, 317.0, 1000.0, 564.0, 358.0, 78.0]
2025-05-10 00:16:44,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 1 minute, 22 seconds)
2025-05-10 00:25:29,704 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:25:29,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:27:15,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1180.54797 ± 997.451
2025-05-10 00:27:15,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [851.1405, 593.7499, 526.1974, 3135.4731, 584.09283, 3006.6382, 376.44165, 272.20572, 1429.0737, 1030.4655]
2025-05-10 00:27:15,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [252.0, 1000.0, 181.0, 1000.0, 204.0, 1000.0, 109.0, 98.0, 474.0, 345.0]
2025-05-10 00:27:15,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 32 minutes, 36 seconds)
2025-05-10 00:37:06,991 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:37:06,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:38:58,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1384.91003 ± 904.935
2025-05-10 00:38:58,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1410.3378, 2156.2988, 2634.594, 794.112, 1204.0491, 1348.4445, 554.0196, 333.70404, 356.5847, 3056.9553]
2025-05-10 00:38:58,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [457.0, 586.0, 746.0, 248.0, 361.0, 1000.0, 184.0, 135.0, 126.0, 1000.0]
2025-05-10 00:38:58,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 11 minutes, 39 seconds)
2025-05-10 00:47:31,360 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:47:31,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:50:05,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1590.59106 ± 1137.935
2025-05-10 00:50:05,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [3004.9556, 455.11304, 1774.6984, 1171.2385, 133.66109, 1571.9055, 3120.3794, 3380.185, 681.657, 612.1162]
2025-05-10 00:50:05,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [948.0, 1000.0, 610.0, 379.0, 1000.0, 538.0, 1000.0, 1000.0, 190.0, 185.0]
2025-05-10 00:50:05,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 4 hours, 55 minutes, 49 seconds)
2025-05-10 00:59:26,849 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:59:26,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:02:44,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2363.39551 ± 915.746
2025-05-10 01:02:45,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2064.9187, 3263.024, 3154.0046, 2445.6746, 2179.9683, 505.65918, 1287.1766, 3434.6565, 1978.117, 3320.756]
2025-05-10 01:02:45,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 656.0, 1000.0, 413.0, 1000.0, 609.0, 1000.0]
2025-05-10 01:02:45,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (2363.40) for latency MM1Queue_a033_s075
2025-05-10 01:02:45,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-10 01:02:45,233 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 01:02:45,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 4 hours, 44 minutes, 31 seconds)
2025-05-10 01:11:53,995 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:11:54,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:14:02,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1452.60315 ± 996.109
2025-05-10 01:14:02,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1684.3612, 885.1428, 681.7427, 2385.9873, 2920.6016, 3206.567, 286.38132, 398.46506, 905.4662, 1171.3173]
2025-05-10 01:14:02,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [558.0, 297.0, 235.0, 709.0, 1000.0, 1000.0, 1000.0, 171.0, 281.0, 451.0]
2025-05-10 01:14:02,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 35 minutes, 6 seconds)
2025-05-10 01:23:23,572 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:23:23,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:26:59,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2562.53247 ± 1017.090
2025-05-10 01:26:59,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [3339.2173, 3417.3047, 3152.2126, 3003.024, 3136.0813, 3036.1548, 3060.6355, 2243.3618, 553.066, 684.26575]
2025-05-10 01:26:59,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 619.0, 1000.0, 1000.0]
2025-05-10 01:26:59,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (2562.53) for latency MM1Queue_a033_s075
2025-05-10 01:26:59,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-10 01:26:59,746 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 01:26:59,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 34 minutes, 46 seconds)
2025-05-10 01:36:12,703 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:36:12,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:39:26,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2509.56543 ± 807.462
2025-05-10 01:39:26,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2935.8594, 3092.332, 2824.3416, 3170.086, 3156.0254, 1525.2343, 880.969, 1532.2269, 2923.6338, 3054.948]
2025-05-10 01:39:26,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 495.0, 339.0, 555.0, 1000.0, 1000.0]
2025-05-10 01:39:26,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 26 minutes)
2025-05-10 01:49:05,618 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:49:06,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:51:47,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2298.32812 ± 1280.643
2025-05-10 01:51:47,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [3105.1335, 3532.1924, 449.1784, 3615.3635, 1227.5132, 3089.4365, 3335.7214, 582.22125, 772.6329, 3273.8865]
2025-05-10 01:51:47,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 984.0, 179.0, 1000.0, 403.0, 1000.0, 1000.0, 205.0, 243.0, 1000.0]
2025-05-10 01:51:47,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 19 minutes, 8 seconds)
2025-05-10 02:00:49,466 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:00:49,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:03:28,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1826.27502 ± 1023.580
2025-05-10 02:03:28,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1414.6858, 2362.9912, 765.0481, 612.66156, 816.12, 3102.1082, 1505.9498, 3367.7546, 3176.045, 1139.3857]
2025-05-10 02:03:28,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 217.0, 200.0, 269.0, 1000.0, 1000.0, 1000.0, 1000.0, 378.0]
2025-05-10 02:03:28,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 2 minutes, 54 seconds)
2025-05-10 02:12:40,258 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:12:40,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:15:32,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2249.84521 ± 1204.128
2025-05-10 02:15:32,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2339.7058, 1028.3291, 3491.2432, 2142.0334, 2273.7732, 3293.7932, 762.0676, 73.42832, 3405.1353, 3688.9412]
2025-05-10 02:15:32,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [742.0, 1000.0, 1000.0, 594.0, 1000.0, 1000.0, 248.0, 35.0, 911.0, 1000.0]
2025-05-10 02:15:32,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 53 minutes, 39 seconds)
2025-05-10 02:24:52,528 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:24:52,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:26:55,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1882.33228 ± 1116.275
2025-05-10 02:26:55,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [3592.1438, 2854.606, 3448.3574, 1361.9941, 677.3647, 1879.1654, 1297.6453, 526.734, 2609.873, 575.43915]
2025-05-10 02:26:55,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 821.0, 1000.0, 413.0, 199.0, 531.0, 405.0, 167.0, 750.0, 162.0]
2025-05-10 02:26:55,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 35 minutes, 44 seconds)
2025-05-10 02:36:29,792 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:36:29,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:38:17,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1628.63867 ± 1474.049
2025-05-10 02:38:17,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [3337.4663, 486.28433, 3300.9648, 831.2821, 117.47743, 62.503963, 3461.0178, 1135.6168, 3463.5503, 90.223015]
2025-05-10 02:38:17,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 168.0, 1000.0, 211.0, 51.0, 35.0, 1000.0, 297.0, 1000.0, 40.0]
2025-05-10 02:38:17,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 20 minutes, 8 seconds)
2025-05-10 02:47:23,604 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:47:23,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:50:26,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2586.98511 ± 741.403
2025-05-10 02:50:26,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [3278.1484, 3209.3777, 2383.723, 3209.5413, 3116.7258, 2551.7288, 1676.4113, 3412.3872, 1274.9813, 1756.8274]
2025-05-10 02:50:26,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 738.0, 1000.0, 1000.0, 788.0, 514.0, 1000.0, 359.0, 578.0]
2025-05-10 02:50:26,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (2586.99) for latency MM1Queue_a033_s075
2025-05-10 02:50:26,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-10 02:50:26,322 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 02:50:26,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 7 minutes, 39 seconds)
2025-05-10 02:59:41,431 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:59:41,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:01:39,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1049.70862 ± 1097.680
2025-05-10 03:01:39,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1804.8961, 73.791855, 253.43979, 406.42938, 311.5813, 3565.386, 2275.9722, 198.55125, 446.04483, 1160.9928]
2025-05-10 03:01:39,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [497.0, 48.0, 98.0, 115.0, 101.0, 1000.0, 1000.0, 1000.0, 1000.0, 366.0]
2025-05-10 03:01:39,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 2 hours, 54 minutes, 32 seconds)
2025-05-10 03:11:10,730 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:11:10,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:13:08,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1762.61621 ± 1183.323
2025-05-10 03:13:08,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [949.51306, 2249.0437, 1963.1904, 363.5394, 396.98865, 2818.316, 163.33707, 1879.1256, 3583.2234, 3259.8848]
2025-05-10 03:13:08,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [314.0, 630.0, 551.0, 114.0, 125.0, 839.0, 70.0, 551.0, 1000.0, 1000.0]
2025-05-10 03:13:08,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 41 minutes, 15 seconds)
2025-05-10 03:21:51,724 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:21:51,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:24:17,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1646.60547 ± 1005.034
2025-05-10 03:24:17,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1235.7999, 1871.0424, 3221.4294, 541.0965, 1458.2754, 2525.311, 3161.1956, 31.229855, 1000.6758, 1419.9995]
2025-05-10 03:24:17,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [420.0, 562.0, 1000.0, 1000.0, 478.0, 795.0, 888.0, 22.0, 299.0, 1000.0]
2025-05-10 03:24:17,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 29 minutes, 9 seconds)
2025-05-10 03:33:26,555 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:33:26,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:36:09,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2157.72607 ± 1334.833
2025-05-10 03:36:09,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [3273.4634, 1776.2401, 3410.6846, 3480.742, 3459.3376, 195.46434, 1482.0272, 114.78427, 981.9076, 3402.609]
2025-05-10 03:36:09,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 555.0, 1000.0, 1000.0, 1000.0, 79.0, 436.0, 46.0, 1000.0, 1000.0]
2025-05-10 03:36:09,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 18 minutes, 51 seconds)
2025-05-10 03:45:51,684 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:45:51,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:48:49,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2417.70435 ± 827.698
2025-05-10 03:48:49,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [3379.8801, 2046.0626, 1404.5654, 3196.0469, 1396.7662, 1435.1278, 3179.9304, 1859.3866, 3465.0752, 2814.2031]
2025-05-10 03:48:49,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 401.0, 984.0, 432.0, 508.0, 1000.0, 553.0, 1000.0, 978.0]
2025-05-10 03:48:49,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 8 minutes, 26 seconds)
2025-05-10 03:57:56,058 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:57:56,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:00:32,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2055.24048 ± 1340.414
2025-05-10 04:00:32,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [459.25848, 3621.3608, 325.82654, 3096.5225, 3211.576, 3376.5154, 1287.619, 1776.1792, 135.4544, 3262.0896]
2025-05-10 04:00:32,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [139.0, 1000.0, 121.0, 1000.0, 1000.0, 1000.0, 448.0, 1000.0, 45.0, 1000.0]
2025-05-10 04:00:32,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 57 minutes, 45 seconds)
2025-05-10 04:10:07,838 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:10:07,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:12:09,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1557.53320 ± 1294.047
2025-05-10 04:12:09,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1657.3394, 197.97945, 115.18095, 1393.5096, 3341.433, 732.744, 1327.072, 2966.1707, 3740.5205, 103.38217]
2025-05-10 04:12:09,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [460.0, 66.0, 48.0, 492.0, 1000.0, 234.0, 1000.0, 1000.0, 1000.0, 51.0]
2025-05-10 04:12:09,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 46 minutes, 14 seconds)
2025-05-10 04:20:58,487 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:20:58,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:23:01,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1626.14551 ± 1144.471
2025-05-10 04:23:02,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1404.8883, 1040.4464, 53.415184, 2497.631, 1445.3965, 270.76923, 3464.0374, 3556.5176, 1608.577, 919.77606]
2025-05-10 04:23:02,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [415.0, 280.0, 45.0, 809.0, 1000.0, 100.0, 1000.0, 1000.0, 508.0, 254.0]
2025-05-10 04:23:02,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 33 minutes, 59 seconds)
2025-05-10 04:32:42,273 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:32:42,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:35:41,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2606.58960 ± 1089.216
2025-05-10 04:35:41,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [3438.8755, 1171.7295, 3720.1719, 1610.3551, 3354.943, 3456.8342, 3378.031, 481.03485, 3203.8328, 2250.0884]
2025-05-10 04:35:41,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 365.0, 1000.0, 446.0, 919.0, 1000.0, 1000.0, 154.0, 1000.0, 659.0]
2025-05-10 04:35:41,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (2606.59) for latency MM1Queue_a033_s075
2025-05-10 04:35:41,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-10 04:35:41,198 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 04:35:41,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 23 minutes, 20 seconds)
2025-05-10 04:46:21,633 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:46:22,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:49:11,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2108.84521 ± 1384.756
2025-05-10 04:49:12,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [269.73895, 841.22424, 1720.5282, 3157.616, 3353.2993, 3763.3374, 687.43024, 3401.1934, 3513.35, 380.73456]
2025-05-10 04:49:12,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [105.0, 247.0, 531.0, 1000.0, 1000.0, 1000.0, 188.0, 1000.0, 1000.0, 1000.0]
2025-05-10 04:49:12,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 12 minutes, 27 seconds)
2025-05-10 04:57:58,882 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:57:58,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:00:18,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1545.38892 ± 1325.762
2025-05-10 05:00:18,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1763.7762, 355.89136, 3463.1292, 487.76382, 666.88525, 3304.7627, 3475.8752, 136.00114, 1572.1877, 227.61676]
2025-05-10 05:00:18,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 118.0, 1000.0, 1000.0, 197.0, 1000.0, 1000.0, 66.0, 587.0, 81.0]
2025-05-10 05:00:18,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 59 minutes, 46 seconds)
2025-05-10 05:09:48,784 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:09:49,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:12:04,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2048.84790 ± 952.888
2025-05-10 05:12:04,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2331.295, 804.068, 1150.2954, 883.4001, 3544.728, 3417.8364, 1383.6671, 2890.1843, 2148.3394, 1934.6649]
2025-05-10 05:12:04,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [743.0, 228.0, 379.0, 282.0, 1000.0, 1000.0, 382.0, 839.0, 608.0, 500.0]
2025-05-10 05:12:04,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 47 minutes, 56 seconds)
2025-05-10 05:21:07,564 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:21:07,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:22:51,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1238.85815 ± 1175.583
2025-05-10 05:22:51,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [636.4738, 2231.5198, 3520.5845, 2959.9814, 664.38446, 304.62433, 127.75375, 165.55663, 1373.0061, 404.69644]
2025-05-10 05:22:51,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [191.0, 627.0, 1000.0, 847.0, 190.0, 104.0, 51.0, 1000.0, 438.0, 156.0]
2025-05-10 05:22:51,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 35 minutes, 53 seconds)
2025-05-10 05:32:04,545 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:32:04,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:33:01,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 862.72656 ± 933.172
2025-05-10 05:33:01,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [1633.8799, 3259.3435, 468.99454, 69.8647, 1016.0444, 990.8537, 445.7077, 45.08091, 644.7796, 52.71644]
2025-05-10 05:33:01,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [418.0, 907.0, 149.0, 39.0, 311.0, 305.0, 152.0, 29.0, 180.0, 32.0]
2025-05-10 05:33:01,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 22 minutes, 55 seconds)
2025-05-10 05:42:55,720 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:42:55,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:46:13,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 2898.45776 ± 838.075
2025-05-10 05:46:13,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [3286.238, 2105.4385, 3182.9153, 744.2475, 3604.7075, 3392.7812, 3364.5283, 2562.0298, 3331.805, 3409.8862]
2025-05-10 05:46:13,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 608.0, 1000.0, 249.0, 1000.0, 1000.0, 1000.0, 736.0, 1000.0, 1000.0]
2025-05-10 05:46:13,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1226 [INFO]: New best (2898.46) for latency MM1Queue_a033_s075
2025-05-10 05:46:13,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1229 [INFO]: saving network
2025-05-10 05:46:13,384 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 05:46:13,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 11 minutes, 24 seconds)
2025-05-10 05:55:11,616 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:55:11,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:57:31,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1221 [DEBUG]: Total Reward: 1785.25720 ± 1038.673
2025-05-10 05:57:31,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1222 [DEBUG]: All rewards: [2095.5771, 2617.0151, 3334.9556, 1778.4951, 658.4797, 1170.0571, 3492.8684, 1569.6877, 459.60828, 675.82794]
2025-05-10 05:57:31,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [621.0, 783.0, 1000.0, 573.0, 1000.0, 352.0, 1000.0, 489.0, 143.0, 189.0]
2025-05-10 05:57:31,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1251 [DEBUG]: Training session finished
