2025-05-06 08:22:56,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-05-06 08:22:56,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-05-06 08:22:56,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7fd3a840bf40>}
2025-05-06 08:22:56,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1009 [DEBUG]: using device: cuda
2025-05-06 08:22:56,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1031 [INFO]: Creating new trainer
2025-05-06 08:22:56,283 baseline-mbpac-noisy-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-06 08:22:56,283 baseline-mbpac-noisy-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-06 08:22:56,312 baseline-mbpac-noisy-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-05-06 08:22:57,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1092 [DEBUG]: Starting training session...
2025-05-06 08:22:57,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 1/100
2025-05-06 08:39:45,595 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 08:39:45,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 08:44:36,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: -109.92186 ± 96.159
2025-05-06 08:44:36,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [0.44145662, 2.042767, -127.7349, -205.41325, -25.132942, -190.88434, -242.75024, -203.98805, -128.9725, 23.1733]
2025-05-06 08:44:36,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [72.0, 53.0, 1000.0, 1000.0, 97.0, 1000.0, 1000.0, 1000.0, 1000.0, 127.0]
2025-05-06 08:44:36,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (-109.92) for latency ExtremeSparseL4U32
2025-05-06 08:44:36,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-06 08:44:36,499 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 08:44:36,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 2/100 (estimated time remaining: 35 hours, 43 minutes, 57 seconds)
2025-05-06 09:01:38,569 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 09:01:38,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 09:05:58,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 182.56387 ± 105.496
2025-05-06 09:05:58,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [68.28843, 242.13483, 295.9747, 28.510418, 339.32635, 134.77634, 262.39038, 265.05228, 130.50705, 58.67814]
2025-05-06 09:05:58,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [114.0, 1000.0, 1000.0, 89.0, 1000.0, 279.0, 1000.0, 1000.0, 349.0, 99.0]
2025-05-06 09:05:58,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (182.56) for latency ExtremeSparseL4U32
2025-05-06 09:05:58,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-06 09:05:58,137 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 09:05:58,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 3/100 (estimated time remaining: 35 hours, 7 minutes, 48 seconds)
2025-05-06 09:20:51,229 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 09:20:51,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 09:26:17,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 357.00287 ± 143.780
2025-05-06 09:26:17,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [479.64, 464.5114, 364.19812, 525.3027, 458.31503, 50.5635, 331.97098, 454.69897, 227.4291, 213.39877]
2025-05-06 09:26:17,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 685.0, 1000.0, 1000.0, 71.0, 383.0, 1000.0, 633.0, 364.0]
2025-05-06 09:26:17,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (357.00) for latency ExtremeSparseL4U32
2025-05-06 09:26:17,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-06 09:26:17,623 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 09:26:17,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 4/100 (estimated time remaining: 34 hours, 8 minutes, 2 seconds)
2025-05-06 09:41:40,024 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 09:41:40,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 09:48:13,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 570.34973 ± 204.751
2025-05-06 09:48:13,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [638.87976, 527.2096, 725.04877, 758.32587, 437.29404, 20.249733, 664.87854, 579.7225, 705.6865, 646.20215]
2025-05-06 09:48:13,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 23.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:48:13,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (570.35) for latency ExtremeSparseL4U32
2025-05-06 09:48:13,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-06 09:48:13,211 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 09:48:13,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 5/100 (estimated time remaining: 34 hours, 6 minutes, 26 seconds)
2025-05-06 10:04:17,123 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 10:04:17,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 10:11:18,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 673.69519 ± 219.006
2025-05-06 10:11:18,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [614.3941, 885.171, 911.923, 545.3452, 735.8695, 621.32, 653.29504, 916.7384, 717.8345, 135.06123]
2025-05-06 10:11:18,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 144.0]
2025-05-06 10:11:18,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (673.70) for latency ExtremeSparseL4U32
2025-05-06 10:11:18,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-06 10:11:18,645 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 10:11:18,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 6/100 (estimated time remaining: 34 hours, 18 minutes, 48 seconds)
2025-05-06 10:26:27,628 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 10:26:27,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 10:32:38,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 814.68158 ± 220.813
2025-05-06 10:32:38,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [868.01166, 621.5162, 371.50403, 880.6763, 1149.5637, 578.5257, 1071.769, 848.80725, 836.8891, 919.5532]
2025-05-06 10:32:38,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 283.0, 806.0, 1000.0, 1000.0, 1000.0, 1000.0, 746.0, 697.0]
2025-05-06 10:32:38,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (814.68) for latency ExtremeSparseL4U32
2025-05-06 10:32:38,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-06 10:32:38,932 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 10:32:38,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 7/100 (estimated time remaining: 33 hours, 51 minutes, 9 seconds)
2025-05-06 10:49:24,605 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 10:49:24,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 10:54:51,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 616.73456 ± 323.695
2025-05-06 10:54:51,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [480.99075, 1125.9526, 1207.074, 720.3028, 266.7374, 622.4567, 564.92896, 667.3744, 361.71527, 149.81302]
2025-05-06 10:54:51,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 885.0, 1000.0, 1000.0, 199.0, 1000.0, 1000.0, 1000.0, 287.0, 130.0]
2025-05-06 10:54:51,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 8/100 (estimated time remaining: 33 hours, 45 minutes, 26 seconds)
2025-05-06 11:09:06,106 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 11:09:06,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 11:13:39,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 702.05359 ± 318.221
2025-05-06 11:13:39,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [479.16086, 538.2437, 707.37805, 589.9126, 594.9312, 540.9976, 1241.8435, 536.41156, 399.7401, 1391.916]
2025-05-06 11:13:39,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [534.0, 1000.0, 530.0, 471.0, 448.0, 379.0, 884.0, 337.0, 309.0, 1000.0]
2025-05-06 11:13:39,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 9/100 (estimated time remaining: 32 hours, 55 minutes, 22 seconds)
2025-05-06 11:30:00,841 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 11:30:00,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 11:36:02,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 955.62878 ± 487.226
2025-05-06 11:36:02,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1384.5752, 1257.7026, 157.5228, 1400.3132, 1281.3146, 178.51817, 1312.8723, 1318.7406, 425.62292, 839.10516]
2025-05-06 11:36:02,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 945.0, 130.0, 1000.0, 1000.0, 162.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:36:02,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (955.63) for latency ExtremeSparseL4U32
2025-05-06 11:36:02,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-06 11:36:02,541 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 11:36:02,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 10/100 (estimated time remaining: 32 hours, 42 minutes, 21 seconds)
2025-05-06 11:49:42,068 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 11:49:42,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 11:55:52,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 950.91180 ± 410.497
2025-05-06 11:55:52,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [901.0606, 1115.0745, 289.35596, 835.3732, 1139.7737, 1542.9302, 624.30273, 1168.9541, 369.8554, 1522.4373]
2025-05-06 11:55:52,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 244.0, 1000.0, 921.0, 1000.0, 1000.0, 818.0, 267.0, 1000.0]
2025-05-06 11:55:52,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 11/100 (estimated time remaining: 31 hours, 22 minutes, 4 seconds)
2025-05-06 12:11:08,602 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 12:11:08,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 12:16:09,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 769.25647 ± 560.795
2025-05-06 12:16:09,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [27.905138, 1404.8441, 228.06567, 1089.056, 1408.6885, 1677.3901, 639.7175, 97.85257, 500.11752, 618.9274]
2025-05-06 12:16:09,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [23.0, 913.0, 153.0, 1000.0, 1000.0, 1000.0, 1000.0, 100.0, 1000.0, 1000.0]
2025-05-06 12:16:09,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 12/100 (estimated time remaining: 30 hours, 42 minutes, 18 seconds)
2025-05-06 12:30:13,007 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 12:30:13,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 12:34:43,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 632.25146 ± 396.713
2025-05-06 12:34:43,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [766.4221, 536.01306, 1233.1123, 498.11957, 666.6016, 13.74655, 1076.1903, 1058.0483, 452.96475, 21.296082]
2025-05-06 12:34:43,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [474.0, 1000.0, 1000.0, 1000.0, 1000.0, 22.0, 646.0, 672.0, 1000.0, 37.0]
2025-05-06 12:34:43,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 13/100 (estimated time remaining: 29 hours, 17 minutes, 34 seconds)
2025-05-06 12:47:57,679 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 12:47:57,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 12:51:55,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 818.52441 ± 376.634
2025-05-06 12:51:55,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [984.4108, 1554.5881, 76.60603, 1101.7588, 1026.9917, 876.5594, 661.4569, 700.6839, 462.12527, 740.06274]
2025-05-06 12:51:55,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 62.0, 634.0, 582.0, 1000.0, 1000.0, 475.0, 249.0, 393.0]
2025-05-06 12:51:55,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 14/100 (estimated time remaining: 28 hours, 30 minutes, 2 seconds)
2025-05-06 13:04:19,795 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 13:04:19,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 13:08:10,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 734.94025 ± 539.367
2025-05-06 13:08:10,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [739.23096, 1594.9332, 659.46783, 700.70245, 136.87999, 315.28537, 928.26373, 37.27509, 482.1745, 1755.1895]
2025-05-06 13:08:10,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [399.0, 1000.0, 1000.0, 375.0, 125.0, 218.0, 1000.0, 39.0, 1000.0, 1000.0]
2025-05-06 13:08:10,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 15/100 (estimated time remaining: 26 hours, 24 minutes, 47 seconds)
2025-05-06 13:21:38,808 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 13:21:38,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 13:25:11,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 775.97363 ± 502.090
2025-05-06 13:25:11,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [935.2029, 1773.7258, 977.30206, 356.41373, 219.25043, 636.55023, 505.59177, 135.87476, 1462.2396, 757.5851]
2025-05-06 13:25:11,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 527.0, 177.0, 238.0, 411.0, 269.0, 139.0, 817.0, 1000.0]
2025-05-06 13:25:11,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 16/100 (estimated time remaining: 25 hours, 18 minutes, 27 seconds)
2025-05-06 13:37:41,067 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 13:37:41,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 13:41:05,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 992.19104 ± 729.324
2025-05-06 13:41:05,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [285.52573, 1765.0878, 1000.0984, 7.898312, 4.430523, 1520.1477, 1807.2113, 2073.3445, 846.30884, 611.8576]
2025-05-06 13:41:05,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [179.0, 805.0, 553.0, 16.0, 19.0, 732.0, 864.0, 1000.0, 363.0, 1000.0]
2025-05-06 13:41:05,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (992.19) for latency ExtremeSparseL4U32
2025-05-06 13:41:05,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-06 13:41:05,484 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 13:41:05,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 17/100 (estimated time remaining: 23 hours, 47 minutes)
2025-05-06 13:54:20,370 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 13:54:20,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 13:58:10,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 767.36951 ± 327.351
2025-05-06 13:58:10,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1014.48145, 420.36176, 145.22093, 1284.0082, 913.2768, 608.01, 515.87286, 838.0529, 823.45905, 1110.9508]
2025-05-06 13:58:10,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 99.0, 530.0, 421.0, 1000.0, 244.0, 1000.0, 394.0, 585.0]
2025-05-06 13:58:10,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 18/100 (estimated time remaining: 23 hours, 5 minutes, 16 seconds)
2025-05-06 14:10:03,259 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 14:10:03,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 14:15:31,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1351.98242 ± 639.200
2025-05-06 14:15:31,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [720.23914, 872.97906, 2316.76, 842.8293, 864.2803, 1902.4884, 548.26294, 2231.7751, 1315.9769, 1904.233]
2025-05-06 14:15:31,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 249.0, 1000.0, 1000.0, 1000.0]
2025-05-06 14:15:31,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (1351.98) for latency ExtremeSparseL4U32
2025-05-06 14:15:31,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-06 14:15:31,492 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 14:15:31,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 19/100 (estimated time remaining: 22 hours, 50 minutes, 57 seconds)
2025-05-06 14:27:41,779 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 14:27:41,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 14:30:20,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 605.94373 ± 303.179
2025-05-06 14:30:20,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1156.243, 824.1075, 369.66876, 409.6366, 234.00014, 297.2247, 470.08063, 928.50006, 455.0551, 914.92114]
2025-05-06 14:30:20,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [643.0, 372.0, 1000.0, 272.0, 160.0, 169.0, 197.0, 407.0, 285.0, 1000.0]
2025-05-06 14:30:20,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 20/100 (estimated time remaining: 22 hours, 11 minutes, 4 seconds)
2025-05-06 14:45:28,033 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 14:45:28,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 14:48:16,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 342.20380 ± 203.039
2025-05-06 14:48:16,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [236.33578, 16.13169, 591.9944, 411.8144, 102.35511, 343.8225, 549.864, 623.23267, 134.7992, 411.68848]
2025-05-06 14:48:16,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [123.0, 18.0, 1000.0, 194.0, 63.0, 193.0, 266.0, 287.0, 257.0, 1000.0]
2025-05-06 14:48:16,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 21/100 (estimated time remaining: 22 hours, 9 minutes, 22 seconds)
2025-05-06 15:01:50,718 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 15:01:50,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 15:08:13,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1817.22778 ± 840.385
2025-05-06 15:08:13,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2106.278, 2377.7014, 2462.931, 347.4292, 2384.2896, 394.4356, 2395.806, 2188.8516, 963.0435, 2551.5105]
2025-05-06 15:08:13,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 326.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 15:08:13,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (1817.23) for latency ExtremeSparseL4U32
2025-05-06 15:08:13,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-06 15:08:13,334 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 15:08:13,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 22/100 (estimated time remaining: 22 hours, 56 minutes, 40 seconds)
2025-05-06 15:25:59,893 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 15:25:59,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 15:29:38,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 829.73059 ± 598.399
2025-05-06 15:29:38,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [634.4729, 547.4208, 705.4533, 773.4385, 1099.0195, 477.70245, 140.09212, 721.2917, 711.4043, 2487.01]
2025-05-06 15:29:38,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 284.0, 300.0, 1000.0, 476.0, 209.0, 80.0, 402.0, 291.0, 1000.0]
2025-05-06 15:29:38,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 23/100 (estimated time remaining: 23 hours, 46 minutes, 51 seconds)
2025-05-06 15:45:09,826 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 15:45:09,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 15:49:25,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1244.46411 ± 811.267
2025-05-06 15:49:25,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [338.07114, 1420.4873, 1083.162, 1099.9303, 2420.1006, 1254.6923, 299.0092, 2454.112, 73.92251, 2001.1528]
2025-05-06 15:49:25,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [160.0, 588.0, 407.0, 1000.0, 1000.0, 1000.0, 167.0, 952.0, 105.0, 880.0]
2025-05-06 15:49:25,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 24/100 (estimated time remaining: 24 hours, 6 minutes, 7 seconds)
2025-05-06 16:05:44,172 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:05:44,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:12:47,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1843.70923 ± 874.018
2025-05-06 16:12:47,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2559.2568, 2393.2717, 2478.098, 2400.28, 1528.5802, 680.10223, 1195.4502, 2423.8984, 68.22565, 2709.9285]
2025-05-06 16:12:47,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 328.0, 1000.0, 1000.0, 60.0, 1000.0]
2025-05-06 16:12:47,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (1843.71) for latency ExtremeSparseL4U32
2025-05-06 16:12:47,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-06 16:12:47,425 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 16:12:47,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 25/100 (estimated time remaining: 25 hours, 57 minutes, 9 seconds)
2025-05-06 16:27:51,218 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:27:51,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:32:58,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 992.89392 ± 752.150
2025-05-06 16:32:58,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [83.72818, 397.60562, 778.4437, 972.0759, 2029.5422, 2409.1868, 1073.4485, 445.18417, 158.29488, 1581.4294]
2025-05-06 16:32:58,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [53.0, 1000.0, 1000.0, 361.0, 1000.0, 1000.0, 416.0, 1000.0, 76.0, 734.0]
2025-05-06 16:32:58,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 26/100 (estimated time remaining: 26 hours, 10 minutes, 26 seconds)
2025-05-06 16:48:45,435 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:48:45,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:55:18,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1910.36951 ± 730.018
2025-05-06 16:55:18,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2413.7732, 1109.9567, 2100.5698, 2517.723, 1946.5509, 2222.396, 2319.7827, 2090.2546, 29.989, 2352.699]
2025-05-06 16:55:18,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 553.0, 1000.0, 1000.0, 900.0, 1000.0, 1000.0, 1000.0, 30.0, 1000.0]
2025-05-06 16:55:18,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (1910.37) for latency ExtremeSparseL4U32
2025-05-06 16:55:18,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-06 16:55:18,161 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 16:55:18,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 27/100 (estimated time remaining: 26 hours, 24 minutes, 47 seconds)
2025-05-06 17:11:52,615 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:11:52,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:19:02,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1838.43518 ± 986.310
2025-05-06 17:19:02,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2565.504, 2425.4478, 2607.0415, 2485.6995, 312.74954, 2452.559, 91.322105, 2372.953, 2435.1729, 635.9015]
2025-05-06 17:19:02,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 69.0, 1000.0, 1000.0, 285.0]
2025-05-06 17:19:02,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 28/100 (estimated time remaining: 26 hours, 37 minutes, 10 seconds)
2025-05-06 17:33:40,610 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:33:40,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:37:57,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1342.38159 ± 846.124
2025-05-06 17:37:57,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1385.9543, 2185.19, 818.3129, 488.46024, 1582.5402, 23.348627, 1126.3228, 2524.9736, 639.8406, 2648.8726]
2025-05-06 17:37:57,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [519.0, 742.0, 332.0, 214.0, 698.0, 43.0, 1000.0, 1000.0, 326.0, 1000.0]
2025-05-06 17:37:57,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 29/100 (estimated time remaining: 26 hours, 2 minutes, 52 seconds)
2025-05-06 17:54:35,670 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:54:35,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:58:04,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1124.87207 ± 978.899
2025-05-06 17:58:04,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [733.1494, 2536.1697, 21.934265, 2312.2368, 483.30533, 2748.0312, 830.0962, 362.589, 77.80507, 1143.4047]
2025-05-06 17:58:04,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [265.0, 1000.0, 44.0, 1000.0, 203.0, 1000.0, 1000.0, 174.0, 54.0, 550.0]
2025-05-06 17:58:04,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 30/100 (estimated time remaining: 24 hours, 54 minutes, 57 seconds)
2025-05-06 18:13:56,157 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:13:56,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:18:46,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1325.94812 ± 870.347
2025-05-06 18:18:46,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [662.9889, 1219.6099, 1326.6669, 611.0338, 386.02002, 2522.5513, 2919.1843, 207.16277, 2037.2993, 1366.9642]
2025-05-06 18:18:46,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 209.0, 208.0, 1000.0, 1000.0, 115.0, 793.0, 455.0]
2025-05-06 18:18:46,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 31/100 (estimated time remaining: 24 hours, 41 minutes, 13 seconds)
2025-05-06 18:33:53,631 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:33:53,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:38:18,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1456.62024 ± 1078.062
2025-05-06 18:38:18,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2647.8977, 2835.4626, 281.77036, 344.7355, 2769.3958, 1748.8009, 2.1710718, 2349.1304, 587.4527, 999.3849]
2025-05-06 18:38:18,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 165.0, 1000.0, 783.0, 22.0, 925.0, 248.0, 404.0]
2025-05-06 18:38:18,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 32/100 (estimated time remaining: 23 hours, 41 minutes, 30 seconds)
2025-05-06 18:54:49,168 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:54:49,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:00:42,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1754.21460 ± 924.366
2025-05-06 19:00:42,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1104.7112, 314.6579, 1833.0963, 1117.3192, 988.4378, 1218.0106, 1955.8855, 2871.3296, 2977.771, 3160.9285]
2025-05-06 19:00:42,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [352.0, 1000.0, 1000.0, 406.0, 397.0, 1000.0, 783.0, 1000.0, 1000.0, 1000.0]
2025-05-06 19:00:42,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 33/100 (estimated time remaining: 23 hours, 2 minutes, 44 seconds)
2025-05-06 19:16:59,852 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:16:59,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:22:45,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1543.15588 ± 1050.826
2025-05-06 19:22:45,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [450.45752, 1124.4109, 223.73637, 2454.1663, 2715.3562, 2381.09, 2500.1863, 316.384, 2770.6575, 495.11414]
2025-05-06 19:22:45,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 466.0, 107.0, 1000.0, 1000.0, 1000.0, 1000.0, 150.0, 1000.0, 1000.0]
2025-05-06 19:22:45,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 34/100 (estimated time remaining: 23 hours, 24 minutes, 21 seconds)
2025-05-06 19:38:53,015 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:38:53,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:43:51,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1832.95081 ± 1153.399
2025-05-06 19:43:51,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2593.2603, 2568.2322, 185.33487, 2857.9712, 2338.8423, 2740.1714, 51.51647, 2541.7488, 16.22781, 2436.2017]
2025-05-06 19:43:51,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 110.0, 1000.0, 1000.0, 1000.0, 40.0, 1000.0, 15.0, 1000.0]
2025-05-06 19:43:51,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 35/100 (estimated time remaining: 23 hours, 16 minutes, 30 seconds)
2025-05-06 19:58:51,523 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:58:51,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:04:19,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1579.52112 ± 1108.811
2025-05-06 20:04:19,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2230.298, 759.14813, 180.5943, 386.92203, 3288.537, 977.79706, 741.5137, 3027.807, 1329.3798, 2873.214]
2025-05-06 20:04:19,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [759.0, 1000.0, 83.0, 1000.0, 1000.0, 1000.0, 336.0, 1000.0, 501.0, 1000.0]
2025-05-06 20:04:19,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 36/100 (estimated time remaining: 22 hours, 52 minutes, 14 seconds)
2025-05-06 20:19:58,477 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:19:58,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:25:03,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1959.94470 ± 851.837
2025-05-06 20:25:03,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2762.9458, 2716.6135, 1569.467, 3018.8752, 672.48987, 2590.8064, 1147.8452, 996.9348, 1319.4578, 2804.0117]
2025-05-06 20:25:03,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [891.0, 1000.0, 580.0, 1000.0, 328.0, 1000.0, 473.0, 348.0, 459.0, 1000.0]
2025-05-06 20:25:03,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (1959.94) for latency ExtremeSparseL4U32
2025-05-06 20:25:03,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-06 20:25:03,975 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 20:25:04,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 37/100 (estimated time remaining: 22 hours, 46 minutes, 28 seconds)
2025-05-06 20:40:12,450 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:40:12,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:44:57,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1727.21216 ± 926.318
2025-05-06 20:44:57,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2827.0657, 2833.192, 2530.2651, 535.6845, 178.74002, 2445.7334, 761.6023, 2041.1047, 1357.0822, 1761.6505]
2025-05-06 20:44:57,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 923.0, 204.0, 69.0, 1000.0, 263.0, 1000.0, 432.0, 648.0]
2025-05-06 20:44:57,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 38/100 (estimated time remaining: 21 hours, 53 minutes, 29 seconds)
2025-05-06 21:00:04,488 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:00:04,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:04:09,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1469.10962 ± 1132.528
2025-05-06 21:04:09,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1483.8556, 2991.0457, 767.742, 405.45532, 78.10245, 2946.5002, 3007.0269, 407.58078, 2083.1978, 520.59064]
2025-05-06 21:04:09,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [488.0, 1000.0, 1000.0, 196.0, 53.0, 1000.0, 944.0, 158.0, 1000.0, 195.0]
2025-05-06 21:04:09,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 39/100 (estimated time remaining: 20 hours, 57 minutes, 14 seconds)
2025-05-06 21:19:48,169 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:19:48,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:22:00,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 665.46674 ± 805.666
2025-05-06 21:22:00,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2380.6365, 1037.2313, 539.4145, 514.91296, 1896.2534, 119.978, 19.4928, 14.240359, 82.36921, 50.138252]
2025-05-06 21:22:00,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [742.0, 349.0, 1000.0, 258.0, 679.0, 69.0, 27.0, 14.0, 59.0, 41.0]
2025-05-06 21:22:00,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 40/100 (estimated time remaining: 19 hours, 57 minutes, 24 seconds)
2025-05-06 21:36:13,501 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:36:13,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:40:50,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1697.98535 ± 1185.408
2025-05-06 21:40:50,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [262.47324, 2965.443, 2508.7249, 555.18036, 218.90833, 982.0947, 3089.1199, 667.8567, 2843.6895, 2886.364]
2025-05-06 21:40:50,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [129.0, 1000.0, 1000.0, 207.0, 1000.0, 335.0, 1000.0, 231.0, 1000.0, 1000.0]
2025-05-06 21:40:50,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 41/100 (estimated time remaining: 19 hours, 18 minutes, 3 seconds)
2025-05-06 21:55:30,682 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:55:30,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:02:04,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2434.88916 ± 758.213
2025-05-06 22:02:04,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1300.2479, 3059.9333, 3095.323, 2757.5818, 692.5736, 2532.0464, 2886.3896, 2695.5425, 2432.2285, 2897.0256]
2025-05-06 22:02:04,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [514.0, 1000.0, 1000.0, 1000.0, 240.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 22:02:04,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (2434.89) for latency ExtremeSparseL4U32
2025-05-06 22:02:04,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-06 22:02:04,719 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 22:02:04,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 42/100 (estimated time remaining: 19 hours, 4 minutes, 44 seconds)
2025-05-06 22:16:06,810 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:16:06,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:21:28,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1664.45667 ± 1246.408
2025-05-06 22:21:28,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2613.1223, 14.362101, 145.85, 2690.7922, 1626.5048, 2815.5618, 2903.439, 3128.4663, 119.23053, 587.2375]
2025-05-06 22:21:28,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 13.0, 1000.0, 1000.0, 572.0, 1000.0, 1000.0, 1000.0, 1000.0, 202.0]
2025-05-06 22:21:28,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 43/100 (estimated time remaining: 18 hours, 39 minutes, 36 seconds)
2025-05-06 22:37:21,525 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:37:21,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:44:45,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2394.56494 ± 657.448
2025-05-06 22:44:45,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1350.2382, 2068.2336, 2295.252, 3272.5823, 2642.767, 3185.987, 3069.8552, 2621.9673, 1375.0973, 2063.6707]
2025-05-06 22:44:45,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [538.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 694.0]
2025-05-06 22:44:45,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 44/100 (estimated time remaining: 19 hours, 6 minutes, 47 seconds)
2025-05-06 23:00:01,743 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:00:01,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:06:32,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2130.69946 ± 985.676
2025-05-06 23:06:32,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1073.7664, 3055.2522, 2954.2139, 3143.6223, 524.44, 1844.3323, 2682.4856, 694.73224, 2200.4172, 3133.7322]
2025-05-06 23:06:32,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [332.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 277.0, 1000.0, 1000.0]
2025-05-06 23:06:32,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 45/100 (estimated time remaining: 19 hours, 30 minutes, 43 seconds)
2025-05-06 23:21:20,300 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:21:20,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:26:15,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1731.02832 ± 1079.965
2025-05-06 23:26:15,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2007.8718, 2317.7231, 1070.5272, 3022.0366, 872.2997, 688.7886, 1088.9413, 14.559515, 3371.9148, 2855.6204]
2025-05-06 23:26:15,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [680.0, 808.0, 329.0, 1000.0, 333.0, 1000.0, 374.0, 35.0, 1000.0, 1000.0]
2025-05-06 23:26:15,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 46/100 (estimated time remaining: 19 hours, 19 minutes, 32 seconds)
2025-05-06 23:42:57,553 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:42:57,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:48:25,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1952.90662 ± 1076.873
2025-05-06 23:48:25,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [332.73718, 2837.9963, 3039.6333, 3020.702, 473.06232, 1005.90607, 2780.5808, 2775.945, 853.12897, 2409.3743]
2025-05-06 23:48:25,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 993.0, 1000.0, 1000.0, 231.0, 315.0, 1000.0, 871.0, 301.0, 882.0]
2025-05-06 23:48:25,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 47/100 (estimated time remaining: 19 hours, 8 minutes, 26 seconds)
2025-05-07 00:03:15,269 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:03:15,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:08:03,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1374.22168 ± 1209.373
2025-05-07 00:08:03,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [638.2823, 784.39075, 2928.759, 1129.0603, 3402.1008, 1451.258, 84.237434, 2979.901, 104.961555, 239.26562]
2025-05-07 00:08:03,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [219.0, 248.0, 1000.0, 1000.0, 1000.0, 496.0, 66.0, 1000.0, 133.0, 1000.0]
2025-05-07 00:08:03,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 48/100 (estimated time remaining: 18 hours, 49 minutes, 45 seconds)
2025-05-07 00:22:55,942 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:22:55,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:26:22,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 646.30713 ± 714.601
2025-05-07 00:26:22,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [374.03122, 2353.9346, 22.625116, 1121.2109, 167.44267, 98.56167, 56.809307, 153.51065, 1205.9716, 908.97363]
2025-05-07 00:26:22,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 689.0, 38.0, 370.0, 1000.0, 86.0, 37.0, 95.0, 342.0, 1000.0]
2025-05-07 00:26:22,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 49/100 (estimated time remaining: 17 hours, 36 minutes, 48 seconds)
2025-05-07 00:43:01,582 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:43:01,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:48:00,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2068.71802 ± 1226.451
2025-05-07 00:48:00,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [67.804016, 3098.5427, 2123.7893, 2714.3896, 3103.6814, 125.6852, 3346.4712, 782.4707, 3262.0479, 2062.2976]
2025-05-07 00:48:00,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [55.0, 1000.0, 681.0, 988.0, 1000.0, 69.0, 1000.0, 302.0, 1000.0, 664.0]
2025-05-07 00:48:00,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 50/100 (estimated time remaining: 17 hours, 14 minutes, 54 seconds)
2025-05-07 01:01:22,170 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:01:22,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:05:42,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1972.32129 ± 1339.768
2025-05-07 01:05:42,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3412.4854, 3011.6362, 219.0096, 3317.0984, 1690.7843, 2910.2698, 55.892277, 18.019627, 3179.7544, 1908.2628]
2025-05-07 01:05:42,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 80.0, 1000.0, 552.0, 1000.0, 59.0, 24.0, 1000.0, 564.0]
2025-05-07 01:05:42,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 51/100 (estimated time remaining: 16 hours, 34 minutes, 34 seconds)
2025-05-07 01:21:14,880 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:21:14,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:26:57,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2349.36230 ± 1011.622
2025-05-07 01:26:57,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1454.3118, 3242.112, 3049.0127, 644.21436, 1876.0508, 3195.4814, 3128.1113, 3355.1506, 769.6232, 2779.555]
2025-05-07 01:26:57,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [475.0, 1000.0, 1000.0, 273.0, 592.0, 1000.0, 1000.0, 1000.0, 256.0, 1000.0]
2025-05-07 01:26:57,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 52/100 (estimated time remaining: 16 hours, 5 minutes, 40 seconds)
2025-05-07 01:43:20,523 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:43:20,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:47:19,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1475.59546 ± 1104.401
2025-05-07 01:47:19,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2004.2104, 3446.822, 3107.7085, 1258.4766, 898.6678, 763.4727, 104.73409, 1076.6681, 28.72826, 2066.467]
2025-05-07 01:47:19,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 347.0, 330.0, 1000.0, 67.0, 338.0, 31.0, 569.0]
2025-05-07 01:47:19,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 53/100 (estimated time remaining: 15 hours, 53 minutes)
2025-05-07 02:03:29,651 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:03:29,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:09:38,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2492.71240 ± 952.188
2025-05-07 02:09:38,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3298.178, 3208.2246, 1547.3573, 2817.7966, 2310.8174, 2880.3613, 3015.5564, 3082.6655, 2732.0088, 34.160015]
2025-05-07 02:09:38,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 464.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 943.0, 32.0]
2025-05-07 02:09:38,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (2492.71) for latency ExtremeSparseL4U32
2025-05-07 02:09:38,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-07 02:09:38,450 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 02:09:38,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 54/100 (estimated time remaining: 16 hours, 10 minutes, 46 seconds)
2025-05-07 02:25:52,250 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:25:52,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:31:37,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1606.65479 ± 1488.311
2025-05-07 02:31:37,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [808.65247, 678.286, 363.09924, 3098.6028, 3628.3816, 3525.955, 257.76285, 150.75781, 178.92259, 3376.127]
2025-05-07 02:31:37,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [237.0, 220.0, 1000.0, 1000.0, 1000.0, 1000.0, 140.0, 1000.0, 1000.0, 1000.0]
2025-05-07 02:31:37,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 55/100 (estimated time remaining: 15 hours, 53 minutes, 20 seconds)
2025-05-07 02:45:48,601 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:45:48,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:52:02,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2504.39307 ± 844.036
2025-05-07 02:52:02,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2103.0781, 2528.41, 946.7744, 3095.3538, 3068.4358, 2725.7578, 3262.9392, 2907.8794, 998.5978, 3406.7058]
2025-05-07 02:52:02,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [712.0, 821.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 300.0, 1000.0]
2025-05-07 02:52:02,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (2504.39) for latency ExtremeSparseL4U32
2025-05-07 02:52:02,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-07 02:52:02,976 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 02:52:03,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 56/100 (estimated time remaining: 15 hours, 57 minutes, 4 seconds)
2025-05-07 03:06:16,265 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:06:16,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:11:21,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2040.29944 ± 1299.285
2025-05-07 03:11:21,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2540.513, 2549.9678, 2849.3342, 29.227991, 29.717236, 3156.0266, 2820.26, 3017.5046, 189.07602, 3221.3665]
2025-05-07 03:11:21,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 32.0, 25.0, 1000.0, 1000.0, 1000.0, 102.0, 1000.0]
2025-05-07 03:11:21,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 57/100 (estimated time remaining: 15 hours, 18 minutes, 44 seconds)
2025-05-07 03:24:26,001 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:24:26,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:27:07,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 906.93976 ± 1099.008
2025-05-07 03:27:07,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [709.7279, 76.10231, 684.6356, 1342.389, 3923.3384, 404.6918, 232.36232, 200.48369, 1361.9564, 133.70975]
2025-05-07 03:27:07,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [303.0, 57.0, 244.0, 401.0, 1000.0, 187.0, 169.0, 106.0, 517.0, 1000.0]
2025-05-07 03:27:07,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 58/100 (estimated time remaining: 14 hours, 18 minutes, 17 seconds)
2025-05-07 03:41:12,698 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:41:12,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:44:54,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1582.12732 ± 1238.886
2025-05-07 03:44:54,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2871.0215, 2240.3066, 128.27759, 28.311356, 2026.6525, 25.153036, 3262.6243, 1299.4934, 3173.8232, 765.6103]
2025-05-07 03:44:54,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 710.0, 1000.0, 52.0, 597.0, 26.0, 1000.0, 426.0, 1000.0, 208.0]
2025-05-07 03:44:54,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 59/100 (estimated time remaining: 13 hours, 20 minutes, 13 seconds)
2025-05-07 03:56:45,281 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:56:45,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:01:43,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2687.37744 ± 937.841
2025-05-07 04:01:43,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3412.0664, 165.87007, 1905.9158, 2842.9941, 3429.1104, 3052.407, 2871.7705, 2981.7844, 2858.0073, 3353.8499]
2025-05-07 04:01:43,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 81.0, 626.0, 785.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 04:01:43,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (2687.38) for latency ExtremeSparseL4U32
2025-05-07 04:01:43,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-07 04:01:43,936 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 04:01:43,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 60/100 (estimated time remaining: 12 hours, 18 minutes, 51 seconds)
2025-05-07 04:14:34,554 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:14:34,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:18:57,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2317.55371 ± 1326.216
2025-05-07 04:18:57,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2463.342, 3620.0583, 1958.3997, 3466.7517, 31.502447, 4.2413745, 1621.6133, 3319.5068, 2995.8962, 3694.2249]
2025-05-07 04:18:57,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 637.0, 1000.0, 45.0, 15.0, 489.0, 1000.0, 1000.0, 1000.0]
2025-05-07 04:18:57,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 61/100 (estimated time remaining: 11 hours, 35 minutes, 15 seconds)
2025-05-07 04:31:59,935 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:31:59,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:36:31,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2510.06274 ± 1316.365
2025-05-07 04:36:31,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2325.405, 1056.8507, 3596.05, 3476.503, 3680.0195, 291.42328, 3428.4475, 3729.423, 3046.081, 470.42416]
2025-05-07 04:36:31,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 426.0, 1000.0, 1000.0, 1000.0, 136.0, 1000.0, 1000.0, 1000.0, 171.0]
2025-05-07 04:36:31,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 62/100 (estimated time remaining: 11 hours, 4 minutes, 20 seconds)
2025-05-07 04:48:02,314 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:48:02,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:52:51,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2570.46338 ± 1098.014
2025-05-07 04:52:51,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [994.5844, 3519.3015, 1678.8397, 3592.4475, 3105.638, 2851.784, 3463.0432, 2391.6003, 456.0616, 3651.3318]
2025-05-07 04:52:51,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [343.0, 1000.0, 548.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 280.0, 1000.0]
2025-05-07 04:52:51,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 51 minutes, 29 seconds)
2025-05-07 05:05:22,165 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:05:22,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:09:41,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2217.89111 ± 1265.190
2025-05-07 05:09:41,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2769.571, 3443.6433, 2170.3188, 1467.1045, 331.58658, 3688.1196, 3283.7651, 323.1205, 3606.4243, 1095.2552]
2025-05-07 05:09:41,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 130.0, 1000.0, 1000.0, 123.0, 1000.0, 321.0]
2025-05-07 05:09:41,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 27 minutes, 27 seconds)
2025-05-07 05:21:49,782 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:21:49,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:26:54,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2824.07275 ± 939.345
2025-05-07 05:26:54,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3316.515, 3606.192, 3056.9873, 3086.4446, 3421.0305, 3242.7107, 1511.0094, 3368.2576, 538.8341, 3092.7454]
2025-05-07 05:26:54,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 436.0, 1000.0, 186.0, 1000.0]
2025-05-07 05:26:54,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (2824.07) for latency ExtremeSparseL4U32
2025-05-07 05:26:54,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-07 05:26:54,811 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 05:26:54,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 65/100 (estimated time remaining: 10 hours, 13 minutes, 18 seconds)
2025-05-07 05:39:49,405 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:39:49,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:42:32,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1209.73999 ± 1041.742
2025-05-07 05:42:32,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2980.6802, 1958.9877, 1598.6866, 1404.3926, 150.26619, 3.0089424, 196.45517, 2746.6284, 718.3903, 339.90448]
2025-05-07 05:42:32,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 501.0, 435.0, 355.0, 1000.0, 17.0, 94.0, 845.0, 221.0, 120.0]
2025-05-07 05:42:32,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 45 minutes, 3 seconds)
2025-05-07 05:53:50,482 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:53:50,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:57:23,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1693.51526 ± 785.048
2025-05-07 05:57:23,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1320.9552, 717.37823, 1166.4413, 1579.6992, 2608.064, 1595.5343, 2656.5242, 1392.4845, 768.77814, 3129.2917]
2025-05-07 05:57:23,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [449.0, 290.0, 455.0, 486.0, 717.0, 452.0, 744.0, 405.0, 1000.0, 1000.0]
2025-05-07 05:57:23,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 9 minutes, 47 seconds)
2025-05-07 06:10:25,003 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:10:25,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:13:56,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1949.47046 ± 1254.831
2025-05-07 06:13:56,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2278.4116, 847.2696, 3563.9973, 2891.1836, 13.198095, 2050.015, 1249.2758, 173.76117, 3619.8296, 2807.764]
2025-05-07 06:13:56,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [675.0, 305.0, 1000.0, 1000.0, 14.0, 606.0, 378.0, 96.0, 1000.0, 1000.0]
2025-05-07 06:13:56,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 55 minutes, 9 seconds)
2025-05-07 06:25:30,203 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:25:30,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:28:02,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1337.90833 ± 1305.254
2025-05-07 06:28:02,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2039.3798, 10.485399, 3433.3743, 6.2120996, 1399.9741, 1046.1051, 35.57628, 3799.415, 484.29755, 1124.2635]
2025-05-07 06:28:02,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 21.0, 1000.0, 13.0, 416.0, 305.0, 39.0, 1000.0, 141.0, 371.0]
2025-05-07 06:28:02,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 21 minutes, 21 seconds)
2025-05-07 06:41:17,339 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:41:17,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:43:31,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1162.03528 ± 1092.026
2025-05-07 06:43:31,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2780.0784, 1193.1124, 324.2295, 3531.9954, 399.70898, 589.5357, 19.0178, 1317.3945, 297.16425, 1168.115]
2025-05-07 06:43:31,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [808.0, 549.0, 124.0, 1000.0, 131.0, 271.0, 29.0, 561.0, 133.0, 326.0]
2025-05-07 06:43:31,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 54 minutes, 57 seconds)
2025-05-07 06:55:01,282 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:55:01,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:58:37,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1490.04004 ± 1222.635
2025-05-07 06:58:37,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2175.489, 607.4329, 419.832, 3330.542, 2240.4507, 684.4527, 3716.8271, 267.6506, 1143.016, 314.70844]
2025-05-07 06:58:37,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [593.0, 188.0, 158.0, 987.0, 757.0, 192.0, 951.0, 1000.0, 355.0, 1000.0]
2025-05-07 06:58:37,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 36 minutes, 28 seconds)
2025-05-07 07:10:47,973 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:10:47,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:14:52,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1771.42322 ± 1358.703
2025-05-07 07:14:52,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1296.6637, 1100.3911, 3678.8757, 2376.1572, 50.593002, 17.082611, 3000.3528, 3360.3804, 102.42234, 2731.3137]
2025-05-07 07:14:52,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [512.0, 1000.0, 1000.0, 743.0, 36.0, 15.0, 1000.0, 1000.0, 1000.0, 851.0]
2025-05-07 07:14:52,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 29 minutes, 27 seconds)
2025-05-07 07:27:54,655 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:27:54,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:32:43,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2581.03345 ± 755.273
2025-05-07 07:32:43,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3166.6855, 3436.4553, 1707.7749, 2209.3396, 2246.4912, 3948.3499, 2931.4973, 1371.2296, 2611.5266, 2180.9827]
2025-05-07 07:32:43,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 584.0, 783.0, 591.0, 1000.0, 1000.0, 423.0, 784.0, 1000.0]
2025-05-07 07:32:43,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 21 minutes, 12 seconds)
2025-05-07 07:44:02,886 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:44:02,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:46:54,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 702.52887 ± 1097.585
2025-05-07 07:46:54,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1472.4851, 411.51733, 172.23833, 223.0113, 701.02216, 21.144537, 3745.5564, 46.468693, 28.379429, 203.46521]
2025-05-07 07:46:54,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [398.0, 141.0, 67.0, 119.0, 213.0, 31.0, 975.0, 1000.0, 1000.0, 1000.0]
2025-05-07 07:46:54,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 5 minutes, 53 seconds)
2025-05-07 07:58:53,895 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:58:53,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:04:16,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2295.45654 ± 1331.228
2025-05-07 08:04:16,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3215.1367, 39.903885, 314.36133, 2852.8901, 3209.356, 483.15982, 3329.8992, 3204.9395, 2969.2932, 3335.6248]
2025-05-07 08:04:16,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 46.0, 1000.0, 1000.0, 994.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 08:04:16,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 59 minutes, 56 seconds)
2025-05-07 08:16:19,321 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:16:19,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:20:17,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2182.72437 ± 1356.472
2025-05-07 08:20:17,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [16.77352, 2224.6548, 3622.075, 3422.8926, 2689.3335, 3613.5188, 2166.3562, 2.8755624, 3258.6077, 810.15594]
2025-05-07 08:20:17,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [25.0, 662.0, 1000.0, 1000.0, 1000.0, 1000.0, 818.0, 16.0, 1000.0, 254.0]
2025-05-07 08:20:17,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 48 minutes, 22 seconds)
2025-05-07 08:33:33,363 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:33:33,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:38:09,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2512.65259 ± 945.933
2025-05-07 08:38:09,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1079.198, 3208.2285, 2258.6597, 1443.1885, 3387.3184, 1184.6411, 2248.4065, 3359.9617, 3576.0024, 3380.9194]
2025-05-07 08:38:09,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [330.0, 925.0, 650.0, 533.0, 1000.0, 1000.0, 636.0, 1000.0, 1000.0, 1000.0]
2025-05-07 08:38:09,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 39 minutes, 44 seconds)
2025-05-07 08:49:26,229 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:49:26,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:54:33,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2681.44214 ± 995.595
2025-05-07 08:54:33,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2288.0627, 3620.0894, 3110.027, 3245.6736, 1192.4927, 3221.543, 607.64703, 3807.2012, 2519.3496, 3202.3333]
2025-05-07 08:54:33,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [595.0, 1000.0, 1000.0, 1000.0, 348.0, 1000.0, 1000.0, 1000.0, 688.0, 1000.0]
2025-05-07 08:54:33,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 16 minutes, 28 seconds)
2025-05-07 09:07:27,115 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:07:27,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:11:38,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2245.86108 ± 1289.048
2025-05-07 09:11:38,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [6.067268, 1252.8613, 3346.6135, 2271.5728, 3060.3599, 2612.9082, 3806.5935, 3384.7236, 74.826836, 2642.0845]
2025-05-07 09:11:38,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [11.0, 333.0, 1000.0, 1000.0, 833.0, 918.0, 1000.0, 1000.0, 113.0, 777.0]
2025-05-07 09:11:38,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 12 minutes, 48 seconds)
2025-05-07 09:23:46,733 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:23:46,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:28:40,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2449.12842 ± 1313.206
2025-05-07 09:28:40,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3146.0557, 2461.081, 3721.3052, 14.373135, 3355.4873, 1947.467, 2811.406, 46.877533, 3660.467, 3326.7666]
2025-05-07 09:28:40,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 703.0, 1000.0, 39.0, 1000.0, 612.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 09:28:40,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 54 minutes, 27 seconds)
2025-05-07 09:41:03,466 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:41:03,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:43:47,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1564.98181 ± 1144.778
2025-05-07 09:43:47,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [329.6667, 2226.3123, 1458.9058, 32.993763, 33.452774, 1826.329, 2311.3425, 2862.7712, 1040.1726, 3527.8723]
2025-05-07 09:43:47,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [113.0, 653.0, 549.0, 27.0, 35.0, 623.0, 578.0, 857.0, 295.0, 960.0]
2025-05-07 09:43:47,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 34 minutes)
2025-05-07 09:55:44,551 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:55:44,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:58:10,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1400.84741 ± 954.009
2025-05-07 09:58:10,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1251.1976, 1716.6075, 1135.6841, 16.495945, 1547.3011, 1754.2357, 640.74036, 1923.9202, 390.9818, 3631.3096]
2025-05-07 09:58:10,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [418.0, 486.0, 363.0, 30.0, 579.0, 520.0, 194.0, 549.0, 139.0, 997.0]
2025-05-07 09:58:10,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 4 minutes, 2 seconds)
2025-05-07 10:10:45,810 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:10:45,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:15:24,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2492.80005 ± 1285.398
2025-05-07 10:15:24,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [900.732, 1865.7993, 3482.7935, 14.511126, 3458.0198, 3583.299, 3635.3796, 1261.6926, 3454.0466, 3271.7268]
2025-05-07 10:15:24,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [354.0, 582.0, 1000.0, 26.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 10:15:24,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 51 minutes, 3 seconds)
2025-05-07 10:27:10,014 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:27:10,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:31:42,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2444.88672 ± 1574.727
2025-05-07 10:31:42,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [757.4995, 3521.625, 3797.034, 1345.7032, 3814.5747, 3768.8853, 3624.9653, 37.07877, 112.93757, 3668.5645]
2025-05-07 10:31:42,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [223.0, 1000.0, 986.0, 454.0, 1000.0, 1000.0, 1000.0, 1000.0, 69.0, 1000.0]
2025-05-07 10:31:42,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 32 minutes, 14 seconds)
2025-05-07 10:44:00,716 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:44:00,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:47:10,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1678.88550 ± 1248.867
2025-05-07 10:47:10,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [9.244163, 2718.7979, 3775.4292, 557.58276, 3397.768, 40.003284, 1298.4257, 1553.9963, 2086.7258, 1350.8829]
2025-05-07 10:47:10,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [15.0, 813.0, 1000.0, 180.0, 1000.0, 36.0, 417.0, 1000.0, 769.0, 381.0]
2025-05-07 10:47:10,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 11 minutes, 11 seconds)
2025-05-07 11:00:02,706 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:00:02,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:03:11,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1529.96411 ± 1208.611
2025-05-07 11:03:11,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1377.8978, 488.62228, 1892.0013, 3034.7917, 1355.3964, 22.341879, 3327.3877, 704.7899, -15.219126, 3111.6309]
2025-05-07 11:03:11,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [430.0, 145.0, 516.0, 802.0, 441.0, 34.0, 1000.0, 190.0, 1000.0, 910.0]
2025-05-07 11:03:11,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 58 minutes, 10 seconds)
2025-05-07 11:14:44,936 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:14:44,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:18:16,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2080.81079 ± 1607.962
2025-05-07 11:18:16,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3701.758, 3653.0718, 100.17822, 377.66318, 3246.5737, 3500.7795, 30.554531, 3027.9487, 9.3038645, 3160.275]
2025-05-07 11:18:16,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 59.0, 140.0, 1000.0, 1000.0, 28.0, 1000.0, 30.0, 1000.0]
2025-05-07 11:18:16,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 44 minutes, 18 seconds)
2025-05-07 11:30:39,140 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:30:39,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:35:40,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2866.93066 ± 1172.464
2025-05-07 11:35:40,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3307.1904, 3310.4731, 3619.5005, 3741.4087, 103.691475, 3495.8474, 3670.4692, 1089.5112, 3092.335, 3238.8813]
2025-05-07 11:35:40,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 52.0, 1000.0, 1000.0, 384.0, 1000.0, 1000.0]
2025-05-07 11:35:40,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (2866.93) for latency ExtremeSparseL4U32
2025-05-07 11:35:40,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:35:40,530 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-mbpac-highdim-memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:35:40,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 28 minutes, 41 seconds)
2025-05-07 11:48:31,597 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:48:31,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:52:07,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1897.85522 ± 1391.398
2025-05-07 11:52:07,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2648.8901, 3096.5864, 79.62433, 191.98352, 3053.0708, 3497.1882, 1725.9025, 687.10657, 3692.4016, 305.79868]
2025-05-07 11:52:07,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [689.0, 1000.0, 94.0, 64.0, 775.0, 1000.0, 480.0, 231.0, 949.0, 1000.0]
2025-05-07 11:52:07,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 13 minutes)
2025-05-07 12:04:06,511 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:04:06,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:07:32,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1922.97485 ± 1439.477
2025-05-07 12:07:32,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3653.7302, 3807.2852, 3167.1794, 940.6246, 2444.8538, 68.831245, 37.021057, 2108.9465, 2915.45, 85.82667]
2025-05-07 12:07:32,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 875.0, 246.0, 1000.0, 46.0, 59.0, 543.0, 1000.0, 60.0]
2025-05-07 12:07:32,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 56 minutes, 49 seconds)
2025-05-07 12:19:04,815 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:19:04,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:23:09,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2544.72290 ± 1386.134
2025-05-07 12:23:09,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4050.9832, 3836.9377, 3112.907, 3963.2327, 2487.7646, 2095.9253, 168.51457, 3559.1956, 2035.469, 136.29977]
2025-05-07 12:23:09,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 924.0, 1000.0, 630.0, 596.0, 96.0, 1000.0, 599.0, 61.0]
2025-05-07 12:23:09,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 39 minutes, 55 seconds)
2025-05-07 12:36:04,877 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:36:04,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:39:21,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1374.71204 ± 784.281
2025-05-07 12:39:21,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1565.7356, 908.2481, 400.5115, 2098.6028, 1388.7551, 153.84941, 1853.6614, 1135.4324, 1245.9258, 2996.3982]
2025-05-07 12:39:21,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [504.0, 246.0, 163.0, 577.0, 422.0, 1000.0, 1000.0, 311.0, 611.0, 796.0]
2025-05-07 12:39:21,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 25 minutes, 55 seconds)
2025-05-07 12:51:13,426 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:51:13,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:54:54,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1942.57947 ± 1335.091
2025-05-07 12:54:54,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2111.4854, 336.21713, 3368.1753, 3538.4402, 3138.233, 3507.7678, 412.15173, 1096.7942, 19.664886, 1896.8666]
2025-05-07 12:54:54,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [643.0, 129.0, 1000.0, 1000.0, 785.0, 1000.0, 138.0, 1000.0, 29.0, 604.0]
2025-05-07 12:54:54,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 6 minutes, 45 seconds)
2025-05-07 13:07:13,788 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 13:07:13,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:10:23,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1596.01562 ± 1506.901
2025-05-07 13:10:23,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [463.44553, -1.7692606, 3039.2217, 4139.228, 2551.925, 1538.7375, 273.64236, 15.323848, 3531.7327, 408.66904]
2025-05-07 13:10:23,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [216.0, 13.0, 1000.0, 1000.0, 755.0, 392.0, 97.0, 14.0, 1000.0, 1000.0]
2025-05-07 13:10:23,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 49 minutes, 34 seconds)
2025-05-07 13:22:58,319 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 13:22:58,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:27:53,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1887.48401 ± 1215.755
2025-05-07 13:27:53,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3233.0137, 2517.5981, 3330.9302, 3394.7732, 2558.7532, 895.2571, 22.74151, 1548.9785, 1077.2076, 295.58835]
2025-05-07 13:27:53,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 825.0, 1000.0, 1000.0, 1000.0, 265.0, 1000.0, 466.0, 1000.0, 1000.0]
2025-05-07 13:27:53,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 36 minutes, 25 seconds)
2025-05-07 13:40:13,835 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 13:40:13,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:43:31,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1834.48108 ± 1453.746
2025-05-07 13:43:31,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2395.229, 1186.8865, 50.176533, 3797.615, 2434.941, 3515.0898, 12.865434, 3668.5435, 71.66958, 1211.7927]
2025-05-07 13:43:31,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 359.0, 50.0, 1000.0, 1000.0, 1000.0, 13.0, 1000.0, 46.0, 331.0]
2025-05-07 13:43:31,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 20 minutes, 22 seconds)
2025-05-07 13:55:29,432 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 13:55:29,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:59:36,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1763.72339 ± 1565.556
2025-05-07 13:59:36,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3935.6729, 829.6588, 23.558765, 3782.3367, 841.51416, 3246.6274, 37.74012, 3354.91, -35.145252, 1620.3606]
2025-05-07 13:59:36,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 286.0, 1000.0, 1000.0, 289.0, 1000.0, 38.0, 1000.0, 1000.0, 496.0]
2025-05-07 13:59:36,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 4 minutes, 12 seconds)
2025-05-07 14:11:48,377 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 14:11:48,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:15:57,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2343.35791 ± 1153.456
2025-05-07 14:15:57,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1453.3146, 1945.4617, 3423.278, 1750.1353, 3396.4792, 1419.1018, 3843.1965, 3017.27, 4.918215, 3180.421]
2025-05-07 14:15:57,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 668.0, 1000.0, 430.0, 880.0, 433.0, 1000.0, 871.0, 16.0, 944.0]
2025-05-07 14:15:57,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 98/100 (estimated time remaining: 48 minutes, 38 seconds)
2025-05-07 14:27:49,080 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 14:27:49,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:32:23,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2065.03882 ± 1113.079
2025-05-07 14:32:23,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3543.9722, 3171.3428, 1827.9069, 28.339237, 2271.354, 1839.5739, 462.77408, 3514.7324, 1964.3577, 2026.0347]
2025-05-07 14:32:23,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [941.0, 1000.0, 1000.0, 48.0, 671.0, 1000.0, 1000.0, 885.0, 687.0, 637.0]
2025-05-07 14:32:23,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 99/100 (estimated time remaining: 32 minutes, 48 seconds)
2025-05-07 14:45:31,604 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 14:45:31,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:49:43,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2605.02612 ± 1121.245
2025-05-07 14:49:43,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1660.2662, 1018.2671, 3804.6643, 3759.999, 1256.1803, 3657.235, 3620.4614, 1183.2291, 2939.695, 3150.2659]
2025-05-07 14:49:43,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [446.0, 307.0, 1000.0, 1000.0, 379.0, 1000.0, 931.0, 458.0, 812.0, 786.0]
2025-05-07 14:49:43,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 22 seconds)
2025-05-07 15:02:04,832 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 15:02:04,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:06:24,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1923.61206 ± 1376.024
2025-05-07 15:06:24,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2624.9604, 809.51337, 3915.0264, 677.5951, 3517.9395, 3751.4287, 2023.0027, 1386.4691, 250.10907, 280.07712]
2025-05-07 15:06:24,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [811.0, 1000.0, 1000.0, 229.0, 1000.0, 1000.0, 746.0, 414.0, 1000.0, 107.0]
2025-05-07 15:06:24,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1149 [DEBUG]: Training session finished
