2025-04-30 08:29:09,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay
2025-04-30 08:29:09,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay
2025-04-30 08:29:09,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7f0ee9083b50>}
2025-04-30 08:29:09,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1009 [DEBUG]: using device: cuda
2025-04-30 08:29:09,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1031 [INFO]: Creating new trainer
2025-04-30 08:29:09,896 baseline-mbpac-noisy-ant:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-04-30 08:29:09,896 baseline-mbpac-noisy-ant:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-04-30 08:29:09,911 baseline-mbpac-noisy-ant:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=27, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=27, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=8, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-04-30 08:29:10,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1092 [DEBUG]: Starting training session...
2025-04-30 08:29:10,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 1/100
2025-04-30 08:41:51,356 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 08:41:51,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 08:46:04,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: -322.43484 ± 211.735
2025-04-30 08:46:04,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [-476.37784, -17.236345, -315.60254, -551.3497, -58.64366, -383.09872, -314.20584, -599.4062, -498.82388, -9.603599]
2025-04-30 08:46:04,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 40.0, 333.0, 792.0, 140.0, 1000.0, 394.0, 1000.0, 439.0, 11.0]
2025-04-30 08:46:04,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (-322.43) for latency ExtremeClogL1U23
2025-04-30 08:46:04,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-04-30 08:46:04,522 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 08:46:04,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 2/100 (estimated time remaining: 27 hours, 52 minutes, 56 seconds)
2025-04-30 09:01:53,119 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 09:01:53,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 09:06:14,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 131.69083 ± 74.162
2025-04-30 09:06:14,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [229.63889, 125.377785, 108.02512, 51.163822, 39.15354, 172.53671, 68.99662, 249.91771, 207.26917, 64.82868]
2025-04-30 09:06:14,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [936.0, 475.0, 1000.0, 159.0, 198.0, 1000.0, 171.0, 1000.0, 678.0, 456.0]
2025-04-30 09:06:14,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (131.69) for latency ExtremeClogL1U23
2025-04-30 09:06:14,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-04-30 09:06:14,438 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 09:06:14,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 3/100 (estimated time remaining: 30 hours, 16 minutes, 7 seconds)
2025-04-30 09:22:30,611 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 09:22:30,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 09:27:34,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 248.19290 ± 127.744
2025-04-30 09:27:34,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [173.64635, 315.82877, 102.03815, 416.8093, 96.12998, 285.32132, 479.4166, 308.88336, 112.69433, 191.16081]
2025-04-30 09:27:34,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [351.0, 767.0, 191.0, 1000.0, 492.0, 518.0, 1000.0, 1000.0, 1000.0, 561.0]
2025-04-30 09:27:34,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (248.19) for latency ExtremeClogL1U23
2025-04-30 09:27:34,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-04-30 09:27:34,384 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 09:27:34,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 4/100 (estimated time remaining: 31 hours, 28 minutes, 8 seconds)
2025-04-30 09:45:26,201 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 09:45:26,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 09:51:06,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 405.75903 ± 176.906
2025-04-30 09:51:06,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [298.08002, 529.38135, 164.68764, 128.03697, 567.2588, 344.92337, 471.46863, 747.4039, 402.88037, 403.46924]
2025-04-30 09:51:06,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [392.0, 1000.0, 712.0, 213.0, 1000.0, 603.0, 737.0, 1000.0, 1000.0, 1000.0]
2025-04-30 09:51:06,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (405.76) for latency ExtremeClogL1U23
2025-04-30 09:51:06,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-04-30 09:51:06,861 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 09:51:06,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 5/100 (estimated time remaining: 32 hours, 46 minutes, 30 seconds)
2025-04-30 10:06:18,907 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 10:06:18,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 10:13:35,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 602.88574 ± 95.028
2025-04-30 10:13:35,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [714.7009, 533.61145, 684.8557, 429.17267, 643.7311, 607.849, 454.9314, 623.7774, 625.8176, 710.41003]
2025-04-30 10:13:35,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 848.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 10:13:35,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (602.89) for latency ExtremeClogL1U23
2025-04-30 10:13:35,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-04-30 10:13:35,348 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 10:13:35,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 6/100 (estimated time remaining: 33 hours, 3 minutes, 49 seconds)
2025-04-30 10:30:00,274 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 10:30:00,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 10:36:21,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 600.41986 ± 186.382
2025-04-30 10:36:21,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [186.71214, 722.8434, 507.01666, 699.50793, 730.56226, 700.3906, 479.35092, 771.7069, 792.47125, 413.63678]
2025-04-30 10:36:21,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [203.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 497.0]
2025-04-30 10:36:21,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 7/100 (estimated time remaining: 34 hours, 33 minutes, 25 seconds)
2025-04-30 10:53:07,998 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 10:53:08,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 10:56:58,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 500.01495 ± 338.524
2025-04-30 10:56:58,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [564.14655, 408.63608, 1092.2339, 16.607706, 861.89746, 339.54648, 105.89889, 409.0975, 923.84436, 278.24097]
2025-04-30 10:56:58,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [513.0, 370.0, 1000.0, 18.0, 1000.0, 296.0, 101.0, 1000.0, 1000.0, 250.0]
2025-04-30 10:56:58,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 8/100 (estimated time remaining: 34 hours, 19 minutes, 41 seconds)
2025-04-30 11:12:07,030 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 11:12:07,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 11:15:07,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 332.02805 ± 269.559
2025-04-30 11:15:07,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [522.5378, 228.20403, 909.53845, 163.18405, 226.45996, 594.3403, 11.956165, 132.60655, 473.76477, 57.6887]
2025-04-30 11:15:07,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 168.0, 1000.0, 114.0, 154.0, 475.0, 14.0, 112.0, 1000.0, 142.0]
2025-04-30 11:15:07,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 9/100 (estimated time remaining: 32 hours, 59 minutes, 5 seconds)
2025-04-30 11:32:30,776 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 11:32:30,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 11:37:36,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 766.01428 ± 445.025
2025-04-30 11:37:36,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1070.882, 465.40567, 1357.6365, 257.6857, 330.64905, 209.53757, 1248.1036, 1217.492, 395.8, 1106.9504]
2025-04-30 11:37:36,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 155.0, 196.0, 166.0, 1000.0, 1000.0, 334.0, 931.0]
2025-04-30 11:37:36,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (766.01) for latency ExtremeClogL1U23
2025-04-30 11:37:36,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-04-30 11:37:36,245 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 11:37:36,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 10/100 (estimated time remaining: 32 hours, 18 minutes, 6 seconds)
2025-04-30 11:52:32,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 11:52:32,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 11:57:31,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 409.12122 ± 221.301
2025-04-30 11:57:31,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [15.981877, 282.76324, 422.38608, 652.01996, 68.19456, 708.1125, 630.7825, 447.84225, 468.79874, 394.33032]
2025-04-30 11:57:31,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [29.0, 203.0, 1000.0, 1000.0, 42.0, 447.0, 1000.0, 1000.0, 366.0, 1000.0]
2025-04-30 11:57:31,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 11/100 (estimated time remaining: 31 hours, 10 minutes, 43 seconds)
2025-04-30 12:12:48,947 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 12:12:48,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 12:19:28,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1128.21313 ± 415.978
2025-04-30 12:19:28,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [847.6542, 1693.3037, 619.54004, 1600.6049, 683.9984, 1685.7373, 1171.5342, 700.29626, 1419.9298, 859.5321]
2025-04-30 12:19:28,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 417.0, 1000.0, 714.0, 411.0, 942.0, 662.0]
2025-04-30 12:19:28,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (1128.21) for latency ExtremeClogL1U23
2025-04-30 12:19:28,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-04-30 12:19:28,634 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 12:19:28,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 12/100 (estimated time remaining: 30 hours, 35 minutes, 25 seconds)
2025-04-30 12:35:36,915 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 12:35:36,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 12:40:49,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 642.81982 ± 384.960
2025-04-30 12:40:49,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [631.91235, 384.3622, 802.208, 330.1467, 1635.9125, 768.18335, 401.19006, 820.278, 310.4603, 343.5444]
2025-04-30 12:40:49,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 484.0, 1000.0, 1000.0, 1000.0, 242.0, 430.0, 164.0, 177.0]
2025-04-30 12:40:49,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 13/100 (estimated time remaining: 30 hours, 27 minutes, 49 seconds)
2025-04-30 12:58:23,577 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 12:58:23,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 13:04:14,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 686.28845 ± 476.789
2025-04-30 13:04:14,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [757.14325, 894.23914, 85.07904, 1200.4146, 346.38403, 271.0834, 374.34985, 624.7142, 533.7245, 1775.7527]
2025-04-30 13:04:14,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 451.0, 50.0, 595.0, 1000.0, 119.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 13:04:14,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 14/100 (estimated time remaining: 31 hours, 38 minutes, 33 seconds)
2025-04-30 13:19:44,521 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 13:19:44,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 13:25:07,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 784.36975 ± 355.621
2025-04-30 13:25:07,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1632.8798, 476.5548, 895.74664, 548.0586, 912.7629, 724.2257, 375.56436, 753.3378, 1072.5538, 452.0139]
2025-04-30 13:25:07,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 225.0, 446.0, 281.0, 1000.0, 1000.0, 1000.0, 1000.0, 567.0, 267.0]
2025-04-30 13:25:07,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 15/100 (estimated time remaining: 30 hours, 49 minutes, 12 seconds)
2025-04-30 13:41:00,104 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 13:41:00,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 13:44:46,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 676.17444 ± 429.848
2025-04-30 13:44:46,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [456.31082, 447.5571, 1344.9701, 528.45013, 590.23236, 49.49457, 1034.4211, 932.5701, 104.29094, 1273.4469]
2025-04-30 13:44:46,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [277.0, 1000.0, 613.0, 1000.0, 316.0, 27.0, 532.0, 1000.0, 61.0, 547.0]
2025-04-30 13:44:46,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 16/100 (estimated time remaining: 30 hours, 23 minutes, 26 seconds)
2025-04-30 14:02:29,134 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 14:02:29,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 14:07:17,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1021.82373 ± 854.459
2025-04-30 14:07:17,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [198.39304, 2170.3862, 2499.3853, 48.84129, 2054.3586, 158.26097, 560.2403, 683.4052, 836.79346, 1008.17316]
2025-04-30 14:07:17,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [100.0, 1000.0, 1000.0, 30.0, 1000.0, 66.0, 273.0, 1000.0, 1000.0, 1000.0]
2025-04-30 14:07:17,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 17/100 (estimated time remaining: 30 hours, 11 minutes, 16 seconds)
2025-04-30 14:22:58,531 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 14:22:58,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 14:29:12,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1834.07690 ± 744.270
2025-04-30 14:29:12,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2084.6753, 2414.2048, 72.1919, 2416.1953, 1651.8611, 2489.6316, 2311.0884, 1638.1698, 2296.9126, 965.8392]
2025-04-30 14:29:12,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 34.0, 1000.0, 763.0, 1000.0, 1000.0, 674.0, 1000.0, 1000.0]
2025-04-30 14:29:12,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (1834.08) for latency ExtremeClogL1U23
2025-04-30 14:29:12,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-04-30 14:29:12,656 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 14:29:12,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 18/100 (estimated time remaining: 29 hours, 59 minutes, 6 seconds)
2025-04-30 14:46:47,734 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 14:46:47,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 14:52:15,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1637.94727 ± 966.964
2025-04-30 14:52:15,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [626.859, 2360.4255, 2398.789, 2474.8835, 2275.2637, 166.35114, 2415.2383, 795.5063, 302.37656, 2563.7795]
2025-04-30 14:52:15,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 87.0, 1000.0, 323.0, 113.0, 1000.0]
2025-04-30 14:52:15,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 19/100 (estimated time remaining: 29 hours, 31 minutes, 17 seconds)
2025-04-30 15:07:31,678 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 15:07:31,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 15:11:25,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1011.75861 ± 985.676
2025-04-30 15:11:25,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [93.47733, 107.85343, 2416.4707, 84.00737, 831.6527, 535.56903, 465.956, 577.81274, 2628.7915, 2375.9954]
2025-04-30 15:11:25,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [49.0, 45.0, 1000.0, 43.0, 349.0, 1000.0, 1000.0, 290.0, 1000.0, 858.0]
2025-04-30 15:11:25,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 20/100 (estimated time remaining: 28 hours, 42 minutes, 16 seconds)
2025-04-30 15:27:43,845 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 15:27:43,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 15:32:43,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1256.39661 ± 916.630
2025-04-30 15:32:43,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1912.4088, 640.75604, 51.109535, 560.86664, 887.43146, 387.69705, 2923.8572, 1126.644, 2649.2935, 1423.9011]
2025-04-30 15:32:43,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 259.0, 27.0, 1000.0, 419.0, 264.0, 1000.0, 1000.0, 1000.0, 531.0]
2025-04-30 15:32:43,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 21/100 (estimated time remaining: 28 hours, 47 minutes, 9 seconds)
2025-04-30 15:49:26,256 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 15:49:26,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 15:54:16,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1347.94141 ± 858.232
2025-04-30 15:54:16,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2210.7363, 2459.3, 1534.1014, 1633.3317, 2640.4753, 77.43296, 1182.368, 823.97797, 684.9234, 232.76746]
2025-04-30 15:54:16,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 782.0, 551.0, 608.0, 961.0, 41.0, 402.0, 306.0, 1000.0, 94.0]
2025-04-30 15:54:16,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 22/100 (estimated time remaining: 28 hours, 10 minutes, 23 seconds)
2025-04-30 16:10:25,380 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 16:10:25,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 16:16:29,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1690.12915 ± 1196.297
2025-04-30 16:16:29,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3087.5144, 2663.496, 95.704506, 429.14044, 2819.1116, 807.7989, 3093.7803, 2664.0896, 525.5645, 715.0899]
2025-04-30 16:16:29,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 49.0, 139.0, 1000.0, 325.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 16:16:29,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 23/100 (estimated time remaining: 27 hours, 53 minutes, 37 seconds)
2025-04-30 16:32:05,830 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 16:32:05,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 16:36:49,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1013.88544 ± 749.076
2025-04-30 16:36:49,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [579.9385, 239.21408, 2499.6682, 753.8935, 1823.1014, 951.4072, 668.6799, 22.836966, 1858.4943, 741.61993]
2025-04-30 16:36:49,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 86.0, 1000.0, 234.0, 630.0, 1000.0, 1000.0, 22.0, 578.0, 239.0]
2025-04-30 16:36:49,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 24/100 (estimated time remaining: 26 hours, 50 minutes, 31 seconds)
2025-04-30 16:54:04,415 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 16:54:04,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 17:00:16,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2247.94043 ± 904.527
2025-04-30 17:00:16,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2830.0886, 2416.2258, 897.93823, 3077.8882, 1721.5258, 3287.4424, 2625.4548, 521.58044, 1985.2092, 3116.051]
2025-04-30 17:00:16,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 772.0, 312.0, 1000.0, 578.0, 1000.0, 1000.0, 201.0, 1000.0, 1000.0]
2025-04-30 17:00:16,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (2247.94) for latency ExtremeClogL1U23
2025-04-30 17:00:16,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-04-30 17:00:16,025 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 17:00:16,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 25/100 (estimated time remaining: 27 hours, 34 minutes, 19 seconds)
2025-04-30 17:15:44,284 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 17:15:44,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 17:19:22,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1048.40259 ± 804.630
2025-04-30 17:19:22,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2281.5513, 406.65414, 2229.2117, 988.0658, 200.84113, 141.80746, 1512.6708, 178.93666, 756.4227, 1787.8639]
2025-04-30 17:19:22,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 117.0, 893.0, 279.0, 101.0, 73.0, 762.0, 62.0, 388.0, 1000.0]
2025-04-30 17:19:22,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 26/100 (estimated time remaining: 26 hours, 39 minutes, 50 seconds)
2025-04-30 17:33:39,467 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 17:33:39,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 17:38:53,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2098.58423 ± 894.041
2025-04-30 17:38:53,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [915.6011, 2282.767, 3103.7656, 2791.2812, 1772.9675, 2025.6865, 2305.9482, 3298.3164, 2249.184, 240.32483]
2025-04-30 17:38:53,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [277.0, 1000.0, 918.0, 902.0, 580.0, 699.0, 1000.0, 1000.0, 719.0, 95.0]
2025-04-30 17:38:53,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 27/100 (estimated time remaining: 25 hours, 48 minutes, 19 seconds)
2025-04-30 17:54:07,466 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 17:54:07,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 17:59:32,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2290.57227 ± 1095.245
2025-04-30 17:59:32,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3329.6348, 3133.5037, 2955.27, 367.60098, 778.52246, 2003.2502, 3220.7878, 1010.2977, 3173.0781, 2933.7766]
2025-04-30 17:59:32,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 137.0, 292.0, 812.0, 1000.0, 406.0, 1000.0, 1000.0]
2025-04-30 17:59:32,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (2290.57) for latency ExtremeClogL1U23
2025-04-30 17:59:32,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-04-30 17:59:32,998 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 17:59:33,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 28/100 (estimated time remaining: 25 hours, 4 minutes, 36 seconds)
2025-04-30 18:14:27,824 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 18:14:27,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 18:18:43,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1885.72388 ± 934.032
2025-04-30 18:18:43,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1644.7208, 2352.8213, 288.6171, 2802.4146, 1163.3154, 2363.8054, 2462.707, 1467.9293, 3507.012, 803.89594]
2025-04-30 18:18:43,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [540.0, 678.0, 122.0, 1000.0, 392.0, 691.0, 892.0, 421.0, 1000.0, 225.0]
2025-04-30 18:18:43,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 29/100 (estimated time remaining: 24 hours, 27 minutes, 17 seconds)
2025-04-30 18:33:08,009 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 18:33:08,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 18:37:20,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1769.08337 ± 1159.865
2025-04-30 18:37:20,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [251.02272, 749.72296, 2253.145, 774.98975, 3420.3508, 3097.6362, 2091.906, 245.54774, 1639.9517, 3166.5608]
2025-04-30 18:37:20,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [95.0, 419.0, 733.0, 225.0, 1000.0, 1000.0, 678.0, 103.0, 1000.0, 1000.0]
2025-04-30 18:37:20,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 30/100 (estimated time remaining: 22 hours, 58 minutes, 23 seconds)
2025-04-30 18:51:00,110 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 18:51:00,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 18:54:55,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 669.11230 ± 466.350
2025-04-30 18:54:55,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [480.46515, 1655.3053, 563.85803, 417.32407, 430.76108, 1019.18604, 78.393196, 70.04124, 958.36707, 1017.422]
2025-04-30 18:54:55,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [142.0, 481.0, 1000.0, 147.0, 1000.0, 1000.0, 43.0, 64.0, 1000.0, 275.0]
2025-04-30 18:54:55,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 31/100 (estimated time remaining: 22 hours, 17 minutes, 35 seconds)
2025-04-30 19:09:46,555 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 19:09:46,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 19:16:42,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2829.85889 ± 1110.552
2025-04-30 19:16:42,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1127.4735, 262.0391, 3728.236, 2979.4194, 3575.96, 3238.826, 3018.9358, 3286.4956, 3669.9944, 3411.209]
2025-04-30 19:16:42,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [311.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 866.0, 1000.0, 1000.0, 1000.0]
2025-04-30 19:16:42,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (2829.86) for latency ExtremeClogL1U23
2025-04-30 19:16:42,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-04-30 19:16:42,181 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 19:16:42,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 32/100 (estimated time remaining: 22 hours, 29 minutes, 45 seconds)
2025-04-30 19:29:39,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 19:29:39,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 19:34:12,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1769.64807 ± 1290.944
2025-04-30 19:34:12,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1392.0226, 2310.4456, 665.9848, 3734.2185, 3398.7253, 210.89548, 3300.7205, 112.443436, 1790.3944, 780.6299]
2025-04-30 19:34:12,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [399.0, 648.0, 1000.0, 1000.0, 937.0, 62.0, 919.0, 55.0, 544.0, 201.0]
2025-04-30 19:34:12,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 33/100 (estimated time remaining: 21 hours, 27 minutes, 23 seconds)
2025-04-30 19:47:41,278 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 19:47:41,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 19:50:40,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1313.55334 ± 1144.505
2025-04-30 19:50:40,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3677.4387, 14.026641, 1167.1166, 2592.534, 734.32166, 713.2134, 198.08191, 909.0491, 581.30975, 2548.4424]
2025-04-30 19:50:40,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 17.0, 268.0, 859.0, 274.0, 187.0, 116.0, 279.0, 193.0, 793.0]
2025-04-30 19:50:40,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 34/100 (estimated time remaining: 20 hours, 32 minutes, 13 seconds)
2025-04-30 20:05:21,679 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 20:05:21,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 20:10:47,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2458.94336 ± 1292.108
2025-04-30 20:10:47,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3455.664, 1849.4004, 3729.458, 3619.8982, 2486.678, 4056.998, 2883.8923, 270.01056, 1909.1069, 328.3261]
2025-04-30 20:10:47,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 467.0, 1000.0, 1000.0, 665.0, 1000.0, 757.0, 1000.0, 512.0, 119.0]
2025-04-30 20:10:47,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 35/100 (estimated time remaining: 20 hours, 33 minutes, 30 seconds)
2025-04-30 20:24:52,115 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 20:24:52,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 20:28:50,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1889.08069 ± 1264.174
2025-04-30 20:28:50,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2104.3298, 1770.4474, 39.35951, 842.7115, 3797.2732, 614.549, 2100.5806, 917.0114, 2693.6978, 4010.8481]
2025-04-30 20:28:50,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 507.0, 27.0, 287.0, 1000.0, 159.0, 598.0, 279.0, 716.0, 1000.0]
2025-04-30 20:28:50,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 36/100 (estimated time remaining: 20 hours, 20 minutes, 57 seconds)
2025-04-30 20:42:42,781 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 20:42:42,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 20:46:39,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2067.67261 ± 1087.229
2025-04-30 20:46:39,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1590.654, 1537.186, 1974.9878, 3748.8481, 3482.7476, 1918.5997, 1001.21387, 61.869965, 3200.3503, 2160.2703]
2025-04-30 20:46:39,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [418.0, 366.0, 448.0, 1000.0, 1000.0, 495.0, 368.0, 30.0, 897.0, 539.0]
2025-04-30 20:46:39,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 37/100 (estimated time remaining: 19 hours, 11 minutes, 21 seconds)
2025-04-30 20:59:48,707 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 20:59:48,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 21:05:17,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2787.30908 ± 965.683
2025-04-30 21:05:17,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2574.2646, 4181.1313, 3610.936, 2561.1042, 2389.253, 3676.8296, 3110.7983, 1139.882, 3408.115, 1220.7758]
2025-04-30 21:05:17,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 768.0, 567.0, 1000.0, 832.0, 337.0, 866.0, 339.0]
2025-04-30 21:05:17,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 38/100 (estimated time remaining: 19 hours, 7 minutes, 39 seconds)
2025-04-30 21:19:18,898 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 21:19:18,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 21:24:28,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2226.72168 ± 1658.554
2025-04-30 21:24:28,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3553.4504, 252.42459, 3869.111, 2776.3076, 4050.6445, 4130.979, 311.98203, 314.60764, 175.30252, 2832.406]
2025-04-30 21:24:28,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 756.0, 1000.0, 1000.0, 131.0, 161.0, 62.0, 1000.0]
2025-04-30 21:24:28,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 39/100 (estimated time remaining: 19 hours, 22 minutes, 58 seconds)
2025-04-30 21:38:47,896 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 21:38:47,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 21:44:49,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2622.49414 ± 1385.159
2025-04-30 21:44:49,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [641.69006, 739.98737, 3859.3806, 3735.6228, 3459.269, 3647.4856, 3702.2485, 363.7658, 3579.4592, 2496.0332]
2025-04-30 21:44:49,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 202.0, 1000.0, 1000.0, 1000.0, 926.0, 1000.0, 158.0, 1000.0, 751.0]
2025-04-30 21:44:49,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 40/100 (estimated time remaining: 19 hours, 7 minutes, 18 seconds)
2025-04-30 21:58:27,399 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 21:58:27,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 22:04:07,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2466.73975 ± 1358.450
2025-04-30 22:04:07,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [837.5689, 4054.8672, 2945.0808, 4144.084, 1083.0283, 1306.906, 3503.8113, 3984.5957, 586.4288, 2221.023]
2025-04-30 22:04:07,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 266.0, 360.0, 1000.0, 1000.0, 154.0, 565.0]
2025-04-30 22:04:07,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 41/100 (estimated time remaining: 19 hours, 3 minutes, 17 seconds)
2025-04-30 22:18:26,962 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 22:18:26,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 22:23:33,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2538.44727 ± 1641.641
2025-04-30 22:23:33,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3684.391, 4252.832, 3300.4402, 943.74536, 47.78375, 4127.028, 166.74942, 1193.3329, 3667.9197, 4000.2512]
2025-04-30 22:23:33,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 797.0, 231.0, 50.0, 1000.0, 60.0, 342.0, 990.0, 1000.0]
2025-04-30 22:23:33,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 42/100 (estimated time remaining: 19 hours, 3 minutes, 28 seconds)
2025-04-30 22:36:46,574 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 22:36:46,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 22:41:31,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2135.45166 ± 1522.522
2025-04-30 22:41:31,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1663.0352, 454.217, 19.441277, 243.18536, 1534.8871, 4510.989, 3878.4473, 3459.6812, 3246.163, 2344.471]
2025-04-30 22:41:31,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [506.0, 151.0, 20.0, 82.0, 1000.0, 1000.0, 1000.0, 850.0, 1000.0, 599.0]
2025-04-30 22:41:31,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 43/100 (estimated time remaining: 18 hours, 36 minutes, 16 seconds)
2025-04-30 22:55:09,666 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 22:55:09,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 23:01:26,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2600.04712 ± 1530.568
2025-04-30 23:01:26,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4148.8794, 3777.2827, 202.89789, 2084.685, 3944.4531, 1293.4292, 4004.163, 44.032185, 3868.612, 2632.036]
2025-04-30 23:01:26,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 581.0, 1000.0, 335.0, 1000.0, 1000.0, 1000.0, 641.0]
2025-04-30 23:01:26,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 44/100 (estimated time remaining: 18 hours, 25 minutes, 22 seconds)
2025-04-30 23:16:15,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 23:16:15,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 23:21:33,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2874.98828 ± 1339.346
2025-04-30 23:21:33,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4202.435, 3904.7961, 3390.823, 1793.9689, 1077.1226, 1657.8925, 654.34174, 3856.665, 3960.7031, 4251.1343]
2025-04-30 23:21:33,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 846.0, 572.0, 284.0, 488.0, 178.0, 1000.0, 1000.0, 1000.0]
2025-04-30 23:21:33,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (2874.99) for latency ExtremeClogL1U23
2025-04-30 23:21:33,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-04-30 23:21:33,579 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 23:21:33,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 45/100 (estimated time remaining: 18 hours, 3 minutes, 25 seconds)
2025-04-30 23:35:36,700 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 23:35:36,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 23:40:10,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2081.28857 ± 1706.923
2025-04-30 23:40:10,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4105.9126, 9.793913, 3600.6863, 31.49066, 4189.4824, 287.7017, 362.23288, 3127.9143, 3570.1594, 1527.5122]
2025-04-30 23:40:10,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 13.0, 1000.0, 21.0, 1000.0, 1000.0, 114.0, 902.0, 923.0, 435.0]
2025-04-30 23:40:10,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 46/100 (estimated time remaining: 17 hours, 36 minutes, 42 seconds)
2025-04-30 23:54:08,320 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 23:54:08,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 23:59:50,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2928.82300 ± 1351.300
2025-04-30 23:59:50,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2798.2598, 3960.7852, 691.7578, 3043.7612, 3924.2405, 663.80945, 3992.2695, 1756.7616, 4508.849, 3947.737]
2025-04-30 23:59:50,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [660.0, 881.0, 1000.0, 830.0, 1000.0, 239.0, 1000.0, 405.0, 1000.0, 1000.0]
2025-04-30 23:59:50,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (2928.82) for latency ExtremeClogL1U23
2025-04-30 23:59:50,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-04-30 23:59:50,583 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 23:59:50,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 47/100 (estimated time remaining: 17 hours, 19 minutes, 53 seconds)
2025-05-01 00:13:34,504 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 00:13:34,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 00:18:35,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2546.48779 ± 1480.596
2025-05-01 00:18:35,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3983.4204, 4087.0876, 4334.405, 3074.1523, 242.7749, 2436.2227, 766.6248, 1356.9512, 1159.4679, 4023.7715]
2025-05-01 00:18:35,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 726.0, 73.0, 645.0, 1000.0, 348.0, 314.0, 1000.0]
2025-05-01 00:18:35,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 48/100 (estimated time remaining: 17 hours, 8 minutes, 53 seconds)
2025-05-01 00:33:08,001 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 00:33:08,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 00:39:10,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3090.74878 ± 1517.831
2025-05-01 00:39:10,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3235.5498, 4322.0874, 831.07666, 886.56836, 4240.3076, 3956.6257, 4134.6597, 743.3483, 4131.989, 4425.2783]
2025-05-01 00:39:10,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 249.0, 1000.0, 1000.0, 1000.0, 1000.0, 204.0, 1000.0, 1000.0]
2025-05-01 00:39:10,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (3090.75) for latency ExtremeClogL1U23
2025-05-01 00:39:10,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-01 00:39:10,457 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 00:39:10,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 49/100 (estimated time remaining: 16 hours, 56 minutes, 30 seconds)
2025-05-01 00:53:13,578 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 00:53:13,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 00:59:41,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3214.24170 ± 1053.657
2025-05-01 00:59:41,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2999.0337, 3188.947, 1729.5374, 860.4013, 3764.7302, 3436.2324, 3801.013, 4342.393, 3817.6274, 4202.5015]
2025-05-01 00:59:41,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 670.0, 470.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 923.0, 1000.0]
2025-05-01 00:59:41,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (3214.24) for latency ExtremeClogL1U23
2025-05-01 00:59:41,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-01 00:59:41,404 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 00:59:41,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 50/100 (estimated time remaining: 16 hours, 40 minutes, 55 seconds)
2025-05-01 01:13:44,410 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 01:13:44,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 01:19:20,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2297.81372 ± 1449.806
2025-05-01 01:19:20,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1126.7766, 1051.4187, 3813.9854, 1696.4332, 4208.57, 816.04987, 1288.2512, 4412.296, 3713.3972, 850.9619]
2025-05-01 01:19:20,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [318.0, 268.0, 1000.0, 1000.0, 1000.0, 1000.0, 380.0, 1000.0, 929.0, 1000.0]
2025-05-01 01:19:20,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 51/100 (estimated time remaining: 16 hours, 31 minutes, 39 seconds)
2025-05-01 01:32:51,683 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 01:32:51,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 01:36:36,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1591.35291 ± 1319.992
2025-05-01 01:36:36,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2540.6995, 792.7495, 684.16986, 741.5554, 3964.9797, 127.57442, 3707.3367, 1106.3097, 283.38312, 1964.7716]
2025-05-01 01:36:36,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [701.0, 208.0, 166.0, 203.0, 1000.0, 53.0, 1000.0, 1000.0, 79.0, 511.0]
2025-05-01 01:36:36,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 52/100 (estimated time remaining: 15 hours, 48 minutes, 16 seconds)
2025-05-01 01:49:56,755 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 01:49:56,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 01:54:30,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2386.63721 ± 1376.050
2025-05-01 01:54:30,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2281.7512, 3704.135, 2078.8684, 304.79996, 2425.7932, 1858.9198, 4291.4917, 38.239834, 2695.5454, 4186.8286]
2025-05-01 01:54:30,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [560.0, 1000.0, 492.0, 82.0, 544.0, 404.0, 1000.0, 25.0, 681.0, 1000.0]
2025-05-01 01:54:30,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 53/100 (estimated time remaining: 15 hours, 20 minutes, 51 seconds)
2025-05-01 02:08:00,313 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 02:08:00,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 02:12:49,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2438.53467 ± 1291.733
2025-05-01 02:12:49,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [893.2225, 3998.8909, 1231.3579, 2957.4456, 3146.314, 2855.3745, 3936.886, 3743.0884, 479.99304, 1142.7737]
2025-05-01 02:12:49,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [230.0, 1000.0, 276.0, 844.0, 784.0, 720.0, 1000.0, 1000.0, 137.0, 311.0]
2025-05-01 02:12:49,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 54/100 (estimated time remaining: 14 hours, 40 minutes, 14 seconds)
2025-05-01 02:27:03,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 02:27:03,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 02:32:21,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2965.57129 ± 1550.328
2025-05-01 02:32:21,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4438.425, 3817.5967, 4042.9739, 3546.6577, 1707.9819, 320.07672, 98.88842, 3489.054, 4164.049, 4030.01]
2025-05-01 02:32:21,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 827.0, 406.0, 78.0, 38.0, 1000.0, 948.0, 1000.0]
2025-05-01 02:32:21,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 55/100 (estimated time remaining: 14 hours, 12 minutes, 28 seconds)
2025-05-01 02:46:09,110 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 02:46:09,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 02:50:53,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2722.70728 ± 1657.820
2025-05-01 02:50:53,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4355.29, 3782.9553, 660.99603, 1017.0611, 3642.6514, 3718.4473, 4534.067, 1030.4973, 231.22878, 4253.879]
2025-05-01 02:50:53,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 207.0, 238.0, 1000.0, 812.0, 1000.0, 270.0, 64.0, 1000.0]
2025-05-01 02:50:53,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 56/100 (estimated time remaining: 13 hours, 43 minutes, 49 seconds)
2025-05-01 03:05:08,016 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 03:05:08,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 03:09:54,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1811.85583 ± 1530.328
2025-05-01 03:09:54,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2789.1606, 38.639793, 1332.9274, 419.491, 4306.3545, 25.615967, 146.76572, 2935.4531, 2524.821, 3599.33]
2025-05-01 03:09:54,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [640.0, 22.0, 368.0, 163.0, 1000.0, 1000.0, 1000.0, 694.0, 1000.0, 857.0]
2025-05-01 03:09:54,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 57/100 (estimated time remaining: 13 hours, 41 minutes, 1 second)
2025-05-01 03:24:31,644 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 03:24:31,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 03:31:06,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3747.55322 ± 854.839
2025-05-01 03:31:06,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4239.565, 3546.4766, 3838.9846, 3991.7505, 3674.918, 1308.6964, 4211.6875, 4058.7249, 4497.0776, 4107.65]
2025-05-01 03:31:06,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 333.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 03:31:06,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (3747.55) for latency ExtremeClogL1U23
2025-05-01 03:31:06,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-01 03:31:06,685 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 03:31:06,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 58/100 (estimated time remaining: 13 hours, 50 minutes, 45 seconds)
2025-05-01 03:44:37,035 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 03:44:37,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 03:50:41,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3497.17114 ± 1208.096
2025-05-01 03:50:41,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3969.9324, 4262.457, 1287.2323, 4572.5283, 3252.8572, 3809.4978, 1111.5094, 3821.907, 4498.8774, 4384.9116]
2025-05-01 03:50:41,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [953.0, 1000.0, 379.0, 1000.0, 1000.0, 1000.0, 251.0, 1000.0, 1000.0, 1000.0]
2025-05-01 03:50:41,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 59/100 (estimated time remaining: 13 hours, 42 minutes, 4 seconds)
2025-05-01 04:04:50,094 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 04:04:50,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 04:09:57,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3164.12524 ± 1321.330
2025-05-01 04:09:57,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3296.8245, 4325.1567, 3968.8926, 4601.465, 3989.3667, 299.76593, 2539.4094, 1568.3712, 4330.9087, 2721.0906]
2025-05-01 04:09:57,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [686.0, 1000.0, 905.0, 1000.0, 917.0, 86.0, 639.0, 386.0, 903.0, 603.0]
2025-05-01 04:09:57,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 60/100 (estimated time remaining: 13 hours, 20 minutes, 19 seconds)
2025-05-01 04:24:09,951 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 04:24:09,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 04:30:06,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3443.35742 ± 1392.328
2025-05-01 04:30:06,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4447.2407, 3823.85, 15.867302, 4278.708, 4490.9697, 3757.215, 4295.284, 3869.7847, 3880.298, 1574.3557]
2025-05-01 04:30:06,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 19.0, 1000.0, 1000.0, 1000.0, 1000.0, 929.0, 1000.0, 445.0]
2025-05-01 04:30:06,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 61/100 (estimated time remaining: 13 hours, 13 minutes, 46 seconds)
2025-05-01 04:43:57,135 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 04:43:57,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 04:48:07,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2369.78760 ± 1733.683
2025-05-01 04:48:07,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4343.107, 156.46623, 3486.4053, 4194.6826, 3865.5857, 128.76683, 1211.7085, 122.827934, 2097.2405, 4091.0862]
2025-05-01 04:48:07,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 79.0, 812.0, 1000.0, 889.0, 44.0, 286.0, 68.0, 559.0, 1000.0]
2025-05-01 04:48:07,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 62/100 (estimated time remaining: 12 hours, 46 minutes, 5 seconds)
2025-05-01 05:01:40,589 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 05:01:40,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 05:08:30,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3679.58667 ± 953.816
2025-05-01 05:08:30,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4302.331, 4202.4697, 1017.60583, 3455.5068, 3990.028, 4617.5264, 3771.9683, 3394.912, 4032.7783, 4010.739]
2025-05-01 05:08:30,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 265.0, 1000.0, 1000.0, 1000.0, 951.0, 818.0, 1000.0, 1000.0]
2025-05-01 05:08:30,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 63/100 (estimated time remaining: 12 hours, 20 minutes, 12 seconds)
2025-05-01 05:23:10,067 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 05:23:10,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 05:26:36,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1410.62268 ± 1429.492
2025-05-01 05:26:36,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1481.7694, 4540.1836, 572.62195, 63.583115, 2293.431, 3266.8635, 281.4323, 152.23265, 951.0794, 503.03055]
2025-05-01 05:26:36,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [397.0, 1000.0, 141.0, 34.0, 534.0, 761.0, 81.0, 1000.0, 277.0, 146.0]
2025-05-01 05:26:36,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 64/100 (estimated time remaining: 11 hours, 49 minutes, 53 seconds)
2025-05-01 05:39:03,761 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 05:39:03,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 05:43:32,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2342.00732 ± 1537.166
2025-05-01 05:43:32,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1075.7699, 612.85095, 4130.4766, 4304.3916, 408.71744, 3119.376, 3354.1216, 290.99597, 2231.9087, 3891.465]
2025-05-01 05:43:32,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [308.0, 213.0, 1000.0, 1000.0, 120.0, 747.0, 696.0, 105.0, 672.0, 1000.0]
2025-05-01 05:43:32,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 65/100 (estimated time remaining: 11 hours, 13 minutes, 47 seconds)
2025-05-01 05:58:34,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 05:58:34,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 06:03:42,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2709.84180 ± 1380.031
2025-05-01 06:03:42,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1430.5282, 4633.097, 1602.5865, 4046.3206, 4403.3364, 2555.2046, 2855.1814, 3730.7178, 1333.0707, 508.37534]
2025-05-01 06:03:42,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [366.0, 991.0, 452.0, 1000.0, 1000.0, 619.0, 1000.0, 1000.0, 324.0, 157.0]
2025-05-01 06:03:42,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 66/100 (estimated time remaining: 10 hours, 55 minutes, 12 seconds)
2025-05-01 06:17:25,494 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 06:17:25,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 06:22:54,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2446.97949 ± 1580.784
2025-05-01 06:22:54,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4662.141, 89.1732, 2691.1335, 4201.8315, 1390.6796, 805.38116, 2516.2395, 3250.5476, 4291.6914, 570.9763]
2025-05-01 06:22:54,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 31.0, 1000.0, 945.0, 294.0, 1000.0, 671.0, 826.0, 1000.0, 1000.0]
2025-05-01 06:22:54,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 67/100 (estimated time remaining: 10 hours, 44 minutes, 34 seconds)
2025-05-01 06:36:59,993 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 06:36:59,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 06:42:31,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3103.82202 ± 1335.121
2025-05-01 06:42:31,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4452.471, 1444.5051, 3631.7673, 4146.758, 1205.8477, 701.50964, 3570.2197, 4030.0825, 3726.3184, 4128.7427]
2025-05-01 06:42:31,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 359.0, 1000.0, 1000.0, 290.0, 188.0, 1000.0, 927.0, 1000.0, 1000.0]
2025-05-01 06:42:31,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 68/100 (estimated time remaining: 10 hours, 20 minutes, 29 seconds)
2025-05-01 06:56:46,173 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 06:56:46,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 07:03:00,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3521.75537 ± 1227.504
2025-05-01 07:03:00,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3488.6523, 4384.361, 4144.9146, 4048.182, 404.61487, 4005.7256, 4323.226, 3887.2075, 4448.3047, 2082.368]
2025-05-01 07:03:00,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 121.0, 1000.0, 1000.0, 1000.0, 1000.0, 541.0]
2025-05-01 07:03:00,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 69/100 (estimated time remaining: 10 hours, 16 minutes, 51 seconds)
2025-05-01 07:16:54,003 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 07:16:54,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 07:22:28,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2130.92578 ± 1320.567
2025-05-01 07:22:28,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [827.00964, 1622.622, 4167.699, 3812.8018, 2123.0098, 165.94055, 1427.0914, 909.4276, 3706.0222, 2547.6355]
2025-05-01 07:22:28,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 385.0, 939.0, 1000.0, 533.0, 1000.0, 1000.0, 230.0, 1000.0, 742.0]
2025-05-01 07:22:28,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 70/100 (estimated time remaining: 10 hours, 13 minutes, 27 seconds)
2025-05-01 07:36:13,817 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 07:36:13,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 07:41:45,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3439.89331 ± 1571.657
2025-05-01 07:41:45,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1556.5521, 4381.395, 4928.0083, 4630.017, 4542.27, 982.36536, 3980.2688, 4307.124, 709.98364, 4380.949]
2025-05-01 07:41:45,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [399.0, 1000.0, 1000.0, 1000.0, 1000.0, 228.0, 1000.0, 954.0, 189.0, 1000.0]
2025-05-01 07:41:45,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 71/100 (estimated time remaining: 9 hours, 48 minutes, 15 seconds)
2025-05-01 07:56:33,544 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 07:56:33,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 08:02:22,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3493.45264 ± 940.238
2025-05-01 08:02:22,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3594.6062, 2588.6543, 4331.857, 4387.9824, 3246.952, 1694.9572, 4494.332, 4328.6743, 3925.182, 2341.3298]
2025-05-01 08:02:22,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [758.0, 672.0, 990.0, 1000.0, 707.0, 513.0, 1000.0, 1000.0, 1000.0, 536.0]
2025-05-01 08:02:22,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 72/100 (estimated time remaining: 9 hours, 36 minutes, 52 seconds)
2025-05-01 08:15:12,660 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 08:15:12,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 08:20:21,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2766.27979 ± 1474.411
2025-05-01 08:20:21,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3376.84, 61.276867, 3986.172, 4298.7847, 1392.9159, 848.0778, 4329.2544, 2842.7864, 2340.3496, 4186.3394]
2025-05-01 08:20:21,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [742.0, 134.0, 878.0, 1000.0, 1000.0, 180.0, 1000.0, 674.0, 503.0, 1000.0]
2025-05-01 08:20:21,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 73/100 (estimated time remaining: 9 hours, 7 minutes, 52 seconds)
2025-05-01 08:34:58,294 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 08:34:58,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 08:40:09,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2789.20850 ± 1175.904
2025-05-01 08:40:09,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2398.2104, 2393.5046, 4314.9897, 4337.78, 3770.8352, 1819.2305, 3739.9475, 2170.509, 439.4086, 2507.67]
2025-05-01 08:40:09,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 526.0, 920.0, 1000.0, 1000.0, 375.0, 1000.0, 505.0, 122.0, 544.0]
2025-05-01 08:40:09,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 74/100 (estimated time remaining: 8 hours, 44 minutes, 37 seconds)
2025-05-01 08:52:49,402 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 08:52:49,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 08:57:54,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2340.30078 ± 1238.360
2025-05-01 08:57:54,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4203.401, 1736.2966, 1715.1869, 3065.5378, 1381.0988, 308.47977, 4044.6733, 2527.8774, 1078.7703, 3341.6865]
2025-05-01 08:57:54,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 452.0, 1000.0, 1000.0, 348.0, 92.0, 1000.0, 550.0, 247.0, 738.0]
2025-05-01 08:57:54,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 75/100 (estimated time remaining: 8 hours, 16 minutes, 13 seconds)
2025-05-01 09:12:08,811 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 09:12:08,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 09:19:03,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3655.08203 ± 1113.880
2025-05-01 09:19:03,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4534.8496, 4166.2637, 3061.9954, 4106.6177, 4065.0542, 3906.4927, 4049.7864, 4249.3545, 3918.6538, 491.75406]
2025-05-01 09:19:03,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 593.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 133.0]
2025-05-01 09:19:03,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 76/100 (estimated time remaining: 8 hours, 6 minutes, 33 seconds)
2025-05-01 09:32:04,837 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 09:32:04,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 09:38:09,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2712.35010 ± 1430.614
2025-05-01 09:38:09,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2099.2178, 4299.765, 3963.9507, 2056.1265, 3951.4885, 1539.3083, 4237.4375, 899.7489, 3790.6294, 285.82742]
2025-05-01 09:38:09,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [509.0, 1000.0, 1000.0, 1000.0, 1000.0, 379.0, 1000.0, 1000.0, 903.0, 96.0]
2025-05-01 09:38:09,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 77/100 (estimated time remaining: 7 hours, 39 minutes, 45 seconds)
2025-05-01 09:51:58,450 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 09:51:58,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 09:57:26,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3059.54590 ± 1222.650
2025-05-01 09:57:26,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4046.7625, 4195.0166, 1631.4789, 3975.195, 4128.7124, 1982.2164, 4055.742, 3853.115, 1109.8899, 1617.3306]
2025-05-01 09:57:26,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 338.0, 873.0, 1000.0, 475.0, 1000.0, 1000.0, 250.0, 402.0]
2025-05-01 09:57:26,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 78/100 (estimated time remaining: 7 hours, 26 minutes, 36 seconds)
2025-05-01 10:11:38,630 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 10:11:38,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 10:17:53,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3676.56299 ± 1164.347
2025-05-01 10:17:53,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4430.1807, 4402.0923, 4469.7524, 4758.3433, 3783.054, 4389.757, 4247.0664, 3210.5737, 2011.0721, 1063.7382]
2025-05-01 10:17:53,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 992.0, 991.0, 1000.0, 1000.0, 1000.0, 1000.0, 798.0, 516.0, 233.0]
2025-05-01 10:17:53,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 79/100 (estimated time remaining: 7 hours, 10 minutes, 4 seconds)
2025-05-01 10:31:21,923 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 10:31:21,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 10:35:42,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2171.99146 ± 1825.492
2025-05-01 10:35:42,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2420.694, 4424.8594, 1194.4469, 447.23694, 3883.7844, 4670.372, 513.40906, 4030.1653, -39.095917, 174.04091]
2025-05-01 10:35:42,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [511.0, 1000.0, 282.0, 100.0, 1000.0, 1000.0, 159.0, 927.0, 1000.0, 114.0]
2025-05-01 10:35:42,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 80/100 (estimated time remaining: 6 hours, 50 minutes, 45 seconds)
2025-05-01 10:50:06,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 10:50:06,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 10:55:14,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3115.96997 ± 1258.583
2025-05-01 10:55:14,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1158.3932, 1375.0325, 2359.1023, 2756.1707, 2185.2693, 3954.031, 3634.36, 4478.289, 4638.0933, 4620.9575]
2025-05-01 10:55:14,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [232.0, 335.0, 515.0, 608.0, 499.0, 1000.0, 895.0, 1000.0, 1000.0, 1000.0]
2025-05-01 10:55:14,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 81/100 (estimated time remaining: 6 hours, 24 minutes, 43 seconds)
2025-05-01 11:08:43,334 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 11:08:43,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 11:15:05,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3820.14258 ± 1023.629
2025-05-01 11:15:05,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3682.5476, 4279.6943, 4700.7183, 4320.175, 1664.6443, 4482.557, 3965.4836, 4483.396, 2054.4517, 4567.7573]
2025-05-01 11:15:05,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 894.0, 502.0, 1000.0, 1000.0, 1000.0, 516.0, 1000.0]
2025-05-01 11:15:05,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (3820.14) for latency ExtremeClogL1U23
2025-05-01 11:15:05,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-01 11:15:05,418 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 11:15:05,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 82/100 (estimated time remaining: 6 hours, 8 minutes, 21 seconds)
2025-05-01 11:29:10,145 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 11:29:10,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 11:32:45,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2171.63037 ± 1728.725
2025-05-01 11:32:45,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2181.0522, 65.08577, 4483.708, 2725.5583, 4461.157, 639.82635, 561.4198, 1785.5544, 243.80177, 4569.138]
2025-05-01 11:32:45,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [465.0, 32.0, 1000.0, 800.0, 1000.0, 167.0, 130.0, 443.0, 91.0, 1000.0]
2025-05-01 11:32:45,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 43 minutes, 7 seconds)
2025-05-01 11:47:05,888 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 11:47:05,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 11:53:33,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3198.52759 ± 1508.917
2025-05-01 11:53:33,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2300.768, -171.96115, 4183.9365, 4694.9585, 1433.1493, 3783.0725, 4378.7275, 3703.3594, 2984.123, 4695.1426]
2025-05-01 11:53:33,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [673.0, 1000.0, 1000.0, 1000.0, 351.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 11:53:33,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 84/100 (estimated time remaining: 5 hours, 25 minutes, 14 seconds)
2025-05-01 12:08:14,921 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 12:08:14,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 12:13:20,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2937.65088 ± 1527.568
2025-05-01 12:13:20,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4220.0874, 1716.5494, 2931.5806, 539.9918, 4072.5264, 4584.197, 4488.795, 4378.717, 1019.9769, 1424.0905]
2025-05-01 12:13:20,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 503.0, 663.0, 262.0, 1000.0, 1000.0, 1000.0, 1000.0, 300.0, 391.0]
2025-05-01 12:13:20,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 85/100 (estimated time remaining: 5 hours, 12 minutes, 24 seconds)
2025-05-01 12:27:11,680 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 12:27:11,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 12:31:08,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1816.18823 ± 1660.847
2025-05-01 12:31:08,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4349.777, 158.41808, 4346.388, 149.78534, 689.4674, 977.99963, 1618.6279, 399.39093, 1407.957, 4064.0718]
2025-05-01 12:31:08,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 58.0, 1000.0, 58.0, 164.0, 238.0, 353.0, 1000.0, 332.0, 1000.0]
2025-05-01 12:31:08,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 47 minutes, 42 seconds)
2025-05-01 12:43:47,177 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 12:43:47,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 12:50:25,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3280.39697 ± 1301.734
2025-05-01 12:50:25,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4566.961, 4596.5464, 4330.4478, 1827.192, 1147.637, 4655.5103, 3874.2048, 3453.3628, 1440.5505, 2911.5554]
2025-05-01 12:50:25,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 458.0, 1000.0, 1000.0, 840.0, 1000.0, 352.0, 750.0]
2025-05-01 12:50:25,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 26 minutes, 56 seconds)
2025-05-01 13:03:46,113 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 13:03:46,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 13:07:01,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 1452.44299 ± 1602.761
2025-05-01 13:07:01,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4092.2815, 1521.9495, 207.94727, 1907.853, 327.90027, 138.83109, 58.0054, 4687.5464, 1375.5704, 206.54459]
2025-05-01 13:07:01,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 481.0, 64.0, 429.0, 100.0, 55.0, 29.0, 1000.0, 1000.0, 66.0]
2025-05-01 13:07:01,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 88/100 (estimated time remaining: 4 hours, 5 minutes, 6 seconds)
2025-05-01 13:21:09,376 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 13:21:09,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 13:28:08,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3525.19263 ± 1213.350
2025-05-01 13:28:08,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1487.3969, 2996.6462, 4263.955, 3922.0874, 4511.821, 4310.3496, 1118.391, 4272.65, 4807.503, 3561.1243]
2025-05-01 13:28:08,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 686.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 763.0]
2025-05-01 13:28:08,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 46 minutes, 59 seconds)
2025-05-01 13:42:38,088 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 13:42:38,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 13:48:20,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2993.37256 ± 1626.164
2025-05-01 13:48:20,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4004.628, 4385.4487, 4323.353, 1010.0714, 4228.857, 4387.2603, 1657.3975, 547.59546, 910.36847, 4478.748]
2025-05-01 13:48:20,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 244.0, 1000.0, 1000.0, 441.0, 142.0, 1000.0, 1000.0]
2025-05-01 13:48:20,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 29 minutes)
2025-05-01 14:02:28,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 14:02:28,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 14:09:03,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3733.48486 ± 1001.285
2025-05-01 14:09:03,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3835.3433, 1227.1729, 3984.1877, 4640.3633, 4505.8364, 4363.485, 4176.133, 2558.445, 4258.551, 3785.3325]
2025-05-01 14:09:03,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 334.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 584.0, 1000.0, 1000.0]
2025-05-01 14:09:03,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 91/100 (estimated time remaining: 3 hours, 15 minutes, 48 seconds)
2025-05-01 14:24:03,111 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 14:24:03,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 14:28:49,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2899.42261 ± 1441.260
2025-05-01 14:28:49,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [1886.1537, 4342.886, 4529.5635, 1568.3997, 1297.4565, 3452.2583, 2439.3262, 604.31085, 4159.4644, 4714.4067]
2025-05-01 14:28:49,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [394.0, 1000.0, 1000.0, 354.0, 278.0, 833.0, 515.0, 202.0, 1000.0, 1000.0]
2025-05-01 14:28:49,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 57 minutes, 6 seconds)
2025-05-01 14:46:12,440 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 14:46:12,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 14:52:17,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3491.07617 ± 1392.655
2025-05-01 14:52:17,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4623.0273, 887.45306, 4699.5864, 4757.231, 4273.4487, 4436.4473, 1990.2451, 3574.0537, 1484.8174, 4184.4526]
2025-05-01 14:52:17,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 383.0, 1000.0, 1000.0, 1000.0, 1000.0, 517.0, 1000.0, 360.0, 1000.0]
2025-05-01 14:52:17,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 48 minutes, 25 seconds)
2025-05-01 15:08:58,131 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 15:08:58,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 15:14:36,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3001.71045 ± 1360.417
2025-05-01 15:14:36,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4017.2942, 2116.523, 2799.825, 4322.638, 1999.5088, 3831.6653, 4741.0195, 4077.8794, 1907.955, 202.79784]
2025-05-01 15:14:36,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 622.0, 1000.0, 591.0, 1000.0, 1000.0, 1000.0, 445.0, 82.0]
2025-05-01 15:14:36,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 29 minutes, 3 seconds)
2025-05-01 15:31:26,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 15:31:26,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 15:38:13,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 4112.77783 ± 818.438
2025-05-01 15:38:13,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4295.9297, 4211.601, 4782.553, 4475.501, 1749.3165, 4722.1826, 4336.8667, 4340.1147, 4195.289, 4018.421]
2025-05-01 15:38:13,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 968.0, 1000.0, 1000.0, 377.0, 1000.0, 1000.0, 1000.0, 953.0, 1000.0]
2025-05-01 15:38:13,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1124 [INFO]: New best (4112.78) for latency ExtremeClogL1U23
2025-05-01 15:38:13,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1127 [INFO]: saving network
2025-05-01 15:38:13,652 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-mbpac-highdim-memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 15:38:13,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 95/100 (estimated time remaining: 2 hours, 11 minutes, 51 seconds)
2025-05-01 15:53:48,597 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 15:53:48,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 16:00:34,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3185.81934 ± 1036.867
2025-05-01 16:00:34,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [2116.094, 4172.4062, 3691.3098, 1419.7249, 3554.8264, 4204.1475, 1671.7562, 3880.4924, 4284.2153, 2863.2214]
2025-05-01 16:00:34,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [560.0, 1000.0, 1000.0, 1000.0, 869.0, 1000.0, 1000.0, 1000.0, 1000.0, 627.0]
2025-05-01 16:00:34,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 51 minutes, 30 seconds)
2025-05-01 16:18:14,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 16:18:14,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 16:24:31,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3425.17456 ± 1102.211
2025-05-01 16:24:31,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4442.4927, 4236.1978, 4264.074, 4201.1836, 1616.0638, 1621.0104, 3011.053, 4332.7915, 4150.7334, 2376.1453]
2025-05-01 16:24:31,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 416.0, 494.0, 788.0, 1000.0, 1000.0, 1000.0]
2025-05-01 16:24:31,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 32 minutes, 33 seconds)
2025-05-01 16:40:25,918 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 16:40:25,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 16:46:09,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2988.52295 ± 1468.955
2025-05-01 16:46:09,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4325.1733, 4341.55, 564.595, 2585.3296, 1841.4509, 4692.804, 3891.3875, 1611.9039, 1435.8115, 4595.222]
2025-05-01 16:46:09,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 618.0, 501.0, 1000.0, 1000.0, 490.0, 327.0, 1000.0]
2025-05-01 16:46:09,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 98/100 (estimated time remaining: 1 hour, 8 minutes, 18 seconds)
2025-05-01 17:04:03,543 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 17:04:03,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 17:08:28,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 2342.97144 ± 1561.285
2025-05-01 17:08:28,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [3906.124, 670.0882, 4282.5723, 3772.6267, 4320.6323, 149.1656, 2601.649, 807.0468, 903.73785, 2016.071]
2025-05-01 17:08:28,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 196.0, 1000.0, 875.0, 1000.0, 54.0, 595.0, 226.0, 220.0, 471.0]
2025-05-01 17:08:28,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 99/100 (estimated time remaining: 45 minutes, 33 seconds)
2025-05-01 17:26:31,440 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 17:26:31,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 17:34:56,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3181.12354 ± 1343.119
2025-05-01 17:34:56,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4413.9644, 2037.4062, 594.0425, 1337.4463, 4468.337, 4324.192, 3260.299, 4015.3767, 4371.4126, 2988.757]
2025-05-01 17:34:56,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 467.0, 176.0, 314.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 17:34:56,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1097 [INFO]: Iteration 100/100 (estimated time remaining: 23 minutes, 20 seconds)
2025-05-01 17:55:49,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 17:55:49,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 18:03:40,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1119 [DEBUG]: Total Reward: 3510.17725 ± 1627.997
2025-05-01 18:03:40,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1120 [DEBUG]: All rewards: [4285.6973, 4293.7295, 3588.054, 335.61703, 4336.028, 4720.9604, 4464.0547, 293.49753, 4741.3877, 4042.7463]
2025-05-01 18:03:40,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 936.0, 92.0, 1000.0, 1000.0, 1000.0, 91.0, 1000.0, 1000.0]
2025-05-01 18:03:40,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-ant):1149 [DEBUG]: Training session finished
