2025-09-11 23:28:06,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc10-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay
2025-09-11 23:28:06,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc10-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay
2025-09-11 23:28:06,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x146769ebe1d0>}
2025-09-11 23:28:06,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-11 23:28:06,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-11 23:28:06,988 baseline-mbpac-noiseperc10-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 23:28:06,988 baseline-mbpac-noiseperc10-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 23:28:06,996 baseline-mbpac-noiseperc10-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 23:28:07,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-11 23:28:07,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-11 23:39:01,383 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:39:01,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:43:43,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -329.40186 ± 18.418
2025-09-11 23:43:43,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-327.096, -306.61438, -313.19562, -357.67163, -346.7874, -311.5163, -324.68802, -309.19122, -351.2336, -346.02417]
2025-09-11 23:43:43,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:43:43,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (-329.40) for latency MM1Queue_a033_s075
2025-09-11 23:43:43,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 25 hours, 43 minutes, 19 seconds)
2025-09-11 23:55:40,833 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:55:40,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:00:25,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -224.49899 ± 37.651
2025-09-12 00:00:25,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-263.13507, -215.35303, -289.02057, -201.14082, -251.4965, -192.13652, -153.89865, -202.63803, -249.97977, -226.1911]
2025-09-12 00:00:25,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:00:25,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (-224.50) for latency MM1Queue_a033_s075
2025-09-12 00:00:25,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 26 hours, 21 minutes, 59 seconds)
2025-09-12 00:12:21,847 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:12:21,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:17:04,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1.04219 ± 63.926
2025-09-12 00:17:04,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [104.915436, 33.933273, -7.96998, -32.556084, 13.381866, -27.548988, 42.243443, -154.7491, 28.689604, 10.082406]
2025-09-12 00:17:04,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:17:04,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (1.04) for latency MM1Queue_a033_s075
2025-09-12 00:17:04,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 26 hours, 22 minutes, 15 seconds)
2025-09-12 00:29:00,615 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:29:00,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:33:41,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 210.58879 ± 59.001
2025-09-12 00:33:41,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [204.29094, 202.1396, 91.820114, 303.67645, 214.18236, 161.63707, 245.09378, 159.89755, 244.83102, 278.3189]
2025-09-12 00:33:41,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:33:41,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (210.59) for latency MM1Queue_a033_s075
2025-09-12 00:33:41,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 26 hours, 13 minutes, 22 seconds)
2025-09-12 00:45:29,925 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:45:29,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:50:09,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 484.79184 ± 174.522
2025-09-12 00:50:09,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-6.061472, 462.25464, 632.61914, 511.24136, 560.17303, 543.6986, 591.78864, 583.3484, 556.0759, 412.7805]
2025-09-12 00:50:09,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:50:09,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (484.79) for latency MM1Queue_a033_s075
2025-09-12 00:50:09,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 25 hours, 58 minutes, 28 seconds)
2025-09-12 01:01:55,874 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:01:55,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:06:34,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 931.80353 ± 196.715
2025-09-12 01:06:34,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [957.73145, 1033.8486, 980.02454, 1012.9691, 388.21066, 1000.2942, 799.8648, 1096.557, 973.6004, 1074.9347]
2025-09-12 01:06:34,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:06:34,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (931.80) for latency MM1Queue_a033_s075
2025-09-12 01:06:34,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 25 hours, 57 minutes, 34 seconds)
2025-09-12 01:18:19,391 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:18:19,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:23:00,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1294.91467 ± 164.438
2025-09-12 01:23:00,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1294.8972, 1579.303, 1331.6306, 972.1157, 1213.352, 1237.0468, 1361.8685, 1107.3842, 1419.6423, 1431.9058]
2025-09-12 01:23:00,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:23:00,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (1294.91) for latency MM1Queue_a033_s075
2025-09-12 01:23:00,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 25 hours, 36 minutes, 6 seconds)
2025-09-12 01:34:47,678 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:34:47,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:39:25,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1489.96680 ± 780.668
2025-09-12 01:39:25,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [44.833424, 2177.1025, 1768.1847, 1258.1602, -5.4975777, 2248.8433, 1648.632, 1834.3722, 1964.3386, 1960.6986]
2025-09-12 01:39:25,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:39:25,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (1489.97) for latency MM1Queue_a033_s075
2025-09-12 01:39:25,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 25 hours, 15 minutes, 28 seconds)
2025-09-12 01:51:14,356 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:51:14,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:55:53,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2086.01709 ± 702.477
2025-09-12 01:55:53,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2719.3665, 2445.5178, 2132.8088, 2332.5676, 1932.0765, 2312.218, 2366.2568, 60.62473, 2353.543, 2205.1887]
2025-09-12 01:55:53,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:55:53,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (2086.02) for latency MM1Queue_a033_s075
2025-09-12 01:55:53,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 24 hours, 56 minutes, 6 seconds)
2025-09-12 02:07:40,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:07:40,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:12:18,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2811.12451 ± 216.554
2025-09-12 02:12:18,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3061.6714, 3083.139, 2462.862, 2926.0515, 2755.657, 2834.0579, 3025.7776, 2432.3591, 2776.4114, 2753.256]
2025-09-12 02:12:18,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:12:18,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (2811.12) for latency MM1Queue_a033_s075
2025-09-12 02:12:18,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 24 hours, 38 minutes, 48 seconds)
2025-09-12 02:24:06,898 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:24:06,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:28:46,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2541.27783 ± 656.265
2025-09-12 02:28:46,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1008.22577, 1738.5657, 2511.6125, 3022.6497, 2648.1301, 3209.493, 2500.612, 3108.9995, 2578.838, 3085.652]
2025-09-12 02:28:46,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:28:46,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 24 hours, 23 minutes, 4 seconds)
2025-09-12 02:40:34,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:40:34,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:45:15,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2784.55273 ± 867.698
2025-09-12 02:45:15,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3372.6926, 3357.0242, 2963.0103, 2447.5808, 2138.808, 3542.9875, 3004.008, 516.529, 3463.2139, 3039.6719]
2025-09-12 02:45:15,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:45:15,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 24 hours, 7 minutes, 35 seconds)
2025-09-12 02:57:02,969 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:57:02,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:01:46,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3010.90942 ± 649.144
2025-09-12 03:01:46,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3563.761, 1746.9316, 3181.8674, 2201.159, 3542.9517, 3412.0232, 2205.5444, 3568.0981, 3371.3093, 3315.4502]
2025-09-12 03:01:46,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:01:46,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3010.91) for latency MM1Queue_a033_s075
2025-09-12 03:01:46,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 23 hours, 52 minutes, 47 seconds)
2025-09-12 03:13:32,914 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:13:32,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:18:15,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2903.65918 ± 786.773
2025-09-12 03:18:15,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3370.0505, 2958.487, 3097.003, 3451.2908, 3257.7336, 792.69904, 3424.2927, 2174.789, 3228.392, 3281.8516]
2025-09-12 03:18:15,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:18:15,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 23 hours, 36 minutes, 32 seconds)
2025-09-12 03:30:04,261 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:30:04,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:34:48,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2943.47583 ± 864.222
2025-09-12 03:34:48,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3530.7273, 1325.4342, 1986.2532, 3129.7043, 3448.0598, 3631.2886, 3395.2654, 3788.5708, 3514.2178, 1685.2334]
2025-09-12 03:34:48,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:34:48,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 23 hours, 22 minutes, 25 seconds)
2025-09-12 03:46:39,453 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:46:39,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:51:19,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3326.14185 ± 492.309
2025-09-12 03:51:19,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2918.0273, 4001.3716, 3540.9126, 3522.8691, 2950.3218, 3373.0735, 3687.4448, 3719.834, 2191.8965, 3355.6667]
2025-09-12 03:51:19,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:51:19,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3326.14) for latency MM1Queue_a033_s075
2025-09-12 03:51:19,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 23 hours, 6 minutes, 59 seconds)
2025-09-12 04:03:08,878 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:03:08,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:07:50,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2914.17114 ± 1095.514
2025-09-12 04:07:50,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1578.9075, 3353.452, 3348.673, 3345.1436, 3168.8027, 134.48921, 3704.72, 3507.9817, 3186.568, 3812.9724]
2025-09-12 04:07:50,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:07:50,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 22 hours, 50 minutes, 56 seconds)
2025-09-12 04:19:39,356 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:19:39,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:24:25,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2904.40405 ± 1017.507
2025-09-12 04:24:25,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3278.356, 3529.2832, 3308.817, 3301.742, 3757.876, 2971.273, 1074.6709, 860.4937, 2942.8523, 4018.6772]
2025-09-12 04:24:25,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:24:25,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 22 hours, 35 minutes, 23 seconds)
2025-09-12 04:36:14,376 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:36:14,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:40:53,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3677.10034 ± 625.868
2025-09-12 04:40:53,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3920.971, 3679.536, 3411.3806, 3677.4563, 4061.6494, 4120.913, 4051.8462, 4091.6963, 1915.4832, 3840.072]
2025-09-12 04:40:53,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:40:53,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3677.10) for latency MM1Queue_a033_s075
2025-09-12 04:40:53,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 22 hours, 18 minutes, 42 seconds)
2025-09-12 04:52:41,893 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:52:41,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:57:21,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2719.50269 ± 1629.342
2025-09-12 04:57:21,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3671.1697, 3553.6614, 4097.5654, 3505.806, 362.9866, 3870.9019, 477.31122, 3973.9304, -83.30998, 3765.0059]
2025-09-12 04:57:21,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:57:21,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 22 hours, 41 seconds)
2025-09-12 05:09:10,167 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:09:10,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:13:55,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3680.13672 ± 247.944
2025-09-12 05:13:55,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3369.2715, 3582.6853, 3540.7898, 3479.9011, 3991.1345, 3952.4253, 3580.4128, 3433.9016, 4132.0684, 3738.7742]
2025-09-12 05:13:55,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:13:55,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3680.14) for latency MM1Queue_a033_s075
2025-09-12 05:13:55,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 21 hours, 45 minutes, 7 seconds)
2025-09-12 05:25:44,861 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:25:44,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:30:24,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2816.03564 ± 1332.708
2025-09-12 05:30:24,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3967.5034, 4011.2559, 3787.1606, 2463.7222, 4152.8555, 1517.948, 3506.2, 231.8171, 3422.1077, 1099.7867]
2025-09-12 05:30:24,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:30:24,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 21 hours, 28 minutes, 4 seconds)
2025-09-12 05:42:13,291 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:42:13,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:46:51,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3399.15869 ± 669.122
2025-09-12 05:46:51,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3892.4797, 3823.9912, 3210.6462, 1909.4323, 3875.4436, 3789.3367, 3850.0515, 3512.7708, 2364.3484, 3763.0867]
2025-09-12 05:46:51,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:46:51,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 21 hours, 9 minutes, 29 seconds)
2025-09-12 05:58:38,716 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:58:38,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:03:20,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3332.20850 ± 1045.083
2025-09-12 06:03:20,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1588.8347, 972.5313, 3728.92, 3779.3179, 3844.7056, 3814.8083, 4215.6455, 3967.0422, 3698.3313, 3711.9497]
2025-09-12 06:03:20,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:03:20,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 20 hours, 53 minutes, 23 seconds)
2025-09-12 06:15:09,567 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:15:09,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:19:50,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3782.49072 ± 165.804
2025-09-12 06:19:50,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3733.6235, 3448.7815, 4032.2769, 3952.6694, 3710.3352, 3938.017, 3604.4707, 3733.6465, 3854.3228, 3816.7603]
2025-09-12 06:19:50,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:19:50,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3782.49) for latency MM1Queue_a033_s075
2025-09-12 06:19:50,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 20 hours, 37 minutes, 15 seconds)
2025-09-12 06:31:39,186 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:31:39,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:36:18,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3325.02588 ± 950.158
2025-09-12 06:36:18,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1221.8524, 4127.8984, 3826.4226, 4102.688, 3784.6555, 1996.6158, 2818.4424, 3575.0146, 4173.544, 3623.1262]
2025-09-12 06:36:18,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:36:18,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 20 hours, 19 minutes, 15 seconds)
2025-09-12 06:48:07,086 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:48:07,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:52:50,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3274.50415 ± 761.690
2025-09-12 06:52:50,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3667.9524, 3983.9956, 3511.9338, 2881.5986, 2842.7505, 4117.1094, 4110.1772, 2232.598, 3589.76, 1807.1653]
2025-09-12 06:52:50,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:52:50,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 20 hours, 3 minutes, 26 seconds)
2025-09-12 07:04:39,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:04:39,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:09:20,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3477.82568 ± 895.577
2025-09-12 07:09:20,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4110.004, 3969.762, 3645.4548, 4093.3438, 1662.1266, 1952.3143, 4172.776, 3049.5984, 4077.0315, 4045.8462]
2025-09-12 07:09:20,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:09:20,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 19 hours, 47 minutes, 49 seconds)
2025-09-12 07:21:10,210 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:21:10,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:25:49,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3622.18311 ± 1095.846
2025-09-12 07:25:49,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4024.4624, 3797.1184, 4067.3857, 3474.7615, 3880.5818, 4271.807, 3887.5186, 415.52054, 3996.7644, 4405.9116]
2025-09-12 07:25:49,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:25:49,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 19 hours, 31 minutes, 10 seconds)
2025-09-12 07:37:36,993 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:37:37,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:42:18,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3543.27222 ± 1141.715
2025-09-12 07:42:18,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4059.3105, 3881.6694, 3862.1118, 3794.992, 4099.192, 3943.5854, 3370.793, 4078.9155, 4163.247, 178.90471]
2025-09-12 07:42:18,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:42:18,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 19 hours, 14 minutes, 36 seconds)
2025-09-12 07:54:07,786 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:54:07,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:58:49,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2644.05640 ± 1520.852
2025-09-12 07:58:49,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1409.5865, 2346.3513, 1379.5634, 1122.8026, -92.92923, 4352.321, 3978.5042, 3815.1157, 4122.8516, 4006.396]
2025-09-12 07:58:49,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:58:49,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 58 minutes, 36 seconds)
2025-09-12 08:10:39,174 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:10:39,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:15:23,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4109.73340 ± 293.183
2025-09-12 08:15:23,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3946.2214, 4395.9907, 4307.756, 4523.141, 4222.681, 4135.1914, 4052.1667, 3708.4717, 3530.405, 4275.307]
2025-09-12 08:15:23,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:15:23,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4109.73) for latency MM1Queue_a033_s075
2025-09-12 08:15:23,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 18 hours, 42 minutes, 38 seconds)
2025-09-12 08:27:12,865 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:27:12,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:31:55,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4120.45361 ± 286.941
2025-09-12 08:31:55,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3624.8262, 4394.172, 3926.0557, 4181.6953, 4360.044, 4077.8162, 4492.8374, 4082.1487, 4383.469, 3681.4695]
2025-09-12 08:31:55,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:31:55,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4120.45) for latency MM1Queue_a033_s075
2025-09-12 08:31:55,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 18 hours, 26 minutes, 36 seconds)
2025-09-12 08:43:45,231 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:43:45,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:48:26,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2903.16455 ± 1407.161
2025-09-12 08:48:26,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2743.1504, 488.6125, 4233.516, 3999.8127, 1752.9683, 457.0424, 4012.0725, 3858.1462, 3981.433, 3504.89]
2025-09-12 08:48:26,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:48:26,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 18 hours, 10 minutes, 33 seconds)
2025-09-12 09:00:15,530 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:00:15,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:05:00,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3285.35742 ± 1374.252
2025-09-12 09:05:00,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3857.0408, 1295.0527, 4248.608, 3544.81, 3294.23, 4252.8906, 4520.1987, 3581.8408, 82.35307, 4176.5483]
2025-09-12 09:05:00,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:05:00,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 55 minutes, 12 seconds)
2025-09-12 09:16:49,600 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:16:49,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:21:28,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3196.52808 ± 1195.582
2025-09-12 09:21:28,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [691.8485, 4317.651, 3570.1033, 4099.3857, 3679.5122, 2555.6277, 4134.809, 3957.6963, 3636.1682, 1322.4796]
2025-09-12 09:21:28,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:21:28,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 17 hours, 38 minutes)
2025-09-12 09:33:17,853 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:33:17,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:37:57,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4133.92676 ± 342.818
2025-09-12 09:37:57,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4409.8877, 4281.5864, 4265.4546, 4256.8306, 4228.1787, 3845.5513, 4258.37, 3295.2075, 3927.0972, 4571.1035]
2025-09-12 09:37:57,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:37:57,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4133.93) for latency MM1Queue_a033_s075
2025-09-12 09:37:57,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 17 hours, 20 minutes, 20 seconds)
2025-09-12 09:49:46,406 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:49:46,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:54:28,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4009.61084 ± 234.973
2025-09-12 09:54:28,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3936.9468, 3367.099, 4325.8145, 4076.5745, 4003.575, 4076.1907, 4050.3503, 4100.7095, 4027.57, 4131.2793]
2025-09-12 09:54:28,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:54:28,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 17 hours, 3 minutes, 35 seconds)
2025-09-12 10:06:17,996 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:06:18,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:10:57,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4252.50830 ± 181.207
2025-09-12 10:10:57,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4215.5547, 4289.689, 4038.886, 4235.3765, 4264.6855, 4488.018, 4086.958, 4370.6978, 4569.47, 3965.7473]
2025-09-12 10:10:57,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:10:57,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4252.51) for latency MM1Queue_a033_s075
2025-09-12 10:10:57,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 16 hours, 46 minutes, 37 seconds)
2025-09-12 10:22:46,887 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:22:46,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:27:27,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3436.28125 ± 1178.280
2025-09-12 10:27:27,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3874.0234, 623.13904, 4341.019, 4299.2925, 3881.1619, 3705.7598, 4257.22, 3045.695, 1948.4379, 4387.064]
2025-09-12 10:27:27,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:27:27,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 16 hours, 29 minutes, 13 seconds)
2025-09-12 10:39:16,287 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:39:16,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:43:55,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4180.55762 ± 299.240
2025-09-12 10:43:55,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4140.631, 4120.46, 4628.9697, 4515.3296, 3990.9133, 4297.593, 4172.816, 3767.4863, 4497.235, 3674.1392]
2025-09-12 10:43:55,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:43:55,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 16 hours, 12 minutes, 54 seconds)
2025-09-12 10:55:44,955 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:55:44,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:00:24,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3795.97192 ± 1071.070
2025-09-12 11:00:24,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4363.7783, 4340.6094, 4322.0293, 4252.5103, 807.6179, 4308.306, 3479.3843, 4222.0957, 4586.0283, 3277.358]
2025-09-12 11:00:24,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:00:24,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 56 minutes, 27 seconds)
2025-09-12 11:12:12,564 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:12:12,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:16:51,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3648.58057 ± 1108.766
2025-09-12 11:16:51,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4002.5837, 4226.963, 2176.2322, 4286.291, 4581.067, 898.1317, 4193.24, 4084.08, 4198.3774, 3838.8389]
2025-09-12 11:16:51,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:16:51,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 39 minutes, 11 seconds)
2025-09-12 11:28:39,388 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:28:39,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:33:20,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3787.08057 ± 974.434
2025-09-12 11:33:20,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4711.732, 4426.903, 2282.06, 3814.8833, 4459.552, 3805.2996, 1597.6583, 4438.144, 4113.033, 4221.541]
2025-09-12 11:33:20,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:33:20,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 15 hours, 22 minutes, 39 seconds)
2025-09-12 11:45:09,245 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:45:09,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:49:50,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4101.87012 ± 554.680
2025-09-12 11:49:50,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4201.2993, 4077.651, 4195.832, 4306.498, 4471.0625, 4634.7217, 3325.9634, 4319.983, 2824.1794, 4661.509]
2025-09-12 11:49:50,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:49:50,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 15 hours, 6 minutes, 13 seconds)
2025-09-12 12:01:36,887 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:01:36,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:06:20,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3991.59033 ± 1048.530
2025-09-12 12:06:20,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4270.902, 3521.912, 4400.672, 4443.056, 4616.4434, 1124.9135, 3666.1072, 4888.8423, 4100.877, 4882.175]
2025-09-12 12:06:20,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:06:20,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 50 minutes, 5 seconds)
2025-09-12 12:18:10,007 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:18:10,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:22:54,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3657.74609 ± 1181.331
2025-09-12 12:22:54,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4099.272, 4450.2456, 2807.6946, 4193.8467, 3769.3057, 4379.811, 370.71805, 4182.883, 4167.8467, 4155.8345]
2025-09-12 12:22:54,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:22:54,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 34 minutes, 33 seconds)
2025-09-12 12:34:41,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:34:41,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:39:20,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4345.47119 ± 257.964
2025-09-12 12:39:20,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4286.255, 3976.925, 4229.398, 4664.5776, 4811.4233, 4439.088, 4053.3313, 4090.5432, 4373.829, 4529.341]
2025-09-12 12:39:20,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:39:20,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4345.47) for latency MM1Queue_a033_s075
2025-09-12 12:39:20,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 14 hours, 17 minutes, 48 seconds)
2025-09-12 12:51:07,588 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:51:07,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:55:47,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3673.37036 ± 1742.682
2025-09-12 12:55:47,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4570.701, 4760.5664, 4161.6113, 4524.426, 4402.68, 503.59518, -54.62288, 4953.688, 4256.189, 4654.8677]
2025-09-12 12:55:47,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:55:47,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 14 hours, 1 minute, 8 seconds)
2025-09-12 13:07:34,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:07:34,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:12:15,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4482.09326 ± 242.512
2025-09-12 13:12:15,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4547.4434, 4286.9604, 4303.985, 4756.1064, 4460.7627, 4864.1035, 4307.4497, 4748.2407, 4500.7046, 4045.174]
2025-09-12 13:12:15,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:12:15,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4482.09) for latency MM1Queue_a033_s075
2025-09-12 13:12:15,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 44 minutes, 18 seconds)
2025-09-12 13:24:02,975 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:24:02,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:28:47,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3735.31689 ± 1497.016
2025-09-12 13:28:47,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4651.262, 1597.148, 4747.3, 4749.308, 4548.607, 3938.557, 4492.4624, 4037.5261, 123.62579, 4467.372]
2025-09-12 13:28:47,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:28:47,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 13 hours, 27 minutes, 56 seconds)
2025-09-12 13:40:35,427 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:40:35,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:45:13,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3713.10107 ± 1288.154
2025-09-12 13:45:13,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4402.8403, 4197.3433, 3616.7524, 4643.283, 4117.255, 4453.4077, 148.0149, 4660.6567, 3968.505, 2922.9546]
2025-09-12 13:45:13,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:45:13,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 13 hours, 10 minutes, 16 seconds)
2025-09-12 13:57:02,994 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:57:03,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:01:43,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4106.81543 ± 341.254
2025-09-12 14:01:43,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4503.575, 4536.1836, 4528.2485, 3934.7026, 4138.0654, 3490.0615, 3790.9414, 4120.4126, 3771.5232, 4254.4385]
2025-09-12 14:01:43,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:01:43,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 54 minutes, 20 seconds)
2025-09-12 14:13:30,345 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:13:30,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:18:14,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3304.84717 ± 2030.786
2025-09-12 14:18:14,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4757.354, 271.83188, 4764.3647, 4626.06, 462.50974, 4417.434, 4746.219, -93.222725, 4437.669, 4658.2534]
2025-09-12 14:18:14,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:18:14,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 38 minutes, 29 seconds)
2025-09-12 14:30:02,802 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:30:02,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:34:45,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4044.99756 ± 1013.220
2025-09-12 14:34:45,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4713.046, 4633.1904, 4195.86, 1056.2749, 4459.909, 4338.4497, 4044.369, 4285.993, 4393.4263, 4329.4575]
2025-09-12 14:34:45,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:34:45,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 22 minutes, 22 seconds)
2025-09-12 14:46:35,203 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:46:35,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:51:16,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4033.33984 ± 1114.954
2025-09-12 14:51:16,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4980.787, 4594.76, 4529.1177, 849.7208, 4259.683, 4117.58, 4233.6846, 3669.1833, 4357.9224, 4740.9595]
2025-09-12 14:51:16,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:51:16,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 12 hours, 5 minutes, 52 seconds)
2025-09-12 15:03:06,085 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:03:06,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:07:47,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3982.01416 ± 1214.564
2025-09-12 15:07:47,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4780.6807, 4708.593, 4131.474, 4901.254, 4389.795, 2989.1833, 676.04865, 4552.626, 4499.612, 4190.8706]
2025-09-12 15:07:47,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:07:47,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 49 minutes, 57 seconds)
2025-09-12 15:19:35,782 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:19:35,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:24:17,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4057.75635 ± 1381.094
2025-09-12 15:24:17,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5104.1187, 4755.312, 3885.6465, 4995.6274, 4793.088, 4730.025, 281.91003, 3076.2053, 4384.2466, 4571.385]
2025-09-12 15:24:17,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:24:17,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 33 minutes, 39 seconds)
2025-09-12 15:36:05,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:36:05,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:40:43,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3921.59766 ± 1306.962
2025-09-12 15:40:43,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4207.507, 3671.5256, 4799.702, 4760.0356, 4386.3057, 165.34113, 4860.2314, 4419.6025, 4014.4214, 3931.3044]
2025-09-12 15:40:43,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:40:43,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 16 minutes, 22 seconds)
2025-09-12 15:52:31,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:52:31,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:57:09,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3627.32227 ± 1095.736
2025-09-12 15:57:09,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [981.0045, 2028.0316, 4287.978, 3998.6301, 3965.0647, 4119.6655, 4055.8433, 4468.4546, 4143.572, 4224.9795]
2025-09-12 15:57:09,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:57:09,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 59 minutes, 14 seconds)
2025-09-12 16:08:57,075 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:08:57,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:13:39,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4315.57227 ± 542.599
2025-09-12 16:13:39,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4447.9077, 4702.3877, 4372.211, 4250.9717, 4533.483, 4525.4863, 4450.1655, 2765.8457, 4822.8506, 4284.4136]
2025-09-12 16:13:39,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:13:39,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 42 minutes, 38 seconds)
2025-09-12 16:25:26,630 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:25:26,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:30:08,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4376.17090 ± 542.790
2025-09-12 16:30:08,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4770.846, 4395.703, 4565.0337, 4685.175, 4483.395, 4573.2725, 4644.913, 4158.9854, 2824.2305, 4660.1543]
2025-09-12 16:30:08,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:30:08,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 25 minutes, 58 seconds)
2025-09-12 16:41:57,824 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:41:57,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:46:36,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3821.30542 ± 1356.370
2025-09-12 16:46:36,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2402.2605, 5043.611, 4933.681, 2255.7212, 4288.7773, 4190.2275, 4729.258, 4836.5107, 927.1597, 4605.8457]
2025-09-12 16:46:36,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:46:36,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 9 minutes, 9 seconds)
2025-09-12 16:58:23,992 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:58:23,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:03:04,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3991.47388 ± 914.617
2025-09-12 17:03:04,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1293.2201, 4094.2578, 4156.3335, 4410.3486, 4504.7397, 4488.0967, 4163.9165, 4225.91, 4065.687, 4512.229]
2025-09-12 17:03:04,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:03:04,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 52 minutes, 51 seconds)
2025-09-12 17:14:51,276 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:14:51,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:19:31,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4505.48682 ± 923.768
2025-09-12 17:19:31,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4679.978, 1763.0579, 4800.4277, 4731.641, 5016.757, 4977.623, 4866.453, 4539.7886, 4789.703, 4889.4355]
2025-09-12 17:19:31,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:19:31,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4505.49) for latency MM1Queue_a033_s075
2025-09-12 17:19:31,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 36 minutes, 36 seconds)
2025-09-12 17:31:19,170 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:31:19,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:36:00,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4254.10693 ± 471.778
2025-09-12 17:36:00,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4012.9578, 4382.0034, 4437.0996, 4011.7893, 3126.2673, 4280.6235, 4834.4253, 4047.0225, 4727.357, 4681.527]
2025-09-12 17:36:00,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:36:00,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 19 minutes, 58 seconds)
2025-09-12 17:47:47,906 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:47:47,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:52:26,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4596.49365 ± 274.931
2025-09-12 17:52:26,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4630.084, 5003.837, 3962.774, 4923.046, 4586.1826, 4450.366, 4612.9473, 4609.018, 4764.614, 4422.066]
2025-09-12 17:52:26,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:52:26,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4596.49) for latency MM1Queue_a033_s075
2025-09-12 17:52:26,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 3 minutes, 9 seconds)
2025-09-12 18:04:11,628 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:04:11,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:08:54,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4106.43750 ± 918.477
2025-09-12 18:08:54,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3863.4531, 4815.344, 4801.3447, 1564.6359, 4200.658, 4538.698, 4136.0703, 4658.6763, 4685.4775, 3800.0203]
2025-09-12 18:08:54,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:08:54,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 46 minutes, 38 seconds)
2025-09-12 18:20:39,215 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:20:39,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:25:17,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4714.59863 ± 229.258
2025-09-12 18:25:17,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4974.933, 4900.065, 4373.375, 4923.0527, 4673.725, 4675.357, 5022.6743, 4374.2144, 4735.598, 4492.9937]
2025-09-12 18:25:17,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:25:17,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4714.60) for latency MM1Queue_a033_s075
2025-09-12 18:25:17,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 29 minutes, 44 seconds)
2025-09-12 18:37:02,926 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:37:02,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:41:41,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4087.25781 ± 1278.835
2025-09-12 18:41:41,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1934.2305, 4057.7705, 4591.028, 4700.433, 5149.593, 1279.9562, 4609.008, 4656.2603, 4990.156, 4904.1445]
2025-09-12 18:41:41,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:41:41,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 12 minutes, 56 seconds)
2025-09-12 18:53:26,387 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:53:26,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:58:04,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4501.33105 ± 880.821
2025-09-12 18:58:04,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5019.428, 4936.3135, 5031.3115, 2146.5361, 4703.88, 5216.1953, 3928.184, 4099.942, 4995.916, 4935.605]
2025-09-12 18:58:04,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:58:04,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 55 minutes, 59 seconds)
2025-09-12 19:09:47,800 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:09:47,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:14:26,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3952.40039 ± 1570.952
2025-09-12 19:14:27,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4667.661, 4880.8037, 4542.8765, 786.44867, 4668.396, 4954.231, 850.0954, 4673.0337, 4765.685, 4734.774]
2025-09-12 19:14:27,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:14:27,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 39 minutes, 13 seconds)
2025-09-12 19:26:10,411 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:26:10,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:30:48,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4648.85791 ± 277.566
2025-09-12 19:30:48,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4468.1533, 4372.895, 4564.594, 4487.582, 4341.438, 5284.3174, 4781.3486, 4951.759, 4715.761, 4520.7305]
2025-09-12 19:30:48,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:30:48,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 22 minutes, 17 seconds)
2025-09-12 19:42:32,674 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:42:32,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:47:15,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4648.36475 ± 911.407
2025-09-12 19:47:15,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5130.8823, 4250.926, 4684.0933, 5162.101, 4959.7114, 4873.7007, 4997.1616, 5145.867, 2042.955, 5236.25]
2025-09-12 19:47:15,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:47:15,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 6 minutes, 15 seconds)
2025-09-12 19:58:59,450 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:58:59,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:03:38,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4042.15869 ± 1015.252
2025-09-12 20:03:38,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4680.588, 4476.673, 2134.0312, 2029.5103, 4166.322, 4603.1626, 4195.3306, 4251.3623, 4916.113, 4968.493]
2025-09-12 20:03:38,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:03:38,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 49 minutes, 48 seconds)
2025-09-12 20:15:22,230 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:15:22,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:20:04,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4131.50684 ± 1442.135
2025-09-12 20:20:04,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2341.9683, 4692.883, 4663.5684, 5125.512, 5076.279, 5074.484, 471.67178, 4368.5825, 4675.238, 4824.8794]
2025-09-12 20:20:04,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:20:04,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 33 minutes, 34 seconds)
2025-09-12 20:31:48,520 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:31:48,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:36:26,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4666.27979 ± 637.782
2025-09-12 20:36:26,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5025.5884, 4912.585, 4864.7134, 4535.2026, 2800.752, 4807.302, 4904.1006, 4903.56, 5099.8403, 4809.153]
2025-09-12 20:36:26,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:36:26,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 17 minutes, 10 seconds)
2025-09-12 20:48:11,059 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:48:11,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:52:48,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3871.93091 ± 1549.564
2025-09-12 20:52:48,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5379.196, 4895.801, 4792.903, 2427.9937, 2157.3862, 4451.5366, 4701.524, 396.46844, 5026.9497, 4489.5513]
2025-09-12 20:52:48,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:52:48,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 48 seconds)
2025-09-12 21:04:32,812 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:04:32,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:09:11,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4717.86963 ± 1030.005
2025-09-12 21:09:11,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4528.327, 4189.7573, 5278.717, 4933.13, 4969.072, 1893.5908, 5636.8784, 4974.784, 5595.898, 5178.544]
2025-09-12 21:09:11,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:09:11,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4717.87) for latency MM1Queue_a033_s075
2025-09-12 21:09:11,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 44 minutes, 6 seconds)
2025-09-12 21:20:56,310 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:20:56,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:25:34,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4723.42871 ± 236.876
2025-09-12 21:25:34,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4812.4614, 5221.7476, 4833.58, 4504.834, 4913.5923, 4549.022, 4393.997, 4740.534, 4794.9175, 4469.6]
2025-09-12 21:25:34,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:25:34,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4723.43) for latency MM1Queue_a033_s075
2025-09-12 21:25:34,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 27 minutes, 42 seconds)
2025-09-12 21:37:18,804 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:37:18,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:41:56,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4662.35156 ± 360.137
2025-09-12 21:41:56,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4521.558, 4858.4263, 4801.018, 5008.2686, 3705.2551, 4933.003, 4578.2236, 4796.145, 4911.8984, 4509.7207]
2025-09-12 21:41:56,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:41:56,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 11 minutes, 4 seconds)
2025-09-12 21:53:41,021 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:53:41,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:58:19,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4125.58545 ± 1552.727
2025-09-12 21:58:19,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5030.598, 4686.052, 4940.5215, 4568.3647, 194.83243, 4477.732, 2182.8179, 5144.6323, 5285.3325, 4744.9717]
2025-09-12 21:58:19,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:58:19,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 54 minutes, 45 seconds)
2025-09-12 22:10:00,919 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:10:00,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:14:38,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3914.89771 ± 1438.913
2025-09-12 22:14:38,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [535.3061, 5177.809, 4587.841, 4662.707, 4378.7915, 4446.9375, 4634.3535, 4494.638, 1693.0887, 4537.506]
2025-09-12 22:14:38,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:14:38,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 38 minutes, 14 seconds)
2025-09-12 22:26:35,130 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:26:35,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:31:20,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4778.10938 ± 223.645
2025-09-12 22:31:20,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4743.226, 4953.4805, 4649.7144, 4881.369, 4642.424, 4831.8535, 4907.497, 4976.4033, 4211.237, 4983.8887]
2025-09-12 22:31:20,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:31:20,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4778.11) for latency MM1Queue_a033_s075
2025-09-12 22:31:20,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 22 minutes, 51 seconds)
2025-09-12 22:43:16,615 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:43:16,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:47:57,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3258.29028 ± 1700.194
2025-09-12 22:47:57,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2805.7415, 4899.8193, 4983.6455, 1481.275, 4861.6016, 555.4375, 2000.0139, 4153.237, 1476.9275, 5365.206]
2025-09-12 22:47:57,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:47:57,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 7 minutes, 8 seconds)
2025-09-12 22:59:53,242 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:59:53,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:04:35,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4755.86572 ± 690.035
2025-09-12 23:04:35,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5256.607, 4966.9326, 2818.7786, 4672.3047, 5108.3247, 4844.7334, 4721.819, 5345.0674, 5210.1045, 4613.9834]
2025-09-12 23:04:35,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:04:35,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 51 minutes, 25 seconds)
2025-09-12 23:16:31,988 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:16:32,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:21:13,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4654.85010 ± 692.945
2025-09-12 23:21:13,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5423.68, 5276.999, 4487.0303, 4904.0356, 4603.3213, 2750.6257, 4858.1455, 4674.1685, 4665.1104, 4905.382]
2025-09-12 23:21:13,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:21:13,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 35 minutes, 33 seconds)
2025-09-12 23:33:09,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:33:09,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:37:50,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4279.47998 ± 958.621
2025-09-12 23:37:50,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4932.826, 1823.4204, 4656.193, 4854.3604, 4553.3237, 4176.9653, 4731.4204, 3217.2134, 4986.5596, 4862.5225]
2025-09-12 23:37:50,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:37:50,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 19 minutes, 39 seconds)
2025-09-12 23:49:46,275 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:49:46,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:54:28,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3995.70508 ± 1460.005
2025-09-12 23:54:28,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4241.6914, 5380.347, 4435.1904, 4340.676, 4209.7383, 4457.708, 521.00714, 2038.773, 4916.052, 5415.867]
2025-09-12 23:54:28,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:54:28,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 2 minutes, 55 seconds)
2025-09-13 00:06:27,663 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:06:27,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:11:10,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4912.27051 ± 558.860
2025-09-13 00:11:10,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3362.5183, 4776.528, 5237.506, 4966.8145, 5078.1206, 5603.434, 4991.131, 5145.112, 4882.537, 5078.998]
2025-09-13 00:11:10,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:11:10,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4912.27) for latency MM1Queue_a033_s075
2025-09-13 00:11:10,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 46 minutes, 26 seconds)
2025-09-13 00:23:05,251 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:23:05,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:27:44,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4896.48486 ± 354.964
2025-09-13 00:27:44,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4632.037, 5057.1616, 4500.328, 5359.691, 4500.6274, 5556.017, 5079.6265, 4909.297, 4882.4204, 4487.637]
2025-09-13 00:27:44,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:27:44,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 29 minutes, 40 seconds)
2025-09-13 00:39:37,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:39:37,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:44:17,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4282.29150 ± 1388.102
2025-09-13 00:44:17,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4844.905, 4879.198, 5264.1807, 4674.025, 4719.7935, 4843.346, 5367.4946, 4930.9766, 2518.4106, 780.5851]
2025-09-13 00:44:17,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:44:17,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 12 minutes, 53 seconds)
2025-09-13 00:56:17,227 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:56:17,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:00:58,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4129.46191 ± 1323.320
2025-09-13 01:00:58,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4627.166, 5069.789, 4504.4614, 2471.6401, 4811.58, 4651.3926, 807.0971, 4280.4136, 5194.1367, 4876.941]
2025-09-13 01:00:58,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:00:58,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 56 minutes, 23 seconds)
2025-09-13 01:12:50,723 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:12:50,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:17:33,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4720.02441 ± 713.422
2025-09-13 01:17:33,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4427.069, 4585.899, 4966.366, 4835.833, 2877.896, 4956.333, 4458.0415, 5677.412, 5236.4355, 5178.957]
2025-09-13 01:17:33,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:17:33,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 39 minutes, 42 seconds)
2025-09-13 01:29:23,624 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:29:23,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:34:02,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3875.21875 ± 1776.965
2025-09-13 01:34:02,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4514.1357, 4377.6416, 5035.567, 164.19662, 3990.1218, 657.49164, 4592.169, 4939.4834, 5405.967, 5075.4146]
2025-09-13 01:34:02,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:34:02,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 22 minutes, 51 seconds)
2025-09-13 01:45:55,538 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:45:55,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:50:36,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4115.10254 ± 1751.014
2025-09-13 01:50:36,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5172.589, 1126.0873, 352.0588, 4987.599, 5285.9243, 5519.3516, 5389.4214, 4927.7754, 4042.7808, 4347.441]
2025-09-13 01:50:36,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:50:36,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 6 minutes, 17 seconds)
2025-09-13 02:02:29,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:02:29,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:07:08,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4533.07275 ± 997.040
2025-09-13 02:07:09,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5349.2124, 4879.304, 4997.5127, 4963.896, 4728.5117, 4563.818, 1625.8038, 4957.9604, 4808.439, 4456.27]
2025-09-13 02:07:09,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:07:09,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 49 minutes, 42 seconds)
2025-09-13 02:19:01,191 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:19:01,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:23:42,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4233.30371 ± 1387.731
2025-09-13 02:23:42,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5137.0254, 5199.1455, 5042.9487, 4683.025, 2100.406, 3901.6597, 4758.0244, 1098.9785, 5197.4336, 5214.3857]
2025-09-13 02:23:42,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:23:42,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 33 minutes, 5 seconds)
2025-09-13 02:35:35,384 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:35:35,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:40:16,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4998.39551 ± 414.147
2025-09-13 02:40:16,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5122.394, 5175.541, 5164.292, 4724.175, 5336.0176, 4959.532, 5335.533, 5328.2754, 3891.5784, 4946.618]
2025-09-13 02:40:16,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:40:16,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4998.40) for latency MM1Queue_a033_s075
2025-09-13 02:40:16,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 32 seconds)
2025-09-13 02:52:08,284 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:52:08,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:56:51,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3930.49536 ± 2020.101
2025-09-13 02:56:51,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5082.745, 3986.8748, 4505.015, 1.7730247, 5285.004, 5373.2607, 5507.5527, 4959.2183, -37.59035, 4641.0957]
2025-09-13 02:56:51,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:56:51,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1251 [DEBUG]: Training session finished
