INFO - train_rl - Running command 'train_rl'
INFO - train_rl - Started run with ID "2"
INFO - imitation.scripts.ingredients.logging - Logging to /home/luotianjiao/imitation/quickstart/rl
INFO - imitation.scripts.ingredients.rl - RL algorithm: <class 'stable_baselines3.ppo.ppo.PPO'>
INFO - imitation.scripts.ingredients.rl - Policy network summary:
 FeedForward32Policy(
  (features_extractor): FlattenExtractor(
    (flatten): Flatten(start_dim=1, end_dim=-1)
  )
  (pi_features_extractor): FlattenExtractor(
    (flatten): Flatten(start_dim=1, end_dim=-1)
  )
  (vf_features_extractor): FlattenExtractor(
    (flatten): Flatten(start_dim=1, end_dim=-1)
  )
  (mlp_extractor): MlpExtractor(
    (shared_net): Sequential(
      (0): Linear(in_features=3, out_features=32, bias=True)
      (1): Tanh()
      (2): Linear(in_features=32, out_features=32, bias=True)
      (3): Tanh()
    )
    (policy_net): Sequential()
    (value_net): Sequential()
  )
  (action_net): Linear(in_features=32, out_features=1, bias=True)
  (value_net): Linear(in_features=32, out_features=1, bias=True)
)
INFO - root - Saved policy to /home/luotianjiao/imitation/quickstart/rl/policies/000000000002
--------------------------
| time/              |   |
|    fps             | 5 |
|    iterations      | 1 |
|    time_elapsed    | 0 |
|    total_timesteps | 2 |
--------------------------
INFO - root - Saved policy to /home/luotianjiao/imitation/quickstart/rl/policies/000000000004
------------------------------------------
| time/                   |              |
|    fps                  | 9            |
|    iterations           | 2            |
|    time_elapsed         | 0            |
|    total_timesteps      | 4            |
| train/                  |              |
|    approx_kl            | 0.0072758496 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.41        |
|    explained_variance   | 0.0212       |
|    learning_rate        | 0.001        |
|    loss                 | 0.689        |
|    n_updates            | 10           |
|    policy_gradient_loss | -0.0301      |
|    std                  | 0.99         |
|    value_loss           | 2.01         |
------------------------------------------
INFO - root - Rollout stats: {'n_traj': 400, 'return_min': -48.70681795780929, 'return_mean': -18.63557135956911, 'return_std': 15.350654150591792, 'return_max': -0.02330578205247302, 'len_min': 5, 'len_mean': 5.0, 'len_std': 0.0, 'len_max': 5}
INFO - root - Dumped demonstrations to /home/luotianjiao/imitation/quickstart/rl/rollouts/final.npz.
INFO - root - Saved policy to /home/luotianjiao/imitation/quickstart/rl/policies/final
INFO - train_rl - Result: {'n_traj': 2, 'monitor_return_len': 2, 'return_min': -3.470819592475891, 'return_mean': -1.9774596244096756, 'return_std': 1.4933599680662155, 'return_max': -0.4840996563434601, 'len_min': 5, 'len_mean': 5.0, 'len_std': 0.0, 'len_max': 5, 'monitor_return_min': -39.21028, 'monitor_return_mean': -22.3400695, 'monitor_return_std': 16.8702105, 'monitor_return_max': -5.469859}
INFO - train_rl - Completed after 0:00:03
