this folder contains the complete results of convergence analysis in the synthetical example.
1. subfolder "kl_divergence" contains the KL divergence plots for each mesh resolution.
2. subfolder "reward" contains the comparison between the ground truth reward and estimated reward for each mesh resolution.
3. subfolder "state_action_probability" contains the animation of state action probability evolution over time for each mesh resolution..
4. subfolder "value_policy_reward" contains the overview of the value function, policy, and reward for each mesh resolution.  


## KL divergence plot:
1. the first plot on the left shows the KL divergence between the the ground truth state-action probability and state-action probability using estimated transition and policy induced from the estimated value function.
2. the second plot in the middle shows the KL divergence between the ground truth policy and the estimated policy, the value on each grid shows the the KL divergence for the policy conditioned on the state $s$: $D_{KL} (\pi(\cdot | s) || \hat{\pi} (\cdot | s))$.
3. the third plot on the right show the KL divergence between the ground truth transition of the estimated transition. Each grid contains $n_a$ subgrids ($n_a$ denotes the number of discretized actions), the value for each subgrids represent the KL divergence for the transition conditioned on the state-action pair $(s, a)$: $D_{KL} (T(\cdot | s, a) || \hat{T} (\cdot | s, a))$.


## Reward plot
1. the first plot on the left shows the ground truth reward using corresponding mesh resolution.
2. the second plot in the middle shows the estimated reward if using the joint transition from the Fokker-Planck equation.
3. the third plot on the right shows the estimated reward if using the marginal transition and Boltzmann policy.


## State action probability animation
1. This plot shows the animation of the state-action probability density evolution over time for different mesh resolutions. each grid contains $n_a$ subgrids, the color of subgrids represents the probability of that state-action pair $p_t(s, a)$


## Value policy reward 
This plot contains a overall comparison for values functions, policy, and reward. The top row is ground truth values, the bottom row is the estimated values from VSI. The first column is the value function comparison, the second column is the policy comparison, the third column is the reward comparison.
