This code is for the paper: Achieve Performatively Optimal Policy for Performative Reinforcement Learning 

Simply run “main.py”. It will implement both our 0-FW algorithm and the repeated training algorithm, and compute the entropy value, unregularized and regularized performative value function values of every generated policy. These three quantities have the following relationship:

Regularized performative value=Unregularized performative value + lambda * Entropy (lambda=0.5)

A new folder called “Results_PerformativeRL” will be created. Then in this folder, the code will generate the result figures "simulation_lambda0.5.png" and "simulation_unreg.png", the learning curves of respectively the regularized and unregularized performative value function, as well as the following files saving the result data as numpy arrays.

"RepeatedTraining_V_unreg.npy": The unregularized performative value function of the policies generated from the repeated training algorithm. 
"RepeatedTraining_V_entropy.npy": The entropy values of the policies generated from the repeated training algorithm. 
"RepeatedTraining_V_reg_lambda0.5.npy": The regularized performative value function of the policies generated from the repeated training algorithm. "0FW_V_unreg.npy": The unregularized performative value function of the policies generated from our 0-FW algorithm. "0FW_V_entropy.npy": The entropy values of the policies generated from our 0-FW algorithm. 
"0FW_V_reg_lambda0.5.npy": The regularized performative value function of the policies generated from our 0-FW algorithm. Our experiment is run on Python 3.9 in a MacBook Pro laptop with 500 GB Storage and 8-core CPU (16 GB Memory), taking about 110 minutes.


