Abstract: Highlights•Proposed an unbiased reduced-variance multifidelity estimator for the state–action value function in multifidelity reinforcement learning (RL).•Theoretically analyzed the impacts of variance reduction in estimating the state–action value function on both policy evaluation and policy improvement.•Designed a multifidelity Monte Carlo RL algorithm, MFMCRL, to improve policy learning for RL agents operating in high-fidelity environments.•Demonstrated empirical performance gains in synthetic multifidelity RL environments and a neural architecture search (NAS) use case.
Loading