Abstract: Highlights•A novel pre-training DRL algorithm simplifies the pre-training phase.•The algorithm reduces the format requirements on the demonstration data.•The algorithm simplifies the dominance term.•A novel priority formula fulfills algorithm’s needs for replaying experience.•Double target-networks achieve more reliable training.
Loading