On-PolicyDeepReinforcementLearningfortheAverage-RewardCriterionYimingZhang1KeithW.Ross21AbstractHaarnojaetal.,2018)orinaqueuingscenario(Tadepalli&Ok,1994;Sutton&Barto,2018),thereisnonaturalsep-Wede...