On-PolicyDeepReinforcementLearningfortheAverage-rewardCriterionYimingZhang1KeithW.Ross21AbstractHaarnojaetal.,2018)orinaqueuingscenario(Tadepalli&Ok,1994;Sutton&Barto,2018),thereisnonaturalsep-Wede...
LearningandPlanninginAverage-rewardMarkovDecisionProcessesYiWan1AbhishekNaik1RichardS.Sutton12Abstractwithit.Forlearningandcombinedmethods,bothcontrolandpredictionproblemscanbefurthersubdividedinto...
Average-rewardOff-PolicyPolicyEvaluationwithFunctionApproximationShangtongZhang1YiWan2RichardS.Sutton2ShimonWhiteson1Abstractwhichaimtogenerateapolicythatmaximizestherewardratebyiterativelyimprovin...
Model-freeReinforcementLearninginInfinite-horizonAverage-rewardMarkovDecisionProcessesChen-YuWei1MehdiJafarnia-Jahromi1HaipengLuo1HiteshiSharma1RahulJain1Abstractandmodel-free.Model-basedalgorithms...