TighteningtheDependenceonHorizonintheSampleComplexityofQ-LearningGenLi1ChangxiaoCai2YuxinChen2YuantaoGu1YutingWei3YuejieChi4AbstractQ-learning(Borkar&Meyn,2000;Jaakkolaetal.,1994;Szepesva´ri,1998;...
UnderstandingtheCurseofHorizoninOff-PolicyEvaluationviaConditionalImportanceSamplingYaoLiu1Pierre-LucBacon2EmmaBrunskill1Abstractincreasinginterestindevelopingaccurateandefficientalgo-rithmsforoff-...
MaximumEntropyGainExplorationforLongHorizonMulti-goalReinforcementLearningSilviuPitis12HarrisChan12StephenZhao1BradlyStadie2JimmyBa12AbstractInthispaper,weimproveuponexistingapproachestointrin-sicg...