PrincipledExplorationviaOptimisticBootstrappingandBackwardInductionChenjiaBai1LingxiaoWang2LeiHan3JianyeHao4AnimeshGarg5PengLiu1ZhaoranWang2Abstract2007;Jinetal.,2018)isaprincipledapproachforeffici...
EnsembleBootstrappingforQ-LearningOrenPeer1ChenTessler1NadavMerlis1RonMeir1Abstractfocusesonlearningthevalue-function.Thevaluerepresentstheexpected,discounted,reward-to-gothattheagentwillQ-learning...
BootstrappingFittedQ-EvaluationforOff-PolicyInferenceBotaoHao1XiangJi2YaqiDuan2HaoLu2CsabaSzepesva´ri13MengdiWang12Abstractetal.,2013;Munos&Szepesva´ri,2008;Leetal.,2019).Inpractice,FQEhasdemonst...
SafePolicyImprovementwithBaselineBootstrappingRomainLaroche1PaulTrichelair1RemiTachetdesCombes1AbstractisakeychallengeofmodernRLthatneedstobetackledbeforeanywide-scaleadoption.ThispaperconsidersSaf...
GarbageIn,RewardOut:BootstrappingExplorationinMulti-ArmedBanditsBranislavKveton1CsabaSzepesva´ri23SharanVaswani4ZhengWen5MohammadGhavamzadeh6TorLattimore2Abstract2013b)isageneralizationofamulti-ar...