AccountableOff-PolicyEvaluationWithKernelBellmanStatisticsYihaoFeng1TongzhengRen1ZiyangTang1QiangLiu1Abstractdecisions.Off-policyevaluationplaysanimportantroleinImportancesampling(IS)providesabasic...
RevisitingtheSoftmaxBellmanOperator:NewBenefitsandNewPerspectiveZhaoSong1RonaldE.Parr1LawrenceCarin1Abstracttivatestheuseofexploratoryandpotentiallysub-optimalactionsduringlearning,andonecommonly-u...
TheUncertaintyBellmanEquationandExplorationBrendanO’Donoghue1IanOsband1RemiMunos1VolodymyrMnih1Abstracttionsthatmaximizerewardsgivenitscurrentknowledge?Weconsidertheexploration/exploitationprob-Se...
AnEfficient,GeneralizedBellmanUpdateForCooperativeInverseReinforcementLearningDhruvMalik1MalayandiPalaniappan1JaimeF.Fisac1DylanHadfield-Menell1StuartRussell1AncaD.Dragan1AbstractFigure1.ACIRLgame....
ContextualDecisionProcesseswithlowBellmanrankarePAC-LearnableNanJiang1AkshayKrishnamurthy2AlekhAgarwal3JohnLangford3RobertE.Schapire3AbstracteralizeMDPswherethestateformsthecontext(Ex.1)andPOMDPswh...