AccountableOff-PolicyEvaluationWithKernelBellmanStatisticsYihaoFeng1TongzhengRen1ZiyangTang1QiangLiu1Abstractdecisions.Off-policyevaluationplaysanimportantroleinImportancesampling(IS)providesabasic...
PolicyCertificates:TowardsAccountableReinforcementLearningChristophDann1LihongLi2WeiWei2EmmaBrunskill3Abstractploration.Evensharpdropsinpolicyperformanceduringlearningarecommon,e.g.,whentheagentsta...