AdaptiveEstimatorSelectionforOff-PolicyEvaluationYiSu1PavithraSrinath2AkshayKrishnamurthy2Abstracthighqualityestimationashasbeendemonstratedinrecentempiricalstudies(Voloshinetal.,2019).However,data...
AccountableOff-PolicyEvaluationWithKernelBellmanStatisticsYihaoFeng1TongzhengRen1ZiyangTang1QiangLiu1Abstractdecisions.Off-PolicyevaluationplaysanimportantroleinImportancesampling(IS)providesabasic...
Trajectory-BasedOff-PolicyDeepReinforcementLearningAndreasDoerr123MichaelVolpp1MarcToussaint3SebastianTrimpe2ChristianDaniel1Abstractstandardalgorithmsarevastlydata-inefficientandrelyonmillionsofda...
Off-PolicyDeepReinforcementLearningwithoutExplorationScottFujimoto12DavidMeger12DoinaPrecup12Abstractrequirefurtherinteractionswiththeenvironmenttocom-pensate(Hesteretal.,2017;Sunetal.,2018;Chenget...
MoreEfficientOff-PolicyEvaluationthroughRegularizedTargetedLearningAure´lienF.Bibaut1IvanaMalenica1NikosVlassis2MarkJ.vanderLaan1Abstractinference,andhasledtomanymethodologicaldevelop-ments.Oneoft...
EfficientOff-PolicyMeta-ReinforcementLearningviaProbabilisticContextVariablesKateRakelly1AurickZhou1DeirdreQuillen1ChelseaFinn1SergeyLevine1AbstractFortunately,manyoftheproblemswewouldlikeourau-ton...
CounterfactualOff-PolicyEvaluationwithGumbel-MaxStructuralCausalModelsMichaelOberst1DavidSontag1Abstractoptimistically,arethereliveswhichcouldhavebeensaved?Thisquestionbecomesincreasinglyrelevant,w...
CombiningParametricandNonparametricModelsforOff-PolicyEvaluationOmerGottesman1YaoLiu2ScottSussex1EmmaBrunskill2FinaleDoshi-Velez1Abstractjectoriesundertheevaluationpolicyviastitchingtogetheractualt...
SoftActor-Critic:Off-PolicyMaximumEntropyDeepReinforcementLearningwithaStochasticActorTuomasHaarnoja1AurickZhou1PieterAbbeel1SergeyLevine1Abstractnetworksholdsthepromiseofautomatingawiderangeofdeci...
MoreRobustDoublyRobustOff-PolicyEvaluationMehrdadFarajtabar1YinlamChow2MohammadGhavamzadeh2AbstractSwaminathanetal.2017)andreinforcementlearning(RL)(e.g.,Precupetal.2000a;2001;Paduraru2013;MahmoodW...
OptimalandAdaptiveOff-PolicyEvaluationinContextualBanditsYu-XiangWang1AlekhAgarwal2MiroslavDudík2Abstractnotscaletoevaluatingmanydifferenttargetpolicies.WestudytheOff-Policyevaluationproblem—Off-...
ConsistentOn-LineOff-PolicyEvaluationAssafHallak1ShieMannor1Abstractthetestingpopulation,andsub-optimalpoliciescanhavelifethreateningeffects(Hochbergetal.,2016).OPEcanTheproblemofon-lineOff-Policye...