StateRelevanceforOff-PolicyEvaluationSimonP.Shen1YechengJasonMa2OmerGottesman3FinaleDoshi-Velez1Abstractimportantasmanydomainshavetrajectorieswithdifferentlengths:inhealthsettings,patients’lengtho...
Self-PacedContextEvaluationforContextualReinforcementLearningTheresaEimer1Andre´Biedenkapp2FrankHutter23MariusLindauer1AbstractFigure1:ExampleinstancesofthecontextualPointMassenvironment.Theagent...
ScalableEvaluationofMulti-AgentReinforcementLearningwithMeltingPotJoelZ.Leibo1EdgarDue´n˜ez-Guzma´n1AlexanderSashaVezhnevets1JohnP.Agapiou1PeterSunehag1RaphaelKoster1JaydMatyas1CharlesBeattie1Ig...
OptimalOff-PolicyEvaluationfromMultipleLoggingPoliciesNathanKallus1YutaSaito1MasatoshiUehara1AbstractInmostoftheabovestudies,theobservationsusedtoevalu-ateanewpolicyareassumedgeneratedbyasinglelogg...
Model-FreeandModel-BasedPolicyEvaluationwhenCausalityisUncertainDavidBruns-Smith1Abstractunobservedshocksareoftenassumedtobedrawniidev-eryperiod.ConsidertheFederalReserveBoardadjustingWhendecision-...
MANDOLINE:ModelEvaluationunderDistributionShiftMayeeChen1KaranGoel1NimitSohoni2FaitPoms1KayvonFatahalian1ChristopherRe´1Abstracttionerstodetermineiftheirmodelswillperformwellwhendeployed.Unfortuna...
GeomCA:GeometricEvaluationofDataRepresentationsPetraPoklukar1AnastasiaVarava1DanicaKragic1Abstractlearningandrobotics,usefulnessofrepresentationsisevalu-atedontheperformanceofthepolicy(Ghadirzadehe...
Average-RewardOff-PolicyPolicyEvaluationwithFunctionApproximationShangtongZhang1YiWan2RichardS.Sutton2ShimonWhiteson1Abstractwhichaimtogenerateapolicythatmaximizestherewardratebyiterativelyimprovin...
ActiveTesting:Sample–EfficientModelEvaluationJannikKossen1SebastianFarquhar1YarinGal1TomRainforth2AbstractDifferencetoFullTestLoss×10−2I.I.D.Acquisition5ActiveTestingWeintroduceanewframeworkfors...
ReliableEvaluationofAdversarialRobustnesswithanEnsembleofDiverseParameter-freeAttacksFrancescoCroce1MatthiasHein1Abstractvariationsareusingotherlosses(Zhangetal.,2019b)andboostrobustnessviagenerati...
OntheRelationbetweenQuality-DiversityEvaluationandDistribution-FittingGoalinTextGenerationJianingLi12YanyanLan12JiafengGuo12XueqiCheng12Abstractbymaximumlikelihoodestimation(MLE)(Mikolovetal.,2010)...
Minimax-OptimalOff-PolicyEvaluationwithLinearFunctionApproximationYaqiDuan1ZeyuJia2MengdiWang34Abstractvalue)tobeearnedbyanewpolicybasedonloggedhistory.Thispaperstudiesthestatisticaltheoryofoff-Int...
InterpretableOff-PolicyEvaluationinReinforcementLearningbyHighlightingInfluentialTransitionsOmerGottesman1JosephFutoma1YaoLiu2SonaliParbhoo1LeoAnthonyCeli3EmmaBrunskill2FinaleDoshi-Velez1Abstractan...
Doublyrobustoff-policyEvaluationwithshrinkageYiSu1MariaDimakopoulou2AkshayKrishnamurthy3MiroslavDud´ık3Abstractsubroutinesforoptimizingapolicy(Dud´ıketal.,2011).Weproposeanewframeworkfordesigni...
DistributionallyRobustPolicyEvaluationandLearninginOfflineContextualBanditsNianSi1FanZhang1ZhengyuanZhou2JoseBlanchet1Abstractnomenonintheseapplications,canbeintelligentlyexploitedtoachievebetterou...
AdaptiveEstimatorSelectionforOff-PolicyEvaluationYiSu1PavithraSrinath2AkshayKrishnamurthy2Abstracthighqualityestimationashasbeendemonstratedinrecentempiricalstudies(Voloshinetal.,2019).However,data...
AccountableOff-PolicyEvaluationWithKernelBellmanStatisticsYihaoFeng1TongzhengRen1ZiyangTang1QiangLiu1Abstractdecisions.Off-policyEvaluationplaysanimportantroleinImportancesampling(IS)providesabasic...
RehashingKernelEvaluationinHighDimensionsParisSiminelakis1KexinRong1PeterBailis1MosesCharikar1PhilipLevis1Abstract(a)kernel(b)difficultcase(c)simplecaseKernelmethodsareeffectivebutdonotscalewellFig...
MoreEfficientOff-PolicyEvaluationthroughRegularizedTargetedLearningAure´lienF.Bibaut1IvanaMalenica1NikosVlassis2MarkJ.vanderLaan1Abstractinference,andhasledtomanymethodologicaldevelop-ments.Oneoft...
ImportanceSamplingPolicyEvaluationwithanEstimatedBehaviorPolicyJosiahP.Hanna1ScottNiekum1PeterStone1Abstractdeterminetheexpectedreturn–sumofrewards–thatanEvaluationpolicy,πe,willobtainwhendeploy...