ProvablyOptimalAlgorithmsforGeneralizedLinearContextualBanditsLihongLi1YuLu2DengyongZhou1Abstractetal.,2009;Lietal.,2010;2012).Intheproblemofper-sonalizednewsrecommendation,thewebsitemustrecom-Cont...
OptimalandAdaptiveOff-policyEvaluationinContextualBanditsYu-XiangWang1AlekhAgarwal2MiroslavDudík2Abstractnotscaletoevaluatingmanydifferenttargetpolicies.Westudytheoff-policyevaluationproblem—Off-...
ContextualDecisionProcesseswithlowBellmanrankarePAC-LearnableNanJiang1AkshayKrishnamurthy2AlekhAgarwal3JohnLangford3RobertE.Schapire3AbstracteralizeMDPswherethestateformsthecontext(Ex.1)andPOMDPswh...