Sparsity-AgnosticLassoBanditMin-hwanOh1GarudIyengar2AssafZeevi2AbstractthetraditionalMABproblem,herepullinganyonearmprovidessomeinformationabouttheunknownparameterWeconsiderastochasticcontextualban...
ResourceAllocationinMulti-armedBanditExploration:OvercomingSublinearScalingwithAdaptiveParallelismBrijenThananjeyan1KirthevasanKandasamy1IonStoica1MichaelI.Jordan1KenGoldberg1JosephE.Gonzalez1Abstr...
ProblemDependentViewonStructuredThresholdingBanditProblemsJamesCheshire1PierreMe´nard1AlexandraCarpentier1Abstractoferror-i.e.theprobabilitythatthelearnermis-classifiesatleastonearm-andconsiderthe...
ParametricGraphforUnimodalRankingBanditCamille-SovannearyGauthier12RomaricGaudel3ElisaFromont452BoammaniAserLompo6Abstractuserattention.Typicalexamplesofsuchdisplaysare(i)alistofnews,visibleonebyon...
OptimalregretalgorithmforPseudo-1dBanditConvexOptimizationAadirupaSaha1NagarajanNatarajan2PraneethNetrapalli23PrateekJain23Abstracttheproblemhasa"pseudo-1d"structureinthelossfunc-tionsft(w)=t(gt(w;...
IncentivizedBanditLearningwithSelf-ReinforcingUserPreferencesTianchenZhou1JiaLiu1ChaoshengDong2JingyuanDeng2Abstractaccumulatesmorepositivefeedbacks.Forexample,onamovierentalwebsite,currentcustomer...
BestModelIdentification:ARestedBanditFormulationLeonardoCella1MassimilianoPontil12ClaudioGentile3Abstract2002),thefeedbackgeneratedwhenpullinganarmismod-Weintroduceandanalyzeabestarmidentifica-eled...
ReinforcementLearninginFeatureSpace:MatrixBandit,Kernels,andRegretBoundLinF.Yang1MengdiWang2Abstractplayanactiona∈A,whereSandAarethestateandactionspaces.ThenthesystemtransitionstoanotherstateExplo...
OptimisticPolicyOptimizationwithBanditFeedbackYonathanEfroni1LiorShani1AvivRosenberg2ShieMannor1AbstractDuetotheirpopularity,thereisarichliteraturethatpro-videsdifferenttypesoftheoreticalguarantees...
MyFairBandit:DistributedLearningofMax-MinFairnesswithMulti-playerBanditsIlaiBistritz1TavorZ.Baharav1AmirLeshem2NicholasBambos1Abstracttheenvironment.Isthereanalternativethatliesinthegapbetweenthetw...
MultinomialLogitBanditwithLowSwitchingCostKefanDong1YingkaiLi2QinZhang3YuanZhou4Abstractthatno-purchaseisthemostfrequentchoice,whichisverynaturalinretailing.W.l.o.g.,weassumev0=1,andvi1Westudymul...
LearningAdversarialMarkovDecisionProcesseswithBanditFeedbackandUnknownTransitionChiJin1TianchengJin2HaipengLuo2SuvritSra3TianchengYu3AbstractThemajorityoftheliteratureinlearningMDPsassumesstationar...
CombinatorialPureExplorationforDuelingBanditsWeiChen1YihanDu2LongboHuang2HaoyuZhao2Abstracttradeoffinonlinelearning.Thepureexplorationtask(Even-Daretal.,2006;Chen&Li,2016;Sabato,2019)isanInthispape...
OntheDesignofEstimatorsforBanditOff-PolicyEvaluationNikosVlassis1AurelienBibaut2MariaDimakopoulou1TonyJebara1Abstractofgreatinterest:GivenaBanditmodel,whatisalow-riskestimatorofthecounterfactualtar...
ContextualMulti-armedBanditAlgorithmforSemiparametricRewardModelGi-SooKim1MyungheeChoPaik1Abstract(Langfordetal.,2008),newsarticleplacementalgorithms(Lietal.,2010),revenuemanagement(Ferreiraetal.,2...
BanditMulticlassLinearClassification:EfficientAlgorithmsfortheSeparableCaseAlinaBeygelzimer1Da´vidPa´l1Bala´zsSzo¨re´nyi1DevanathanThiruvenkatachari2Chen-YuWei3ChichengZhang4Abstractandreveals...
MinimaxConcavePenalizedMulti-ArmedBanditModelwithHigh-DimensionalConvariatesXueWang1MikeMingchengWei2TaoYao1Abstractexample,doctors(i.e.,decision-makers)canpersonalizetreatmentsforpatients(i.e.,use...
Safety-AwareAlgorithmsforAdversarialContextualBandit122WenSunDebadeeptaDeyAshishKapoorAbstractside-effectofanewtreatmentmustbetakenintoconsidera-tionforpatients’safety.Ingeneraltheseapplicationswi...
EfficientOnlineBanditMulticlassLearningwithO˜(√T)RegretAlinaBeygelzimer1FrancescoOrabona2ChichengZhang3Abstracttakesofthebestpredictorintheclass.Kakadeetal.(2008)proposedaBanditmodificationoftheM...