AdversarialDuelingBanditsAadirupaSaha1TomerKoren2YishayMansour2Abstractregretwithrespecttothebestiteminhindsight,accordingtoacertainscorefunction.WeintroducetheproblemofregretminimizationinAdversar...
AdversarialCombinatorialBanditswithGeneralNon-linearRewardFunctionsXiChen1YanjunHan2YiningWang3Abstractchoosesarewardvectorvt=(vt1,···,vtN)∈[0,1]Nnotrevealedtothealgorithm.Thealgorithmchoosesas...
AdaptingtoMisspecificationinContextualBanditswithOfflineRegressionOraclesSanathKumarKrishnamurthy1VitorHadad2SusanAthey2Abstractwhosedistributionmaydependonthecontextandaction.Theobjectiveofthealgo...
ThompsonSamplingAlgorithmsforMean-VarianceBanditsQiuyuZhu1VincentY.F.Tan123AbstractTheprimaryconcernofthisbodyofliteratureistofindalearningalgorithmwhichcanmaximizetheexpectedcu-Themulti-armedbandi...
TheIntrinsicRobustnessofStochasticBanditstoStrategicManipulationZheFeng1DavidC.Parkes1HaifengXu2Abstractabletomodulateitsownrewardfeedbackinordertofurtheritsownobjective,e.g.,increasingthenumberoft...
StructuredLinearContextualBandits:ASharpandGeometricSmoothedAnalysisVidyashankarSivakumar12ZhiweiStevenWu2ArindamBanerjee2Abstractselectsacontextxtitfromkavailablecontextsxt,...,xt1kBanditlearninga...
StructureAdaptiveAlgorithmsforStochasticBanditsRe´myDegenne1HanShao2WouterM.Koolen3Abstractstartingwithasymptoticresultsinthe80sand90s(Lai&Robbins,1985;Graves&Lai,1997)andmovingtothefi-Westudyrewa...
StochasticBanditswitharm-dependentdelaysAnneGaelManegueu1ClaireVernade2AlexandraCarpentier1MichalValko3AbstractAsaresult,westudystochasticdelayedBanditsforwhichthedelaydistributionsarearm-dependent...
PreselectionBanditsViktorBengs1EykeHu¨llermeier1Abstractadvertising,whereadvertisementsrecommendedtouserscanbeseenasapreselection.Asaconcreteapplication,weInthispaper,weintroducethePreselectionBan...
Non-StationaryDelayedBanditswithIntermediateObservationsClaireVernade1Andra´sGyo¨rgy1TimothyA.Mann1AbstractDelayedfeedbackinonlinelearninghavebeenaddressedbothinthefullinformationsetting(see,e.g....
NeuralContextualBanditswithUCB-basedExplorationDongruoZhou1LihongLi2QuanquanGu1Abstracttheexpectedrewardateachroundislinearinthefeaturevector.Whilesuccessfulinboththeoryandpractice(LiWestudythestoc...
Meta-learningwithStochasticLinearBanditsLeonardoCella12AlessandroLazaric3MassimilianoPontil2AbstractsolidatedMABsettinginwhicheacharmisassociatedwithavectoroffeaturesandthearmpayofffunctionismod-We...
LinearBanditswithStochasticDelayedFeedbackClaireVernade1AlexandraCarpentier2TorLattimore1GiovanniZappella3BeyzaErmis3MichaelBrueckner3Abstractmostadoptedastheyallowtotakeintoaccountthestructureofth...
LearningwithGoodFeatureRepresentationsinBanditsandinRLwithaGenerativeModelTorLattimore1CsabaSzepesva´ri23Gelle´rtWeisz1AbstractforlearninginBandits.TheideasbyDuetal.(2019)suggestthattheanswerisal...
InfluenceDiagramBandits:VariationalThompsonSamplingforStructuredBanditProblemsTongYu1BranislavKveton2ZhengWen3RuiyiZhang4OleJ.Mengshoel15Abstractandnewalgorithmsarenecessaryevenwhenthemodelingassum...
ImprovedSleepingBanditswithStochasticActionsSetsandAdversarialRewardsAadirupaSaha1PierreGaillard2MichalValko3Abstractetal.,2012).Howeverinvariousrealworldapplications,thedecisionspace(setofarmsA)of...
ImprovedOptimisticAlgorithmsforLogisticBanditsLouisFaury12MarcAbeille1Cle´mentCalauze`nes1OlivierFercoq2Abstractetal.(2017)andreferencestherein),itspracticalinterestislimitedbythelinearstructureof...
GamificationofPureExplorationforLinearBanditsRe´myDegenne1PierreMe´nard2XuedongShang3MichalValko4Abstracthighconfidencetoagivenqueryusingasfewsamplesaspossible.Weinvestigateanactivepure-explorati...
FiduciaryBanditsGalBahar1OmerBen-Porat1KevinLeyton-Brown2MosheTennenholtz1Abstractsarial(Aueretal.,1995)andnon-stationary(Besbesetal.,2014;Levineetal.,2017)Bandits.Recommendationsystemsoftenfaceexp...
BeyondUCB:OptimalandEfficientContextualBanditswithRegressionOraclesDylanJ.Foster1AlexanderRakhlin1Abstractible,generalpurposealgorithmsthatworkforarbitrary,user-specifiedclassesofpoliciesandcomewit...