OptimalThompsonSamplingstrategiesforsupport-awareCVaRbanditsDorianBaudry1RomainGautron23EmilieKaufmann1Odalric-AmbrymMaillard1AbstractValueatRisk(CVaR)aswellasmoregenericcoherentspec-tralriskmeasur...
MOTS:MinimaxOptimalThompsonSamplingTianyuanJin1PanXu2JiemingShi3XiaokuiXiao1QuanquanGu2Abstractplayingthebestarmandplayingthearmaccordingtothestrategy,whichisalsocalledtheregretofabanditstrategy.Th...
ThompsonSamplingviaLocalUncertaintyZhendongWang1MingyuanZhou1Abstractbecomeacommonpractice.Sincethemodeltraininganddatacollectionusuallyhappenatthesametime,themodelThompsonsamplingisanefficientalgo...
ThompsonSamplingAlgorithmsforMean-VarianceBanditsQiuyuZhu1VincentY.F.Tan123AbstractTheprimaryconcernofthisbodyofliteratureistofindalearningalgorithmwhichcanmaximizetheexpectedcu-Themulti-armedbandi...
OnThompsonSamplingwithLangevinAlgorithmsEricMazumdar1AldoPacchiano1Yi-AnMa23PeterL.Bartlett14MichaelI.Jordan14Abstractexploitationtradeoffs(Aueretal.,2002;LattimoreandSzepesva´ri,2020),whereinanal...
ThompsonSamplingforCombinatorialSemi-BanditsSiweiWang1WeiChen2AbstractdifferenceoverTstepsbetweenalwaysplayingthearmwiththeoptimalexpectedrewardandplayingthearmsWestudytheapplicationoftheThompsonsa...
RacingThompson:anEfficientAlgorithmforThompsonSamplingwithNon-conjugatePriorsYichiZhou1JunZhu1JingweZhuo1AbstractAsoneofthemostimportantproblemsinlearninganddecision-makinginunknownenvironments,MAB...
ImprovedRegretBoundsforThompsonSamplinginLinearQuadraticControlProblemsMarcAbeille1AlessandroLazaric2Abstracthasbeenmostlyaddressedfollowingtwomainapproaches:optimism-in-face-of-uncertainty(OFU)and...
ParallelandDistributedThompsonSamplingforLarge-scaleAcceleratedExplorationofChemicalSpaceJose´MiguelHerna´ndez-Lobato1JamesRequeima12EdwardO.Pyzer-Knapp34Ala´nAspuru-Guzik3Abstractcompoundsandpo...