OptimisticPolicyOptimizationwithBanditFeedbackYonathanEfroni1LiorShani1AvivRosenberg2ShieMannor1AbstractDuetotheirpopularity,thereisarichliteraturethatpro-videsdifferenttypesoftheoreticalguarantees...
OnlineMulti-KernelLearningwithGraph-StructuredFeedbackPouyaMGhari1YanningShen1Abstractwhilethedata-drivenmulti-kernellearning(MKL)approachismorepowerful,asitlearnstheoptimalkernelfromadic-Multi-ker...
OnlineLearningwithDependentStochasticFeedbackGraphsCorinnaCortes1GiuliaDeSalvo1ClaudioGentile1MehryarMohri1NingshanZhang2AbstractofonlinelearningintroducedbyMannor&Shamir(2011),wherelossobservabili...
OnlineDenseSubgraphDiscoveryviaBlurred-GraphFeedbackYukoKuroki12AtsushiMiyauchi12JunyaHonda12MasashiSugiyama21Abstractsity),whichisdefinedashalftheaveragedegreeofthesub-graphinducedbythesubset.Unli...
LinearBanditswithStochasticDelayedFeedbackClaireVernade1AlexandraCarpentier2TorLattimore1GiovanniZappella3BeyzaErmis3MichaelBrueckner3Abstractmostadoptedastheyallowtotakeintoaccountthestructureofth...
Graph-based,Self-SupervisedProgramRepairfromDiagnosticFeedbackMichihiroYasunaga1PercyLiang1LSTMLASTbMstractLSTMBrokenProgramEvaluator(compiler)WeconsidertheprLoSbTlMemoflearnLiSnTgMtorepairpro-(`ch...
OnlineLearningwithSleepingExpertsandFeedbackGraphsCorinnaCortes1GiuliaDeSalvo1ClaudioGentile1MehryarMohri12ScottYang3Abstractworkforonlinelearningwheretheactionlossesthatareobservabletothelearnerar...
ErrorFeedbackFixesSignSGDandotherGradientCompressionSchemesSaiPraneethKarimireddy1QuentinRebjock1SebastianU.Stich1MartinJaggi1AbstractAlgorithm1EF-SIGNSGD(SIGNSGDwithError-Feedb.)Sign-basedalgorith...
BanditswithDelayed,AggregatedAnonymousFeedbackCiaraPike-Burke1ShipraAgrawal2CsabaSzepesvári34SteffenGrünewälder1AbstractoftheKpossiblearms.IntheclassicstochasticMABset-ting,theplayerimmediatelyo...
InteractiveLearningfromPolicy-DependentHumanFeedbackJamesMacGlashan1MarkKHo2RobertLoftin3BeiPeng4GuanWang2DavidL.Roberts3MatthewE.Taylor4MichaelL.Littman2Abstractbehaviorusingthesesimplesignals.Ind...