ProvablyEfficientExplorationinPolicyOptimizationQiCai1ZhuoranYang2ChiJin3ZhaoranWang1Abstractofiterations,evengiveninfinitedata.Meanwhile,fromthestatisticalperspective,itremainsunclearhowtoattainWh...
No-RegretExplorationinGoal-OrientedReinforcementLearningJeanTarbouriech12EvrardGarcelon1MichalValko2MatteoPirotta1AlessandroLazaric1Abstractlengthofanepisode(i.e.,thetimetoreachthegoalstate)isunkno...
NeuralContextualBanditswithUCB-basedExplorationDongruoZhou1LihongLi2QuanquanGu1Abstracttheexpectedrewardateachroundislinearinthefeaturevector.Whilesuccessfulinboththeoryandpractice(LiWestudythestoc...
NaiveExplorationisOptimalforOnlineLQRMaxSimchowitz1DylanJ.Foster2Abstractdevelopanon-asymptotictheoryofdata-drivencontinuouscontrol,withanemphasisonunderstandingkeyalgorithmicWeconsidertheproblemof...
MaximumEntropyGainExplorationforLongHorizonMulti-goalReinforcementLearningSilviuPitis12HarrisChan12StephenZhao1BradlyStadie2JimmyBa12AbstractInthispaper,weimproveuponexistingapproachestointrin-sicg...
ImplicitGenerativeModelingforEfficientExplorationNealeRatzlaff1QinxunBai2LiFuxin1WeiXu2Abstractficiency.Agentsareoftentrainedformillions,orevenbil-lionsofsimulationstepsbeforeachievingreasonableper...
GamificationofPureExplorationforLinearBanditsRe´myDegenne1PierreMe´nard2XuedongShang3MichalValko4Abstracthighconfidencetoagivenqueryusingasfewsamplesaspossible.Weinvestigateanactivepure-explorati...
ExplorationThroughRewardBiasing:Reward-BiasedMaximumLikelihoodEstimationforStochasticMulti-ArmedBanditsXiLiu1Ping-ChunHsieh2Yu-HengHung2AnirbanBhattacharya3P.R.Kumar1Abstractandthenappliestheaction...
EfficientOptimisticExplorationinLinear-QuadraticRegulatorsviaLagrangianRelaxationMarcAbeille1AlessandroLazaric2AbstractConfidence-basedExploration.Bittantietal.(2006)intro-ducedanadaptivecontrolsys...
EffcientContinuousParetoExplorationinMulti-TaskLearningPingchuanMa1TaoDu1WojciechMatusik1Abstractgiverisetoasetofsolutions,knownastheParetoset,withvaryingpreferencesondifferentobjectives.Tasksinmul...
CombinatorialPureExplorationforDuelingBanditsWeiChen1YihanDu2LongboHuang2HaoyuZhao2Abstracttradeoffinonlinelearning.ThepureExplorationtask(Even-Daretal.,2006;Chen&Li,2016;Sabato,2019)isanInthispape...
Self-SupervisedExplorationviaDisagreementDeepakPathak1DhirajGandhi2AbhinavGupta23Abstracttotheagentaresparse.Thecommonapproachtoexplo-rationhasbeentogenerate“intrinsic”rewards,i.e.,rewardsEfficie...
ProvablyEfficientMaximumEntropyExplorationEladHazan12ShamM.Kakade342KaranSingh12AbbyVanSoest12Abstractsuchaslearningwithintrinsicrewardandcuriositydrivenmethods,surveyedbelow.Ourworkstudiesaclassof...
Off-PolicyDeepReinforcementLearningwithoutExplorationScottFujimoto12DavidMeger12DoinaPrecup12Abstractrequirefurtherinteractionswiththeenvironmenttocom-pensate(Hesteretal.,2017;Sunetal.,2018;Chenget...
Model-BasedActiveExplorationPranavShyam1WojciechJas´kowski1FaustinoGomez1AbstractThisapproachisinherentlymorepowerfulthanreactiveex-ploration,butrequiresamethodtopredicttheconsequencesEfficientexp...
GarbageIn,RewardOut:BootstrappingExplorationinMulti-ArmedBanditsBranislavKveton1CsabaSzepesva´ri23SharanVaswani4ZhengWen5MohammadGhavamzadeh6TorLattimore2Abstract2013b)isageneralizationofamulti-ar...
ExplorationConsciousReinforcementLearningRevisitedLiorShani1YonathanEfroni1ShieMannor1AbstractRL,i.e,whenusingfunctionapproximation,remainsanopenproblem.Onthepracticalside,recentworkscom-TheExplora...
EMI:ExplorationwithMutualInformationHyoungseokKim12JaekyeomKim12YeonwooJeong12SergeyLevine3HyunOhSong12Abstract83Reinforcementlearningalgorithmsstrugglewhen177therewardsignalisverysparse.Inthesecas...
DistributionalMultivariatePolicyEvaluationandExplorationwiththeBellmanGANDrorFreirich1TzahiShimkin1RonMeir1AvivTamar2Abstracting(DiRL)approach,wherethevaluedistribution,ratherthantheexpectationarel...
DistributionalReinforcementLearningforEfficientExplorationBorislavMavrin12HengshuaiYao3LinglongKong12KaiwenWu4YaoliangYu4AbstractDeterministicenvironmentIndistributionalreinforcementlearning(RL),th...