UneVEn:UniversalValueExplorationforMulti-AgentReinforcementLearningTarunGupta1AnujMahajan1BeiPeng1WendelinBo¨hmer2ShimonWhiteson1Abstractfactorization,thejointactionvaluefunctioncanbedecen-trallym...
Task-OptimalExplorationinLinearDynamicalSystemsAndrewWagenmaker1MaxSimchowitz2KevinJamieson1Abstracttaintyabouttheenvironment,anaivestrategymightbetoexploretheenvironmentuntilitisuniformlyunderstoo...
SkillDiscoveryforExplorationandPlanningusingDeepSkillGraphsAkhilBagaria1JasonSenthil1GeorgeKonidaris1AbstractWeintroduceanewskill-discoveryalgorithmthatbuildsadiscretegraphrepresentationoflargecon-...
RobustPureExplorationinLinearBanditswithLimitedBudgetAyyaAlieva1AshokCutkosky2AbhimanyuDas3AbstracttheExplorationphaseshouldbesomehowefficient-wewishtomakethebestuseofourlimitedbudgetinordertoWecon...
ResourceAllocationinMulti-armedBanditExploration:OvercomingSublinearScalingwithAdaptiveParallelismBrijenThananjeyan1KirthevasanKandasamy1IonStoica1MichaelI.Jordan1KenGoldberg1JosephE.Gonzalez1Abstr...
PureExplorationandRegretMinimizationinMatchingBanditsFloreSentenac1JialinYi2Cle´mentCalauze`nes3VianneyPerchet4MilanVojnovic´2Abstractonlineadvertising,wheretheprobabilitythatauserclicksonanaddep...
ProvablyCorrectOptimizationandExplorationwithNon-linearPoliciesFeiFeng1WotaoYin1AlekhAgarwal2LinYang3Abstractrer&Geist,2014;Geistetal.,2019;Abbasi-Yadkorietal.,2019;Agarwaletal.,2020c;Bhandari&Russ...
PrincipledExplorationviaOptimisticBootstrappingandBackwardInductionChenjiaBai1LingxiaoWang2LeiHan3JianyeHao4AnimeshGarg5PengLiu1ZhaoranWang2Abstract2007;Jinetal.,2018)isaprincipledapproachforeffici...
RandomizedExplorationforReinforcementLearningwithGeneralValueFunctionApproximationHaqueIshfaq12QiwenCui3VietNguyen12AlexAyoub4ZhuoranYang5ZhaoranWang6DoinaPrecup127LinF.Yang8Abstractwhengeneralfunc...
Multi-layeredNetworkExplorationviaRandomWalks:FromOfflineOptimizationtoOnlineLearningXutongLiu1JinhangZuo2XiaoweiChen3WeiChen4JohnC.S.Lui1AbstractusedasaneffectivetoolfornetworkExploration(Lvetal.,...
MetaCURE:MetaReinforcementLearningwithEmpowerment-DrivenExplorationJinZhang1JianhaoWang1HaoHu1TongChen1YingfengChen2ChangjieFan2ChongjieZhang1Abstractwithsparserewardsremainschallenging,astask-rele...
LocallyPersistentExplorationinContinuousControlTaskswithSparseRewardsSusanAmin12MaziarGomrokchi12HosseinAboutalebi34HarshSajita12DoinaPrecup12AbstractcallforacleverExplorationstrategythatexposesthe...
GuidedExplorationwithProximalPolicyOptimizationusingaSingleDemonstrationGabrieleLibardi1SebastianDittert1GianniDeFabritiis12AbstractLearningfromdemonstrationsallowstodirectlybypassthisproblembutito...
FastactivelearningforpureExplorationinreinforcementlearningPierreMénard1OmarDarwicheDomingues2EmilieKaufmann23AndersJonsson4EdouardLeurent2MichalValko235Abstracthowtoexploreefficiently.Inparticula...
ExplorationinApproximateHyper-StateSpaceforMetaReinforcementLearningLuisaZintgraf1LeoFeng2CongLu1MaximilianIgl1KristianHartikainen1KatjaHofmann3ShimonWhiteson1AbstractFigure1.IllustrationoftheMeta-...
DeepCoherentExplorationforContinuousControlYijieZhang1HerkevanHoof2Abstractstrategiesandundirectedstrategies(Thrun,1992;Plappertetal.,2018).Whiledirectedstrategiesaimtoextractuse-Inpolicysearchmeth...
DecouplingExplorationandExploitationforMeta-ReinforcementLearningwithoutSacrificesEvanZheranLiu1AditiRaghunathan1PercyLiang1ChelseaFinn1Abstractanewkitchen(theenvironment)afterithaslearnedtocookoth...
CooperativeExplorationforMulti-AgentDeepReinforcementLearningIou-JenLiu1UnnatJain1RaymondA.Yeh1AlexanderG.Schwing1Abstract(MADDPG)(Loweetal.,2017),andcounterfactualmulti-agentpolicygradients(COMA)(...
TighteningExplorationinUpperConfidenceReinforcementLearningHippolyteBourel1Odalric-AmbrymMaillard1MohammadSadeghTalebi2Abstract1.IntroductionTheupperconfidencereinforcementlearningInthispaper,wecon...
Reward-FreeExplorationforReinforcementLearningChiJin1AkshayKrishnamurthy2MaxSimchowitz3TianchengYu4AbstractExplorationiswidelyregardedasthemostsignificantchal-lengeinRL,becausetheagentmayhavetotake...