HierarchicalImitationandReinforcementLearningHoangM.Le1NanJiang2AlekhAgarwal2MiroslavDud´ık2YisongYue1HalDaume´III32AbstractficiencyinRLoverlongtimehorizonsistoexploithierar-chicalstructureofthe...
FullyDecentralizedMulti-AgentReinforcementLearningwithNetworkedAgentsKaiqingZhang1ZhuoranYang2HanLiu3TongZhang4TamerBas¸ar1Abstractagent.Inaddition,theagentsareallowedtoobserveonlyitsownreward,whi...
Feedback-BasedTreeSearchforReinforcementLearningDanielR.Jiang1EmmanuelEkwedike23HanLiu24Abstractleaf-nodeevaluators(eitherapolicyfunction(Chaslotetal.,2006)rollout,avaluefunctionevaluation(Campbell...
End-to-endActiveObjectTrackingviaReinforcementLearningWenhanLuo1PengSun1FangweiZhong2WeiLiu1TongZhang1YizhouWang2AbstractActionActionActiveTrackerCameraControlWestudyactiveobjecttracking,whereatrac...
EfficientModel–BasedDeepReinforcementLearningwithVariationalStateTabulationDaneCorneil1WulframGerstner1JohanniBrea1Abstractstates(e.g.Mnihetal.(2015;2016))andlearningapproxi-matedynamicstoperformp...
EfficientBias-Span-ConstrainedExploration-ExploitationinReinforcementLearningRonanFruit1MatteoPirotta1AlessandroLazaric2RonaldOrtner3Abstractand,ateachstep,itexecutesthepolicywithhighestopti-mistic...
DeepVariationalReinforcementLearningforPOMDPsMaximilianIgl1LuisaZintgraf1TuanAnhLe1FrankWood2ShimonWhiteson1Abstract(a)RNN-basedapproach.TheRNNactsasanencoderfortheaction-observationhistory,onwhich...
DeepReinforcementLearninginContinuousActionSpaces:aCaseStudyintheGameofSimulatedCurlingKyowoonLee1Sol-AKim1JaesikChoi1Seong-WhanLee2Abstract1992),andothello(Buro,1999).Recently,deepconvolu-tionalne...
CoordinatedExplorationinConcurrentReinforcementLearningMariaDimakopoulou1BenjaminVanRoy1Abstractandrefinesestimatesasdataisgathered.Atthestartofeachepisode,theagentsamplesanMDPfromitscurrentposte-W...
ContinualReinforcementLearningwithComplexSynapsesChristosKaplanis12MurrayShanahan13ClaudiaClopath2Abstractoldmemories-aparadoxoftenreferredtoasthestability-plasticitydilemma(Carpenter&Grossberg,198...
CompetitiveMulti-agentInverseReinforcementLearningwithSub-optimalDemonstrationsXingyuWang1DiegoKlabjan1Abstractoftherewardfunction,oratleastobservationsofimmediatereward.Somelearningtasks,however,p...
CanDeepReinforcementLearningSolveErdos-Selfridge-SpencerGames?MaithraRaghu12AlexIrpan1JacobAndreas3RobertKleinberg2QuocLe1JonKleinberg2Abstractbehaviorisdifficult.Optimalbehaviorintheseenviron-ment...
BeyondtheOne-StepGreedyApproachinReinforcementLearningYonathanEfroni1GalDalal1BrunoScherrer2ShieMannor1Abstractsuggestedthatgreedyapproachesw.r.t.multiplestepsper-formbetterthanw.r.t.1-step.Notable...
AutomaticGoalGenerationforReinforcementLearningAgentsCarlosFlorensa1DavidHeld2XinyangGeng1PieterAbbeel13AbstracttodefeatachampionGoplayer(Silveretal.,2016),tooutperformhumansin49Atarigames(Guoetal....
ALaplacianFrameworkforOptionDiscoveryinReinforcementLearningMarlosC.Machado1MarcG.Bellemare2MichaelBowling1Abstracttheoptimalpolicyforthatrewardfunction.Inthispaperweintroduceanalgorithmforoptiondi...
ADistributionalPerspectiveonReinforcementLearningMarcG.Bellemare1WillDabney1Re´miMunos1Abstractmentlearning.Specifically,themainobjectofourstudyistherandomreturnZwhoseexpectationisthevalueQ.ThisIn...
Zero-ShotTaskGeneralizationwithMulti-TaskDeepReinforcementLearningJunhyukOh1SatinderSingh1HonglakLee12PushmeetKohli3AbstractFigure1:Exampleof3Dworldandinstructions.Theagentistaskedtoexecutelongerse...
UnifyingTaskSpecificationinReinforcementLearningMarthaWhite1Abstractjectives,includingoptions(Suttonetal.,1999),state-baseddiscounting(Sutton,1995;Suttonetal.,2011)andinter-Reinforcementlearningtas...
StabilisingExperienceReplayforDeepMulti-AgentReinforcementLearningJakobFoerster1NantasNardelli1GregoryFarquhar1TriantafyllosAfouras1Philip.H.S.Torr1PushmeetKohli2ShimonWhiteson1Abstractmulti-agents...
RobustAdversarialReinforcementLearningLerrelPinto1JamesDavidson2RahulSukthankar3AbhinavGupta13Abstractpolicy-learningmethodsistheirrelianceondata:train-inghigh-capacitymodelsrequireshugeamountsoftr...