UneVEn:UniversalValueExplorationforMulti-AgentReinforcementLearningTarunGupta1AnujMahajan1BeiPeng1WendelinBo¨hmer2ShimonWhiteson1Abstractfactorization,thejointactionvaluefunctioncanbedecen-trallym...
UncertaintyWeightedActor-CriticforOfflineReinforcementLearningYueWu12ShuangfeiZhai1NitishSrivastava1JoshuaSusskind1JianZhang1RuslanSalakhutdinov2HanlinGoh1Abstractleveragingpriorexperience(Langeeta...
TowardsBetterLaplacianRepresentationinReinforcementLearningwithGeneralizedGraphDrawingKaixinWang1KuangqiZhou1QixinZhang2JieShao3BryanHooi1JiashiFeng1AbstractFigure1.VisualizationofenvironmentandLap...
Tesseract:TensorisedActorsforMulti-AgentReinforcementLearningAnujMahajan1MikayelSamvelyan2LeiMao3ViktorMakoviychuk3AnimeshGarg3JeanKossaifi3ShimonWhiteson1YukeZhu3AnimashreeAnandkumar3Abstractarise...
StructuredWorldBeliefforReinforcementLearninginPOMDPGautamSingh1SkandPeri1JunghyunKim1HyunseokKim2SungjinAhn13Abstractgeneralizationtonovelscenes(Chenetal.,2020).Object-centricworldmodelsprovidestr...
SpectralNormalisationforDeepReinforcementLearning:AnOptimisationPerspectiveFlorinGogianu12TudorBerariu3MihaelaRosca45ClaudiaClopath34LucianBusoniu2RazvanPascanu4AbstractFigure1:Optimisationrivalsal...
SparseFeatureSelectionMakesBatchReinforcementLearningMoreSampleEfficientBotaoHao1YaqiDuan2TorLattimore1CsabaSzepesva´ri13MengdiWang21Abstract1.IntroductionThispaperprovidesastatisticalanalysisofhi...
Shortest-PathConstrainedReinforcementLearningforSparseRewardTasksSungryullSohn12SungtaeLee3JongwookChoi1HarmvanSeijen4MehdiFatemi4HonglakLee21AbstractMoreover,thesuccessoftheRLalgorithmheavilyhinge...
Self-PacedContextEvaluationforContextualReinforcementLearningTheresaEimer1Andre´Biedenkapp2FrankHutter23MariusLindauer1AbstractFigure1:ExampleinstancesofthecontextualPointMassenvironment.Theagent...
SCC:anEfficientDeepReinforcementLearningAgentMasteringtheGameofStarCraftIIXiangjunWang1JunxiaoSong1PenghuiQi1PengPeng1ZhenkunTang1WeiZhang1WeiminLi1XiongjunPi1JujieHe1ChaoGao1HaitaoLong1QuanYuan1Ab...
ScalingMulti-AgentReinforcementLearningwithSelectiveParameterSharingFilipposChristianos1GeorgiosPapoudakis1ArrasyRahman1StefanoV.Albrecht1Abstract(e.g.(Guptaetal.,2017))wherebyagentssharesomeorallp...
ScalableEvaluationofMulti-AgentReinforcementLearningwithMeltingPotJoelZ.Leibo1EdgarDue´n˜ez-Guzma´n1AlexanderSashaVezhnevets1JohnP.Agapiou1PeterSunehag1RaphaelKoster1JaydMatyas1CharlesBeattie1Ig...
SampleEfficientReinforcementLearningInContinuousStateSpaces:APerspectiveBeyondLinearityDhruvMalik1AldoPacchiano2VishwakSrinivasan1YuanzhiLi1Abstractsuchabenchmark(Bellemareetal.,2013).Agentstrained...
SafeReinforcementLearningwithLinearFunctionApproximationSanaeAmani1ChristosThrampoulidis2LinF.Yang1Abstractactionmayleadtocatastrophicresults.Thus,safetyinRLhasbecomeaseriousissuethatrestrictstheap...
SafeReinforcementLearningUsingAdvantage-BasedInterventionNolanWagener1ByronBoots2Ching-AnCheng3AbstractFigure1.Advantage-basedinterventionofSAILRandconstruc-tionofthesurrogateMDPM.InM,wheneverthepo...
RRL:ResnetasrepresentationforReinforcementLearningRutavShah1VikashKumar2AbstractSupervisedLearningTheabilitytoautonomouslylearnbehaviorsviaReinforcementdirectinteractionsinuninstrumentedenviron-Lea...
RobustReinforcementLearningusingLeastSquaresPolicyIterationwithProvablePerformanceGuaranteesKishanPanaganti1DileepKalathil1AbstractThismismatchbetweenthetrainingandtestingenvironmentparameterscansi...
Risk-SensitiveReinforcementLearningwithFunctionApproximation:ADebiasingApproachYingjieFei1ZhuoranYang2ZhaoranWang1Abstractrisk-seekingobjectiveandβ<0inducesarisk-averseone.ItcanalsobeseenthatVβte...
RewardIdentificationinInverseReinforcementLearningKunoKim1KirankumarShiragur1ShivamGarg1StefanoErmon1AbstractMDPstobuildcomputationalmodels(Niv,2009)ofreal-world,rationaldecisionmakerssuchasinvesto...
RevisitingPeng’sQ(λ)forModernReinforcementLearningTadashiKozuno1YunhaoTang2MarkRowland3Re´miMunos4StevenKapturowski3WillDabney3MichalValko4DavidAbel3Abstract1996;Watkins,1989;Peng&Williams,1994;...