AnOptimisticPerspectiveonOfflineReinforcementLearningRishabhAgarwal1DaleSchuurmans12MohammadNorouzi1Abstractunsafe,orrequireahigh-fidelitysimulatorthatisoftendiffi-culttobuild(Dulac-Arnoldetal.,201...
AdaptiveReward-PoisoningAttacksagainstReinforcementLearningXuezhouZhang1YuzheMa1AdishSingla2XiaojinZhu1AbstractgroupofTwitteruserswhodeliberatelytaughtitmisogynis-ticandracistremarksshortlyafterits...
TransferLearningforRelatedReinforcementLearningTasksviaImage-to-ImageTranslationShaniGamrian1YoavGoldberg12Abstractprocessofanewtaskhastobeperformedfromscratchevenforarelatedone.Recentworkshavetrie...
Trajectory-BasedOff-PolicyDeepReinforcementLearningAndreasDoerr123MichaelVolpp1MarcToussaint3SebastianTrimpe2ChristianDaniel1Abstractstandardalgorithmsarevastlydata-inefficientandrelyonmillionsofda...
TighterProblem-DependentRegretBoundsinReinforcementLearningwithoutDomainKnowledgeusingValueFunctionBoundsAndreaZanette1EmmaBrunskill2AbstractFortunatelyinpracticeReinforcementlearningalgorithmsof-t...
TheValueFunctionPolytopeinReinforcementLearningRobertDadashi1AdrienAliTa¨ıga12NicolasLeRoux1DaleSchuurmans13MarcG.Bellemare1AbstractLinetheorem.Weshowthatpoliciesthatagreeonallbutonestategenerate...
Task-AgnosticDynamicsPriorsforDeepReinforcementLearningYilunDu1KarthikNarasimhan2Abstracttt+1Whilemodel-baseddeepReinforcementlearningFigure1.Twodifferentenvironmentswithobjectdynamicsthat(RL)holds...
StatisticsandSamplesinDistributionalReinforcementLearningMarkRowland1RobertDadashi2SaurabhKumar2Re´miMunos1MarcG.Bellemare2WillDabney1AbstractthatDRLalgorithmscanbeviewedascombiningastatisti-cales...
SOLAR:DeepStructuredRepresentationsforModel-BasedReinforcementLearningMarvinZhang1SharadVikram2LauraSmith1PieterAbbeel1MatthewJ.Johnson3SergeyLevine1AbstractFigure1.Ourmethodcanlearnpoliciesforcomp...
ReinforcementLearninginConfigurableContinuousEnvironmentsAlbertoMariaMetelli1EmanueleGhelfi1MarcelloRestelli1AbstractasaConfigurableMarkovDecisionProcess(Conf-MDP,Metellietal.,2018).Asintraditional...
QuantifyingGeneralizationinReinforcementLearningKarlCobbe1OlegKlimov1ChrisHesse1TaehoonKim1JohnSchulman1Abstract(Nicholetal.,2018),weseektobetterquantifyanagent’sabilitytogeneralize.Inthispaper,we...
PolicyConsolidationforContinualReinforcementLearningChristosKaplanis12MurrayShanahan13ClaudiaClopath2Abstractwaythatcannotbediscretisedeasilyintoseparatetasks.InReinforcementlearning(RL),forexample...
PolicyCertificates:TowardsAccountableReinforcementLearningChristophDann1LihongLi2WeiWei2EmmaBrunskill3Abstractploration.Evensharpdropsinpolicyperformanceduringlearningarecommon,e.g.,whentheagentsta...
OntheGeneralizationGapinReparameterizableReinforcementLearningHuanWang1StephanZheng1CaimingXiong1RichardSocher1Abstract2018a).Amodelthatperformswellinthetrainingenvi-ronment,mayormaynotperformwellw...
Off-PolicyDeepReinforcementLearningwithoutExplorationScottFujimoto12DavidMeger12DoinaPrecup12Abstractrequirefurtherinteractionswiththeenvironmenttocom-pensate(Hesteretal.,2017;Sunetal.,2018;Chenget...
NeuralLogicReinforcementLearningZhengyaoJiang1ShanLuo1Abstract(Doshi-Velez&Kim,2017)definesinterpretabilityastheabilitytoexplainortopresentthedecisioninunderstand-DeepReinforcementlearning(DRL)hasa...
Multi-AgentAdversarialInverseReinforcementLearningLantaoYu1JiamingSong1StefanoErmon1Abstractever,thesuccessofRLcruciallydependsoncarefulrewarddesign(Amodeietal.,2016).AsReinforcementlearningReinfor...
MaximumEntropy-RegularizedMulti-GoalReinforcementLearningRuiZhao12XudongSun1VolkerTresp12AbstractOneofthebiggestchallengesinRListomaketheagentlearnefficientlyinapplicationswithsparserewards.ToInMul...
LearningActionRepresentationsforReinforcementLearningYashChandak1GeorgiosTheocharous2JamesE.Kostas1ScottM.Jordan1PhilipS.Thomas1AbstractFigure1.Thestructureoftheproposedoverallpolicy,πo,consist-in...
LearningaPrioroverIntentviaMeta-InverseReinforcementLearningKelvinXu1EllisRatner1AncaDragan1SergeyLevine1ChelseaFinn1AbstractFigure1.Adiagramofourmeta-inverseRLapproach.Ourap-proachattemptstoremedy...