REPAINT:KnowledgeTransferinDeepReinforcementLearningYunzheTao1SahikaGenc1JonathanChung1TaoSun1SunilMallya1Abstractimproveperformanceonothertasks.AcceleratinglearningprocessesforcomplextasksTransfer...
ReinforcementLearningwithPrototypicalRepresentationsDenisYarats12RobFergus1AlessandroLazaric2LerrelPinto1Abstractfromrewardsaloneissampleinefficientandleadstopoorperformance.Priorwork(Srinivasetal....
ReinforcementLearningUnderMoralUncertaintyAdrienEcoffet12JoelLehman12AbstractWhilesuchaccomplishmentsaresignificant,progresshasbeenlessstraight-forwardinapplyingRLtounstructuredAnambitiousgoalforma...
ReinforcementLearningofImplicitandExplicitControlFlowinInstructionsEthanA.Brooks1JanarthananRajendran1RichardL.Lewis2SatinderSingh1Abstracttaskinstructionsthatrequiretheagenttolearncontrolfloweithe...
ReinforcementLearningforCost-AwareMarkovDecisionProcessesWesleyA.Suttle1KaiqingZhang2ZhuoranYang3DavidN.Kraemer1JiLiu4Abstractquentlyusedinpractice.Nevertheless,alternativeobjectiveshaveseenincreas...
QuantumAlgorithmsforReinforcementLearningwithaGenerativeModelDaochenWang1AarthiSundaram2RobinKothari2AshishKapoor3MartinRoetteler2Abstractfasteralgorithmsforcertaintaskslikesearchandfactor-ing(Grov...
PsiPhi-Learning:ReinforcementLearningwithDemonstrationsusingSuccessorFeaturesandInverseTemporalDifferenceLearningAngelosFilos1ClareLyle1YarinGal1SergeyLevine2NatashaJaques23GregoryFarquhar4Abstract...
ProvablyEfficientReinforcementLearningforDiscountedMDPswithFeatureMappingDongruoZhou1JiafanHe1QuanquanGu1Abstractlinearfunctionsorneuralnetworkstomapstatesandactionstoalow-dimensionalspaceandsolvet...
PEBBLE:Feedback-EfficientInteractiveReinforcementLearningviaRelabelingExperienceandUnsupervisedPre-trainingKiminLee1LauraSmith1PieterAbbeel1AbstractKober&Peters,2011;Koberetal.,2013;Silveretal.,201...
PC-MLP:Model-basedReinforcementLearningwithPolicyCoverGuidedExplorationYudaSong1WenSun2Abstractsuccessrate0.5HandEgg0.4Model-basedReinforcementLearning(RL)isa0.3DeepPC-MPL200000popularlearningparad...
On-PolicyDeepReinforcementLearningfortheAverage-RewardCriterionYimingZhang1KeithW.Ross21AbstractHaarnojaetal.,2018)orinaqueuingscenario(Tadepalli&Ok,1994;Sutton&Barto,2018),thereisnonaturalsep-Wede...
RecomposingtheReinforcementLearningBuildingBlockswithHypernetworksEladSarafian1ShaiKeynan1SaritKraus1AbstractResBlockmetavariablePrimarynetLinearBlock256ResBlockTheReinforcementLearning(RL)building...
RandomizedExplorationforReinforcementLearningwithGeneralValueFunctionApproximationHaqueIshfaq12QiwenCui3VietNguyen12AlexAyoub4ZhuoranYang5ZhaoranWang6DoinaPrecup127LinF.Yang8Abstractwhengeneralfunc...
RandomizedEntity-wiseFactorizationforMulti-AgentReinforcementLearningShariqIqbal1ChristianA.SchroederdeWitt2BeiPeng2WendelinBo¨hmer3ShimonWhiteson2FeiSha14AbstractFigure1:Breakawaysub-scenarioinso...
OnReinforcementLearningwithAdversarialCorruptionandItsApplicationtoBlockMDPTianhaoWu12YunchangYang3SimonS.Du4LiweiWang35Abstractisvulnerabletocorrupteddatastemmingfrommaliciousentities(Huangetal.,2...
OfflineReinforcementLearningwithFisherDivergenceCriticRegularizationIlyaKostrikov12JonathanTompson2RobFergus13OfirNachum2Abstractwheredeployinganewpolicytointeractwiththeliveen-vironmentisexpensive...
OfflineReinforcementLearningwithPseudometricLearningRobertDadashi1ShidehRezaeifar2NinoVieillard13Le´onardHussenot14OlivierPietquin1MatthieuGeist1Abstractthatgeneratedtheseexperiences(Pomerleau,199...
Near-OptimalModel-FreeReinforcementLearninginNon-StationaryEpisodicMDPsWeichaoMao1KaiqingZhang1RuihaoZhu2DavidSimchi-Levi2TamerBas¸ar1Abstractthroughsequentialinteractionswithaninitiallyunknownbut...
NearlyOptimalReward-FreeReinforcementLearningZihanZhang1SimonS.Du2XiangyangJi1AbstractRLisexplorationforwhichtheagentneedstostrategicallyvisitnewstatestolearntransitionandrewardinformationWestudyth...
MURAL:Meta-LearningUncertainty-AwareRewardsforOutcome-DrivenReinforcementLearningKevinLi1AbhishekGupta1VitchyrPong1AshwinReddy1AurickZhou1JustinYu1SergeyLevine1AbstractFigure1.MURAL:Ourmethodtrains...