PsiPhi-Learning:ReinforcementLearningwithDemonstrationsusingSuccessorFeaturesandInverseTemporalDifferenceLearningAngelosFilos1ClareLyle1YarinGal1SergeyLevine2NatashaJaques23GregoryFarquhar4Abstract...
LearningtoWeightImperfectDemonstrationsYunkeWang1ChangXu2BoDu1HonglakLee34Abstractanyaccesstorewardsignal,hasachievedgreatsuccessinmanysequentialdecisionmakingproblems(Stadieetal.,Thispaperinvestig...
VariationalImitationLearningwithDiverse-qualityDemonstrationsVootTangkaratt1BoHan21MohammadEmtiyazKhan1MasashiSugiyama13Abstractanassumptionthatdiversityiscausedbynoise-densities.Learningfromdemons...
ExtrapolatingBeyondSuboptimalDemonstrationsviaInverseReinforcementLearningfromObservationsDanielS.Brown1WonjoonGoo1PrabhatNagarajan2ScottNiekum1AbstractFigure1.T-REXtakesasequenceofrankeddemonstrat...
PolicyOptimizationwithDemonstrationsBingyiKang1ZequnJie2JiashiFeng1Abstractonheuristicexplorationstrategies,e.g.,-greedyforvaluebasedmethods(VanHasseltetal.,2016)andnoise-basedExplorationremainsasi...