DensityConstrainedReinforcementLearningZengyiQin1YuxiaoChen2ChuchuFan1AbstractRoadnetworkWestudyconstrainedReinforcementlearningChargingstation(CRL)fromanovelperspectivebysettingcon-straintsdirectl...
DetectingRewardsDeteriorationinEpisodicReinforcementLearningIdoGreenberg1ShieMannor12AbstractRLtasksisthesafetyandreliabilityofthesystem(Dulac-Arnoldetal.,2019;Chanetal.,2020),arisinginbothof-Inman...
Demonstration-ConditionedReinforcementLearningforFew-ShotImitationThéoCachet1JulienPerez1ChristopherR.Dance1AbstractFigure1.TheproposedDCRLalgorithm,whichusesbothexpertdemonstrationsandenvironment...
DeepReinforcementLearningamidstContinualStructuredNon-StationarityAnnieXie1JamesHarrison1ChelseaFinn1Abstractstationaritymanifestsitselfinchangingterrainsandweatherconditions.Insomesituations,notev...
DecouplingRepresentationLearningfromReinforcementLearningAdamStooke1KiminLee1PieterAbbeel1MichaelLaskin1AbstractHaarnojaetal.,2018)andhavebeensuccessfullyappliedtodomainsrangingfromreal-world(Levin...
CRPO:ANewApproachforSafeReinforcementLearningwithConvergenceGuaranteeTengyuXu1YingbinLang1GuanghuiLan2AbstractMind,2019)andrecommendationsystem(Zhengetal.,2018),etc.Inthesesettings,theagentisallowe...
CounterfactualCreditAssignmentinModel-FreeReinforcementLearningThomasMesnard1ThéophaneWeber1FabioViola1ShantanuThakoor1AlaaSaade1AnnaHarutyunyan1WillDabney1TomStepleton1NicolasHeess1ArthurGuez1Ér...
CooperativeExplorationforMulti-AgentDeepReinforcementLearningIou-JenLiu1UnnatJain1RaymondA.Yeh1AlexanderG.Schwing1Abstract(MADDPG)(Loweetal.,2017),andcounterfactualmulti-agentpolicygradients(COMA)(...
DouZero:MasteringDouDizhuwithSelf-PlayDeepReinforcementLearningDaochenZha1JingruXie2WenyeMa2ShengZhang3XiangruLian2XiaHu1JiLiu2Abstractexample,AlphaGo(Silveretal.,2016),AlphaZero(Sil-veretal.,2018)...
Coach-PlayerMulti-AgentReinforcementLearningforDynamicTeamCompositionBoLiu1QiangLiu1PeterStone1AnimeshGarg23YukeZhu13AnimashreeAnandkumar34AbstractcoachomniscientcoachomniscientInreal-worldmulti-ag...
ControllingGraphDynamicswithReinforcementLearningandGraphNeuralNetworksEliA.Meirom1HaggaiMaron1ShieMannor1GalChechik1AbstractFigure1.Aviralinfectionprocessonagraphandaninterventionaimedtostopitsspr...
Continuous-TimeModel-BasedReinforcementLearningÇag˘atayYıldız1MarkusHeinonen1HarriLähdesmäki1AbstractFigure1:AcomparisonoftruesolutionoftheCartPolesystemagainstdiscreteandcontinuous-timetraje...
ActionableModels:UnsupervisedOfflineReinforcementLearningofRoboticSkillsYevgenChebotar1KarolHausman1YaoLu1TedXiao1DmitryKalashnikov1JakeVarley1AlexIrpan1BenjaminEysenbach12RyanJulian13ChelseaFinn14...
AcceleratingSafeReinforcementLearningwithConstraint-mismatchedBaselinePoliciesTsung-YenYang1JustinianRosca2KarthikNarasimhan1PeterJ.Ramadge1Abstractorothercosts.Forinstance,whenyoudriveanunfamiliar...
ADeepReinforcementLearningApproachtoMarginalizedImportanceSamplingwiththeSuccessorRepresentationScottFujimoto1DavidMeger1DoinaPrecup1Abstractapproachcanhavesignificantlylowervariancethantradi-tiona...
ASharpAnalysisofModel-basedReinforcementLearningwithSelf-PlayQinghuaLiu1TianchengYu2YuBai3ChiJin1Abstract1.IntroductionModel-basedalgorithms—algorithmsthatexploreThispaperisconcernedwiththeproblem...
TighteningExplorationinUpperConfidenceReinforcementLearningHippolyteBourel1Odalric-AmbrymMaillard1MohammadSadeghTalebi2Abstract1.IntroductionTheupperconfidenceReinforcementlearningInthispaper,wecon...
ANaturalLotteryTicketWinner:ReinforcementLearningwithOrdinaryNeuralCircuitsRaminHasani12MathiasLechner3AlexanderAmini2DanielaRus2RaduGrosu1Abstractnectedthroughapproximately8000chemicalandelectrica...
Student-TeacherCurriculumLearningviaReinforcementLearning:PredictingHospitalInpatientAdmissionLocationRasheedel-Bouri1DavidEyre12PeterWatkinson2TingtingZhu1DavidA.Clifton1Abstractrequestedafterthe4...
Sub-GoalTrees–aFrameworkforGoal-BasedReinforcementLearningTomJurgenson1OrAvner1EdwardGroshev2AvivTamar1AbstractpopularframeworkfortrajectoryoptimizationbasedonBellman’sdynamicprogramming(DP)equat...