TimeLimitsinReinforcementLearningFabioPardo1ArashTavakoli1VitalyLevdik1PetarKormushev1Abstractintheenvironmentwhichinturnprovidesarepresenta-tionSt+1ofthesuccessorstateandarewardsignalRt+1.Inreinfo...
TheMirageofAction-DependentBaselinesinReinforcementLearningGeorgeTucker1SuryaBhupatiraju12ShixiangGu134RichardE.Turner3ZoubinGhahramani35SergeyLevine16Abstractetal.,2015a;2017)areaclassofmodel-free...
StructuredControlNetsforDeepReinforcementLearningMarioSrouji1JianZhang2RuslanSalakhutdinov12AbstractInrecentyears,DeepReinforcementLearningFigure1.TheproposedStructuredControlNet(SCN)forpolicyhasma...
StateAbstractionsforLifelongReinforcementLearningDavidAbel1DilipArumugam1LucasLehnert1MichaelL.Littman1AbstractM<latexitsha1_base64="OX1ier/XMCCLr88ChMp6EICKr2E=">AAAEQnicZVNLb9NAEN4SHsW8WjhyWRGQip...
SoftActor-Critic:Off-PolicyMaximumEntropyDeepReinforcementLearningwithaStochasticActorTuomasHaarnoja1AurickZhou1PieterAbbeel1SergeyLevine1Abstractnetworksholdsthepromiseofautomatingawiderangeofdeci...
Self-ConsistentTrajectoryAutoencoder:HierarchicalReinforcementLearningwithTrajectoryEmbeddingsJohnD.Co-Reyes1YuXuanLiu1AbhishekGupta1BenjaminEysenbach2PieterAbbeel1SergeyLevine1Abstractinvolvetempo...
SBEED:ConvergentReinforcementLearningwithNonlinearFunctionApproximationBoDai1AlbertShaw1LihongLi2LinXiao3NiaoHe4ZhenLiu1JianshuChen5LeSong1AbstractarereferredtothetextbookofPuterman(2014)fordetails...
RLlib:AbstractionsforDistributedReinforcementLearningEricLiang1RichardLiaw1PhilippMoritz1RobertNishihara1RoyFox1KenGoldberg1JosephE.Gonzalez1MichaelI.Jordan1IonStoica1AbstractIntheabsenceofasingled...
RegretMinimizationforPartiallyObservableDeepReinforcementLearningPeterJin1KurtKeutzer1SergeyLevine1Abstractfunction-basedmethods.Somepolicygradientmethodssuchasadvantageactor-critic(Mnihetal.,2016)...
ReinforcementLearningwithFunction-ValuedActionSpacesforPartialDifferentialEquationControlYangchenPan12Amir-massoudFarahmand32MarthaWhite1SalehNabi2PiyushGrover2DanielNikovski2Abstractnamicsystem(Li...
ProgrammaticallyInterpretableReinforcementLearningAbhinavVerma1VijayaraghavanMurali1RishabhSingh2PushmeetKohli3SwaratChaudhuri1Abstractmakesthemdifficulttointerpretortobecheckedforconsis-tencyforso...
ProblemDependentReinforcementLearningBoundsWhichCanIdentifyBanditStructureinMDPsAndreaZanette1EmmaBrunskill1Abstract(MDPs)andpartiallyobservableMDPs(POMDPs).Ban-ditsassumethattheactionstakendonotim...
PolicyandValueTransferinLifelongReinforcementLearningDavidAbel†1YuuJinnai†1YueGuo1GeorgeKonidaris1MichaelL.Littman1Abstractcomputedpoliciesfromrelatedtasks(Ferna´ndez&Veloso,2006;Taylor&Stone,20...
ModelingOthersusingOneselfinMulti-AgentReinforcementLearningRobertaRaileanu1EmilyDenton1ArthurSzlam2RobFergus12Abstractofunderstandingwhattheotherplayeristryingtoachieve,anagentshouldaskitself“wha...
Mix&Match–AgentCurriculaforReinforcementLearningWojciechMarianCzarnecki1SiddhantM.Jayakumar1MaxJaderberg1LeonardHasenclever1YeeWhyeTeh1SimonOsindero1NicolasHeess1RazvanPascanu1AbstractFigure1.Sche...
MeanFieldMulti-AgentReinforcementLearningYaodongYang1RuiLuo1MinneLi1MingZhou2WeinanZhang2JunWang1AbstractInstead,accountingfortheextrainformationfromconjec-turingthepoliciesofotheragentsisbeneficia...
LipschitzContinuityinModel-basedReinforcementLearningKavoshAsadi1DipendraMisra2MichaelL.Littman1Abstractintroduceanovelcharacterizationofmodels,referredtoasaLipschitzmodelclass,thatrepresentsstocha...
LatentSpacePoliciesforHierarchicalReinforcementLearningTuomasHaarnoja1KristianHartikainen2PieterAbbeel1SergeyLevine1AbstractresentationsintoRListhepotentialfortheemergenceofhi-erarchies,whichcanena...
ImportanceWeightedTransferofSamplesinReinforcementLearningAndreaTirinzoni1AndreaSessa1MatteoPirotta2MarcelloRestelli1Abstracttions,parameters,policies,etc.)andinthecriteriausedtoestablishwhethersuc...
ImplicitQuantileNetworksforDistributionalReinforcementLearningWillDabney1GeorgOstrovski1DavidSilver1Re´miMunos1Abstractthis,itassumesreturnsareboundedinaknownrangeandtradesoffmean-preservationatth...