StochasticallyDominantDistributionalReinforcementLearningJohnD.Martin1MichalLyskawinski1XiaohuLi1BrendanEnglot1AbstractTheConditionalValueatRisk(CVaRα)isapopularstatisticthatmeasuresuncertaintywit...
StabilizingTransformersforReinforcementLearningEmilioParisotto1H.FrancisSong2JackW.Rae2RazvanPascanu2CaglarGulcehre2SiddhantM.Jayakumar2MaxJaderberg2Raphae¨lLopezKaufman2AidanClark2SebNoury2Matthe...
Skew-Fit:State-CoveringSelf-SupervisedReinforcementLearningVitchyrH.Pong1MurtazaDalal1StevenLin1AshvinNair1ShikharBahl1SergeyLevine1AbstractFigure1.Left:RobotlearningtoopenadoorwithSkew-Fit,without...
SequentialTransferinReinforcementLearningwithaGenerativeModelAndreaTirinzoni1RiccardoPoiani1MarcelloRestelli1AbstractAkeyquestioniswhatandhowknowledgeshouldbetrans-ferred(Taylor&Stone,2009).Asforth...
SafeReinforcementLearninginConstrainedMarkovDecisionProcessesAkifumiWachi1YananSui2Abstractessentialrequirement,theprimaryobjectiveisnonethelesstoobtainrewards(e.g.,scientificgain).Safereinforcemen...
ROMA:Multi-AgentReinforcementLearningwithEmergentRolesTonghanWang1HengDong1VictorLesser2ChongjieZhang1Abstract598Theroleconceptprovidesausefultooltode-signandunderstandcomplexmulti-agentsys-162tems...
Reward-FreeExplorationforReinforcementLearningChiJin1AkshayKrishnamurthy2MaxSimchowitz3TianchengYu4AbstractExplorationiswidelyregardedasthemostsignificantchal-lengeinRL,becausetheagentmayhavetotake...
ResponsiveSafetyinReinforcementLearningbyPIDLagrangianMethodsAdamStooke12JoshuaAchiam12PieterAbbeel1Abstractonarobot’scomponentsoritssurroundings.Itmaynotbepossibletoimposesuchlimitsbyprescribingc...
RepresentationsforStableOff-PolicyReinforcementLearningDibyaGhosh1MarcBellemare1Abstract1995;Tsitsiklis&Roy,1996).Despitethispotentialforfailure,Q-learningandothertemporal-differencealgorithmsReinf...
PrivateReinforcementLearningwithPACandRegretGuaranteesGiuseppeVietri1BorjaBalle2AkshayKrishnamurthy3StevenWu1Abstractingdataisavailablebeforehand.Whilethesetechniquescoveralargenumberofapplications...
ReinforcementLearninginFeatureSpace:MatrixBandit,Kernels,andRegretBoundLinF.Yang1MengdiWang2Abstractplayanactiona∈A,whereSandAarethestateandactionspaces.ThenthesystemtransitionstoanotherstateExplo...
ReinforcementLearningforIntegerProgramming:LearningtoCutYunhaoTang1ShipraAgrawal1YuriFaenza1Abstractsicalresultsinpolyhedraltheory(seee.g.Confortietal.(2014))implythatanycombinatorialoptimizationpr...
ReinforcementLearningforMolecularDesignGuidedbyQuantumMechanicsGregorN.C.Simm1RobertPinsler1Jose´MiguelHerna´ndez-Lobato1AbstractFigure1.Visualizationofthemoleculardesignprocesspresentedinthiswor...
ReinforcementLearningforNon-StationaryMarkovDecisionProcesses:TheBlessingof(More)OptimismWangChiCheung1DavidSimchi-Levi2RuihaoZhu2Abstractimizesitscumulativerewards,whilefacingthefollowingchallenge...
Q-valuePathDecompositionforDeepMultiagentReinforcementLearningYaodongYang1JianyeHao12GuangyongChen3HongyaoTang1YingfengChen4YujingHu4ChangjieFan4ZhongyuWei5Abstract1.IntroductionRecently,deepmultia...
ProvableSelf-PlayAlgorithmsforCompetitiveReinforcementLearningYuBai1ChiJin2Abstractconflictingrewards(sothattheyessentiallycompetewitheachother)yetcanbetrainedinacentralizedfashion(i.e.Self-play,wh...
Prediction-GuidedMulti-ObjectiveReinforcementLearningforContinuousRobotControlJieXu1YunshengTian1PingchuanMa1DanielaRus1ShinjiroSueda2WojciechMatusik1AbstractRNf2Manyreal-worldcontrolproblemsinvolv...
No-RegretExplorationinGoal-OrientedReinforcementLearningJeanTarbouriech12EvrardGarcelon1MichalValko2MatteoPirotta1AlessandroLazaric1Abstractlengthofanepisode(i.e.,thetimetoreachthegoalstate)isunkno...
Multi-stepGreedyReinforcementLearningAlgorithmsMananTomar1YonathanEfroni2MohammadGhavamzadeh3Abstractestimations(Greensmithetal.,2004)andtohavedifficultiesinhandlingfunctionapproximation(e.g.,Thrun...
Model-BasedReinforcementLearningwithValue-TargetedRegressionAlexAyoub1ZeyuJia2CsabaSzepesva´ri13MengdiWang43LinF.Yang5Abstractmains,suchasgames,roboticsandscience,haswitnessedphenomenalempiricalad...