Model-freeReinforcementLearninginInfinite-horizonAverage-rewardMarkovDecisionProcessesChen-YuWei1MehdiJafarnia-Jahromi1HaipengLuo1HiteshiSharma1RahulJain1Abstractandmodel-free.Model-basedalgorithms...
LeveragingProceduralGenerationtoBenchmarkReinforcementLearningKarlCobbe1ChristopherHesse1JacobHilton1JohnSchulman1Abstractoraretheyapproximatelymemorizingspecifictrajectories?WeintroduceProcgenBenc...
LearningFairPoliciesinMultiobjective(Deep)ReinforcementLearningwithAverageandDiscountedRewardsUmerSiddique1PaulWeng12MatthieuZimmer1AbstractcurrentAImethodsdonothandlewellsituationswheretheyimpactm...
InterpretableOff-PolicyEvaluationinReinforcementLearningbyHighlightingInfluentialTransitionsOmerGottesman1JosephFutoma1YaoLiu2SonaliParbhoo1LeoAnthonyCeli3EmmaBrunskill2FinaleDoshi-Velez1Abstractan...
Inductive-bias-drivenReinforcementLearningforEfficientSchedulesinHeterogeneousClustersSubhoS.Banerjee1SaurabhJha1ZbigniewT.Kalbarczyk1RavishankarK.Iyer1Abstracthariaetal.(2010)).Suchheuristicsaredi...
GeneralizationtoNewActionsinReinforcementLearningAyushJain1AndrewSzot1JosephJ.Lim1AbstractActionAfundamentaltraitofintelligenceistheabil-GoalGoalitytoachievegoalsinthefaceofnovelcircum-stances,such...
EvaluatingthePerformanceofReinforcementLearningAlgorithmsScottM.Jordan1YashChandak1DanielCohen1MengxueZhang1PhilipS.Thomas1AbstractusabilityofRLalgorithms,wesuggestthatitshouldhavefourproperties.Fi...
EnhancedPOET:Open-endedReinforcementLearningthroughUnboundedInventionofLearningChallengesandtheirSolutionsRuiWang1JoelLehman1AdityaRawal1JialeZhi1YulunLi1JeffClune2KennethO.Stanley1Abstractphistica...
DoubleReinforcementLearningforEfficientandRobustOff-PolicyEvaluationNathanKallus1MasatoshiUehara2Abstracts0a0r0s1a1r1s2Off-policyevaluation(OPE)inReinforcementFigure1.Non-Markovdecisionprocess(NMDP...
DiscountFactorasaRegularizerinReinforcementLearningRonAmit1RonMeir1KamilCiosek2Abstractetal.,2019;Zhaoetal.,2019).Inparticular,generalizationiscriticalforsuccessfullydeployingRLagentsthatwereSpecif...
DesigningOptimalDynamicTreatmentRegimes:ACausalReinforcementLearningApproachJunzheZhang1EliasBareinboim1Abstract1.IntroductionAdynamictreatmentregime(DTR)consistsofaInmedicalpractice,apatienttypica...
DescriptionBasedTextClassificationwithReinforcementLearningDuoChai1WeiWu1QinghongHan1WuFei2JiweiLi1AbstractStandardly,textclassificationisdividedintothefollowingtwosteps:(1)textfeatureextraction:as...
DeepReinforcementLearningwithSmoothPolicyQianliShen1YanLi2HaomingJiang2ZhaoranWang3TuoZhao2Abstractquiresasignificantamountoftrainingdata,andsuffersfromnumeroustrainingdifficultiessuchasoverfitting...
DataValuationusingReinforcementLearningJinsungYoon1SercanO¨.Arık1TomasPfister1Abstracttainedbyremovingasignificantportionoftrainingsamples(Ferdowsietal.,2013;Frenay&Verleysen,2014).More-Quantifyi...
CURL:ContrastiveUnsupervisedRepresentationsforReinforcementLearningMichaelLaskin⇤1AravindSrinivas⇤1PieterAbbeel1AbstractFigure1.ContrastiveUnsupervisedRepresentationsforReinforcementLearning(CURL...
Clinician-in-the-LoopDecisionMaking:ReinforcementLearningwithNear-OptimalSet-ValuedPoliciesShengpuTang1AdityaModi1MichaelW.Sjoding23JennaWiens1Abstractrewardsignalsviarewardshaping(Lizotteetal.,201...
CautiousAdaptationForReinforcementLearninginSafety-CriticalSettingsJesseZhang1BrianCheung1ChelseaFinn2SergeyLevine1DineshJayaraman3AbstractFigure1.TheSafety-CriticalAdaptation(SCA)taskframework.Ina...
CanIncreasingInputDimensionalityImproveDeepReinforcementLearning?KeiOta1TomoakiOiki1DeveshK.Jha2ToshisadaMariyama1DanielNikovski2Abstract1.IntroductionDeepReinforcementlearning(RL)algorithmsDeeprei...
BootstrapLatent-PredictiveRepresentationsforMultitaskReinforcementLearningDanielGuo1BernardoAvilaPires1BilalPiot1JeanBastienGrill2FlorentAltché2RémiMunos2MohammadGheshlaghiAzar1Abstracttitaskandp...
BatchReinforcementLearningwithHyperparameterGradientsByung-JunLee1JongminLee1PeterVrancx2DonghoKim2Kee-EungKim23Abstractrealenvironment.However,thisapproachrequiresalotofhumaneffortincludingdomaine...