OptimalOff-PolicyEvaluationfromMultipleLoggingPoliciesNathanKallus1YutaSaito1MasatoshiUehara1AbstractInmostoftheabovestudies,theobservationsusedtoevalu-ateanewpolicyareassumedgeneratedbyasinglelogg...
Neuro-algorithmicPoliciesEnableFastCombinatorialGeneralizationMarinVlastelica1,MichalRolínek1GeorgMartius1inputrepresentationt+2Dijkstra'sshortestpathpredictedHammingexpert2frameslearningt+1trajec...
LearningQueueingPoliciesforOrganTransplantationAllocationusingInterpretableCounterfactualSurvivalAnalysisJeroenBerrevoets1AhmedM.Alaa2ZhaozhiQian1JamesJordon3AlexanderGimson4MihaelavanderSchaar125A...
LearningFairPoliciesinDecentralizedCooperativeMulti-AgentReinforcementLearningMatthieuZimmer1ClaireGlanois1UmerSiddique1PaulWeng12Abstractcurrentmainfocusisontheirperformancewithrespecttothetotal(o...
DiscoveringsymbolicPolicieswithdeepreinforcementlearningMikelLandajuela1BrendenK.Petersen1SookyungKim1ClaudioP.Santiago1RubenGlatt1T.NathanMundhenk1JacobF.Pettit1DanielM.Faissol1AbstractFigure1:Alg...
Decision-MakingUnderSelectiveLabels:OptimalFinite-DomainPoliciesandBeyondDennisWei1Abstracttoobserveitifbailisdenied.Inhiring,acandidate’sjobperformanceisobservedonlyiftheyarehired.Selectivelabels...
LearningNearOptimalPolicieswithLowInherentBellmanErrorAndreaZanette1AlessandroLazaric2MykelKochenderfer1EmmaBrunskill1Abstract1.IntroductionWestudytheexplorationproblemwithapprox-Improvingthesample...
LearningFairPoliciesinMultiobjective(Deep)ReinforcementLearningwithAverageandDiscountedRewardsUmerSiddique1PaulWeng12MatthieuZimmer1AbstractcurrentAImethodsdonothandlewellsituationswheretheyimpactm...
LearningCalibratablePoliciesusingProgrammaticStyle-ConsistencyEricZhan1AlbertTseng1YisongYue1AdithSwaminathan2MatthewHausknecht2Abstractthatthebehaviorscanexhibitverydiversestyles(e.g.,frommultiple...
SymbolicNetwork:GeneralizedNeuralPoliciesforRelationalMDPsSankalpGarg1AniketBajpai1Mausam1Abstract1.IntroductionARelationalMarkovDecisionProcess(RMDP)ARelationalMarkovDecisionProcess(RMDP)(Boutilie...
ImitatingLatentPoliciesfromObservationAshleyD.Edwards1HimanshuSahni1YannickSchroecker1CharlesL.Isbell1Abstractnariosandcostlytoobtain.Thus,weneedamechanismforlearningPoliciesfromobservationalonewit...
ComposingEntropicPoliciesusingDivergenceCorrectionJonathanJHunt1AndreBarreto1TimothyPLillicrap1NicolasHeess1Abstractetal.,2012;Haith&Krakauer,2013)However,oncesuchskillshavebeenacquiredhumansrapidl...
LatentSpacePoliciesforHierarchicalReinforcementLearningTuomasHaarnoja1KristianHartikainen2PieterAbbeel1SergeyLevine1AbstractresentationsintoRListhepotentialfortheemergenceofhi-erarchies,whichcanena...
ReinforcementLearningwithDeepEnergy-BasedPoliciesTuomasHaarnoja1HaoranTang2PieterAbbeel134SergeyLevine1AbstractstochasticPoliciesaredesirableforexploration,thisex-plorationistypicallyattainedheuris...