Shortest-PathConstrainedReinforcementLearningforSparseRewardTasksSungryullSohn12SungtaeLee3JongwookChoi1HarmvanSeijen4MehdiFatemi4HonglakLee21AbstractMoreover,thesuccessoftheRLalgorithmheavilyhinge...
RewardIdentificationinInverseReinforcementLearningKunoKim1KirankumarShiragur1ShivamGarg1StefanoErmon1AbstractMDPstobuildcomputationalmodels(Niv,2009)ofreal-world,rationaldecisionmakerssuchasinvesto...
AdversarialCombinatorialBanditswithGeneralNon-linearRewardFunctionsXiChen1YanjunHan2YiningWang3AbstractchoosesaRewardvectorvt=(vt1,···,vtN)∈[0,1]Nnotrevealedtothealgorithm.Thealgorithmchoosesas...
SafeImitationLearningviaFastBayesianRewardInferencefromPreferencesDanielS.Brown1RussellColeman12RaviSrinivasan2ScottNiekum1Abstractdemonstrations,itisimportantforanagenttobeabletoprovidehigh-confid...
IntrinsicRewardDrivenImitationLearningviaGenerativeModel2020.02.05XingruiYu1YuemingLyu1IvorW.Tsang1AbstractBeyondExpertImitationlearninginahigh-dimensionalenviron-ExpertLevelmentischallenging.Mosti...
IdentifyingRewardFunctionsusingAnchorActionsSinongGeng1HoussamNassif2CarlosA.Manzanares2A.MaxReppen3RonnieSircar3Abstractwithfirmprofitfunctions(Abbring,2010;AguirregabiriaandNevo,2013).Weproposear...
GarbageIn,RewardOut:BootstrappingExplorationinMulti-ArmedBanditsBranislavKveton1CsabaSzepesva´ri23SharanVaswani4ZhengWen5MohammadGhavamzadeh6TorLattimore2Abstract2013b)isageneralizationofamulti-ar...
ContextualMulti-armedBanditAlgorithmforSemiparametricRewardModelGi-SooKim1MyungheeChoPaik1Abstract(Langfordetal.,2008),newsarticleplacementalgorithms(Lietal.,2010),revenuemanagement(Ferreiraetal.,2...
LearningtheRewardFunctionforaMisspecifiedModelErikTalvitie1AbstractFigure1.TheShooterdomain.Inmodel-basedreinforcementlearningitistypi-inMBRL:learningaRewardfunction.Itiscommonforcaltodecouplethepr...
LearningbyPlaying–SolvingSparseRewardTasksfromScratchMartinRiedmiller1RolandHafner1ThomasLampe1MichaelNeunert1JonasDegrave1TomVandeWiele1VolodymyrMnih1NicolasHeess1TobiasSpringenberg1Abstractsimul...