ProvablyEfficientAlgorithmsforMulti-ObjectiveCompetitiveRLTianchengYu1YiTian1JingzhaoZhang1SuvritSra1Abstractaveragereturntoatargetsetsmallaslongasthissetsatisfiesaconditioncalledapproachability(Bl...
OnReward-FreeRLwithKernelandNeuralFunctionApproximations:Single-AgentMDPandMarkovGameShuangQiu1JiepingYe1ZhaoranWang2ZhuoranYang3Abstractislargeandfunctionapproximatorssuchasneuralnetworksareemploy...
LTL2Action:GeneralizingLTLInstructionsforMulti-TaskRLPashootanVaezipoor12AndrewC.Li12RodrigoToroIcarte12SheilaMcIlraith123Abstractapproachesdonotscalewellbecausetheyrequire(foreverypossibleenvironm...
IsPessimismProvablyEfficientforOfflineRL?YingJin1ZhuoranYang2ZhaoranWang3AbstractVinyalsetal.,2017)reliesontwoingredients:(i)expressivefunctionapproximators,e.g.,deepneuralnetworks(LeCunWestudyoffl...
InstabilitiesofOfflineRLwithPre-TrainedNeuralRepresentationRuosongWang1YifanWu1RuslanSalakhutdinov1ShamM.Kakade23Abstract2018;Wangetal.,2018;Yuetal.,2019);itisseeingmuchrecentinterestduetothelargea...
ExponentialLowerBoundsforBatchReinforcementLearning:BatchRLcanbeExponentiallyHarderthanOnlineRLAndreaZanette1AbstractweconsidertwoclassicalbatchRLproblems:1)theoff-policyevaluation(OPE)problem,wher...
CausalCuriosity:RLAgentsDiscoveringSelf-supervisedExperimentsforCausalRepresentationLearningSumedhASontakke1ArashMehrjou2LaurentItti1BernhardSchölkopf2Abstractform.Thus,therehasbeenrecentinteresti...
ProvablyefficientRLwithRichObservationsviaLatentStateDecodingSimonS.Du1AkshayKrishnamurthy2NanJiang3AlekhAgarwal4MiroslavDud´ık2JohnLangford2Abstract2010;Lattimore&Hutter,2012).Consequently,treat...