UCBMomentumQ-learning:CorrectingthebiaswithoutforgettingPierreMénard1OmarDarwicheDomingues2XuedongShang23MichalValko234Abstractbalancetheexplorationoftheenvironmentandexploitationofthecurrentknowl...
EnsembleBootstrappingforQ-learningOrenPeer1ChenTessler1NadavMerlis1RonMeir1Abstractfocusesonlearningthevalue-function.Thevaluerepresentstheexpected,discounted,reward-to-gothattheagentwillQ-learning...
EMaQ:Expected-MaxQ-learningOperatorforSimpleYetEffectiveOfflineandOnlineRLSeyedKamyarSeyedGhasemipour12DaleSchuurmans3ShixiangShaneGu3Abstract1.IntroductionOff-policyreinforcementlearning(RL)holdst...
Multi-AgentDeterminantalQ-learningYaodongYang12YingWen12LihengChen3JunWang2KunShao1DavidMguni1WeinanZhang3AbstractAfullspectrumofMARLalgorithmshasbeendevelopedtosolvecooperativetasks(Panait&Luke,20...
Lookahead-BoundedQ-learningIbrahimElShar1DanielR.Jiang1Abstractinthefollowingsense:writingthetransitiondynamicsasst+1=f(st,at,wt+1),wherestandatarethecurrentWeintroducethelookahead-boundedQ-learnin...
ConQUR:MitigatingDelusionalBiasinDeepQ-learningDiJia(Andy)Su12JaydenOoi1TylerLu1DaleSchuurmans13CraigBoutilier1Abstract&Smart,2004;Melo&Ribeiro,2007;Maeietal.,2010;Munosetal.,2016);butitremainsdif...
AFinite-TimeAnalysisofQ-learningwithNeuralNetworkFunctionApproximationPanXu1QuanquanGu1AbstractwhichtriggersalineofresearchondeepreinforcementlearningsuchasDoubleDeepQ-learning(VanHasseltQ-learning...
Sample-OptimalParametricQ-learningUsingLinearlyAdditiveFeaturesLinF.Yang1MengdiWang1Abstractthistheoretical-sharpresultdoesnotgeneralizetopracticalproblemswhereS,Acanbearbitrarilylargeorinfinite.Co...
MakingDeepQ-learningMethodsRobusttoTimeDiscretizationCorentinTallec1Le´onardBlier12YannOllivier2Abstractpreventstransferfromimperfectsimulatorstorealworldscenarios.Despiteremarkablesuccesses,DeepR...
DiagnosingBottlenecksinDeepQ-learningAlgorithmsJustinFu1AviralKumar1MatthewSoh1SergeyLevine1AbstractwhichpotentialissueswithQ-learningmanifestinpractice.WeempiricallyanalyzeaspectsoftheQ-learningme...