UncertaintyWeightedActor-CriticforOfflineReinforcementLearningYueWu12ShuangfeiZhai1NitishSrivastava1JoshuaSusskind1JianZhang1RuslanSalakhutdinov2HanlinGoh1Abstractleveragingpriorexperience(Langeeta...
Low-PrecisionReinforcementLearning:RunningSoftActor-CriticinHalfPrecisionJohanBjorck1XiangyuChen1ChristopherDeSa1CarlaP.Gomes1KilianQ.Weinberger1Abstractlearning,anemergingtrendforacceleratingdeepl...
GMAC:ADistributionalPerspectiveonActor-CriticFrameworkDanielWontaeNam1YounghoonKim1ChanY.Park1Abstract(a)TheobservationinputInthispaper,wedeviseadistributionalframe-(b)Theevaluatedvaluedistribution...
Finite-SampleAnalysisofOff-PolicyNaturalActor-CriticAlgorithmSajadKhodadadian∗1ZaiweiChen∗2SivaThejaMaguluri1AbstractAnACalgorithmcanbethoughtasageneralizedpolicyiter-ation(Puterman,1995),andcons...
DecentralizedSingle-TimescaleActorCriticonZero-SumTwo-PlayerStochasticGamesHongyiGuo1ZuyueFu1ZhuoranYang2ZhaoranWang1AbstractasMarkovdecisionprocess(Puterman,2014,MDP),whereanagentaimstolearnanopti...
DoublyRobustOff-PolicyActor-Critic:ConvergenceandOptimalityTengyuXu1ZhuoranYang2ZhaoranWang3YingbinLiang1Abstract(Haarnojaetal.,2018),etc.However,thesesuccessesusu-allyrelyontheaccesstoon-policysam...
DiversityActor-Critic:Sample-AwareEntropyRegularizationforSample-EfficientExplorationSeungyulHan1YoungchulSung1Abstractforchallengingcontinuouscontroltasks.Inthispaper,sample-awarepolicyentropyregu...
CharacterizingtheGapBetweenActor-CriticandPolicyGradientJunfengWen1SaurabhKumar2RamkiGummadi3DaleSchuurmans13Abstractonarangeofchallengingtasks.DespitethesuccessofACmethods,ACandPGhavesubtlediffere...
ProvablyConvergentTwo-TimescaleOff-PolicyActor-CriticwithFunctionApproximationShangtongZhang1BoLiu2HengshuaiYao3ShimonWhiteson1Abstractatwo-timescaleconvergentanalysisunderfunctionapproxi-mation(Ko...
Off-PolicyActor-CriticwithSharedExperienceReplaySimonSchmitt1MatteoHessel1KarenSimonyan1AbstractTable1.Comparisonofmodel-freestate-of-the-artagentson57Atarigamesinthestandardregime:Herenoexperience...
SoftActor-Critic:Off-PolicyMaximumEntropyDeepReinforcementLearningwithaStochasticActorTuomasHaarnoja1AurickZhou1PieterAbbeel1SergeyLevine1Abstractnetworksholdsthepromiseofautomatingawiderangeofdeci...
AddressingFunctionApproximationErrorinActor-CriticMethodsScottFujimoto1HerkevanHoof2DavidMeger1Abstractmeansusinganimpreciseestimatewithineachupdatewillleadtoanaccumulationoferror.Duetooverestimati...