GeneralizedDoubly-ReparameterizedGradientEstimatorsMatthiasBauer1AndriyMnih1Abstractusuallypreferunbiasedestimatorsastheytendtobebetter-behavedandarebetterunderstood.LowervarianceisalsoEfficientlow...
DoublyRobustOff-PolicyActor-Critic:ConvergenceandOptimalityTengyuXu1ZhuoranYang2ZhaoranWang3YingbinLiang1Abstract(Haarnojaetal.,2018),etc.However,thesesuccessesusu-allyrelyontheaccesstoon-policysam...
FromImportanceSamplingtoDoublyRobustPolicyGradientJiaweiHuang1NanJiang1AbstractSummaryofthePaperWeprovideasimpleandpositiveanswertotheabovequestionintheepisodicRLsetting.InWeshowthaton-policypolicy...
DoublyStochasticVariationalInferenceforNeuralProcesseswithHierarchicalLatentVariablesQiWang1HerkevanHoof1Abstractdecision-making(Gal&Ghahramani,2016).Neuralprocesses(NPs)constituteafamilyofvari-Fac...
Doublyrobustoff-policyevaluationwithshrinkageYiSu1MariaDimakopoulou2AkshayKrishnamurthy3MiroslavDud´ık3Abstractsubroutinesforoptimizingapolicy(Dud´ıketal.,2011).Weproposeanewframeworkfordesigni...
DoublyRobustJointLearningforRecommendationonDataMissingNotatRandomXiaojieWang1RuiZhang1YuSun2JianzhongQi1Abstract(MNAR).Forexample,arecentstudyinsongrecommen-dationshowsthattheprobabilityofaratingb...
MoreRobustDoublyRobustOff-policyEvaluationMehrdadFarajtabar1YinlamChow2MohammadGhavamzadeh2AbstractSwaminathanetal.2017)andreinforcementlearning(RL)(e.g.,Precupetal.2000a;2001;Paduraru2013;MahmoodW...
DoublyAcceleratedMethodsforFasterCCAandGeneralizedEigendecompositionZeyuanAllen-Zhu1YuanzhiLi2Abstracttraditionof(Wangetal.,2016;Garber&Hazan,2015),weassumewithoutlossofgeneralitythatλi∈[−1,1].W...
DoublyGreedyPrimal-DualCoordinateDescentforSparseEmpiricalRiskMinimizationQiLei1IanE.H.Yen2Chao-yuanWu3InderjitS.Dhillon134PradeepRavikumar2Abstractwhen(z)=max{0,1bz}andg(x)=µ/2kxk22,(1)iiWeconsid...