OntheNoisyGradientDescentthatGeneralizesasSGDJingfengWu1WenqingHu2HaoyiXiong3JunHuan4VladimirBraverman1ZhanxingZhu5Abstractetal.,2017;Keskaretal.,2017).Togainintuitions,onecancomparethegeneralizati...
OntheConvergenceofNesterov’sAcceleratedGradientMethodinStochasticSettingsMahmoudAssran123MichaelRabbat23AbstractHowever,thetheoreticalunderstandingofacceleratedmeth-odsremainslimitedwhenusedwithst...
OnGradientDescentAscentforNonconvex-ConcaveMinimaxProblemsTianyiLin1ChiJin2Michael.I.Jordan3Abstractincludinggenerativeadversarialnetworks(GANs)(Good-fellowetal.,2014),statistics(Xuetal.,2009;Abade...
Non-convexLearningviaReplicaExchangeStochasticGradientMCMCWeiDeng1QiFeng2LiyaoGao1FamingLiang1GuangLin1AbstractinDNNsbyinjectingnoisestostochasticGradients.Sincethen,varioushigh-orderSGMCMCalgorith...
NGBoost:NaturalGradientBoostingforProbabilisticPredictionTonyDuan1AnandAvati1DaisyYiDing1KhanhK.Thai2SanjayBasu3AndrewNg1AlejandroSchuler2AbstractyPredictedmean95%predictionintervalWepresentNatural...
Multi-TaskLearningwithUserPreferences:GradientDescentwithControlledAscentinParetoOptimizationDebabrataMahapatra1VaibhavRajan2Abstract2019a),naturallanguageprocessing(Liuetal.,2019b)andbioinformatic...
Momentum-BasedPolicyGradientMethodsFeihuHuang1ShangqianGao1JianPei2HengHuang13Abstracttimesteps,andthenmaximizesthelong-termcumulativerewardstoobtainanoptimalpolicy.Duetoeasyimple-Inthepaper,weprop...
LowBiasLowVarianceGradientEstimatesforHierarchicalBooleanStochasticNetworksAdeelPervez1TacoCohen2EfstratiosGavves1AbstractGradients,onecansampleGradientsfromsomedistribu-tion.Thesampleestimatescanb...
HybridStochastic-DeterministicMinibatchProximalGradient:Less-Than-Single-PassOptimizationwithNearlyOptimalGeneralizationPanZhou1Xiao-TongYuan2Abstract1.IntroductionStochasticvariance-reducedgradien...
High-dimensionalRobustMeanEstimationviaGradientDescentYuCheng1IliasDiakonikolas2RongGe3MahdiSoltanolkotabi4Abstractasmallconstantfractionofarbitraryoutliers.Westudytheproblemofhigh-dimensionalro-Th...
GradientTemporal-DifferenceLearningwithRegularizedCorrectionsSinaGhiassian1AndrewPatterson1ShivamGarg1DhawalGupta1AdamWhite12MarthaWhite1Abstractlearnapolicyfromhumandemonstrations.Ingeneral,manyal...
ExplicitGradientLearningforBlack-BoxOptimizationEladSarafian1MorSinay1YoramLouzoun1NoaAgmon1SaritKraus1Abstractreal-worldphysicsornumericalsimulation.Black-BoxOp-timization(BBO)algorithms(Audet&Har...
ConvergenceofaStochasticGradientMethodwithMomentumforNon-SmoothNon-ConvexOptimizationVienV.Mai1MikaelJohansson1AbstractThisfunctionclassisveryrichandimportantinoptimiza-tion(Rockafellar,1982;Vial,1...
ConditionalGradientmethodsforstochasticallyconstrainedconvexminimizationMaria-LuizaVladarean1AhmetAlacaoglu1Ya-PingHsieh1VolkanCevher1Abstractandoptimalcontrolproblemshavevariableslyinginapossi-bly...
Compressivesensingwithun-trainedneuralnetworks:GradientdescentfindsthesmoothestapproximationReinhardHeckel1MahdiSoltanolkotabi2Abstractopposedtotrainedconvolutionalneuralnetworks,thatlearnanimagepr...
AR-DAE:TowardsUnbiasedNeuralEntropyGradientEstimationJaeHyunLim12AaronCourville1234ChristopherPal154Chin-WeiHuang12Abstractcontrolthisquantityaspartoftheoptimizationobjective.Inlightofthis,wepropos...
AndersonAccelerationofProximalGradientMethodsVienV.Mai1MikaelJohansson1Abstractrameters;slightlyover-orunder-estimatingthestrongcon-vexityconstantcanhaveasevereeffectontheoverallper-Andersonacceler...
AdaptiveGradientDescentwithoutDescentYuraMalitsky1KonstantinMishchenko2Abstractwheref:Rd→Risadifferentiablefunction.Throughoutthepaperweassumethat(1)hasasolutionandwedenoteWepresentastrikinglysimp...
AdaptiveCheckpointAdjointMethodforGradientEstimationinNeuralODEJuntangZhuang1NichaDvornek12XiaoxiaoLi1SekharTatikonda3XenophonPapademetris124JamesDuncan124Abstractdifferentialequation(NODE)(Cheneta...
AccelerationforCompressedGradientDescentinDistributedandFederatedOptimizationZhizeLi1DmitryKovalev1XunQian1PeterRichta´rik1Abstract1.IntroductionDuetothehighcommunicationcostindistributedWiththepr...